Tech Giants Accused of Ethically Dubious Data Harvesting for AI Training

OpenAI, Google, and Meta reportedly scraped vast amounts of public and copyrighted content, including YouTube videos, to train their AI models amid a data shortage.

Overview

OpenAI transcribed over a million hours of YouTube videos using its Whisper model to train GPT-4, raising legal and ethical concerns.
Google and OpenAI have been accused of violating copyright laws by training their AI models on transcribed YouTube content, despite prohibitions against unauthorized scraping.
Meta also engaged in questionable practices by harvesting copyrighted materials for AI training without compensation, according to reports.
The tech giants' actions have sparked concerns about privacy and copyright, as they seek to overcome the challenges of a dwindling supply of high-quality training data.
Legal and technical measures are being considered or implemented by companies like Google to prevent unauthorized use of their content for AI training.

Particle.news

Tech Giants Accused of Ethically Dubious Data Harvesting for AI Training

Overview