OpenAI And Google Under Fire For Training AI On Millions Of Hours Of YouTube Videos

hero youtube
A new scathing report suggests that Google and other AI companies have been training AI models on YouTube videos, calling into question whether the companies have been violating the content creators' copyrights. The report dives into the depths companies like OpenAI and Meta have gone to in order to scrape the barrel of YouTube creators to train AI models.

This latest revelation comes on the heels of YouTube Chief Executive Officer Neal Mohan, accusing OpenAI of using YouTube Videos to train OpenAI’s text-to-video generator, Sora. All of this simply adds fuel to the fire for those who have been saying AI companies, such as OpenAI, have been breaking copyright laws with its overreaching acts to train its AI. The latest accusations include OpenAI developing its Whisper audio transcription model to transcribe over 1 million hours of YouTube videos in order to train its GPT-4 AI model.

youtube openai logos

OpenAI spokesperson Lindsay Held remarked in a recent interview that the AI company collects “unique” datasets for its models to “help their understanding of the world,” and to maintain its competitive edge around the world. Held went on to add that OpenAI also utilizes “numerous sources including publicly available data and partnerships for non-public data.”

Sources have indicated that Google was aware of OpenAI and others using YouTube videos to train its AI models, but turned a blind eye to the situation because Google was doing the same thing. Google told The New York Times that it only does so with videos from creators who agree to having it done. It is not clear whether other AI companies adhere to the same standard of conduct, however. According to Google’s rules, “unauthorized scraping or downloading of YouTube content” is prohibited, and that Google was not “aware” of OpenAI breaking this golden rule.

As the battle for AI dominance continues to heat up, there is little doubt that AI companies will seek any method or means of getting an advantage, or at least stay competitive. The report also claimed that Google had a team to tweak its privacy policy in June 2023 to have a broader coverage when it comes to using publicly available content, such as with Google Docs and Google Sheets.

The new accusations only bring more questions when it comes to have far companies can go in order to train AI models. How far will the government allow companies like OpenAI, Meta, and Google go before it finally sets clear boundaries for the public’s right to privacy and copyrights to their creative works?