OpenAI may have used videos from YouTube and Instagram to train Sora AI, the video generator. OpenAI may be looking at a massive lawsuit, if some one is able to prove they used social media posts without the user’s or platforms content
Although there isn’t much clear proof of this, several AI experts, including some people from Alphabet, believe that Sora’s OpenAI’s AI video generator was trained on data scrubbed from video platforms and social media platforms like YouTube and Instagram.
If OpenAI did indeed use this data to train Sora, it did not get any clearances or licenses from the platforms. And this may cause trouble for Sam Altman’s AI venture.
According to YouTube Chief Executive Officer Neal Mohan, using YouTube videos to train OpenAI’s text-to-video generator would be a major violation of the platform’s terms of service.
Mohan stated that creators expect their content to be protected when uploaded to YouTube. One such expectation is that people on the platform will adhere to the regulations of the platform. They also expect that their content is not reused by anyone for any commercial purpose, with their explicit approval.
In his first public remarks on the matter, Mohan clarified that he lacked direct knowledge of whether OpenAI had used YouTube videos to refine Sora. However, he emphasized that such actions would clearly breach YouTube’s terms of service.
The debate over the materials used to train AI models, including those behind popular content creation tools like ChatGPT and DALL-E, has sparked widespread discussion. Generative AI tools like Sora rely on scrubbing the internet to improve their capabilities, be it for generating videos, photos, or even text.
While companies such as OpenAI and Google go toe-to-toe in a bid to develop more advanced AI models and be the first to reach AGI, they need extensive pools for content or data to train their models effectively.
In a recent interview, OpenAI’s Chief Technology Officer, Mira Murati was put on the spot when she was asked if OpenAI scrubbed social media platforms and platforms like YouTube for video data to train Sora. Murati was unable to answer the question and claimed she was uncertain as far as she was concerned.
Reports indicate that OpenAI has explored training its upcoming large language model, GPT-5, using transcriptions of public YouTube videos. Mohan noted that Google, YouTube’s parent company, respects individual contracts with creators before considering the use of YouTube videos for training its own AI model, Gemini.
Mohan also highlighted the importance of ensuring that any use of YouTube videos aligns with creators’ licensing agreements and terms of service. Although a portion of YouTube’s content may be utilized for training AI models like Gemini, Google and YouTube prioritize compliance with creators’ contracts and terms of service.