AI Made Friendly HERE

OpenAI Might’ve Breached YouTube’s Terms to Train Sora

OpenAI might’ve breached YouTube’s terms and conditions to train its text-to-video model Sora, says Google CEO Sundar Pichai.

“So you felt like they had broken your terms and conditions, or potentially, or if they had, that wouldn’t have been appropriate?” Nilay Patel, the editor-in-chief of The Verge, asked Pichai in an interview published Monday.

“That’s right. Yes, that’s right,” Pichai replied.

Sundar Pichai says he believes OpenAI’s Sora breached YouTube’s terms and conditions and he is sympathetic to creators whose content is being used to train AI models

— Tsarathustra (@tsarnick) May 20, 2024

Earlier in the interview, Pichai revealed that YouTube was still “following up and trying to understand” how OpenAI had trained Sora.

“Look we don’t know the details,” Pichai said. “We have terms and conditions, and we would expect people to abide by those terms and conditions when you build a product, so that’s how I felt about it.”

In February, the ChatGPT-maker wowed the AI industry when it debuted Sora to the world. The model, which takes its name from the Japanese word for “sky,” is capable of generating high quality videos with a simple text prompt.

But OpenAI has remained coy about the data it used to train coy. The company’s CTO Mira Murati told The Wall Street Journal’s Joanna Stern in March that it “used publicly available data and licensed data.”

Murati, however, gave a far less definitive answer when Stern asked if OpenAI had taken data from platforms like YouTube and Instagram.

“I’m actually not sure about that,” Murati replied. “You know, if they were publicly available to use, there might be data. But I’m not sure. I’m not confident about it.”

Last month, YouTube CEO Neal Mohan told Bloomberg’s Emily Chang that while he didn’t know if OpenAI had trained Sora on YouTube videos, it would’ve been a “clear violation” of the platform’s terms of use if they did.

“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service is going to be abided by,” Mohan said.

“It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service,” he continued. “Those are the rules of the road in terms of content on our platform.”

Representatives for Google and OpenAI didn’t immediately respond to requests for comment from BI sent outside regular business hours.

OpenAI’s YouTube troubles underscore the challenges faced by data-hungry AI companies trying to train their models. In October, Amazon-backed AI startup Anthropic said that it was using data that it generated itself to train their models.

And this wouldn’t be the only time OpenAI has courted controversy with how it works with content and creators.

On Monday, actress Scarlett Johansson said she was “shocked” and “angered” after OpenAI’s brand new virtual assistant sounded “eerily similar” to hers.

Johansson said in a statement that she had turned down OpenAI CEO Sam Altman’s offer to voice its latest GPT-4o model.

The model, which was released last week, included several voice options. Many social media users felt that one of voices, named “Sky,” sounded like an AI chatbot that Johansson voiced in Spike Jonze’s “Her.” OpenAI said on Sunday that it was pausing “Sky’s” release.

We’ve heard questions about how we chose the voices in ChatGPT, especially Sky. We are working to pause the use of Sky while we address them.

Read more about how we chose these voices:

— OpenAI (@OpenAI) May 20, 2024

“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice — Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,” OpenAI wrote in a blog post on the same day.

Originally Appeared Here

You May Also Like

About the Author:

Early Bird