The AI program Sora generated a video featuring this artificial woman based on a text prompt
Sora/OpenAI
OpenAI has unveiled its latest artificial intelligence system, a program called Sora that can transform text descriptions into photorealistic videos. The video generation model is spurring excitement about advancing AI technology, along with growing concerns over how artificial deepfake videos worsen misinformation and disinformation during a pivotal election year worldwide.
The Sora AI model can currently create videos up to 60 seconds long using either text instructions alone or text combined with an image. One demonstration video starts with a text prompt that describes how “a stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage”. Other examples include a dog frolicking in the snow, vehicles driving along roads and more fantastical scenarios such as sharks swimming in midair between city skyscrapers.
“As with other techniques in generative AI, there is no reason to believe that text-to-video will not continue to rapidly improve – moving us closer and closer to a time when it will be difficult to distinguish the fake from the real,” says Hany Farid at the University of California, Berkeley. “This technology, if combined with AI-powered voice cloning, could open up an entirely new front when it comes to creating deepfakes of people saying and doing things they never did.”
Sora is based in part on OpenAI’s preexisting technologies, such as the image generator DALL-E and the GPT large language models. Text-to-video AI models have lagged somewhat behind those other technologies in terms of realism and accessibility, but the Sora demonstration is an “order of magnitude more believable and less cartoonish” than what has come before, says Rachel Tobac, co-founder of SocialProof Security, a white-hat hacking organisation focused on social engineering.
To achieve this higher level of realism, Sora combines two different AI approaches. The first is a diffusion model similar to those used in AI image generators such as DALL-E. These models learn to gradually convert randomised image pixels into a coherent image. The second AI technique is called “transformer architecture” and is used to contextualise and piece together sequential data. For example, large language models use transformer architecture to assemble words into generally comprehensible sentences. In this case, OpenAI broke down video clips into visual “spacetime patches” that Sora’s transformer architecture could process.
Sora’s videos still contain plenty of mistakes, such as a walking human’s left and right legs swapping places, a chair randomly floating in midair or a bitten cookie magically having no bite mark. Still, Jim Fan, a senior research scientist at NVIDIA, took to the social media platform X to praise Sora as a “data-driven physics engine” that can simulate worlds.
The fact that Sora’s videos still display some strange glitches when depicting complex scenes with lots of movement suggests that such deepfake videos will be detectable for now, says Arvind Narayanan at Princeton University. But he also cautioned that in the long run “we will need to find other ways to adapt as a society”.
OpenAI has held off on making Sora publicly available while it performs “red team” exercises where experts try to break the AI model’s safeguards in order to assess its potential for misuse. The select group of people currently testing Sora are “domain experts in areas like misinformation, hateful content and bias”, says an OpenAI spokesperson.
This testing is vital because artificial videos could let bad actors generate false footage in order to, for instance, harass someone or sway a political election. Misinformation and disinformation fuelled by AI-generated deepfakes ranks as a major concern for leaders in academia, business, government and other sectors, as well as for AI experts.
“Sora is absolutely capable of creating videos that could trick everyday folks,” says Tobac. “Video does not need to be perfect to be believable as many people still don’t realise that video can be manipulated as easily as pictures.”
AI companies will need to collaborate with social media networks and governments to handle the scale of misinformation and disinformation likely to occur once Sora becomes open to the public, says Tobac. Defences could include implementing unique identifiers, or “watermarks”, for AI-generated content.
When asked if OpenAI has any plans to make Sora more widely available in 2024, the OpenAI spokesperson described the company as “taking several important safety steps ahead of making Sora available in OpenAI’s products”. For instance, the company already uses automated processes aimed at preventing its commercial AI models from generating depictions of extreme violence, sexual content, hateful imagery and real politicians or celebrities. With more people than ever before participating in elections this year, those safety steps will be crucial.
Topics:
- artificial intelligence/
- video