This February, OpenAI first previewed Sora, a new AI model for video creation that uses text prompts to generate videos in practically any style imaginable. The artificial intelligence research organization released a series of videos that were created through written prompts, and the result is impressive. Although several other text-to-video models have been created and are in development, industry experts have highlighted the quality of the videos, saying that its introduction could represent a big leap in AI and text-to-video generation. Here’s a breakdown of the system:
What is Sora?
Sora is a large-scale video generation model trained on several types of data, including videos and images of different durations, resolutions, and aspect ratios. It uses generative artificial intelligence to create clips based on written prompts, but it could expand beyond that. According to the developers, its name was chosen after the Japanese word for sky, referring to its “limitless creative potential”.
The system has been called a “text-to-video generator” but according to OpenAI, it’s much more than that. Not only can it generate videos based on text prompts, but it also can be prompted with several types of inputs, such as pre-existing images or videos, which can be used to create looping videos, animated static images and extending videos forwards or backwards in time. Moreover, capabilities like 3D consistency, long-range coherence, object permanence and interaction with the environment suggest that the system has the potential to simulate aspects of the physical and digital world.
Sora uses a “transformer architecture” that functions on “spacetime patches” of video and image latent codes. The architecture enables the model to generate high-fidelity videos. The patches act as transformer tokens which allows Sora to train on videos and images no matter their format. It also uses a video compression network to reduce the dimensionality of visual data, which enables better training and generation of videos in a compressed latent space.
However, the system is not perfect. The developers have highlighted current limitations, such as inaccuracies in modeling physics and object interactions. With further research, these limitations could be addressed, improving the model’s capabilities.
Users and critics have highlighted the possible dangers of Sora, especially considering the risks AI currently poses, such as deepfakes — AI edited videos of real people. Some have raised other concerns, such as how this tool could take some jobs away from video creators, animators, editors and special-effects specialists. Added to this, as AI technologies face regulation in the U.S. and elsewhere, there are questions about how Sora will work in the future.
When will Sora be released?
OpenAI has not announced a release date for Sora. The company has said that it has plans to release it but that it would not be soon.
Will I have to pay for Sora?
There has been no announcements on how Sora will be released or if it’s going to be a paid service. However, we can expect OpenAI will charge users for Sora the same way as for their ChatGPT-4 and DALL-E systems.
Are there any other systems like Sora?
Currently, Meta and Google are working on text-to-video generating models. Google’s Lumiere was presented in February 2024, however it’s still in its developmental phase, while Meta’s Make-a-Video is still in the works. There’s also Runway’s Gen-2. However, none of these systems have achieved the quality of Sora’s introduction videos.
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition