AI Made Friendly HERE

Sora, Veo 2, Pika, Ray2

Video generation models and software advanced significantly in 2024, a year capped by impressive model updates to Google DeepMind Veo 2, Pika 2.0, Luma AI’s Ray2 and OpenAI’s Sora, which made its awaited public release.

AI companies are still actively pitching Hollywood studios and demoing their respective generative video tech. Sources to VIP+ agreed that video generation was getting very good but remain unclear on exactly how, and even if, these models can yet be used effectively in film and TV content production.

For more perspective on how generative video will continue to advance and how the entertainment industry is evaluating these models in the current moment, Variety Intelligence Platform spoke with 11 authorities spanning leaders at AI companies building generative video models and software, VFX studios and filmmakers familiar with generative AI and who have consulted directly with AI companies on their video models and are steering the development of AI production workflows at AI studios.

Filmmaker sources who were among the limited pool given beta access to Veo 2 all agreed that DeepMind had achieved a meaningful leap forward relative to other video generation models, superior even to OpenAI’s Sora, which has been generally regarded as the most impressive U.S. video model.

Instead, sources reported being underwhelmed by the public version of Sora, which one described as “neutered” and “aesthetically limited” versus the version they had tested last year.

“We’re still in the early days of understanding what a good model is. It’s quite possible that when we made it ready for scale, it did lose some quality in certain areas. But it’s not scientific right now in terms of how we really measure that. We can [only evaluate] user preferences,” OpenAI’s Sora product lead Rohan Sahai told VIP+.

But in the bigger picture, sources argued for observing the trajectory rather than lamenting the limitations of the moment, given how much and how quickly video models have already improved.

“What’s possible today with these models was unthinkable in the wildest dreams of anyone even a year or two ago,” said Cristóbal Valenzuela, cofounder and CEO at Runway. ‘We’re not even close to the final stage or saturation point.”

Looking ahead to 2025, AI companies tended to agree their development focus for video generation would be further improving image quality, building more types of control and enabling faster generation times. AI developers are building capabilities into their models and honing software interfaces and features in direct response to creative industry needs and feedback.

Filmmaker sources remarked on their sense that AI companies, particularly Runway and Google DeepMind, have sincerely listened to their feedback, and they were seeing it prove out in the models and features designed to serve needs they had specified.

For professional creative use, video generation will be evaluated and therefore compete on several aspects, as described by filmmakers, VFX and AI developer sources:

Image Quality & Realism
The photo and physics realism of AI video outputs has improved with each model update. Sources said Google DeepMind’s Veo 2 and Chinese models Kling and Hailuo now deliver the most realistic outputs.

“Veo 2 is a whole other class. I’ve never seen video that looks that realistic come out of AI,” said Jason Zada, founder and filmmaker at AI studio Secret Level.

Video from Veo 2 is starting to be able to fool the eye, with imperfections becoming less easily perceptible, particularly to the untrained eye. “Many shots pass the ‘visual Turing test,’ in which most people would not be able to distinguish that it’s completely synthetic,” said filmmaker Paul Trillo, strategic partner at AI studio Asteria.

All video models still hallucinate. Motion in generative AI video often tends to look unrealistic and feel constrained or uncanny. Even the most advanced models fail to accurately replicate the physical dynamics of bodies in complex motion, such as walk or run cycles, martial arts, gymnastics or basketball, and misjudge the weight of the world (i.e., visual changes that would result from something being heavier or lighter).

Finally, human figures in AI video still remain the uncanny valley, but no one VIP+ spoke with seemed to want AI-generated actors. All sources expected that human actor performances would continue to be both creatively wanted and necessary for live action, though they didn’t rule out the possibility of compositing performers into generated backgrounds.

Controllability
Despite impressive advancement in image quality, many AI video tools still lack the controllability needed for high-end production use. Professional filmmakers and VFX artists need maximum control, though this hasn’t been easily or consistently achievable with video models.

But across the board, sources were confident that a breadth of sophisticated features would emerge and improve to offer more precise video creation and editing. More advanced, alternative AI production workflows are providing levels of control and flexibility not natively available in video gen software, such as using open-source models like Stable Diffusion in ComfyUI.

“We will always [aim to] have the best model possible, but you also need better tooling on top,” said Runway’s Valenzuela. For example, the company’s Camera Control feature allows users to direct camera movement in a video.

Sources repeatedly referenced the challenge of prompt adherence, or the ability of a model to output video that accurately follows specific instructions contained in text prompts. Sources described a constant struggle with most video models misinterpreting their requests, requiring multiple rounds of trial and error — an extremely wasteful process, not just of time but of the massive energy and water resources needed to run generative AI models.

“We generate 10, 20, 30, 40 different videos just to get one that moves correctly and doesn’t have any weirdness,” said Secret Level’s Zada. “The magic would be I only have to ask once or twice, not 50 times, for the right thing.”

“It’s hard to communicate with the model to really get what you want. You ask for something and you’ll get something totally different,” said OpenAI’s Sahai. “It’s actually an iteration process and takes some work.”

Filmmaker sources to VIP+ each said that Veo 2 excels at adhering to even very complicated text prompts in ways other video models simply don’t. “It’s the first time I’ve really felt like an AI image or video tool is actually creating what’s in my head — and sometimes better,” said Trillo. “I’ve been giving it incredibly specific prompts with multiple characters, and it’s been able to keep it all coherent.”

While text-to-video has its place and prompt adherence could improve, several sources held the larger view that text-prompting a model was akin to a slot machine and wouldn’t ultimately be the best way to maximize a video model in a professional context because it lacks controllability.

“The text prompt is a very crude control mechanism for generating complex videos with motion and performance,” said Theodore Jones, VFX supervisor at Framestore.

“[Text] prompts for us represents only a fraction of how actual users are using Runway,” said Valenzuela.

“We don’t think the future of generative video is text-to-video. Text-to-video will play a part, but it’s not going to be the way that filmmakers will want to use this technology — just writing a text prompt and hoping you get the result you want. Artists will agree that you cannot describe everything in words you want to see onscreen,” said Hanno Basse, CTO at Stability AI (formerly CTO at Digital Domain).

Instead, he said, Stability had identified a list of “utilitarian” problems it was focused on solving, including with new products the company would release itself or with vendor partners — restoration via upscaling, relighting and inpainting (e.g., to remove or replace an object or change the background in a scene) among them. “There are a lot of waypoints where we can automate specific tasks or workloads in the content creation pipeline, one piece at a time. We’re trying to support this really high level of detailed control over the content creation process.” 

Beyond text-to-video, filmmaker sources referenced image-to-video, video-to-video and fine-tuning as valuable capabilities, allowing them to derive new video based on a preferred image, video or curated set of imagery to more accurately match a specific character, style, background or even entire scene.

Data Security & Legal Compliance
Enterprise users, particularly studios, need any data they create or upload to video generation software to be completely siloed and secure, and so they wouldn’t be comfortable using cloud-based services or having user data fed back into the model for training purposes. Multiple sources agreed major studios or any U.S. enterprise user would completely avoid Chinese video models such as Kling, Hailuo and HunyuanVideo due to their unknown data security risks.

Video generation software needs to be evaluated not only for its practical usefulness (how well it performs a task to achieve a desired creative result) but its legal compliance, where there’s still enormous uncertainty tied to model training data and copyrightability of AI outputs.

But even if these tools were scot-free of legal risk, filmmaker and VFX sources still expressed doubt about video generation in its current state for the highest production value film and TV projects, particularly anything intended for theatrical distribution.

“I’m finding it hard to see the use case for final pixel production because they’re very compressed images,” said Trillo. “I’m not sure if developers understand the hurdles it needs to clear to actually be projected in 8K resolution on a large screen. To make full films with this stuff, it has a ways to go, and we still have to ask ourselves why.”

Variety VIP+ Explores Gen AI From All Angles — Pick a Story

More on video generation from VIP+ …

Jan. 22: A look at Veo 2’s leap forward, according to early testers and DeepMind developers
Jan. 27: Why film, TV and VFX studios are in a state of limbo toward using video generation models

Originally Appeared Here

You May Also Like

About the Author:

Early Bird