Sora, OpenAI’s recent AI video generator, seems to leapfrog every prior AI video generator before it. The clips look scaringly realistic, suffer from an almost negligible amount of visual bugs and inconsistencies, and seem to deeply “get” the gist behind every prompt fed into the machine. While some indications are uncovering the AI-generated origins of most clips, this is a new phase – “a new phase” not only regarding AI capabilities, or the state of the video medium but a new phase in the very fabric of our society.
Vision is by far the most prominent of our senses. An estimated 80% of our total perceived information comes from our visual receptors, aka – our eyes. Having visual, information-based perception has served humankind well for eons. Our excellent color vision helped us find ripe fruit, track prey, or avoid camouflaged predators. It helped us marvel at the gentle rays of sunrise and the delicate colors of butterflies. The significance of vision in our perception is undeniable. This system affects our every move and thus holds immense power. Just as a colorful flower or poisonous frog evolved to capitalize on our vision-based decision-making, so did mass media. Sora, LTX Studio, MidJourney, Dall-E, and others are just another step in this direction. Or are they?
Is there a fundamental difference?
As with all interesting questions, there is no definitive answer. The world of filmmaking is packed with powerful visual manipulation (and don’t get me started with sound). I think the most important difference is accessibility to these powerful tools. Visual communication walks two parallel paths: One toward technology-based democratization, the other toward the growing control over publishing. The threshold for creating trustworthy footage is ever-declining, but the more content created, the more power shifts toward publishers and platforms. The Kodak Brownie gifted us with countless more storytellers; it also gave Kodak immense power, Both financial and cultural. As YouTube, Facebook, and others “democratized” broadcast and content publishing, it also shifted unprecedented power into the hands of several new moguls. In that regard, Open AI’s Sora is just another step in this long path(s).
Sora makes trustworthy visual creation potentially easy and accessible. One may argue that, unlike traditional filmmaking methods, this method is different because of its detachment from reality as source material. This is only partially true since AI systems are all trained on found footage. Furthermore – no one will claim that the Sandworms of Arakis are a reality being filmed. So what is it about generative AI?
Magnitude is fundamental
Accessibility. As we’ve seen with every evolution and revolution in our field, accessibility is extremely influential. As technological barriers are breached, the critical mass of newcomers creates a tectonic shift. With the likes of Sora, LTX Studio, and other generative AI applications, this earthquake is starting.
How can we detect Sora-made videos?
Generative videos have seen significant improvement, but we can still spot extra legs, little red pandas popping out of nowhere, and wolf cubs vanishing into thin air (or thin fur). However, we can still fight the effects of this mischievous tech with our sharp visual instincts and critical thinking!
But it matters not. As Mark Twain once said:
A lie can travel halfway around the world before the truth can get its boots on.
Mark Twain (?)
Or maybe it was Thomas Franklin? Thomas Jefferson? Winston Churchill? Terry Pratchett? The fact that we can’t be too sure who actually said it may just be the best demonstration of this point. The larger the content-producing Populus, the more content is being published, the harder it is to investigate and sort out truth from fiction, disinformation, manipulation, conspiracy theories, and outright lies. Even if AI video generators stop improving from this point forward, and even if we, the visually-trained community, can continue to spot them, this matters not. Just think of the last time you saw a badly photoshopped image go viral. Photoshop has been around for over two decades and still, people are falling for the most obvious edits. By the time we, or others who are trying to regulate this field strap our boots on, it will be far too late.
How may generative AI affect the medium of motion capture?
The effect of generative AI that we have on the market has been thoroughly discussed both here at CineD and all across the web. Perhaps the most obvious victim is the stock footage business. Tools like Google Lumiere and OpenAI Sora will fundamentally disrupt this field’s business model. Lightricks’ LTX Studio may do the same with more complex projects because of its controllable and selectable nature. Various other tools will affect various other fields. While none will directly replace the director, DOP, or editor, most if not all will minimize the need for less experienced staff and in some cases will render hiring a professional obsolete. As seen in the recent Willie Wonka experience fiasco, anyone can now produce relatively high-end visuals with little to no regard for reality.
Alas, generative AI such as Sora and LTX studio will restructure much more than the market. It will disrupt the way visual information indicates truth and creates our perception of the world. It will alter the medium itself.
Generative AI will restructure much more
Sora, as well as other generative AI engines, creates realistically-looking visuals with little to no direct connection to reality. While the same can be attributed to CGI, it’s way less accessible. This mix of accessibility and endless visual possibilities is bound to send ripples throughout the medium. Given that most of our perception originates from visual information, I believe we will experience a fundamental restructuring of the way we experience the world.
Sora and the fabric of society
At this point, Sora is limited to silent 1-minute clips. LTX studio can generate audio and has much better control and selection abilities, but lacks the visual finesse Sora provides. A plethora of other tools and applications can support other facets of the cinematic creation, but at this point are still limited, both in terms of access (not publicly available) and in terms of subject matter (certain content is banned).
It all seems to be just a matter of time. Technological limitations tend to disappear over time, and AI timings are rather short in this regard. While certain fields of visual creation will soon suffer and even disappear, this is the least of our concerns. The deconstruction of the visual medium as an indicator of truth and facts will confront us, creators, and society as a whole, with a tectonic shift in the role of visual information. This may lead to both utopic and dystopic futures or anything in between.
Do you fear such a future or welcome this change? Are generative AI tools sitting comfortably at your disposal or do you still prefer good old authentic filmmaking? Let us know in the comments.