With AI improving at a rate that has surprised everyone, even experts in their own field, how do we cope with the future? What steps can we take to ensure we’re not swept away by a tsunami of rapid change?
To AI experts and followers, February 15, 2024, the day OpenAI revealed Sora, will be indelibly etched into their memories. It was that day they realised that technology had become impossible to predict.
Sora turns a short text prompt into a cinematic-looking video, and it has achieved this in perhaps a tenth of the time we expected. Only what seemed like seconds before, we were recoiling at the grotesque results of the previous model of generation that gave us Will Smith eating spaghetti.
Cinematic-looking text-to-video is one thing, but Sora means much, much more than that.
A Sora point
For a start, Sora gives us startlingly better realism than we have seen before in synthetically generated humans and animals. There are still issues – especially when it comes to limbs and digits – but there’s little doubt that that’s a temporary shortcoming. For decades, CGI practitioners have been trying to make realistic human models. It’s hard enough to do it with still images; it’s still harder with moving ones. There’s a list as long as a real, human arm of lighting and physical phenomena that have to be emulated just right to get anywhere near real-world realism. For realistic skin, you have to have a mechanism to emulate phenomena like Sub-surface scattering (SSS), which describes how light penetrates the surface of a material, scatters through it, and exits at a different location, buffeted by blood vessels and other bodily paraphernalia.
And yet, Sora somehow manages to do this without “understanding” the phenomenon. It just does it. That’s incredible.
It also seems to do physics. Objects bounce off each other as if Sora understands their physical make-up, light reflects convincingly, and waves lap on the shore like waves lapping on the shore do. Some people speculated that Sora must be connected with Unity or Unreal game engines, but as far as I know, they’re not. Sora has made its own “world model”, limited though it may be, without any explicit training. It seems that it has inferred the nature of the physical world through its generalised training consisting of images, not instructions.
Incredible though Sora is right now, it will only get better. This is as primitive as it will ever be.
You could compare it with the first-ever powered flight at Kitty Hawk, North Carolina, in 1903. The epochal proof of concept was, by today’s standards, underwhelming but no less significant for that. But look how far it has come. Only sixty-six years later, the world’s first supersonic airliner, Concord, took off from Toulouse, France. That’s an incredible rate of progress, and yet it is nothing compared to what we are seeing now with AI. We have just experienced the equivalent of Kitty Hawk to Concord in perhaps five years, and in reality, Spaghetti Carnage to Sora in little over one year.
The rate of progress will only increase, which makes it impossible to predict anything over a month into the future if even that. (It sometimes feels like I am likely to miss something sensational if I don’t check my Twitter AI contacts in the morning and the afternoon)
All of this poses multiple questions for every supposed answer. Will something like Sora ever master body language and complex facial expressions? Will it ever fool us into thinking we’re watching real actors? If, as seems likely, we can create an entire movie from a written script, who—or what—owns the rights, and to which aspects of the “production”?
Future shock revisited
While we are asking these questions, we can expect even more developments. How do we cope with this?
I’d like to propose, modestly, that we refer to Shapton’s Law, which states that we are in the foothills of a technological singularity when even experts in their own field are surprised by the rate of technological progress.
I would hesitate to say that I am an expert in AI, but I know others who are and who, like me, are currently surprised at the rate of progress. Sora is like an explosion. The entire creative industry is talking about it. In a remarkably short period of time, something that I thought would take several years has burst on the scene with much better performance than I—or virtually anyone outside of OpenAI—would have predicted. It’s transformational. And it’s disruptive.
Understandably, many commentators have seen Sora as merely a software package, saying, “It’s a very good app, but it can’t do this, and it often gets that wrong”. But that’s not seeing the wood for the trees. This is not about an app. It’s about a capability. It is about the fact that this is possible now in this world. AI is not merely an app. It will become part of our daily fabric. It will flow into our lives like water from a tap or electricity from a wall socket.
We need to widen our cone of expectation. What does that mean? We typically have a very narrow field of vision for future events. Until now, that’s been OK because things have moved slowly enough that we know almost exactly which direction to look for them on the technological horizon.
But now – today – we have almost no way to know where the next big advance will come from or what form it will take. The best we can do is at least be looking in the right direction so that the distant, dim sunrise happens in front of us, and not behind us, unnoticed.