
ZDNET’s key takeaways:
- World models could help to advance AI research, entertainment, etc.
- Genie 3, Google DeepMind’s world model, debuted on Tuesday.
- Google DeepMind says Genie 3 has an “understanding” of the world.
Imagine exploring a virtual environment without boundaries, where everything you see looks and behaves just as it would in reality.
This is precisely what many tech developers today are working to create through AI “world models,” or algorithms that can build and act upon internal, representative models of the real world, imitating the human brain’s ability to make predictions about the behavior of physical objects.
Also: Google’s Veo 3 can now create an 8-second video from a single image – how to try it
World models like Google DeepMind’s new Genie 3 could have huge ramifications for AI agents, robotics, entertainment, education, and many other fields.
Here’s a look at what AI world models are, how they work, and why they matter.
What are AI world models?
Just as you’re able to imagine sunlight illuminating the fixtures of your living room, or the effect that a stone dropped into a still pond will have on the water’s surface, an AI “world model” can do more than just string words together or generate a lifelike image. It can make accurate predictions about the real world based on an ability to reason about how the basic physical mechanics of the world actually work.
This has particularly important implications for the field of AI-generated video. It’s one thing for a model to watch millions of videos of a glass falling to the floor and shattering, and using that as a basis to generate new videos of the same event. It’s another for a model to intuitively grasp the physics of gravity, the distance that broken glass should scatter on carpet versus a tile floor, and the fact that a human hand carelessly touching one of those shards could lead to a wound and bleeding.
This has become the latter goal of major AI developers: AI world models that don’t just mimic scenarios, but can actually predict a virtually infinite number of new ones.
OpenAI’s Sora, for example, which was unveiled in February of last year and was an early example of a world model, shocked the AI community with its ability to simulate real-world physics, such as light reflecting off pools of water on a simulated street.
Genie 3
Genie 3 is another illustrative example of the power of a world model.
From a simple natural language prompt, Genie 3 can generate dynamic simulations of virtual environments that evolve and change in response to a user’s actions. (Its predecessors, Genie and Genie 2, debuted last year in February and December, respectively.)
Also: You can turn your Google Photos into video clips now – here’s how
Unlike classic video games, which come with clearly bounded virtual spaces, world models like Genie 3 are able to expand their simulated environments as users interact with them.
“You’re not walking through a pre-built simulation,” a narrator says in a demo video introducing Genie 3. “Everything you see here is being generated live, as you explore it.”
Genie 3 comes with a feature Google DeepMind is calling “world memory,” which allows the model to represent changes that persist across time in the simulated environments. In the demo video, for example, a user is shown painting a wall with a paint roller; when they turn away and then direct their gaze back at the wall, the marks they made with the roller are still visible.
If you find yourself feeling bored while exploring a simulated environment, you can shake things up by prompting Genie 3 to cause an event. Something like: “A man on horseback carrying a bag full of money is being chased by Texas rangers, who are also riding horses. All of the hooves are kicking up huge plumes of dust.”
“We’re excited to see how Genie 3 can be used for next-generation gaming and entertainment,” the narrator says in the demo video, “and that’s just the beginning.”
Why do world models matter?
As the narrator in the Genie 3 demo video suggests, world models could have valuable applications beyond helping to generate more realistic, dynamic, and interactive forms of entertainment.
For example, they could help the AI industry build embodied agents that can navigate and interact with the real world. (This has been the challenge that the autonomous vehicle industry has been trying to overcome since its inception, largely without success.)
Also: This new AI video editor is an all-in-one production service for filmmakers – how to try it
They could also be used to simulate what the Genie 3 demo describes as “dangerous scenarios,” such as the scene of a recent natural disaster, to help first responders prepare for actual emergencies. Coupled with virtual reality headsets, immersion into world models could also help first responders to build muscle memory so that they can be better equipped to act calmly under duress.
Education could also benefit from the use of world models, especially in the case of students who are more receptive to visual information.
Do world models really “understand” the real world?
Trained on copious amounts of real-world data, algorithms gradually refine their ability to make predictions. Eventually — in a process that researchers are still working to understand — they can become so adept at this that, for all intents and purposes, we can say that they seem to “understand” some aspects of the world, such as the syntax of the English language or the physics of human body movement.
In its blog post, Google DeepMind defined world models as “AI systems that can use their understanding of the world to simulate aspects of it, enabling agents to predict both how an environment will evolve and how their actions will affect it.”
Also: This interactive AI video generator feels like walking into a video game – how to try it
The use of the word “understanding” in this context is controversial, however; some experts argue that AI can only reproduce patterns and, therefore, could never understand a concept in the way a human being can, while others take the opposite view, claiming that perhaps human understanding is nothing more than a sophisticated kind of pattern recognition.
If you blindfolded yourself and tried to walk through every room in your house, you could probably do so without injuring yourself or breaking something (assuming you’ve lived there a while). Similarly, today’s AI models are able to explore latent spaces of information in a manner that seems, at least to us humans, like they know the lay of the land.
Get the morning’s top stories in your inbox each day with our Tech Today newsletter.