Key Takeaways
- Veo generates high-quality, consistent video with a focus on cinematic styles and natural language input for creators.
- Imagen 3 improves image generation by rendering text more accurately, interpreting longer prompts, and generating images in a wider range of styles.
- Both Veo and Imagen 3 are available for select creators in private preview, showcasing Google’s advancements in AI models for video and image generation.
It’s a big day for AI at Google I/O, and on top of all the buzz around Gemini 1.5 Pro, Google’s DeepMind lab came out swinging with some new AI models for video and image generation. The new image generation model is Imagen 3 and packs some big improvements over previous models, while its equivalent for video is called Veo.
Related
Project Astra is Google’s answer to GPT-4o, powered by Gemini and coming to your Google Pixel
Google is taking OpenAI seriously.
Veo generates high-quality, consistent video
Veo is the more impressive of the two models, if nothing else because video generation is more recent and improving more quickly. Google’s Veo is going up against OpenAI’s Sora, which was also very impressive, and it promises to deliver 1080p high-quality video, with a big emphasis on consistent results. Google says it can generate videos in a “wide range of cinematic and visual styles”, and it even understands terms like “timelapse” so creators can create all kinds of shots using natural language.
Google also highlights how Veo learns from years of generative models to understand what’s in a video and simulate real physics to produce more realistic results. To showcase the potential of Veo, Google worked with Donald Glover to create a project using the new model, featuring all kinds of shots that honestly could pass for real footage.
This tool is available today for select creators in VideoFX in private preview.
Imagen 3 ups the ante for image generation
On the still image side of things, Google introduced Imagen 3, the latest version of its image generation model capable of producing realistic images with more detail and fewer artifcats than before. One of the big improvements in IMagen 3 is that it does a much better job at rendering text, something that’s been one of the telltale signs of an AI-generated image in the past. Now, you should actually get readable text more consistently.
Imagen 3 also does a better job interpreting longer prompts, incorporating even smaller details mentioned in those prompts. You can describe elements of the foreground and background with additional detail and imagen 3 can still generate output that meets all the criteria in your prompt. Plus, it can generate images in a wider range of styles thanks to the advanced capabilities. As an example, the image above used the following prompt:
A weathered, wooden mech robot covered in flowering vines stands peacefully in a field of tall wildflowers, with a small bluebird resting on its outstretched hand. Digital cartoon, with warm colors and soft lines. A large cliff with waterfall looms behind.
Imagen 3 is also avaialble today for select creators in ImageFX, and it will be coming soon to Vertex AI.