Google’s generative AI-powered chatbot Bard is now rebranded as Gemini, the tech giant announced on Thursday.
The updated version of Bard will be called Gemini Advanced, which you can access using mobile apps on both Android and iOS.
The chatbot has been reconstructed, providing consumers and enterprises with the first multimodal generative AI platform in the industry that doesn’t rely solely on text to generate human-like responses.
Google is also set to release Gemini Ultra, the advanced tier for the underlying AI language model powering the chatbot.
Google Now Leads the GenAI Race, Says Expert
Gartner, Vice President analyst Chirag Dekate, described Gemini as “a really big deal”, pointing out that it is currently the only native multimodal generative AI model available.
When backed by a multimodal model, a single generative AI engine is capable of performing individual tasks with improved accuracy. This is because it allows the engine to learn from far more resources — something that has now put Google ahead of its rivals in the genAI race.
Google’s efforts to take the lead in the generative AI race received a major boost in December 2024, when the tech giant unveiled its Gemini AI model for the first time.
After OpenAI launched ChatGPT, Google rushed to launch Bard as a counterweight in February last year. However, OpenAI was still ahead of Google for a long time, with ChatGPT continuing to prove more powerful.
Microsoft’s Copilot AI, which is based on the same large language model (LLM) as ChatGPT, happens to be one of Bard’s staunchest rivals. However, Dekate believes that “Google is no longer playing catch-up. Now, it is the other way around”.
Google emphasized the model’s multimodal capabilities, which enable it to put together various types of information, such as text, code, images, audio, and video for inputs and outputs.
Other major AI engines such as Google’s own PaLM 2, OpenAI’s GPT, and Llama 2 from Meta are LLM-only, which means they can only be trained on text.
Dekate compared multimodality to watching a movie, which would include watching the video, listening to the audio, and reading text from the subtitles at the same time. LLM-only models, on the other hand, are more like experiencing a movie by only reading a script, he explained.
Gemini AI’s multimodality potentially creates a hyper-immersive and personalized experience. Dekate added that Google holds the potential to change the marketplace if it can allow enterprises and consumers to experience it.
While LLMs are good enough for simple text-to-text tasks, more diverse and complicated ones call for multimodal models.
For instance, a healthcare company can use a multimodal genAI engine to create a chatbot that can take inputs from MRI video scans, radiological images, and doctor’s audio snippets. This would significantly augment the accuracy of the prognoses and treatment outcomes.
2023 witnessed the emergence of task-specific AI models, such as text-to-text, text-to-image, text-to-video, image-to-text, and more.
Dennis Hassabis, CEO of Google’s Deepmind, highlighted the versatility of Gemini and how it fared excellently for various applications.
Around the time the training of Gemini AI was coming to an end, the Deepmind team working on it discovered that it already surpassed all other AI models on several major benchmarks.