On December 6, 2023, Google announced its latest generative language model, Gemini. Prior to this, Google’s language models consisted of the PaLM and LaMDA models. This new language model is a multimodal AI that can understand text, images, audio, video, and codes. It understands objects, actions, and scenes in an image. It can process and understand speech and generate musical scores. The model can also create video clips when prompted.
DeepMind, the developer of Gemini, is a nonprofit artificial intelligence research laboratory founded in 2010 and was acquired by Google in 2014. The team achieved record-breaking success with their AI models, including AlphaZero, which introduced NNUE in chess engines, and AlphaFold, which predicted almost every possible 3D structure of protein molecules. With the new model, the tech company upgraded Google Bard to use Gemini Pro.
Let’s go hands-on with #GeminiAI.
Our newest AI model can reason across different types of inputs and outputs — like images and text. See Gemini’s multimodal reasoning capabilities in action ↓ pic.twitter.com/tikHjGJ5Xj
— Google (@Google) December 6, 2023
Capabilities and its future
The Gemini model has three variants. Ultra is the most capable and largest model for most complex tasks. The second is Pro, which is a smaller but “scalable” model compared to Ultra, meaning it’s mostly used for daily tasks. Finally, the third one is Gemini Nano which is the smallest and most efficient model for on-device tasks.
In their blog post, Google claims to achieve never-before-seen performance with Gemini. The company reported a 90.0% score with Ultra on the Massive Multitask Language Understanding test, a score that exceeds human scores. In comparison, GPT-4, OpenAI’s most advanced language model, scored 86.4% on this test. Google published a table showing the comparison between their new generative language model and GPT-4 on various tests, where the former scores more on almost every test.
🌟 Meet Gemini – our newest Foundation Model! 🚀
Gemini Nano is our most efficient model built for on-device tasks, which opens support for a range of important use cases. Learn how to build Android apps leveraging generative AI. ➡️ https://t.co/BPudPANMmt#BuildWithGemini pic.twitter.com/YO13EmvaR6
— Android Developers (@AndroidDev) December 6, 2023
Google also announced the Gemini Nano for Pixel 8 smartphones. The model will allow Pixel 8 users to summarize in the recorder app and use the smart reply feature in Gboard. Meanwhile, apart from the integration of the Pro variant for Bard, the company is planning to use Pro in many other Google products. Its API is currently available for developers and enterprise customers via Google AI Studio and Google Cloud Vertex AI. Aside from that, Nano will be available to implement in Android 14 via AICore. As for the Ultra version, it will soon be out for the public through Bard Advanced, which is anticipated to come next year.
YouTube: Gemini: Google’s newest and most capable AI model
Photo credits: The feature image is an example from Google DeepMind about this approach and has been made available through Unsplash.
Did this article help you? If not, let us know what we missed.