The latest advances in artificial intelligence (AI) that have been presented this past week aren’t trivial. So, where are things heading?
Demis Hassabis, director of Google DeepMind, the company’s research division, is clear in his response: “Towards artificial general intelligence (AGI), which equals or surpasses human intelligence.”
Over the last decade, we’ve had devices — such as the virtual assistants Siri, Alexa, Hey Google or Bixby — with limited capabilities to respond and execute actions on linked devices. On the other hand, since the end of the last century, search engines have helped find answers and provide offers to user demands. But both have their days numbered. Advances in the field of AI have begun to dig the graves for both services, in order to unify them into a single platform that’s capable of dialoguing like a human, analyzing documents (text, images or videos) in different domains, offering complex answers and solutions, and executing them on behalf of the user. The search engine and the virtual assistant are going to become a single tool or a “super-competent colleague” according to Sam Altman, the CEO of OpenAI, that will be present in all aspects of our lives.
The new developments represent a crucial step in artificial intelligence. Until now, we had AI tools such as voice assistants (Siri or Alexa), which already understood natural language, or applications that could convert a text request into images or videos (Sora), or chatbots that created text, or summarized documents and meetings (ChatGPT). The evolutions released in recent days by both Google and OpenAI do much more. A document from DeepMind and a dozen universities and entities defines them as “artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user — across one or more domains — in line with the user’s expectations.”
According to the latest research, the key is in the use of natural language, which facilitates interaction with the machine. This is in combination with the machine’s autonomy to develop plans and carry them out “on behalf of the user,” in its transversality in terms of sources that can be used, as well as in its ability to contextualize the user’s situation to meet expectations. AI is no longer a simple tool: it’s a complex robot that knows who it’s talking to and what results it should produce. “[Artificial general intelligence] will have a profound impact on our individual and collective lives,” the document warns.
Aside from the innumerable ethical problems that this same work identifies, the transition that’s now beginning has an immediate technological consequence: the AI tools that we’ve been using become obsolete in a singular way. The evolution of AI has begun to bury the search engine and the virtual assistant as we’ve known them until now.
The conventional search engine is dying
Larry Page and Sergey Brin founded Google in 1998, after meeting at Stanford University and publishing The Anatomy of a Large-Scale Hypertextual Web Search Engine. With this research, they developed a search engine that, in just one year, began to register 3.5 million queries a day. It’s now the most-used search engine in the world, with more than 3.5 billion daily information requests. But in this ephemeral era, the Google search engine (as we know it) is beginning its decline. Sergey Brin himself acknowledged this process last Tuesday, at Google’s headquarters in Mountain View, California.
“Over the last 25 years, we’ve invested a lot in the search engine. But we need to think about how we can meet the new needs of users… and I really think that [we can achieve this via] Google Gemini,” says Liz Reid, the new head of Search at Google, tasked with ushering in a new era. Google Gemini is the firm’s AI chatbot tool, a competitor to Open AI’s ChatGPT.
Reid explains that the conventional system — which, she acknowledges, has been an “incredibly powerful tool” — requires “a lot of work” to maintain. The traditional search function (known as “googling”) for a restaurant or any other service near the user’s location offers a map of addresses and a list of web pages. The information seeker has to complete the process by scrolling through each page, one by one. Of course, those that pay to be at the top of the results get to be seen first. A Google user can also narrow down a search and specify what type of food or specific service they require, until they get a more precise list of websites.
This tedious process – apparently – is coming to an end. “We radically changed how it works,” says Sundai Pichai, Google’s CEO. And Reid concurs: “We’ve built a custom Gemini model, designed specifically for Search, that combines our real-time information with unprecedented quality and classification systems.”
The directive assures that, in the tests carried out, “people also click on a greater diversity of websites.” This trend will compel the modification of SEO strategies (Search Engine Optimization), which are the techniques utilized to improve the position of a website in search engine results. Now, firms will have to adapt to how artificial intelligence processes information.
The new search engines plan, reason, understand the user’s context and — at the person’s request — can execute the purchase order for a product or service, suggest add-ons, or make a reservation.
Understand, see, reason and plan
The new search engines have the ability to reason in steps. In this way, they can ask you for a weekly menu that, based on your interactions with them, or with the details provided, will adjust to your tastes. The next step may be to adapt it to a vegetarian and modify the already-selected recipes to this preference.
The new AI chatbots are also able to plan ahead. The user can ask them to schedule a trip anywhere, taking into account activities for children and adults. Requests can be tailored based on the user’s interests, whether they’re a nature lover, or passionate about culture.
Likewise, these new chatbots can make purchases based on images. The user only has to upload a video and indicate, by voice or text, which character is wearing the item they’re looking for. The results will subsequently show locations, prices and availability. Or, a user can simply circle an image of the piece of clothing in question with their finger. Or the user might provide the sequence of an appliance malfunction, and the chatbot can offer information regarding repair services.
Google has been the first to launch its new search engine. However, other companies such as Microsoft are advancing with Copilot, which has similar features, along with the support of OpenAI.
AI Assistants
OpenAI, which has served as a driving force in the rapid development of artificial intelligence, still hasn’t presented a similar product. However, the firm has made inroads with other major advances this past week. Back in April, on the other hand, Google already presented its next-generation AI assistants to its customers who use Google Cloud, which houses a large part of developers’ work. These new assistants are responsible for making tools such as Siri, Alexa or Hey Google obsolete.
On May 13, just 24 hours before Google announced Project Astra — its vision for the future of AI assistants — OpenAI presented ChatGPT-4o, a conversational robot that’s also capable of seeing, hearing, solving and executing tasks on behalf of the user. It will be accessible through the web and mobile application for free, although the most advanced version — five times more capable — will cost $20 per month.
Sam Altman, the CEO of OpenAI, agrees with DeepMind that these new assistants will revolutionize lives, according to comments he made to MIT Technology Review. Altman also feels that AI tools that generate images through voice or text requests, such as DALL-E or Sora, or those that generate text via cues, such as the first versions of ChatGPT, have been just that: simple tools. He thinks that, while they’re used for isolated tasks, they have limited capacity to learn about us from our conversations with them.
The advanced AI assistant, according to Altman, is capable of helping us outside the chat interface and taking real-world tasks off our hands. He describes the previous generation of AI tools as a “super-competent colleague that knows absolutely everything about my whole life, every email, every conversation I’ve ever had… but doesn’t feel like an extension [of me].”
Chirag Shah – a professor at the University of Washington who has no ties to the large firms that are leading these AI developments – concurs with this assessment. “This [new AI assistant] really knows you well… it can do many things for you and can work on multiple tasks and domains,” he told the MIT Technology Review.
Google’s new assistant is called Astra. It will be fully operational at the end of 2024. Its most advanced version will be available in the AI Premium version of Google One and will be offered for free for two months. Subsequently, it will cost users about $20 per month. Google is working to include this assistant on its mobile phones, smart glasses and other devices. “We’re open to all formats… but if OpenAI maintains a limited free version, we may have to do the same,” a Google executive admits to EL PAÍS, during an interview in the United States. They ask not to be identified.
Astra combines the new capabilities of the Gemini search engine with humanized robotic abilities, such as empathy and senses (hearing and vision) to analyze and record context. This is in order to respond to any interactions with users that are related to what passes through the camera and microphones of a device.
Demis Hassabis explains further: “We’re processing a different flow of sensory information. These [AI assistants] can see and hear better what we do, understand the context in which we find ourselves and respond quickly in the conversation, making the pace and quality of the interaction [between the user and their device] much more natural.”
With these abilities, Astra and ChatGPT-4o are able to accompany the user from their mobile phone. They can learn the context in which an interaction is occurring, while they simultaneously answer a specific question, solve a mathematical problem, identify a page of code, or explain where we may have left a missing object.
The principal use of these spectacular capabilities would take place in the workplace or the home. Users, for instance, can ask these next-generation AI assistants to identify expenses — such as a pending insurance payment or an electricity bill — analyze them, display them on a spreadsheet and identify savings options.
The assistant will sift through emails, stored documents and any file or website that contains information about expenses (so long as access is granted by the user). This will all be organized and summarized, while the webpages of the providers of the services in question will also be consulted. The virtual assistant will subsequently propose a plan of savings and — if the user demands it — the AI, for instance, will execute renewals or cancellations of subscriptions.
There are numerous applications for these next-generation AI assistants. The objective is to have them support users across the board. Minsu Jang, the lead researcher at the Social Robotics Laboratory at South Korea’s Electronics and Telecommunications Research Institute, is working on the development of AI for task planning: “We plan to research and develop technologies that can predict task failures in uncertain situations and improve the response to the human when they ask for help. This technology is essential to achieving the era of one robot per home.”
Advantages and risks
The work done by a dozen universities and entities and commissioned by DeepMind examines the new developments in the field of artificial general intelligence, while identifying their advantages and risks. The experts highlight that these advanced assistants can “empower users” to achieve their goals or well-being, or act as “mentors, friends or trusted advisors.” In this sense, a new study recently published in the Journal of the American Medical Informatics Association (JAMIA) reveals how AGI is capable of responding to different emotional states.
In an evaluation of ChatGPT, Gemini and Llama (Meta), the University of Illinois has shown the importance of this skill. “[Next-generation AI assistants] can, for example, help increase user awareness about healthy behaviors, bolster their emotional commitment to change and [make users] realize how their habits could affect the people around them,” explains Michelle Bak, a researcher in the field.
These assistants can also help users make better-informed decisions, or help them develop their creativity, personal training, or problem-solving. They can also help them schedule more efficiently, to provide more time for personal activities and familial relationships.
But this utopian world also has its shadows. This past week, there was a resignation in the OpenAI security department. Jan Leike, one of the firm’s leading researchers, has been blunt, writing on social media that the company values the creation of new products more than security. While walking out the door, he offered a harsh reflection on X: “Building smarter-than-human machines is an inherently dangerous endeavor.”
AI assistants could also violate privacy and be potentially unsafe if they return an incorrect or even harmful answer to the user, despite it having the appearance of truth. Hallucinations, as these errors are called, are common in existing tools.
In the same way, these next-generation AGI assistants can respond to the particular interests of the developers and limit the responses only to their objectives, which may be monetary, and not those of the users. They may also be programmed to prioritize the benefit of an individual user over the consequences that could be felt by a community.
Furthermore, these virtual assistants can impose values on society, creating certain currents of opinion, or be used for malicious cyberattack campaigns. “We’ve investigated 36 parliamentary, regional and presidential elections held between September of 2023 and February of 2024 and discovered that, in at least 10 cases, videos and audio recordings with voice cloning were used for disinformation campaigns. In the context of the [upcoming] European elections, we can expect a new wave of deception in all countries,” explains Sergey Shykevich, director of the threat intelligence group at Check Point Research.
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition