- Google last week asked staff to spend two to three hours helping train its Bard chatbot.
- Chatbots like Bard and ChatGPT learn to mimic humans by ingesting bodies of written work.
- Google wants to prevent Bard from acting emotional or providing confusing responses.
Last week, Google internally kicked off “dogfighting,” where employees across the organization were asked to spend two to four hours helping test Bard, its new artificial-intelligence chatbot for search.
The unveiling of Bard came shortly after Microsoft announced a revamped version of its Bing search engine that incorporates the ChatGPT bot. It allows users to have a back-and-forth dialogue on just about any subject. Google took a slight reputational hit after it was discovered that Bard answered a question incorrectly. Similarly, as more people have tested the new Bing, they’ve run into problems with that engine’s bot, like its propensity to behave combatively.
Bots like Bard and ChatGPT work by getting trained on text written by humans so they can mimic them. That explains why Bing may sound somewhat emotional and unpredictable — a bot trained to act human will do so, faults and all.
These bots initially do much of their learning by ingesting large sets of training data. In addition, Bard’s product lead, Jack Krawczyk, told staff in a memo that the company’s own work found that adding high-quality responses to user queries “dramatically” improved the AI model’s quality.
AI experts told Insider how Googlers might write the high-quality responses for Bard to improve its model. These experts have completed extensive study in the fields of AI and large language models.
Bots can learn in different ways
Krawczyk told staff to ask Bard questions about areas in which they had domain expertise, such as a favorite hobby. Then they were asked to evaluate Bard’s answers to ensure they were what one would expect and of a reasonable length and structure. If an answer was too humanlike, factually wrong, or otherwise didn’t make sense, employees could rewrite the answer and submit it to help train Bard’s model.
To refine Bard, Google could implement a combination of supervised and reinforcement learning, Vered Shwartz, an assistant professor of computer science at the University of British Columbia, said.
Supervised learning is the first step, where the chatbot is fed human-written queries and responses until it learns how to write like a human. The company could decide to layer a reinforcement-learning model on top that would be trained with answers written by Googlers to help it understand which values the company wanted Bard’s answers to exhibit, whether it be in terms of structure, tone, or other qualities.
That model would look at answers Bard produced, rejecting the bad ones and validating the good ones until the chatbot understood how it should behave. Essentially, “good” answers from Googlers would fine-tune the model.
The reinforcement model could teach Bard to be informative without speaking about emotions or otherwise pretending to be human. The first model learns fundamental writing skills, while the second would steer responses in the desired direction.
With enough-good answers to analyze, the reinforcement model would be able to learn what’s appropriate and what’s not, Zhou Yu, a computer-science professor at Columbia University, said.
Factual accuracy
Google has been cautious about its rollout of chatbots, likely because of the near-term influence it could have on search margins and concerns of accuracy. It told employees to reject responses to questions where Bard tried to provide a user with advice on sensitive topics like finance or health, as the risk of incorrect answers is high.
The industry has been working to address factual accuracy, with OpenAI releasing an update in January to improve its factuality across a variety of topics. At a conference about chatbots and AI in San Francisco this month, Anthropic CEO Dario Amodei said he believed chatbots would stop making up facts as the models improved.
While training will improve the quality of generated answers, Shwartz said she didn’t think it would completely solve the issue of factual accuracy. Bard and ChatGPT have a tendency to “hallucinate,” a term the industry has adopted to say the bots make things up. They will pull content from web pages and summarize them incorrectly at times.
“Bots are trained to produce humanlike text, not to be truthful,” Shwartz said.