 
	
		Artificial intelligence (AI) chatbots are “highly vulnerable” to prompts for harmful outputs, new research has found.
The research by the UK AI Safety Institute (AISI) revealed “basic jailbreaks” can bypass large language models (LLMS) security guardrails.
Testing five LLMs, the study saw models issue harmful content even without “dedicated attempts to beat their guardrails”.
These security measures are designed so to avoid the model issues toxic or dangerous content even when prompted to do so.
“All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” AISI researchers said.
The team used questions from an academic paper as well as their harmful prompts, which included asking the system to generate text on convincing users to commit suicide to an article suggesting the Holocaust didn’t happen.
Although the tested models remain unnamed, the government has confirmed they are in the public use.
The announcement comes after LLM developers have pledged to ensure the safe use of AI. Last year, OpenAI, which owns ChatGPT, announced its approach to AI safety and said it does not permit its technology to be used “to generate hateful, harassing, violent or adult content”.
Meanwhile, Meta’s Llama 2 model has undergone testing to “identify performance gaps and mitigate potentially problematic responses in chat use cases” and Google has said its Gemini model has built-in safety filters to prevent toxic language and hate speech.
Other findings showed the LLMs hold expert-level knowledge of chemistry and biology but struggle with cybersecurity challenges aimed at university students.
The LLMs also struggled to complete sequences of actions for complex tasks, without human oversight.
The research comes as the UK is set to co-host the second global AI summit in Seoul.
The first summit was held at Bletchley Park in November and saw the first-ever international declaration to deal with AI.
The AISI also announced plans to open its first overseas in San Francisco, where major AI firms such as Open AI and Anthropic are based.
Holyrood Newsletters
Holyrood provides comprehensive coverage of Scottish politics, offering award-winning reporting and analysis: Subscribe
 
					 



 
								 
								 
								 
								