Prompt injection attacks, where individuals input specific instructions to trick chatbots into revealing sensitive information, are a growing risk as generative artificial intelligence (GenAI) adoption grows.
The attacks potentially expose organizations to data leaks, and GenAI bots are particularly vulnerable to manipulation, according to a study by Immersive Labs.
The company’s analysis revealed that 88% of participants in the Immersive “Prompt Injection Challenge” successfully tricked a GenAI bot into divulging sensitive information at least once, and 17% succeeded across all levels of the increasingly difficult challenge.
The report found that GenAI bots can be easily deceived by creative techniques, such as embedding secrets in poems or stories or altering initial instructions to gain unauthorized access to sensitive information.
Dr. John Blythe, director of cyber psychology for Immersive, said the study’s most surprising finding was that GenAI couldn’t outmatch human ingenuity.
“As the security measures for GenAI became more stringent, users still devised techniques to bypass the bot and successfully reveal the password,” Blythe said. “The challenge showcased people’s problem-solving skills, cognitive flexibility, creativity, and psychological manipulation in their attempts to exploit the bot.”
This suggests that when motivated to exploit GenAI, the barrier to exploiting GenAI might be lower than previously believed. “Even non-cybersecurity professionals could leverage their creativity to trick the bots and potentially expose organizations to risk,” Blythe said.
Defending Against Prompt Injection Attacks
The industry isn’t oblivious to the challenges.
Matt Hillary, CISO at Drata, said challenges and concerns for security in Drata’s AI projects have largely been around the effective sanitization of—or explicit authorization from customers to use—data to train capable models, the informed selection of models.
Other goals include protecting against common security exploits including prompt injection attacks focusing on sensitive data exposure. That’s all while protecting the underlying infrastructure and systems using traditional security approaches. “A large amount of effort is spent on understanding and threat-modeling the architecture, design, and implementation of AI capabilities in our own products to identify and build-in risk mitigations during the build process,” he said.
Daniel Schwartzman, director of AI product management at Akamai, said when the company looks at using a generic LLM with retrieval-augmented generation (RAG), the risks include prompt injections, insecure output handling, and RAG data poisoning. “A prompt injection can be mitigated with a dedicated model that is trained to identify prompt injection attempts, blocks them and then returns an error instead of passing the prompt to the LLM,” he explained. A structured output and monitoring LLM interactions to detect and analyze potential injection attempts are two other ways to mitigate this risk, he added.
Organizations must incorporate security controls into their GenAI systems to mitigate prompt injection attacks. “This involves striking a balance between cached responses, which allow for enhanced security scrutiny, and streaming responses, which offer real-time adaptability,” Blythe said. Effective measures include data loss prevention checks, input validation, and context-aware filtering to prevent and detect manipulation attempts in GenAI outputs.
Data Leakages and Diminished GenAI Trust
Data leakage is a significant issue. It can occur when an LLM’s responses accidentally reveals sensitive information, proprietary algorithms, or other confidential details. According to the Open Source Foundation for Application Security (OWASP), this can result in unauthorized access to sensitive data or intellectual property, privacy violations, and other security breaches.
“This risk can be mitigated by instructing the LLM to return very specific and concise data in JSON format as a response,” Schwartzman said.
Other precautions include anonymizing the training data so that it doesn’t expose sensitive information, conducting regular audits of the LLM’s responses, and monitoring and logging LLM interactions to detect and analyze potential data leakage incidents.
Security awareness and education has helped to embolden application development and infrastructure teams to ensure that security practices – including OWASP Top 10 for LLM — are incorporated into the build and deployment of these capabilities, Hillary said. Security assessments by external penetration testing specialists contribute to improving the AI models, their supporting infrastructure and implementation so that the LLMs can withstand common security exploits against AI—including prompt injection.
Photo credit: Nika Benedictova on Unsplash