Have we reached the point where it’s time to dial down all the fear, uncertainty, and doubt around ChatGPT?
Researchers at Hoxhunt think so.
In a blog posted Wednesday, the cybersecurity company reported that human red teamers still significantly outperform ChatGPT when it comes to socially engineering humans to click on malicious email links.
A study sampling 53,000 email users in more than 100 countries found that professional red teamers generated a click rate of 4.2%, while ChatGPT-generated emails induced just a 2.9% click rate.
The researchers said the evidence supports the argument that — at least for now — humans remain superior at tricking other humans. It also found that users with higher security awareness displayed were better at spotting phishing emails by both human and ChatGPT-generated emails, with failure rates dropping from more than 14% with less trained users to between 2% to 4% with experienced users.
Pyry Avist, co-founder and CTO at Hoxhunt, said that while the study offers some encouraging news, humans are just marginally better at the task, and any advantage may dissipate in the future as more advanced models — like ChatGPT4, which was released the public on Tuesday — come onto the scene. The results from Hoxhunt’s study were taken from interactions with ChatGPT 3.5.
But over the long term, the ultimate difference may not be between human and machine, but those who learn how to leverage the technology in their phishing and those who don’t.
“This is a rapidly developing technology and it will reveal new capabilities to its operators all the time,” said Avist. “Moreover, the art and science of prompt engineering is developing. ChatGPT will yield better results as humans learn how to use it. The performance gap between human-only and AI-augmented phishing emails will likely narrow pretty fast.”
John Bambenek, a principal threat hunter at Netenrich, said ChatGPT helps people radically scale their writing activities. Instead of spending 20 minutes crafting a phishing email, Bambenek said they can use a specialized language learning model tool to do it for them.
“I am not sure the tactics will change much, just the output of phishing emails and web pages,” said Bambenek. “This means it will be much more important to get reputational tools for the web proxy and email filtering solutions that are using similar technology to find inauthentic messages quicker instead of relying on complaint-based detections.”
Matt Mullins, senior security researcher at Cybrary, said AI-generated content is still very much in its infancy, so it makes sense that red teamers outperformed AI. What’s interesting about phishing, and social engineering in general is that there’s a degree of “art” producing something that elicits a response, something that even sophisticated language learning models struggle with today.
“That’s why I believe [humans] did better with click rates: humans can think ‘what would I click on?.’ ChatGPT creates very template-like responses that are pretty consistent. When I think about the big wins I have gotten in the field, it wasn’t from a simple template, but from a very intelligently developed [and uniquely enticing] email that made users break out of their ‘good/bad’ paradigms of thought. The email made them stop and think ‘what is this?,’ which creates the grey space that social engineers love.”
Melissa Bischoping, director of endpoint security research at Tanium, said it’s exciting to think about what tools like ChatGPT and Bing Chat will mean to security researchers, the scientific community, and the larger society.
“There’s truly an art to security and a need to understand human behavior,” said Bischoping. “AI is not capable of putting itself in the shoes of an attacker or a victim, and in my experience, it can fall prey to coloring too neatly inside the lines. While AI is a powerful augmentation of intelligence, it will never be a replacement for it. “
For now, Avist said the language used by ChatGPT in response to an email prompt, from subject line to closing, is often correct in its syntax and grammar, but their results indicate that something about the tone is a bit off: it’s often too polite and formal.
“A more sneaky and manipulative attacker would push more emotional buttons, and seem a bit more like an actual person,” he said.