A 22-year-old college student has developed an app which he claims can detect whether text is written by ChatGPT, the explosive chatbot raising fears of plagiarism in academia.
Edward Tian, a senior at Princeton University, developed GPTZero over a summer break. It had 30,000 hits within a week of its launch.
Tian said the motivation was to address the use of artificial intelligence to evade anti-plagiarism software to cheat in exams with quick and credible academic writing.
His initial tweet, which claimed the app could “quickly and efficiently” detect whether an essay had been written by artificial intelligence, went viral with more than 5m views.
I spent New Years building GPTZero — an app that can quickly and efficiently detect whether an essay is ChatGPT or human written
— Edward Tian (@edward_the6) January 3, 2023
Streamlit, the free platform that hosts GPTZero, has since supported Tian with hosting and memory capabilities to keep up with web traffic.
To determine whether text was written by artificial intelligence, the app tests a calculation of “perplexity” – which measures the complexity of a text, and “burstiness” – which compares the variation of sentences.
The more familiar the text is to the bot – which is trained on similar data – the likelier it is to be generated by AI.
Tian told subscribers the newer model used the same principles, but with an improved capacity to detect artificial intelligence in text.
“Through testing the new model on a dataset of BBC news articles and AI generated articles from the same headlines prompts, the improved model has a false positive rate of < 2%,” he said.
“The coming months, I’ll be completely focused on building GPTZero, improving the model capabilities, and scaling the app out fully.”
Toby Walsh, Scientia professor of artificial intelligence at the University of New South Wales, wasn’t convinced.
He said unless the app was picked up by a major company, it was unlikely to have an impact on ChatGPT’s capacity to be used for plagiarising.
“It’s always an arms race between tech to identify synthetic text and the apps,” he said. “And it’s quite easy to ask ChatGPT to rewrite in a more personable style … like rephrasing as an 11-year-old.
“This will make it harder, but it won’t stop it.”
Walsh said users could also ask ChatGPT to add more “randomness” into text to evade censors, and obfuscate with different synonyms and grammatical edits.
Meanwhile, he said each app developed to spot synthetic texts gave greater ability for artificial intelligence programs to evade detection.
And each time a user logged on to ChatGPT, it was generating human feedback to improve filters, both implicitly and explicitly.
“There’s a deep fundamental technical reason we’ll never win the arms race,” Walsh said.
“Every program used to identify synthetic text can be added to [the original program] to generate synthetic text to fool them … it’s always the case.
“We are training it but it’s getting better by the day.”
Users of GPTZero have cited mixed results.
“It seemed like it was working on – and it does work for texts which are generated by GPT models entirely or generated with semi-human intervention,” one subscriber wrote.
“However … it does not work well with essays written by good writers. It false flagged so many essays as AI-written.
“This is at the same time a very useful tool for professors, and on the other hand a very dangerous tool – trusting it too much would lead to exacerbation of the false flags.”
“Nice attempt, but ChatGPT is so good at what it does,” another subscriber wrote.
“I have pasted in roughly 350 words of French … mostly generated by ChatGPT. The text is slightly manually edited for a better style, and generated with a strong, enforced context leading to the presence of proper nouns.
“That text passes the GPTZero test as human … I am not totally convinced that proper human-AI cooperation can be flagged.”