AI Made Friendly HERE

What the Wisconsin CSAM incident tells us about prompt-blocking

A Wisconsin man was arrested in May 2024 on criminal charges related to his alleged production, distribution, and possession of AI-generated images of minors engaged in sexually explicit conduct and his transfer of similar sexually explicit AI-generated images to a minor.

An incident surfaced in Wisconsin, US, wherein a man was reported to have used generative AI to create child sexual abuse material (CSAM) for distribution among minors. The news was surprising not solely because of the nature of the crime but also considering the accused used AI to create the offending material – something that if AI companies are to be believed should have been prevented by the safety and security measures around AI.

Charges against the accused

The man was charged for producing, distributing, and possessing obscene visual depictions of minors engaged in sexually explicit conduct and transferring obscene material to a minor under 16 years. These images were made using the text-to-image generative artificial intelligence (GenAI) model called Stable Diffusion, produced by Stability AI. According to the judgement, law enforcement authorities caught the accused boasting to a 15-year-old about creating these images using a GenAI and sending the same in direct messages on Instagram.

Manipulation of prompts to get CSAM content

The judgement highlighted how the accused inputted text prompts in Stable Diffusion to generate images based on his parameters. He also used “specific “negative” prompts” – prompts that direct the GenAI model on what not to include in the generated content—to avoid creating images that depict adults. Further, the review of the accused’s electronic devices showed that he also used a graphical user interface and special add-ons created by other Stable Diffusion users that specialized in producing CSAM material. The accused combined these tools to generate photo-realistic CSAM content.

“Additional evidence from the laptop indicates that he used extremely specific and explicit prompts to create these images,” said the judgement.

What does this mean in terms of CSAM prevention?

To some extent, the incident raises a question about the efficacy of blocking prompts. In April, OpenAI talked about blocking prompts as an “additional content safeguards” for non-account experiences without mentioning the specific prompts blocked. Before that, Microsoft blocked “pro choice,” “pro choce” [sic] and “pro life” prompts on its Copilot artificial intelligence tool in March, after a staff AI engineer flagged concerns about the AI’s image generation content. However, the Wisconsin man’s actions have made it abundantly clear that prompt blocking does little to hinder malefactors; to the extent that the accused even asked minors if they would like customized images of the contentious material.

Even the US court said, “While AI companies have pledged to make it more difficult for offenders to use future versions of GenAI tools to generate images of minors being sexually abused, such steps will do little to prevent savvy offenders like the defendant from running prior versions of these tools locally from their computers without detection.”

On the other hand, Gautham Koorma, a machine learning engineer and researcher from UC Berkeley, argued that prompt blocking remains an important security measure despite such incidents.

“Prompt blocking is a valuable measure to prevent the misuse of AI models. However, it is not foolproof, as users might circumvent these measures using techniques like jailbreaks. Large tech firms, such as OpenAI and Microsoft, implement and maintain these filtering mechanisms with dedicated teams to ensure their effectiveness. Unfortunately, open-source models like Stable Diffusion have weak safety mechanisms that can be easily disabled. Websites like CivitAI, which host fine-tuned Stable Diffusion models for pornography and CSAM, exacerbate this issue​,” he said.

When asked whether there could be alternatives to prompt blocking, Koorma referred to Stanford Internet Observatory’s ‘Generative ML and CSAM: Implications and Mitigations’ report. It recommended that computer-generated CSAM or CG-CSAM production forums could be monitored and the perceptual hashes of the material added to separate hash sets.

“This could allow platforms to detect and remove future uploaded CG-CSAM content; platforms themselves could also contribute to these hash sets, as is currently the case with other hash sharing systems… This material can also be analyzed for trends in models and parameters, as well as potentially used for training of detection models,” said the report. It also suggested that the f industry classification and categorization systems be expanded to include additional criteria for determining severity of CG-CSAM. For example, the system could check whether content is:

• Computer generated

• Indistinguishable from photo representations

• Portrays explicit sexual activity

• Is modelled after an extant person or known victim

Wisconsin incident also raises concerns around end-to-end encryption

The court noted that the accused was caught only because Instagram detected and reported the objectionable material sent by the man to a minor by direct message on a single day in October 2023. Further, the court noted that Meta has begun to roll out default end-to-end encryption on its Facebook and Messenger platforms since October 2023 and has indicated that Instagram’s direct message feature would follow soon after.

“If default end-to-end encryption were enabled on Instagram, it is highly likely that no one, including Meta, would be able to detect what the defendant might send by direct message except for the recipient of that message. The same goes for Telegram, which the defendant discussed using with others, and any number of other encrypted messaging applications.”

The incident once again brings up the debate around end-to-end encryption and whether platforms should provide backdoor entry to law enforcement agencies. A version of this discourse can currently be seen in the Indian courts where the Indian government is asking the social media platform WhatsApp to “enable the identification of the first originator of the information [message]” which essentially removes end-to-end encryption (E2EE). However, the platform and many like it continue to argue in favour of E2EE stating that the removal of such a security measure would severely impact user privacy.

In 2022,  Business for Social Responsibility (BSR) said in its report that message content can be scanned in an E2EE environment by using one of several nascent hash-based solutions known as “client-side scanning.” However, even with these techniques, there is still a risk of abuse by governments. Moreover, hash-based solutions cannot effectively moderate nuanced content, meaning that they can only be used against clearly violating content like CSAM and not dis/misinformation.

When asked about his views on the matter, Koorma said, “End-to-end encryption (E2EE) provides essential privacy but complicates detecting illegal content, like CSAM. A balanced approach involves creating technologies that can detect and report CSAM while respecting user privacy. Collaboration between tech companies, policymakers, and law enforcement is crucial for developing protocols that protect privacy without compromising safety. Proposals like the EARN IT Act, which holds tech companies accountable for CSAM by conditioning legal immunity on compliance with best practices, and the STOP CSAM Act, which increases transparency and accountability, aim to address these issues but raise concerns about potentially weakening encryption and security.”

Also Read:

Originally Appeared Here

You May Also Like

About the Author:

Early Bird