AI Made Friendly HERE

All the Bad Things That Can Happen When You Generate a Sora Video

First chance I got, I downloaded the Sora app. I uploaded images of my face—the one my children kiss at bedtime—and my voice—the voice I use to tell my wife I love her—and added them to my Sora profile. I did all this so I could use Sora’s “Cameo” feature to make an idiotic video of my AI self being shot with paintballs by 100 elderly nursing home residents.

What did I just do? The Sora app is powered by Sora 2, an AI model—and a rather breathtaking one to be honest. It can create videos that run the gamut of quality from from banal to profoundly satanic. It is a black hole of energy and data, and also a distributor of highly questionable content. Like so many things these days, using Sora feels like it’s a little bit of a naughty thing to do, even if you don’t know exactly why.

So if you just generated a Sora video, here’s all the bad news. By reading this, you’re asking to feel a little dirty and guilty, and your wish is my command.

Here’s how much electricity you just used

One Sora video uses something like 90 watt-hours of electricity according to CNET. This number is an educated guess drawn from a study of the energy use of GPUs by Hugging Face. 

OpenAI hasn’t actually published the numbers needed for this study, and Sora’s energy footprint has to be inferred from similar models. Sasha Luccioni, one of the Hugging Face researchers who did that work, isn’t happy with estimates like the one above, by the way. She told MIT Technology Review, “We should stop trying to reverse-engineer numbers based on hearsay,” and says we should pressure companies like OpenAI to release accurate data. 

At any rate, different journalists have provided different estimates based on the Hugginface data. For instance, the Wall Street Journal guessed somewhere between 20 and 100 watt-hours.

CNET analogizes its estimate to running a 65-inch TV for 37 minutes. The Journal compares a Sora generation to cooking a steak from raw to rare on an electric outdoor grill (because such a thing exists apparently).

It’s worth clarifying a couple things about this energy use issue in the interest of making you feel even worse. First of all, what I just outlined is the energy expenditure from inference, also known as running the model in response to a prompt. The actual training of the Sora model required some unknown, but certainly astronomical, amount of electricity. The GPT-4 LLM required an estimated 50 gigawatt-hours—reportedly enough to power San Francisco for 72 hours. Sora, being a video model, took more than that, but how much more is unknown.

Viewed in a certain way, you assume a share of that unknown cost when you choose to use the model, before you even generate a video.

Secondly, separating inference from training is important in another way when trying to figure out how much eco-guilt to feel (Are you sorry you asked yet?). You can try to abstract away the high energy cost as something that already happened—like how the cow in your burger died weeks ago, and you can’t un-kill it by ordering a Beyond patty when you’ve already sat down in the restaurant. In that sense, running any cloud-based AI model is more like ordering surf and turf. The “cow” of all that training data may already be dead. But the “lobster” of your specific prompt is still alive until you send your prompt to the “kitchen” that is the data center where inference happens.

Here’s how much water you just used:

We’re about to do more guesstimating, sorry. Data centers use large amounts of water for cooling—either in closed loop systems, or through evaporation. You don’t get to know which data center, or multiple data centers, were involved in making that video of your friend as an American Idol contestant farting the song “Camptown Races.”

But it’s still probably more water than you’re comfortable with. OpenAI CEO Sam Altman claims that a single text ChatGPT query consumes “roughly one fifteenth of a teaspoon,” and CNET estimates that a video has 2,000 times the energy cost of a text generation. So a back-of-the-envelope scribble of an answer might be 0.17 gallons, or about 22 fluid ounces—a little more than a plastic bottle of Coke.

And that’s if you take Altman at face value. It could easily be more. Plus, the same considerations about the cost of training versus the cost of inference that applied to energy use apply here as well. Using Sora, in other words, is not a water wise choice. 

There’s a slight chance someone might make a truly hideous deepfake of you.

Sora’s Cameo privacy settings are robust—as long as you’re aware of them, and avail yourself of them. The settings under “Who can use this” more or less protect your likeness from being a plaything for the public, as long as you don’t choose the setting “Everyone,” which means anyone can make Sora videos of you. 

Even if you are reckless enough to have a publicly available Cameo, you have some added control in the “Cameo preferences” tab, like the ability to describe, in words, how you should appear in videos. You can write whatever you want here, like “lean, toned, and athletic” perhaps, or “always picking my nose.” And you also get to set rules about what you should never be shown doing. If you keep kosher, for instance, you can say you should never be shown eating bacon.

But even if you don’t allow your Cameo to be used by anyone else, you can still take some comfort in the open-ended ability to create guardrails as you make videos of yourself.

But the general content guardrails in Sora aren’t perfect. According to OpenAI’s own model card for Sora, if someone prompts hard enough, an offensive video can slip through the cracks.

The card lays out success rates for various kinds of content filters in the 95%-98% range. However, subtracting only the failures gets you a 1.6% chance of a sexual deepfake, a 4.9% chance of a video with violence and/or gore, a 4.48% chance of something called “violative political persuasion,” and a 3.18% chance of extremism or hate. These chances were calculated from “thousands of adversarial prompts gathered through targeted red-teaming”—intentionally trying to break the guardrails with rule-breaking prompts, in other words.

So the odds are not good of someone making a sexual or violent deepfake of you, but OpenAI (probably wisely) never said never.

Someone might make a video where you touch poop.

In my tests, Sora’s content filters generally worked as advertised, and I never confirmed what the model card said about its failures. I didn’t painstakingly create 100 different prompts trying to trick Sora into generating sexual content. If you prompt it for a cameo of yourself naked, you get the message “Content Violation” in place of your video.

However, some potentially objectionable content is so weakly policed as to be completely unfiltered. Specifically, Sora is seemingly unconcerned about scatological content, and will generate material of that sort without any guardrails, as long as it doesn’t violate other content policies like the ones around sexuality and nudity.

So yes, in my tests, Sora generated Cameo videos of a person interacting with poop, including scooping turds out of a toilet with their bare hands. I’m not going to embed the videos here as a demonstration for obvious reasons, but you can test it for yourself. It didn’t take any trickery or prompt engineering whatsoever. 

In my experience, past AI image generation models have had measures in place to prevent this sort of thing, including Bing’s version of OpenAI’s image generator, Dall-E, but that filter appears to be gone in the Sora app. I don’t think that’s necessarily a scandal, but it’s nasty!  

Gizmodo asked OpenAI to comment on this, and will update if we hear back. 

Your funny video might be someone else’s viral hoax. 

Sora 2 has unlocked a vast and infinite universe of hoaxes. You, a sharp, internet-savvy content consumer would never believe that anything like the viral video below could be real. It shows spontaneous looking footage seemingly shot from outside the White House. In audio that sounds like an overheard phone conversation, AI-generated Donald Trump tells some unknown party not to release the Epstein files, and screams “Just don’t let ’em get out. If I go down, I will bring all of you down with me.”

Judging from Instagram comments alone, some people seemed to believe this was real. 

The creator of the viral video never claimed it was real, telling Snopes, who confirmed it was made by Sora, that the video is “fully AI-generated” and was created “solely for artistic experimentation and social commentary.” A likely story. It was pretty clearly made for clout and social media visibility. 

But if you post videos publicly on Sora, other users can download them and do whatever they want with them—and that includes posting them on other social networks and pretending they’re real. OpenAI very consciously made Sora into a place where users can doomscroll into infinity. Once you put a piece of content in a place like that, context no longer matters, and you have no way of controlling what happens to it next. 

Originally Appeared Here

You May Also Like

About the Author:

Early Bird