Chinese tech companies are racing ahead with artificial intelligence tools that can turn text into short videos. The public release of a handful of AI video generators from big companies and startups aims to show how the country is narrowing the gap with the U.S. when it comes to the technology. But they are simultaneously opening a Pandora’s box, allowing anyone to create short clips from almost any prompt they can imagine.
I tried some out. They wouldn’t give me any videos of Xi Jinping break dancing, but one did make a clip from my headshot that removed my jacket and shirt when I was testing what these could potentially be used for.
While the videos were not always high quality, I was still ultimately left feeling sorry for a generation of girls and young people who are growing up with this technology so easily accessible.
In recent weeks, internet giant Kuaishou Technology released its AI video tool Kling; startup Zhipu AI launched Ying; TikTok parent company ByteDance Ltd. unveiled Jimeng; and startup Shengshu AI, with help from Tsinghua University, launched Vidu. Alibaba Group Holding Ltd. is also reportedly working on its own AI video-generating application.
The rush to offer these services to the Chinese public stands in stark contrast to firms in the U.S. OpenAI teased a first look at its video-generating tool, Sora, in February, but has yet to publicly release it. Google’s Veo is only available to a handful of select creators and testers via a waitlist at the moment.
I couldn’t get my hands on Zhipu AI’s Ying or ByteDance’s Jimeng outside of China. But I spent some time playing around with Kuaishou’s and Shengshu’s offerings, and the results showed fleeting moments of mind-boggling promise.
Still, most of the videos I generated were very brief clips of uncanny content that struggled with human faces, movement, and basic principles of physics. It’s still in its infancy, but these clips felt useless, and just more fodder for a hype-over-performance thesis.
My favorite creation was a realistic gray-striped tabby cat eating a bowl of ramen in outer space from Kling (my prompt was: “Can you make a realistic video of a gray-striped tabby cat eating ramen in outer space?”), but it added a creepy human hand to help the kitty slurp the noodles with chopsticks. Vidu gave me an incredibly lifelike shot of two lovers in the cinematic style of legendary director Wong Kar-wai, but it also removed clothing (from the shoulders up) in my own headshot when prompted. (When I asked the Kling tool to remove my jacket and shirt from a photo of myself it did not obey my prompt command).
Kuaishou has said that it will use Kling to make a fantasy short film, but it’s hard to picture this being anything remotely watchable with the technology as finicky as it was when I used it. A clip I made of a woman break dancing was nightmarish. An animated video I generated had a beautiful background but an incomprehensible figure flying over it. It also took me roughly five minutes to generate a five-second clip, so imagine how many hours it would take to make a longer video, not including the painstaking postproduction and editing.
U.S. tech giants’ rare restraint in launching these tools is wise (and saves them a lot of computing resources). But it also makes it hard to judge how superior their products actually are compared to Chinese counterparts. From the curated teases we’ve seen from OpenAI and Google, they seem far more capable of creating realistic video content.
This may be in part because of their access to advanced chips and computing equipment. Training AI video models requires immense amounts of visual data and processing power. OpenAI’s published research on Sora found that the video quality “improves markedly” as computational resources for training increase.
Beijing is currently restricted from access to the top of the line equipment from Nvidia Corp. and others. But Chinese tech firms are finding ways to obtain these products via sophisticated gray market routes and racing to produce advanced AI chips themselves. In a matter of five to 10 years, I wonder how much more powerful their AI video services and offshoots will become?
Proponents argue AI video generators will democratize creativity, giving anyone with an idea the ability to make their own films. But the opaque training data raises questions around intellectual property rights, and how this could impact the livelihoods of professional creators. There are also very valid fears about bad actors abusing them to create anything from convincing misinformation to deepfake porn.
This technology may not be totally reliable yet, but its public release in China marks a turning point. Meanwhile, sentiment globally is souring against AI-generated content. Some of the initial wow-factors experienced after the release of ChatGPT nearly two years ago have morphed into fatigue, and there are now questions over how this technology will translate into something that makes our lives better. There’s also been mounting scrutiny of the sector’s environmental footprint, while investors globally are reassessing AI’s promises.
Companies in the U.S. and China should approach this crossroads strategically, rather than continuing full-steam ahead with the global race for this technology. Chinese firms should take a page from the Americans’ playbook and hold back on rushing these tools to the public. And both countries must work on guardrails to protect artificial content from wreaking real-world harms, as well as addressing where the training data is coming from, and who has the rights to use it.
It might be too late to put the genie back in the bottle, but the generation that has to come of age under constant threat of being deep-faked and deceived deserves better.
Catherine Thorbecke is a columnist covering Asia tech.