The short-video platform, which has over 600 million active users, announced the new tool on June 6. It’s called Kling. Like OpenAI’s Sora model, Kling is able to generate videos “up to two minutes long with a frame rate of 30fps and video resolution up to 1080p,” the company says on its website.
But unlike Sora, which still remains inaccessible to the public four months after OpenAI trialed it, Kling soon started letting people try the model themselves.
I was one of them. I got access to it after downloading Kuaishou’s video-editing tool, signing up with a Chinese number, getting on a waitlist, and filling out an additional form through Kuaishou’s user feedback groups. The model can’t process prompts written entirely in English, but you can get around that by either translating the phrase you want to use into Chinese or including one or two Chinese words.
So, first things first. Here are a few results I generated with Kling to show you what it’s like. Remember Sora’s impressive demo video of Tokyo’s street scenes or the cat darting through a garden? Here are Kling’s takes:
Prompt: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING
Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING
Prompt: A white and orange tabby cat is seen happily darting through a dense garden, as if chasing something. Its eyes are wide and happy as it jogs forward, scanning the branches, flowers, and leaves as it walks. The path is narrow as it makes its way between all the plants. The scene is captured from a ground-level angle, following the cat closely, giving a low and intimate perspective. The image is cinematic with warm tones and a grainy texture. The scattered daylight between the leaves and plants above creates a warm contrast, accentuating the cat’s orange fur. The shot is clear and sharp, with a shallow depth of field.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING
Remember the image of Dall-E’s horse-riding astronaut? I asked Kling to generate a video version too.
Prompt: An astronaut riding a horse in space.
ZEYI YANG/MIT TECHNOLOGY REVIEW | KLING
There are a few things worth applauding here. None of these videos deviates from the prompt much, and the physics seem right—the panning of the camera, the ruffling leaves, and the way the horse and astronaut turn, showing Earth behind them. The generation process took around three minutes for each of them. Not the fastest, but totally acceptable.
But there are obvious shortcomings, too. The videos, while 720p in format, seem blurry and grainy; sometimes Kling ignores a major request in the prompt; and most important, all videos generated now are capped at five seconds long, which makes them far less dynamic or complex.
However, it’s not really fair to compare these results with things like Sora’s demos, which are hand-picked by OpenAI to release to the public and probably represent better-than-average results. These Kling videos are from the first attempts I had with each prompt, and I rarely included prompt-engineering keywords like “8k, photorealism” to fine-tune the results.