AI Made Friendly HERE

Vidu: China Reveals Revolutionary Text-to-Video Generator to Rival OpenAI’s Sora

With the unveiling of this text-to-video generator, Shengshu Technology and Tsinghua University have demonstrated their commitment to pushing the boundaries of AI technology.

This partnership highlights the growing importance of AI research and development in China and its potential impact on various industries worldwide. 

(Photo: Steve Johnson from Unsplash)

China’s Next Step in AI Innovation

Shengshu Technology and Tsinghua University’s joint venture, Vidu, represents a significant milestone in China’s AI innovation journey.

This collaboration brings together the expertise of a tech startup and an esteemed academic institution to create a cutting-edge text-to-video generator. 

With Vidu’s unveiling at the Zhongguancun Forum in Beijing, it has garnered attention as a noteworthy competitor to OpenAI’s Sora.

Unlike Sora’s longer 60-second video capability, Interesting Engineering reported that Vidu allows users to generate shorter yet high-definition 16-second video clips with just a single click. 

While Vidu’s functionality may seem limited compared to Sora, its introduction marks a significant step forward in China’s AI technology landscape.

As the country continues to invest in AI research and development, Vidu exemplifies China’s commitment to innovation and technological advancement.

Zhu Jun, the chief scientist at Shengshu and deputy dean at Tsinghua’s Institute for AI, described Vidu as a significant advancement in self-reliant innovation, boasting breakthroughs in various domains.

Vidu is characterized by its imaginative capabilities, ability to simulate the physical world, and capacity to generate 16-second videos with consistent characters, scenes, and timelines.

Furthermore, Zhu highlighted Vidu’s proficiency in understanding “Chinese elements.” During the model’s debut, Shengshu Technology presented several demonstrations, including scenarios such as a panda playing a guitar on grass and a puppy swimming in a pool.

Advancements in Vidu’s Architectural Framework

Vidu is constructed on a proprietary visual transformation model architecture called the Universal Vision Transformer (U-ViT). Developers have indicated that this architecture combines two text-to-video AI models: the Diffusion and the Transformer. 

Furthermore, this architectural framework facilitates the creation of lifelike videos featuring dynamic camera movements, intricate facial expressions, and authentic lighting and shadow effects.

Zhu noted that the introduction of Sora resonated with their technical direction, intensifying their resolve to continue their research efforts.

Also read: Sora’s New Realistic AI-Generated Videos Means We Can’t Trust Our Eyes Anymore

Contrary to many Chinese iterations of OpenAI’s ChatGPT that emerged in November 2020, Chinese competitors have only recently caught up to Sora’s capabilities.

Experts in the industry attribute this delay to the significant challenge of insufficient computing power for Chinese companies.

According to Li Yangwei, a Beijing-based technical consultant specializing in intelligent computing, running Sora requires eight NVIDIA A100 graphics processing units (GPUs) for over three hours to generate a one-minute video clip.

Yangwei notes that Sora demands extensive computing power for inferencing.

Related Article: Google Unveils Lumiere: A Revolutionary New AI-Powered Text-to-Video Generator

Written by Inno Flores

ⓒ 2024 All rights reserved. Do not reproduce without permission.

Originally Appeared Here

You May Also Like

About the Author:

Early Bird