
After Google’s Veo 3 and OpenAI’s Sora showed the world how AI can generate hyperrealistic videos from text prompts, Chinese AI companies seem to be catching up. Chinese search giant Baidu recently launched its first video generation model, MuseSteamer. This model is the first AI video generation tool that generates videos with synchronised Chinese audio.
The model allows users to generate visuals, sound effects, and spoken Chinese dialogue simultaneously. Reportedly, this is beneficial for advertisers, marketers, and anyone who wants to make high-quality videos without spending millions in production costs or working through extended timelines. The MuseSteamer is essentially a business-only AI tool which turns images into short videos. Baidu has also upgraded its search offerings by making them smarter, multimodal, and more personalised.
MuseSteamer is a Vision Language Model (VLM), which is a type of AI model that comes with the combined capabilities of computer vision and natural language processing. VLMs allow machines to understand and process information through images and texts, and they also let them perform tasks that require the combined understanding of visual and text data.
Story continues below this ad
MuseSteamer is capable of creating 10-second clips in 1080p resolution with fully synced visuals, spoken dialogue and sound effects. Those who got to try Baidu’s MuseSteamer seem to be raving about the outputs of the model. Here are some stunning video samples shared by X users.
Baidu just launched something amazing today🚀
Introducing MuseSteamer – the first AI tool that can make full videos with Chinese voice, sound, and visuals, all perfectly synced.
You don’t need to record voice or add sound later – the AI does it all for you in one go!
Why it’s… pic.twitter.com/4PRN0CyyTM
— Chidanand Tripathi (@thetripathi58) July 2, 2025
Say hello to MuseSteamer — Baidu’s latest AI breakthrough! 🚨
Unveiled today, this cutting-edge tool generates complete videos in Chinese, with perfectly synced visuals, voice & sound effects — all in one go!
No need for separate voiceovers or post-editing — the AI handles… pic.twitter.com/gDancrK1Cd
— kamran Hassan (@Rana_kamran43) July 2, 2025
🚨 Baidu just launched the world’s first video model capable of generating videos with Chinese audio simultaneously
It’s called MuseSteamer, and creators are already using it to generate Chinese videos end-to-end
This is a massive leap for Chinese-language content creation. pic.twitter.com/soye3k5dz1
— Shruti (@heyshrutimishra) July 2, 2025
Guess what Baidu just dropped?
MuseSteamer—the world’s first video model that generates videos with perfectly synced Chinese audio.
This breakthrough changes the game for creators, marketers, and advertisers by enabling the synchronized generation of visuals, sound effects, and… pic.twitter.com/qPWk3rjTRi
— Parul Gautam (@Parul_Gautam7) July 2, 2025
🔥This New AI Builds Chinese Videos From Start to Finish — No Editing Needed
At today’s AI Day, Baidu introduced a powerful new way to create videos using AI.
It launched MuseSteamer, the world’s first video model that can generate videos with native Chinese audio, perfectly… pic.twitter.com/Q6opvO1FFt
— Markandey Sharma (@TechByMarkandey) July 2, 2025
Story continues below this ad
The AI model is available in three tiers – Turbo, Pro, and Lite – which is focused on enterprise users. While Veo 3 and OpenAI’s Sora are consumer-centric video models, MuseSteamer has been designed for businesses. The latest advancement from Baidu has intensified the generative AI race in China, where players like ByteDance, Tencent, Alibaba, etc., are already making rapid strides.
In May, at the Google I/O, the Alphabet Inc. company had introduced its AI video generation model, Veo 3, which has been lauded for its hyperrealistic videos. With its latest offering Baidu seems to be aiming to outpace giants like Google, OpenAI, and even Runway in this segment.