Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.
Meta AI researchers announced on Thursday that they have developed a new suite of artificial intelligence models called Seamless Communication that aim to enable more natural and authentic communication across languages — essentially making the concept of a Universal Speech Translator a reality. The models were publicly released this week along with research papers and accompanying data.
The flagship model, called Seamless, merges capabilities from three other models — SeamlessExpressive, SeamlessStreaming, and SeamlessM4T v2 — into one unified system. According to the research paper, Seamless is “the first publicly available system that unlocks expressive cross-lingual communication in real-time.”
How Seamless works as a universal real-time translator
The Seamless translator represents a new frontier in the use of AI for communication across the blog. It combines three sophisticated neural network models to enable real-time translation between over 100 spoken and written languages while preserving the vocal style, emotion, and prosody of the speaker’s voice.
SeamlessExpressive focuses on preserving the vocal style and emotional nuances of the speaker’s voice when translating between languages. As described in the paper, “Translations should capture the nuances of human expression. While existing translation tools are skilled at capturing the content within a conversation, they typically rely on monotone, robotic text-to-speech systems for their output.”
VB Event
The AI Impact Tour
Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!
Learn More
SeamlessStreaming enables near real-time translation with only about two seconds of latency. The researchers say it is the “first massively multilingual model” to deliver such fast translation speeds across nearly 100 spoken and written languages.
The third model, SeamlessM4T v2, serves as the foundation for the other two models. It is an upgraded version of the original SeamlessM4T model released last year. The new architecture delivers “improved consistency between text and speech output,” according to the paper.
“In sum, Seamless gives us a pivotal look at the technical foundation needed to turn the Universal Speech Translator from a science fiction concept into a real-world technology,” the researchers wrote.
Potential to transform global communication
The models’ capabilities could enable new voice-based communication experiences, from real-time multilingual conversations using smart glasses to automatically dubbed videos and podcasts. The researchers suggest it could also help break down language barriers for immigrants and others who struggle with communication.
“By publicly releasing our work, we hope that researchers and developers can expand the impact of our contributions by building technologies aimed at bridging multilingual connections in an increasingly interconnected and interdependent world,” the paper states.
However, the researchers acknowledge the technology could also be misused for voice phishing scams, deep fakes and other harmful applications. To promote safety and responsible use of the models, they implemented several measures including audio watermarking and new techniques to reduce hallucinated toxic outputs.
Models publicly released on Hugging Face
In keeping with Meta’s commitment to open research and collaboration, the Seamless Communication models have been publicly released on Hugging Face and Github.
The collection includes the Seamless, SeamlessExpressive, SeamlessStreaming, and SeamlessM4T v2 models along with accompanying metadata.
By making these state-of-the-art natural language processing models freely available, Meta hopes to enable fellow researchers and developers to build upon and extend this work to help connect people across languages and cultures. The release underscores Meta’s leadership in open source AI and provides a valuable new resource for the research community.
“Overall, the multidimensional experiences Seamless may engender could lead to a step change in how machine-assisted cross-lingual communication is accomplished,” the researchers concluded.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.