AI Made Friendly HERE

Gladia launches Solaria as AI-based multi-lingual speech recognition model for speech-to-text transcription

Gladia, an AI transcription and audio intelligence provider, launched Solaria, a next-gen automatic speech recognition (ASR) model designed to redefine real-time communications for call centers and other voice-first platforms.

Solaria now empowers businesses to enhance and expand their customer service operations with AI-powered voice technology that delivers unmatched language coverage—supporting 40+ languages previously inaccessible with other solutions—without compromising quality or speed.

While outsourcing has long been a cost reduction strategy in the call center industry, businesses now face a new, critical challenge: providing seamless, multilingual support at scale. With 49% of global executives reporting financial losses due to language barriers, the demand for scalable, high-quality multilingual solutions has never been greater.

“We’ve seen in the market a huge surge in voice AI. It’s like voice is part of our life again, and we are introducing a new product called Solaria, which is a model that is real time with advanced capabilities,” said Jean-Louis Queguiner, CEO of Gladia, in an interview with GamesBeat. “And it’s going to be the fastest on the market, and the most accurate in the market, covering 100 languages.”

The product also has features like real time sentiment analysis and real time translation, he said. It handles speech to text translation and transcription. This is important to do in real time for voice agents or call centers, where someone may have to answer a question that comes in with a different language.

Solaria: An enterprise-ready model for global customer experience

Solaria is a speech-to-text (STT) engine built for global scalability. Solaria was designed to meet the demands of today’s contact centers, where both AI automation and human agents need high-accuracy, low-latency, and real-time multilingual support to succeed.

The model achieves industry-leading results in speech recognition, delivering both accuracy and fast processing speed. Recent benchmarks show Solaria has reached an unmatched 94% Word Accuracy Rate (WAR) average in English, Spanish, French and other common languages, while maintaining an ultra-low latency of 270 millisecond, making the conversation feel natural and responsive.

While real-time speech-to-text is often measured by speed alone, accuracy and language coverage are equally crucial for businesses providing seamless services across regions.

Unlike other speech-to-text models that prioritize speed over usability, Solaria balances industry-leading accuracy and speed with unmatched language coverage—100 languages in total, with exclusive support for 42 languages not matched by competitors. For high-population markets and key outsourcing hubs like Bangladesh, India, and The Philippines, native-level accuracy in regional languages is now offered through Solaria.

With native-level transcription, real-time code-switching, and translation across all supported languages, businesses can expand into global markets without constraints.

Designed for enterprise-scale voice automation, Solaria delivers:

Best-in-class accuracy in high-population languages such as Tagalog, Bengali, Punjabi, Tamil, Urdu, Persian, and Marathi.

Ability to adapt the model to industry-specific terminology (like medical or financial jargon) and have it extract critical data, like names, addresses, and numericals.

Adaptive speech processing, ensuring high accuracy in noisy call center environments.

Enterprise-grade data security, in full compliance with GDRP, HIPPA, and SOC 2.

With the addition of Solaria to its product portfolio, Gladia allows businesses to enhance customer service by improving AI-powered voice agents, making IVRs and virtual assistants more reliable across multiple languages, while also optimizing human-assisted workflows with real-time transcriptions and translations to help agents provide more effective assistance.

“Speech is the most natural way to connect with the world—for the first time, automated speech recognition is closing the divide, enabling humans and AI to truly speak the same language,” said Jean-Louis Quéguiner, CEO of Gladia, in a statement. “With Solaria, we have made a breakthrough in AI-powered voice technology that unlocks new opportunities for businesses, driving efficiency and delivering more seamless, impactful customer experiences across diverse languages and markets. Solaria is built for next-generation voice platforms ready to lead this transformation on a global scale.”

Serving more than 700 enterprise customers worldwide, including Attention, Circleback, Method Financial, and VEED.IO, Gladia delivers enterprise-grade service and scalability, backed by dedicated support and infrastructure in the U.S. and Europe, guaranteeing reliable performance for mission-critical applications. Companies looking to scale globally, optimize operational costs, and enhance customer experiences can start building with Gladia’s API today.

As part of the Solaria launch, Gladia has partnered with LiveKit, a leading open-source developer framework for real-time AI voice agents, to power real-time, multilingual translation within AI-driven applications. This gives developers global language capabilities out of the box through seamless integration with Gladia’s API.

Following its $16 million Series A round in 2024 and today’s rollout of Solaria, Gladia has taken another critical step toward establishing itself as a leading end-to-end API audio infrastructure provider—combining speech recognition, generative AI, and voice generation capabilities to help enterprise users and developers tap into the full potential of real-time audio data.

Paris-based Gladia was founded in 2022 by Jean-Louis Queguiner (ex-OVHCloud) and Jonathan Soto (ex-MIT/Sigfox). Gladia’s product has been adopted by over 150,000 users and 700 enterprise clients—including industry leaders like Attention, Circleback, Method Financial, and VEED.IO.

There is a 300-millisecond delay between the moment you start speaking and the moment you receive the first event of voice being activated. It takes 100 milliseconds to do the transcription and so you have near instant results.

To improve the accuracy further, Queguiner said the company needs to train on more data. And it needs to work with the data augmentations to make the data more robust. The company has enterprise pricing in price but has not disclosed it yet. He said it will be among the most affordable solutions in the market.

The company has nearly 40 employees.

GB Daily

Stay in the know! Get the latest news in your inbox daily

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Originally Appeared Here

You May Also Like

About the Author:

Early Bird