Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Hugging Face today has released SmolLM2, a new family of compact language models that achieve impressive performance while requiring far fewer computational resources than their larger counterparts.
The new models, released under the Apache 2.0 license, come in three sizes — 135M, 360M and 1.7B parameters — making them suitable for deployment on smartphones and other edge devices where processing power and memory are limited. Most notably, the 1.7B parameter version outperforms Meta’s Llama 1B model on several key benchmarks.
Performance comparison shows SmolLM2-1B outperforming larger rival models on most cognitive benchmarks, with particularly strong results in science reasoning and commonsense tasks. Credit: Hugging Face
Small models pack a powerful punch in AI performance tests
“SmolLM2 demonstrates significant advances over its predecessor, particularly in instruction following, knowledge, reasoning and mathematics,” according to Hugging Face’s model documentation. The largest variant was trained on 11 trillion tokens using a diverse dataset combination including FineWeb-Edu and specialized mathematics and coding datasets.
This development comes at a crucial time when the AI industry is grappling with the computational demands of running large language models (LLMs). While companies like OpenAI and Anthropic push the boundaries with increasingly massive models, there’s growing recognition of the need for efficient, lightweight AI that can run locally on devices.
The push for bigger AI models has left many potential users behind. Running these models requires expensive cloud computing services, which come with their own problems: slow response times, data privacy risks and high costs that small companies and independent developers simply can’t afford. SmolLM2 offers a different approach by bringing powerful AI capabilities directly to personal devices, pointing toward a future where advanced AI tools are within reach of more users and companies, not just tech giants with massive data centers.
A comparison of AI language models shows SmolLM2’s superior efficiency, achieving higher performance scores with fewer parameters than larger rivals like Llama3.2 and Gemma, where the horizontal axis represents the model size and the vertical axis shows accuracy on benchmark tests. Credit: Hugging Face
Edge computing gets a boost as AI moves to mobile devices
SmolLM2’s performance is particularly noteworthy given its size. On the MT-Bench evaluation, which measures chat capabilities, the 1.7B model achieves a score of 6.13, competitive with much larger models. It also shows strong performance on mathematical reasoning tasks, scoring 48.2 on the GSM8K benchmark. These results challenge the conventional wisdom that bigger models are always better, suggesting that careful architecture design and training data curation may be more important than raw parameter count.
The models support a range of applications including text rewriting, summarization and function calling. Their compact size enables deployment in scenarios where privacy, latency or connectivity constraints make cloud-based AI solutions impractical. This could prove particularly valuable in healthcare, financial services and other industries where data privacy is non-negotiable.
Industry experts see this as part of a broader trend toward more efficient AI models. The ability to run sophisticated language models locally on devices could enable new applications in areas like mobile app development, IoT devices, and enterprise solutions where data privacy is paramount.
The race for efficient AI: Smaller models challenge industry giants
However, these smaller models still have limitations. According to Hugging Face’s documentation, they “primarily understand and generate content in English” and may not always produce factually accurate or logically consistent output.
The release of SmolLM2 suggests that the future of AI may not solely belong to increasingly large models, but rather to more efficient architectures that can deliver strong performance with fewer resources. This could have significant implications for democratizing AI access and reducing the environmental impact of AI deployment.
The models are available immediately through Hugging Face’s model hub, with both base and instruction-tuned versions offered for each size variant.
VB Daily
Stay in the know! Get the latest news in your inbox daily
Thanks for subscribing. Check out more VB newsletters here.
An error occured.