Indian AI lab Sarvam's new models are a major bet on the viability of open-source AI | TechCrunch
Summary
Indian AI lab Sarvam launches new large language models, including 30B and 105B parameter models, aiming to challenge foreign AI systems with open-source solutions tailored for local languages.
Why It Matters
This development highlights India's commitment to reducing reliance on foreign AI technologies and fostering local innovation. By focusing on open-source models, Sarvam aims to democratize access to advanced AI capabilities, particularly in Indian languages, which could significantly impact various sectors including education, customer service, and technology development.
Key Takeaways
- Sarvam's new models include 30B and 105B parameters, designed for efficiency and tailored to Indian languages.
- The models utilize a mixture-of-experts architecture to reduce computing costs while maintaining performance.
- Sarvam aims to compete with major players like OpenAI and Google by focusing on real-world applications rather than just model size.
- The company plans to open-source its models, promoting transparency and collaboration in AI development.
- Sarvam has secured over $50 million in funding, indicating strong investor confidence in its vision.
Indian AI lab Sarvam on Tuesday unveiled a new generation of large language models, as it bets that smaller, efficient open-source AI models will be able to grab some market share away from more expensive systems offered by its much larger U.S. and Chinese rivals. The launch, announced at the India AI Impact Summit in New Delhi, aligns with New Delhi’s push to reduce reliance on foreign AI platforms and tailor models to local languages and use cases. Sarvam said the new lineup includes 30-billion and 105-billion parameter models; a text-to-speech model; a speech-to-text model; and a vision model to parse documents. These mark a sharp upgrade from the company’s 2-billion-parameter Sarvam 1 model that it released in October 2024. The 30-billion- and 105-billion-parameter models use a mixture-of-experts architecture, which activates only a fraction of their total parameters at a time, significantly reducing computing costs, Sarvam said. The 30B model supports a 32,000-token context window aimed at real-time conversational use, while the larger model offers a 128,000-token window for more complex, multi-step reasoning tasks. Sarvam’s 30B model is placed against Google’s Gemma 27B and OpenAI’s GPT-OSS-20B, among other models Image Credits:Sarvam Sarvam said the new AI models were trained from scratch rather than fine-tuned on existing open-source systems. The 30B model was pre-trained on about 16 trillion tokens of text, while the 105B model was trained on trillions of tokens s...