Nemotron-Personas-India: Synthesized Data for Sovereign AI
About this article
A Blog post by NVIDIA on Hugging Face
Back to Articles Nemotron-Personas-India: Synthesized Data for Sovereign AI Enterprise + Article Published October 13, 2025 Upvote 14 +8 Kiran Praveen kipraveen Follow nvidia Utkarsh Vaidya uvaidya Follow nvidia Evan A eacharya-nv Follow nvidia Lipika Ramaswamy lipikaxnv Follow nvidia Dhruv Nathawani dnathawani Follow nvidia Dane Corneil dcorneil Follow nvidia Yev Meyer nv-3mei Follow nvidia A compound AI approach to Indian personas grounded in real-world distributions Open Data for India's AI Future India represents one of the world's largest AI opportunities — with over 700 million internet users, a multitude of languages, and a rapidly growing developer ecosystem. Yet, most open datasets reflect Western norms and English-only contexts, creating a data gap that limits AI adoption in India's multilingual, multi-script environment. Today, we're releasing Nemotron-Personas-India, the first open synthetic dataset of Indic personas aligned to India's real-world demographic, geographic, and cultural distributions. Licensed under CC BY 4.0, this dataset offers a privacy-preserving, regulation-ready foundation for scaling AI systems that reflect Indian society—without relying on sensitive personal data. Built with NeMo Data Designer, NVIDIA's enterprise-grade synthetic data generation microservice, Nemotron-Personas-India extends our global collection of Sovereign AI datasets. It builds on the success of our US and Japan persona datasets and includes new features designed specif...