NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset
About this article
A Blog post by NVIDIA on Hugging Face
Back to Articles NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset Enterprise + Article Published August 20, 2025 Upvote 18 +12 Jane Polak Scowcroft jscowcroft Follow nvidia Dhruv Nathawani dnathawani Follow nvidia Shuoyang Ding sding Follow nvidia Oleksii Kuchaiev okuchaiev Follow nvidia Vitaly Lavrukhin vlavrukhin Follow nvidia Authors: Dhruv Nathawani, Shuoyang Ding US, Vitaly Lavrukhin US, Jane Polak Scowcroft US, Oleksii Kuchaiev US NVIDIA continues releasing permissive datasets in support of the open ecosystem with 6 Million Multilingual Reasoning Dataset. Continuing the success of the recent Nemotron Post-Training Dataset v1 release used in Llama Nemotron Super model, and our Llama Nemotron Post-Training Dataset release earlier this year, we’re excited to release the reasoning dataset translated into five target languages: French, Spanish, German, Italian, and Japanese. The newly released NVIDIA Nemotron Nano 2 9B brings these capabilities to the edge with leading accuracy and efficiency with a hybrid Transformer–Mamba architecture and a configurable thinking budget—so you can dial accuracy, throughput, and cost to match your real‑world needs. Model Highlights (TL;DR) Model size: 9B parameters Architecture: Hybrid Transformer–Mamba (Mamba‑2 + a small number of attention layers) for higher throughput at similar accuracy to Transformer‑only peers Throughput: Up to 6× higher token generation than other leading models in its size class Cost: Thinking budget lets...