Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

Hugging Face Blog 4 min read

About this article

A Blog post by NVIDIA on Hugging Face

Back to Articles Measuring Open-Source Llama Nemotron Models on DeepResearch Bench Enterprise + Article Published August 4, 2025 Upvote 5 Jay Rodge jayrodge Follow nvidia Contributors: David Austin, Raja Biswas, Gilberto Titericz Junior, NVIDIA NVIDIA’s AI-Q Blueprint—the leading portable, open deep research agent—recently climbed to the top of the Hugging Face “LLM with Search” leaderboard on DeepResearch Bench. This is a significant step forward for the open-source AI stack, proving that developer-accessible models can power advanced agentic workflows that rival or surpass closed alternatives. What sets AI-Q apart? It fuses two high-performance open LLMs—Llama 3.3-70B Instruct and Llama-3.3-Nemotron-Super-49B-v1.5—to orchestrate long-context retrieval, agentic reasoning, and robust synthesis. Core Stack: Model Choices and Technical Innovations Llama 3.3-70B Instruct: The foundation for fluent, structured report generation, derived from Meta’s Llama series and open-licensed for unrestricted deployment. Llama-3.3-Nemotron-Super-49B-v1.5: An optimized, reasoning-focused variant. Built via Neural Architecture Search (NAS), knowledge distillation, and successive rounds of supervised and reinforcement learning, it excels at multi-step reasoning, query planning, tool use, and reflection—all with a reduced memory footprint for efficient deployment on standard GPUs. The AI-Q reference example also includes: NVIDIA NeMo Retriever for scalable, multimodal search (internal+external)...

Originally published on February 15, 2026. Curated by AI News.

Related Articles

Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

I've been reviewing how various AI memory systems evaluate their performance and noticed a fundamental issue with cross-system comparison...

Reddit - Machine Learning · 1 min ·
Shifting to AI model customization is an architectural imperative | MIT Technology Review
Llms

Shifting to AI model customization is an architectural imperative | MIT Technology Review

In the early days of large language models (LLMs), we grew accustomed to massive 10x jumps in reasoning and coding capability with every ...

MIT Technology Review · 6 min ·
Llms

Artificial intelligence will always depends on human otherwise it will be obsolete.

I was looking for a tool for my specific need. There was not any. So i started to write the program in python, just basic structure. Then...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime