[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability

[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

arXiv - AI March 27, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.24596: X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

Electrical Engineering and Systems Science > Audio and Speech Processing arXiv:2603.24596 (eess) [Submitted on 6 Mar 2026] Title:X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs Authors:Di Cao, Dongjie Fu, Hai Yu, Siqi Zheng, Xu Tan, Tao Jin View a PDF of the paper titled X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs, by Di Cao and 5 other authors View PDF HTML (experimental) Abstract:While the shift from cascaded dialogue systems to end-to-end (E2E) speech Large Language Models (LLMs) improves latency and paralinguistic modeling, E2E models often exhibit a significant performance degradation compared to their text-based counterparts. The standard Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) training methods fail to close this gap. To address this, we propose X-OPD, a novel Cross-Modal On-Policy Distillation framework designed to systematically align the capabilities of Speech LLMs to their text-based counterparts. X-OPD enables the Speech LLM to explore its own distribution via on-policy rollouts, where a text-based teacher model evaluates these trajectories and provides token-level feedback, effectively distilling teacher's capabilities into student's multi-modal representations. Extensive experiments across multiple benchmarks demonstrate that X-OPD significantly narrows the gap in complex tasks while preserving the model's inherent capabilities. Comments: Subjects: Audio and Speech Pro...

Originally published on March 27, 2026. Curated by AI News.

Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min · about 2 hours ago

Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Google’s Gemini AI app debuts in Hong Kong

Tech giant’s chatbot service tops Apple’s app store chart in the city.

AI Tools & Products · 2 min · about 4 hours ago

Llms

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

Anyone looking to switch their AI assistant will find it surprisingly easy, as it only takes a few steps to move from A to B. This is not...

AI Tools & Products · 4 min · about 4 hours ago

[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

About this article

Related Articles

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Google’s Gemini AI app debuts in Hong Kong

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

No comments

Stay updated with AI News