[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

[2603.09643] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.09643: MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

Computer Science > Emerging Technologies arXiv:2603.09643 (cs) [Submitted on 10 Mar 2026 (v1), last revised 7 Apr 2026 (this version, v4)] Title:MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings Authors:Anupam Purwar, Aditya Choudhary View a PDF of the paper titled MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings, by Anupam Purwar and 1 other authors View PDF HTML (experimental) Abstract:Current evaluation frameworks and benchmarks for LLM powered agents focus on text chat driven agents, these frameworks do not expose the persona of user to the agent, thus operating in a user agnostic environment. Importantly, in customer experience management domain, the agent's behaviour evolves as the agent learns about user personality. With proliferation of real time TTS and multi-modal language models, LLM based agents are gradually going to become multi-modal. Towards this, we propose the MM-tau-p$^2$ benchmark with metrics for evaluating the robustness of multi-modal agents in dual control setting with and without persona adaption of user, while also taking user inputs in the planning process to resolve a user query. In particular, our work shows that even with state of-the-art frontier LLMs like GPT-5, GPT 4.1, there are additional considerations measured using metrics viz. multi-modal robustness, turn overhead while introducing multi-modality into LLM based agents. Ove...

Originally published on April 08, 2026. Curated by AI News.

Related Articles

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
Llms

[2603.16105] Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Abstract page for arXiv paper 2603.16105: Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

arXiv - AI · 4 min ·
[2603.07339] Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice
Llms

[2603.07339] Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

Abstract page for arXiv paper 2603.07339: Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

arXiv - AI · 4 min ·
[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities
Llms

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

Abstract page for arXiv paper 2602.00185: QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

arXiv - AI · 4 min ·
[2601.11652] WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching
Llms

[2601.11652] WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching

Abstract page for arXiv paper 2601.11652: WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dy...

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime