[2602.16898] MALLVI: a multi agent framework for integrated generalized robotics manipulation
Summary
The paper presents MALLVI, a multi-agent framework for robotic manipulation that utilizes closed-loop feedback to enhance task planning and execution based on natural language instructions and environmental images.
Why It Matters
MALLVI addresses the limitations of existing robotic manipulation approaches by integrating multiple specialized agents for improved adaptability and success rates in dynamic environments. This advancement is crucial for the future of robotics, particularly in applications requiring precise manipulation and interaction with complex environments.
Key Takeaways
- MALLVI employs a multi-agent system to enhance robotic manipulation tasks.
- The framework uses closed-loop feedback for better decision-making and adaptability.
- Specialized agents handle different aspects of manipulation, improving overall efficiency.
- The approach shows increased success rates in zero-shot manipulation scenarios.
- MALLVI's design allows for targeted error detection and recovery.
Computer Science > Robotics arXiv:2602.16898 (cs) [Submitted on 18 Feb 2026] Title:MALLVI: a multi agent framework for integrated generalized robotics manipulation Authors:Iman Ahmadi, Mehrshad Taji, Arad Mahdinezhad Kashani, AmirHossein Jadidi, Saina Kashani, Babak Khalaj View a PDF of the paper titled MALLVI: a multi agent framework for integrated generalized robotics manipulation, by Iman Ahmadi and 5 other authors View PDF HTML (experimental) Abstract:Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic this http URL present MALLVi, a Multi Agent Large Language and Vision framework that enables closed loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVi generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next this http URL than using a single model, MALLVi coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection an...