Gaia2 and ARE: Empowering the community to study agents
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Gaia2 and ARE: Empowering the Community to Evaluate Agents Published September 22, 2025 Update on GitHub Upvote 127 +121 Clémentine Fourrier clefourrier Follow OpenEvals Grégoire Mialon gregmialz Follow meta-agents-research-environments Maxime Lecanu mlcu Follow meta-agents-research-environments Pierre Andrews mortimerp9 Follow meta-agents-research-environments Adrien Carreira XciD Follow frere thibaud tfrere Follow Avijit Ghosh evijit Follow hfpolicy Romain Froger RomainFroger Follow meta-agents-research-environments Dheeraj Mekala dheeraj7596 Follow meta-agents-research-environments Caroline Pascal CarolinePascal Follow lerobot Ulyana Piterbarg upiter Follow meta-agents-research-environments In an ideal world, AI agents would be reliable assistants. When given a query, they would easily manage ambiguity in instructions, construct step-by-step plans, correctly identify necessary resources, execute those plans without getting sidetracked, and adapt to unexpected events, all while maintaining accuracy and avoiding hallucinations. However, developing agents and testing these behaviors is no small feat: if you have ever tried to debug your own agent, you’ve probably observed how tedious and frustrating this can be. Existing evaluation environments are tightly coupled with the tasks they evaluate, lack real-world flexibility, and do not reflect the messy reality of open-world agents: simulated pages never fail to load, events don’t spontaneously emerge, and as...