AI Agents

Autonomous agents, tool use, and agentic systems

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

Abstract page for arXiv paper 2602.00185: QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

arXiv - AI · 4 min · about 2 hours ago

Llms

[2506.22653] URSA: The Universal Research and Scientific Agent

Abstract page for arXiv paper 2506.22653: URSA: The Universal Research and Scientific Agent

arXiv - AI · 3 min · about 2 hours ago

Ai Agents

[2505.00472] UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces

Abstract page for arXiv paper 2505.00472: UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces

arXiv - AI · 3 min · about 2 hours ago

All Content

Machine Learning

[2505.24205] On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks

This paper explores the expressive power of Mixture-of-Experts (MoEs) in modeling complex tasks, demonstrating their efficiency in approx...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2505.24157] Experience-based Knowledge Correction for Robust Planning in Minecraft

The paper presents XENON, an advanced agent for robust planning in Minecraft that utilizes experience-based knowledge correction to impro...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2505.22475] Non-Asymptotic Analysis of (Sticky) Track-and-Stop

This paper presents a non-asymptotic analysis of the Sticky Track-and-Stop algorithm, extending its guarantees beyond asymptotic optimali...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.11348] AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

The paper introduces AgentNoiseBench, a framework for evaluating the robustness of tool-using LLM agents under noisy conditions, highligh...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

PLAICraft introduces a large-scale dataset capturing time-aligned vision, speech, and action data from multiplayer Minecraft, aimed at ad...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.02050] Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents

This article explores the role of entropy in optimizing tool-use behaviors for Large Language Model (LLM) agents, highlighting the correl...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2505.10992] ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks

The paper presents ReaCritic, a novel reasoning transformer-based critic model for deep reinforcement learning (DRL) in heterogeneous wir...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.00663] SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent

The paper presents SEISMO, a trajectory-aware LLM agent designed to enhance sample efficiency in molecular optimization, achieving signif...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2601.07611] DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning

DIAGPaper introduces a multi-agent framework for identifying and prioritizing weaknesses in scientific papers, addressing limitations of ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2601.01569] CaveAgent: Transforming LLMs into Stateful Runtime Operators

CaveAgent introduces a novel framework that transforms LLMs into stateful runtime operators, enhancing their ability to manage complex ta...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.18318] Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning

The paper presents Earth AI, a novel approach to geospatial analysis using foundation models and cross-modal reasoning to derive insights...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2502.07274] Forget Forgetting: Continual Learning in a World of Abundant Memory

The paper explores continual learning (CL) in AI, proposing a shift from minimizing memory usage to leveraging abundant memory while addr...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2509.00074] Language and Experience: A Computational Model of Social Learning in Complex Tasks

This article presents a computational model that explores how humans and AI can integrate linguistic guidance and direct experience for e...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2409.04332] Amortized Bayesian Workflow

The paper presents an Amortized Bayesian Workflow that combines fast amortized inference with accurate MCMC techniques, optimizing Bayesi...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2411.04760] Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks

This paper presents novel domain adaptation methods for Spiking Neural Networks (SNNs) to address performance drops due to mismatched tem...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2503.18825] EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents

The paper presents evaluation methods for assessing the economic decision-making capabilities of LLMs, focusing on benchmarks and litmus ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2503.10265] SurgRAW: Multi-Agent Workflow with Chain of Thought Reasoning for Robotic Surgical Video Analysis

The article presents SurgRAW, a multi-agent workflow utilizing Chain of Thought reasoning for enhanced robotic surgical video analysis, a...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16708] Policy Compiler for Secure Agentic Systems

The article presents PCAS, a Policy Compiler designed to enforce complex authorization policies in LLM-based agents, improving compliance...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16699] Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents

The paper presents a framework called Calibrate-Then-Act (CTA) that enables LLMs to optimize decision-making by balancing cost and uncert...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.16671] SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

The SPARC framework enhances automated C unit test generation by bridging the gap between program intent and syntactic constraints, impro...

arXiv - AI · 4 min · about 2 months ago

Previous Page 111 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Agents

Top This Week

[2602.00185] QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities

[2506.22653] URSA: The Universal Research and Scientific Agent

[2505.00472] UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces

All Content

[2505.24205] On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks

[2505.24157] Experience-based Knowledge Correction for Robust Planning in Minecraft

[2505.22475] Non-Asymptotic Analysis of (Sticky) Track-and-Stop

[2602.11348] AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

[2602.02050] Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents

[2505.10992] ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks

[2602.00663] SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent

[2601.07611] DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning

[2601.01569] CaveAgent: Transforming LLMs into Stateful Runtime Operators

[2510.18318] Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning

[2502.07274] Forget Forgetting: Continual Learning in a World of Abundant Memory

[2509.00074] Language and Experience: A Computational Model of Social Learning in Complex Tasks

[2409.04332] Amortized Bayesian Workflow

[2411.04760] Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks

[2503.18825] EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents

[2503.10265] SurgRAW: Multi-Agent Workflow with Chain of Thought Reasoning for Robotic Surgical Video Analysis

[2602.16708] Policy Compiler for Secure Agentic Systems

[2602.16699] Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents

[2602.16671] SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Related Topics

Stay updated with AI News