AI Agents

Autonomous agents, tool use, and agentic systems

Top This Week

Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min ·
Llms

What I learned about multi-agent coordination running 9 specialized Claude agents

I've been experimenting with multi-agent AI systems and ended up building something more ambitious than I originally planned: a fully ope...

Reddit - Artificial Intelligence · 1 min ·
Robotics

What happens when AI agents can earn and spend real money? I built a small test to find out

I've been sitting with a question for a while: what happens when AI agents aren't just tools to be used, but participants in an economy? ...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.21340] HiPPO Zoo: Explicit Memory Mechanisms for Interpretable State Space Models
Machine Learning

[2602.21340] HiPPO Zoo: Explicit Memory Mechanisms for Interpretable State Space Models

The paper introduces the HiPPO Zoo, a framework enhancing state space models with explicit memory mechanisms for improved interpretabilit...

arXiv - Machine Learning · 4 min ·
[2602.21328] Efficient Opportunistic Approachability
Machine Learning

[2602.21328] Efficient Opportunistic Approachability

This paper presents an efficient algorithm for opportunistic approachability, improving upon previous methods by achieving faster approac...

arXiv - Machine Learning · 3 min ·
[2602.21320] Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data
Llms

[2602.21320] Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

The paper presents Tool-R0, a framework for training self-evolving LLM agents capable of tool-learning without prior data, showcasing sig...

arXiv - Machine Learning · 4 min ·
[2602.21319] Uncertainty-Aware Diffusion Model for Multimodal Highway Trajectory Prediction via DDIM Sampling
Machine Learning

[2602.21319] Uncertainty-Aware Diffusion Model for Multimodal Highway Trajectory Prediction via DDIM Sampling

The paper presents cVMDx, an advanced diffusion model for multimodal highway trajectory prediction, enhancing efficiency and accuracy in ...

arXiv - Machine Learning · 3 min ·
[2602.21297] Robust AI Evaluation through Maximal Lotteries
Llms

[2602.21297] Robust AI Evaluation through Maximal Lotteries

The paper proposes a new method for evaluating AI models using robust lotteries, addressing limitations of traditional pairwise compariso...

arXiv - Machine Learning · 3 min ·
[2602.05066] Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks
Ai Safety

[2602.05066] Bypassing AI Control Protocols via Agent-as-a-Proxy Attacks

The paper discusses vulnerabilities in AI control protocols, specifically how Agent-as-a-Proxy attacks can bypass existing monitoring def...

arXiv - AI · 3 min ·
[2602.02007] Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation
Nlp

[2602.02007] Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation

The paper introduces xMemory, a novel approach to agent memory systems that enhances retrieval by decoupling and aggregating semantic com...

arXiv - AI · 4 min ·
[2602.00462] LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs
Llms

[2602.00462] LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs

The paper introduces LatentLens, a method for mapping visual tokens to natural language descriptions in Vision-Language Models (VLMs), en...

arXiv - AI · 4 min ·
[2602.00012] OGD4All: A Framework for Accessible Interaction with Geospatial Open Government Data Based on Large Language Models
Llms

[2602.00012] OGD4All: A Framework for Accessible Interaction with Geospatial Open Government Data Based on Large Language Models

The OGD4All framework enhances citizen interaction with geospatial Open Government Data using Large Language Models, achieving high accur...

arXiv - Machine Learning · 3 min ·
[2601.15715] RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind
Ai Agents

[2601.15715] RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind

The paper presents RebuttalAgent, a framework using Theory of Mind for strategic persuasion in academic rebuttals, addressing the complex...

arXiv - AI · 4 min ·
[2601.08026] FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures
Computer Vision

[2601.08026] FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures

The paper presents FigEx2, a framework for detecting and captioning panels in scientific compound figures, enhancing understanding and ac...

arXiv - AI · 4 min ·
[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective
Ai Safety

[2512.17989] The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

This article explores the gaps in understanding superintelligence misalignment, emphasizing the absence of the human subject and the impl...

arXiv - AI · 4 min ·
[2512.08639] Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
Nlp

[2512.08639] Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning

This article presents a unified framework for Aerial Vision-Language Navigation (VLN), enabling UAVs to interpret natural language and na...

arXiv - AI · 4 min ·
[2512.09069] KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification
Machine Learning

[2512.09069] KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification

The paper presents KD-OCT, a novel knowledge distillation framework that enhances the efficiency of deep learning models for classifying ...

arXiv - Machine Learning · 4 min ·
[2511.20718] Stabilizing Off-Policy Training for Long-Horizon LLM Agent via Turn-Level Importance Sampling and Clipping-Triggered Normalization
Llms

[2511.20718] Stabilizing Off-Policy Training for Long-Horizon LLM Agent via Turn-Level Importance Sampling and Clipping-Triggered Normalization

This article presents SORL, a novel approach to stabilize off-policy training for long-horizon LLM agents, addressing issues of instabili...

arXiv - Machine Learning · 4 min ·
[2511.01734] A Proof of Learning Rate Transfer under $μ$P
Machine Learning

[2511.01734] A Proof of Learning Rate Transfer under $μ$P

This paper presents a proof of learning rate transfer in linear multi-layer perceptrons (MLPs) using a new parameterization method called...

arXiv - Machine Learning · 3 min ·
[2511.00062] World Simulation with Video Foundation Models for Physical AI
Llms

[2511.00062] World Simulation with Video Foundation Models for Physical AI

The paper presents Cosmos-Predict2.5, an advanced model for world simulation in Physical AI, integrating various generation methods and i...

arXiv - Machine Learning · 5 min ·
[2510.18060] SPACeR: Self-Play Anchoring with Centralized Reference Models
Machine Learning

[2510.18060] SPACeR: Self-Play Anchoring with Centralized Reference Models

The paper introduces SPACeR, a framework for enhancing autonomous vehicle behavior through self-play reinforcement learning anchored by a...

arXiv - Machine Learning · 4 min ·
[2510.10472] FML-bench: Benchmarking Machine Learning Agents for Scientific Research
Llms

[2510.10472] FML-bench: Benchmarking Machine Learning Agents for Scientific Research

The paper introduces FML-bench, a new benchmark for evaluating machine learning agents in scientific research, focusing on exploration di...

arXiv - AI · 4 min ·
[2510.05077] Slm-mux: Orchestrating small language models for reasoning
Llms

[2510.05077] Slm-mux: Orchestrating small language models for reasoning

The paper presents SLM-MUX, a novel architecture for orchestrating small language models (SLMs) to improve reasoning accuracy, achieving ...

arXiv - AI · 4 min ·
Previous Page 45 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime