AI Agents

Autonomous agents, tool use, and agentic systems

Top This Week

Llms

Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.

Anthropic launches Claude Managed Agents in public beta — composable APIs for shipping production AI agents 10x faster Handles sandboxing...

Reddit - Artificial Intelligence · 1 min ·
Llms

persistent memory system for AI agents — single SQLite file, no external server, no API keys. free and opensource - BrainCTL

Every agent I build forgets everything between sessions. I got tired of it and built brainctl. pip install brainctl, then: from agentmemo...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Why does Multi-Agent RL fail to act like a real society in Spatial Game Theory? [P] [R]

Hey everyone, I’m building a project for my university Machine Learning course called "Social network analysis using iterated game theory...

Reddit - Machine Learning · 1 min ·

All Content

[2510.10854] Discrete State Diffusion Models: A Sample Complexity Perspective
Machine Learning

[2510.10854] Discrete State Diffusion Models: A Sample Complexity Perspective

This article presents a theoretical framework for discrete-state diffusion models, offering the first sample complexity bounds and insigh...

arXiv - AI · 3 min ·
[2510.06714] Dual Goal Representations
Machine Learning

[2510.06714] Dual Goal Representations

The paper introduces dual goal representations for goal-conditioned reinforcement learning (GCRL), enhancing state characterization and i...

arXiv - AI · 3 min ·
[2510.03669] Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
Llms

[2510.03669] Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning

This article introduces the Token Hidden Reward (THR) metric, which enhances exploration-exploitation strategies in Group Relative Deep R...

arXiv - Machine Learning · 4 min ·
[2303.09807] TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction
Machine Learning

[2303.09807] TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

The paper presents TKN, a transformer-based neural network designed for real-time video prediction, achieving a remarkable prediction rat...

arXiv - AI · 4 min ·
[2510.02410] OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data
Llms

[2510.02410] OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data

OpenTSLM introduces a new family of Time Series Language Models designed to enhance reasoning over multivariate medical data, outperformi...

arXiv - Machine Learning · 4 min ·
[2602.06855] AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents
Llms

[2602.06855] AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

AIRS-Bench introduces a suite of 20 tasks designed to evaluate AI agents' capabilities in scientific research, highlighting areas of stre...

arXiv - AI · 4 min ·
[2602.05847] OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention
Machine Learning

[2602.05847] OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention

The paper introduces OmniVideo-R1, a novel framework designed to enhance audio-visual reasoning through query intention and modality atte...

arXiv - AI · 3 min ·
[2509.23106] Effective Quantization of Muon Optimizer States
Llms

[2509.23106] Effective Quantization of Muon Optimizer States

The paper presents the 8-bit Muon optimizer, which enhances computational efficiency and reduces memory usage in large-scale machine lear...

arXiv - Machine Learning · 3 min ·
[2602.05354] PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents
Machine Learning

[2602.05354] PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents

The paper introduces PATHWAYS, a benchmark assessing AI web agents' ability to discover and utilize hidden contextual information in mult...

arXiv - AI · 3 min ·
[2602.01848] ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems
Ai Agents

[2602.01848] ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

The paper introduces ROMA, a Recursive Open Meta-Agent Framework designed to enhance performance in long-horizon multi-agent systems by a...

arXiv - AI · 4 min ·
[2509.16117] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Llms

[2509.16117] DiffusionNFT: Online Diffusion Reinforcement with Forward Process

The paper presents DiffusionNFT, a novel online reinforcement learning paradigm that optimizes diffusion models directly on the forward p...

arXiv - AI · 4 min ·
[2602.00851] Persuasion Propagation in LLM Agents
Llms

[2602.00851] Persuasion Propagation in LLM Agents

The paper explores how user persuasion affects the behavior of large language model (LLM) agents during long-horizon tasks, revealing tha...

arXiv - AI · 3 min ·
[2601.21972] Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
Llms

[2601.21972] Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

The paper presents Multi-Agent Actor-Critic (MAAC) methods for optimizing decentralized collaboration among large language models (LLMs),...

arXiv - AI · 4 min ·
[2601.21654] ScholarGym: Benchmarking Large Language Model Capabilities in the Information-Gathering Stage of Deep Research
Llms

[2601.21654] ScholarGym: Benchmarking Large Language Model Capabilities in the Information-Gathering Stage of Deep Research

The paper introduces ScholarGym, an evaluation environment designed to benchmark large language models in the information-gathering phase...

arXiv - AI · 3 min ·
[2601.15311] Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents
Llms

[2601.15311] Aeon: High-Performance Neuro-Symbolic Memory Management for Long-Horizon LLM Agents

The paper presents Aeon, a Neuro-Symbolic Cognitive Operating System designed to enhance memory management in Long-Horizon LLM agents, ad...

arXiv - AI · 4 min ·
[2508.13415] MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection
Llms

[2508.13415] MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection

The paper introduces MAVIS, a framework for aligning large language models (LLMs) to multiple objectives at inference time, enhancing fle...

arXiv - Machine Learning · 4 min ·
[2601.08005] Internal Deployment Gaps in AI Regulation
Ai Safety

[2601.08005] Internal Deployment Gaps in AI Regulation

This article examines the regulatory gaps in AI deployment within organizations, highlighting issues that allow internal systems to evade...

arXiv - AI · 3 min ·
[2601.05525] Explainable AI: Learning from the Learners
Llms

[2601.05525] Explainable AI: Learning from the Learners

This article discusses the importance of explainable AI (XAI) in enhancing trust and accountability in AI applications, particularly in s...

arXiv - Machine Learning · 3 min ·
[2601.04911] From Stories to Cities to Games: A Qualitative Evaluation of Behaviour Planning
Ai Startups

[2601.04911] From Stories to Cities to Games: A Qualitative Evaluation of Behaviour Planning

This paper evaluates a novel behaviour planning approach, demonstrating its effectiveness across diverse domains such as storytelling, ur...

arXiv - AI · 3 min ·
[2508.08326] Weather-Driven Agricultural Decision-Making Using Digital Twins Under Imperfect Conditions
Machine Learning

[2508.08326] Weather-Driven Agricultural Decision-Making Using Digital Twins Under Imperfect Conditions

This article explores the use of digital twin technology in agriculture, focusing on its ability to enhance decision-making under imperfe...

arXiv - Machine Learning · 3 min ·
Previous Page 132 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime