AI Agents

Autonomous agents, tool use, and agentic systems

Top This Week

New privacy tool helps detect when AI agents become double agents
Ai Agents

New privacy tool helps detect when AI agents become double agents

AI Tools & Products · 5 min ·
Boston's CIO wants the public — and other city governments — to use his open-source agentic AI tools
Ai Agents

Boston's CIO wants the public — and other city governments — to use his open-source agentic AI tools

AI Tools & Products · 7 min ·
Machine Learning

[D] Your Agent, Their Asset: Real-world safety evaluation of OpenClaw agents (CIK poisoning raises attack success to ~64–74%)

Paper: https://arxiv.org/abs/2604.04759 This paper presents a real-world safety evaluation of OpenClaw, a personal AI agent with access t...

Reddit - Machine Learning · 1 min ·

All Content

[2508.10480] Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
Machine Learning

[2508.10480] Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers

The paper introduces $ ext{Pinet}$, a novel output layer for neural networks that optimizes hard constraints using orthogonal projection ...

arXiv - Machine Learning · 3 min ·
[2412.10999] Cocoa: Co-Planning and Co-Execution with AI Agents
Nlp

[2412.10999] Cocoa: Co-Planning and Co-Execution with AI Agents

The paper presents Cocoa, a system designed to enhance human-agent collaboration in AI tasks by allowing flexible co-planning and co-exec...

arXiv - AI · 4 min ·
[2405.05523] Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training
Machine Learning

[2405.05523] Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

This paper introduces a novel Positional Recovery Training (Port) framework for improving temporal grounding in animal behavior analysis,...

arXiv - AI · 3 min ·
[2401.04536] Evaluating Language Model Agency through Negotiations
Llms

[2401.04536] Evaluating Language Model Agency through Negotiations

This paper introduces a novel method for evaluating language model agency through negotiation games, addressing limitations of existing b...

arXiv - Machine Learning · 3 min ·
[2505.24205] On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks
Machine Learning

[2505.24205] On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks

This paper explores the expressive power of Mixture-of-Experts (MoEs) in modeling complex tasks, demonstrating their efficiency in approx...

arXiv - Machine Learning · 3 min ·
[2505.24157] Experience-based Knowledge Correction for Robust Planning in Minecraft
Llms

[2505.24157] Experience-based Knowledge Correction for Robust Planning in Minecraft

The paper presents XENON, an advanced agent for robust planning in Minecraft that utilizes experience-based knowledge correction to impro...

arXiv - Machine Learning · 3 min ·
[2505.22475] Non-Asymptotic Analysis of (Sticky) Track-and-Stop
Machine Learning

[2505.22475] Non-Asymptotic Analysis of (Sticky) Track-and-Stop

This paper presents a non-asymptotic analysis of the Sticky Track-and-Stop algorithm, extending its guarantees beyond asymptotic optimali...

arXiv - Machine Learning · 3 min ·
[2602.11348] AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition
Llms

[2602.11348] AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

The paper introduces AgentNoiseBench, a framework for evaluating the robustness of tool-using LLM agents under noisy conditions, highligh...

arXiv - AI · 4 min ·
[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI
Machine Learning

[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

PLAICraft introduces a large-scale dataset capturing time-aligned vision, speech, and action data from multiplayer Minecraft, aimed at ad...

arXiv - Machine Learning · 4 min ·
[2602.02050] Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents
Llms

[2602.02050] Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents

This article explores the role of entropy in optimizing tool-use behaviors for Large Language Model (LLM) agents, highlighting the correl...

arXiv - AI · 4 min ·
[2505.10992] ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks
Machine Learning

[2505.10992] ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks

The paper presents ReaCritic, a novel reasoning transformer-based critic model for deep reinforcement learning (DRL) in heterogeneous wir...

arXiv - Machine Learning · 4 min ·
[2602.00663] SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent
Llms

[2602.00663] SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent

The paper presents SEISMO, a trajectory-aware LLM agent designed to enhance sample efficiency in molecular optimization, achieving signif...

arXiv - Machine Learning · 4 min ·
[2601.07611] DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning
Llms

[2601.07611] DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning

DIAGPaper introduces a multi-agent framework for identifying and prioritizing weaknesses in scientific papers, addressing limitations of ...

arXiv - AI · 4 min ·
[2601.01569] CaveAgent: Transforming LLMs into Stateful Runtime Operators
Llms

[2601.01569] CaveAgent: Transforming LLMs into Stateful Runtime Operators

CaveAgent introduces a novel framework that transforms LLMs into stateful runtime operators, enhancing their ability to manage complex ta...

arXiv - AI · 4 min ·
[2510.18318] Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning
Llms

[2510.18318] Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning

The paper presents Earth AI, a novel approach to geospatial analysis using foundation models and cross-modal reasoning to derive insights...

arXiv - AI · 4 min ·
[2502.07274] Forget Forgetting: Continual Learning in a World of Abundant Memory
Machine Learning

[2502.07274] Forget Forgetting: Continual Learning in a World of Abundant Memory

The paper explores continual learning (CL) in AI, proposing a shift from minimizing memory usage to leveraging abundant memory while addr...

arXiv - Machine Learning · 4 min ·
[2509.00074] Language and Experience: A Computational Model of Social Learning in Complex Tasks
Machine Learning

[2509.00074] Language and Experience: A Computational Model of Social Learning in Complex Tasks

This article presents a computational model that explores how humans and AI can integrate linguistic guidance and direct experience for e...

arXiv - Machine Learning · 4 min ·
[2409.04332] Amortized Bayesian Workflow
Machine Learning

[2409.04332] Amortized Bayesian Workflow

The paper presents an Amortized Bayesian Workflow that combines fast amortized inference with accurate MCMC techniques, optimizing Bayesi...

arXiv - Machine Learning · 3 min ·
[2411.04760] Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks
Machine Learning

[2411.04760] Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks

This paper presents novel domain adaptation methods for Spiking Neural Networks (SNNs) to address performance drops due to mismatched tem...

arXiv - Machine Learning · 4 min ·
[2503.18825] EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents
Llms

[2503.18825] EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents

The paper presents evaluation methods for assessing the economic decision-making capabilities of LLMs, focusing on benchmarks and litmus ...

arXiv - AI · 4 min ·
Previous Page 109 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime