AI Agents

Autonomous agents, tool use, and agentic systems

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Agents

New privacy tool helps detect when AI agents become double agents

AI Tools & Products · 5 min · about 1 hour ago

Ai Agents

Boston's CIO wants the public — and other city governments — to use his open-source agentic AI tools

AI Tools & Products · 7 min · about 1 hour ago

Machine Learning

[D] Your Agent, Their Asset: Real-world safety evaluation of OpenClaw agents (CIK poisoning raises attack success to ~64–74%)

Paper: https://arxiv.org/abs/2604.04759 This paper presents a real-world safety evaluation of OpenClaw, a personal AI agent with access t...

Reddit - Machine Learning · 1 min · about 3 hours ago

All Content

Machine Learning

[2508.10480] Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers

The paper introduces $ ext{Pinet}$, a novel output layer for neural networks that optimizes hard constraints using orthogonal projection ...

arXiv - Machine Learning · 3 min · about 2 months ago

Nlp

[2412.10999] Cocoa: Co-Planning and Co-Execution with AI Agents

The paper presents Cocoa, a system designed to enhance human-agent collaboration in AI tasks by allowing flexible co-planning and co-exec...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2405.05523] Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

This paper introduces a novel Positional Recovery Training (Port) framework for improving temporal grounding in animal behavior analysis,...

arXiv - AI · 3 min · about 2 months ago

Llms

[2401.04536] Evaluating Language Model Agency through Negotiations

This paper introduces a novel method for evaluating language model agency through negotiation games, addressing limitations of existing b...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2505.24205] On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks

This paper explores the expressive power of Mixture-of-Experts (MoEs) in modeling complex tasks, demonstrating their efficiency in approx...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2505.24157] Experience-based Knowledge Correction for Robust Planning in Minecraft

The paper presents XENON, an advanced agent for robust planning in Minecraft that utilizes experience-based knowledge correction to impro...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2505.22475] Non-Asymptotic Analysis of (Sticky) Track-and-Stop

This paper presents a non-asymptotic analysis of the Sticky Track-and-Stop algorithm, extending its guarantees beyond asymptotic optimali...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.11348] AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

The paper introduces AgentNoiseBench, a framework for evaluating the robustness of tool-using LLM agents under noisy conditions, highligh...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

PLAICraft introduces a large-scale dataset capturing time-aligned vision, speech, and action data from multiplayer Minecraft, aimed at ad...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.02050] Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents

This article explores the role of entropy in optimizing tool-use behaviors for Large Language Model (LLM) agents, highlighting the correl...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2505.10992] ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks

The paper presents ReaCritic, a novel reasoning transformer-based critic model for deep reinforcement learning (DRL) in heterogeneous wir...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.00663] SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent

The paper presents SEISMO, a trajectory-aware LLM agent designed to enhance sample efficiency in molecular optimization, achieving signif...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2601.07611] DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning

DIAGPaper introduces a multi-agent framework for identifying and prioritizing weaknesses in scientific papers, addressing limitations of ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2601.01569] CaveAgent: Transforming LLMs into Stateful Runtime Operators

CaveAgent introduces a novel framework that transforms LLMs into stateful runtime operators, enhancing their ability to manage complex ta...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.18318] Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning

The paper presents Earth AI, a novel approach to geospatial analysis using foundation models and cross-modal reasoning to derive insights...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2502.07274] Forget Forgetting: Continual Learning in a World of Abundant Memory

The paper explores continual learning (CL) in AI, proposing a shift from minimizing memory usage to leveraging abundant memory while addr...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2509.00074] Language and Experience: A Computational Model of Social Learning in Complex Tasks

This article presents a computational model that explores how humans and AI can integrate linguistic guidance and direct experience for e...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2409.04332] Amortized Bayesian Workflow

The paper presents an Amortized Bayesian Workflow that combines fast amortized inference with accurate MCMC techniques, optimizing Bayesi...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2411.04760] Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks

This paper presents novel domain adaptation methods for Spiking Neural Networks (SNNs) to address performance drops due to mismatched tem...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2503.18825] EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents

The paper presents evaluation methods for assessing the economic decision-making capabilities of LLMs, focusing on benchmarks and litmus ...

arXiv - AI · 4 min · about 2 months ago

Previous Page 109 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Agents

Top This Week

New privacy tool helps detect when AI agents become double agents

Boston's CIO wants the public — and other city governments — to use his open-source agentic AI tools

[D] Your Agent, Their Asset: Real-world safety evaluation of OpenClaw agents (CIK poisoning raises attack success to ~64–74%)

All Content

[2508.10480] Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers

[2412.10999] Cocoa: Co-Planning and Co-Execution with AI Agents

[2405.05523] Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

[2401.04536] Evaluating Language Model Agency through Negotiations

[2505.24205] On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks

[2505.24157] Experience-based Knowledge Correction for Robust Planning in Minecraft

[2505.22475] Non-Asymptotic Analysis of (Sticky) Track-and-Stop

[2602.11348] AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition

[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

[2602.02050] Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents

[2505.10992] ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks

[2602.00663] SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent

[2601.07611] DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning

[2601.01569] CaveAgent: Transforming LLMs into Stateful Runtime Operators

[2510.18318] Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning

[2502.07274] Forget Forgetting: Continual Learning in a World of Abundant Memory

[2509.00074] Language and Experience: A Computational Model of Social Learning in Complex Tasks

[2409.04332] Amortized Bayesian Workflow

[2411.04760] Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks

[2503.18825] EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents

Related Topics

Stay updated with AI News