AI Agents
Autonomous agents, tool use, and agentic systems
Top This Week
Boston's CIO wants the public — and other city governments — to use his open-source agentic AI tools
[D] Your Agent, Their Asset: Real-world safety evaluation of OpenClaw agents (CIK poisoning raises attack success to ~64–74%)
Paper: https://arxiv.org/abs/2604.04759 This paper presents a real-world safety evaluation of OpenClaw, a personal AI agent with access t...
All Content
[2508.10480] Pinet: Optimizing hard-constrained neural networks with orthogonal projection layers
The paper introduces $ ext{Pinet}$, a novel output layer for neural networks that optimizes hard constraints using orthogonal projection ...
[2412.10999] Cocoa: Co-Planning and Co-Execution with AI Agents
The paper presents Cocoa, a system designed to enhance human-agent collaboration in AI tasks by allowing flexible co-planning and co-exec...
[2405.05523] Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training
This paper introduces a novel Positional Recovery Training (Port) framework for improving temporal grounding in animal behavior analysis,...
[2401.04536] Evaluating Language Model Agency through Negotiations
This paper introduces a novel method for evaluating language model agency through negotiation games, addressing limitations of existing b...
[2505.24205] On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks
This paper explores the expressive power of Mixture-of-Experts (MoEs) in modeling complex tasks, demonstrating their efficiency in approx...
[2505.24157] Experience-based Knowledge Correction for Robust Planning in Minecraft
The paper presents XENON, an advanced agent for robust planning in Minecraft that utilizes experience-based knowledge correction to impro...
[2505.22475] Non-Asymptotic Analysis of (Sticky) Track-and-Stop
This paper presents a non-asymptotic analysis of the Sticky Track-and-Stop algorithm, extending its guarantees beyond asymptotic optimali...
[2602.11348] AgentNoiseBench: Benchmarking Robustness of Tool-Using LLM Agents Under Noisy Condition
The paper introduces AgentNoiseBench, a framework for evaluating the robustness of tool-using LLM agents under noisy conditions, highligh...
[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI
PLAICraft introduces a large-scale dataset capturing time-aligned vision, speech, and action data from multiplayer Minecraft, aimed at ad...
[2602.02050] Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents
This article explores the role of entropy in optimizing tool-use behaviors for Large Language Model (LLM) agents, highlighting the correl...
[2505.10992] ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks
The paper presents ReaCritic, a novel reasoning transformer-based critic model for deep reinforcement learning (DRL) in heterogeneous wir...
[2602.00663] SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent
The paper presents SEISMO, a trajectory-aware LLM agent designed to enhance sample efficiency in molecular optimization, achieving signif...
[2601.07611] DIAGPaper: Diagnosing Valid and Specific Weaknesses in Scientific Papers via Multi-Agent Reasoning
DIAGPaper introduces a multi-agent framework for identifying and prioritizing weaknesses in scientific papers, addressing limitations of ...
[2601.01569] CaveAgent: Transforming LLMs into Stateful Runtime Operators
CaveAgent introduces a novel framework that transforms LLMs into stateful runtime operators, enhancing their ability to manage complex ta...
[2510.18318] Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning
The paper presents Earth AI, a novel approach to geospatial analysis using foundation models and cross-modal reasoning to derive insights...
[2502.07274] Forget Forgetting: Continual Learning in a World of Abundant Memory
The paper explores continual learning (CL) in AI, proposing a shift from minimizing memory usage to leveraging abundant memory while addr...
[2509.00074] Language and Experience: A Computational Model of Social Learning in Complex Tasks
This article presents a computational model that explores how humans and AI can integrate linguistic guidance and direct experience for e...
[2409.04332] Amortized Bayesian Workflow
The paper presents an Amortized Bayesian Workflow that combines fast amortized inference with accurate MCMC techniques, optimizing Bayesi...
[2411.04760] Zero-Shot Temporal Resolution Domain Adaptation for Spiking Neural Networks
This paper presents novel domain adaptation methods for Spiking Neural Networks (SNNs) to address performance drops due to mismatched tem...
[2503.18825] EconEvals: Benchmarks and Litmus Tests for Economic Decision-Making by LLM Agents
The paper presents evaluation methods for assessing the economic decision-making capabilities of LLMs, focusing on benchmarks and litmus ...
Related Topics
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime