AI Agents

Autonomous agents, tool use, and agentic systems

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Agents

AMD's GAIA now allows building custom AI agents via chat, becomes "true desktop app"

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

Claude code x n8n

Hi everyone, I’ve been exploring MCP and integrating tools like n8n with Claude Code, and I’m trying to understand how practical this rea...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

Ai Agents

Cloudflare just turned Browser Rendering into a lot more powerful MCP infrastructure

Browser Rendering now exposes the Chrome DevTools Protocol, which means MCP clients can access a remote browser directly. That’s a pretty...

Reddit - Artificial Intelligence · 1 min · about 15 hours ago

All Content

Llms

[2602.13473] NeuroWeaver: An Autonomous Evolutionary Agent for Exploring the Programmatic Space of EEG Analysis Pipelines

NeuroWeaver is an autonomous evolutionary agent designed to optimize EEG analysis pipelines, addressing data constraints and computationa...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13477] OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

The paper 'OMNI-LEAK' explores security vulnerabilities in multi-agent systems, revealing how a coordinated attack can lead to data leaka...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13502] Translating Dietary Standards into Healthy Meals with Minimal Substitutions

This article presents a framework for creating nutritious meals that adhere to dietary standards with minimal substitutions, enhancing bo...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.13407] On-Policy Supervised Fine-Tuning for Efficient Reasoning

The paper presents a novel training strategy called on-policy supervised fine-tuning (SFT) for large reasoning models, simplifying the op...

arXiv - AI · 4 min · about 2 months ago

Ai Safety

[2602.13372] MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

The paper introduces MoralityGym, a benchmark for assessing hierarchical moral alignment in AI decision-making, utilizing 98 ethical dile...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.13367] Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

Nanbeige4.1-3B is a novel small generalist language model that excels in reasoning, alignment, and code generation, demonstrating signifi...

arXiv - AI · 4 min · about 2 months ago

Robotics

[2602.13323] Contrastive explanations of BDI agents

This article discusses the extension of Belief-Desire-Intention (BDI) agents to provide contrastive explanations, enhancing transparency ...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13320] Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol

This article presents a theoretical framework for analyzing error propagation in tool-using LLM agents, proving linear growth of cumulati...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.13319] Situation Graph Prediction: Structured Perspective Inference for User Modeling

The paper presents Situation Graph Prediction (SGP), a novel approach for modeling user perspectives by reconstructing structured represe...

arXiv - AI · 3 min · about 2 months ago

Ai Safety

[2602.13292] Mirror: A Multi-Agent System for AI-Assisted Ethics Review

The paper presents Mirror, a multi-agent system designed to enhance AI-assisted ethics reviews, addressing the limitations of current eth...

arXiv - AI · 4 min · about 2 months ago

Ai Agents

[2602.13318] DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

DECKBench introduces a new evaluation framework for multi-agent systems focused on generating and editing academic slide decks, addressin...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.13283] Accuracy Standards for AI at Work vs. Personal Life: Evidence from an Online Survey

This article examines how individuals prioritize accuracy in AI tools differently in professional versus personal contexts, based on an o...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.13280] BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation

The paper presents BEAGLE, a neuro-symbolic framework that simulates student learning behaviors in open-ended problem-solving environment...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13275] Artificial Organisations

The paper 'Artificial Organisations' explores how multi-agent AI systems can achieve reliable outcomes through architectural design, draw...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13272] TemporalBench: A Benchmark for Evaluating LLM-Based Agents on Contextual and Event-Informed Time Series Tasks

TemporalBench introduces a benchmark for evaluating LLM-based agents on time series tasks, focusing on contextual and event-informed reas...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.13262] General learned delegation by clones

The paper presents SELFCEST, a novel approach that enhances language models by enabling them to create clones for improved reasoning effi...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13258] MAPLE: A Sub-Agent Architecture for Memory, Learning, and Personalization in Agentic AI Systems

The paper presents MAPLE, a novel sub-agent architecture designed to enhance memory, learning, and personalization in AI systems, address...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13255] DPBench: Large Language Models Struggle with Simultaneous Coordination

The paper introduces DPBench, a benchmark assessing how well large language models (LLMs) coordinate in multi-agent systems, revealing si...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.13235] Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains

The paper introduces Lang2Act, a novel framework for enhancing visual reasoning in Vision-Language Models (VLMs) through self-emergent li...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.13237] NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models

NL2LOGIC presents a novel framework for translating natural language into first-order logic using large language models, enhancing accura...

arXiv - AI · 4 min · about 2 months ago

Previous Page 145 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Agents

Top This Week

AMD's GAIA now allows building custom AI agents via chat, becomes "true desktop app"

Claude code x n8n

Cloudflare just turned Browser Rendering into a lot more powerful MCP infrastructure

All Content

[2602.13473] NeuroWeaver: An Autonomous Evolutionary Agent for Exploring the Programmatic Space of EEG Analysis Pipelines

[2602.13477] OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage

[2602.13502] Translating Dietary Standards into Healthy Meals with Minimal Substitutions

[2602.13407] On-Policy Supervised Fine-Tuning for Efficient Reasoning

[2602.13372] MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

[2602.13367] Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

[2602.13323] Contrastive explanations of BDI agents

[2602.13320] Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol

[2602.13319] Situation Graph Prediction: Structured Perspective Inference for User Modeling

[2602.13292] Mirror: A Multi-Agent System for AI-Assisted Ethics Review

[2602.13318] DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

[2602.13283] Accuracy Standards for AI at Work vs. Personal Life: Evidence from an Online Survey

[2602.13280] BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation

[2602.13275] Artificial Organisations

[2602.13272] TemporalBench: A Benchmark for Evaluating LLM-Based Agents on Contextual and Event-Informed Time Series Tasks

[2602.13262] General learned delegation by clones

[2602.13258] MAPLE: A Sub-Agent Architecture for Memory, Learning, and Personalization in Agentic AI Systems

[2602.13255] DPBench: Large Language Models Struggle with Simultaneous Coordination

[2602.13235] Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains

[2602.13237] NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models

Related Topics

Stay updated with AI News