AI Agents

Autonomous agents, tool use, and agentic systems

Top This Week

Ai Agents

Agent frameworks waste ~350,000+ tokens per session resending static files. 95% reduction benchmarked.

Measured the actual token waste on a local Qwen 3.5 122B setup. The numbers are unreal. Found a compile-time approach that cuts query con...

Reddit - Artificial Intelligence · 1 min ·
OpenClaw gives users yet another reason to be freaked out about security - Ars Technica
Ai Agents

OpenClaw gives users yet another reason to be freaked out about security - Ars Technica

The viral AI agentic tool let attackers silently gain admin unauthenticated access.

Ars Technica - AI · 5 min ·
Robotics

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

Ran an experiment — gave AI agents full control over writing, character creation, and performing a sitcom. Left it running nonstop for ov...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2306.00554] ShaRP: Shape-Regularized Multidimensional Projections
Data Science

[2306.00554] ShaRP: Shape-Regularized Multidimensional Projections

The paper introduces ShaRP, a novel projection technique for dimensionality reduction that allows users to control the visual signature o...

arXiv - AI · 3 min ·
[2602.21201] Aletheia tackles FirstProof autonomously
Llms

[2602.21201] Aletheia tackles FirstProof autonomously

The paper presents Aletheia, an autonomous mathematics research agent that successfully solved 6 out of 10 problems in the FirstProof cha...

arXiv - Machine Learning · 3 min ·
[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
Machine Learning

[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

The paper presents NoRD, a data-efficient Vision-Language-Action model that enhances autonomous driving without requiring extensive datas...

arXiv - AI · 3 min ·
[2602.21143] A Benchmark for Deep Information Synthesis
Llms

[2602.21143] A Benchmark for Deep Information Synthesis

The paper introduces DEEPSYNTH, a benchmark for evaluating large language models on complex tasks requiring deep information synthesis an...

arXiv - Machine Learning · 4 min ·
[2602.21066] The Initial Exploration Problem in Knowledge Graph Exploration
Ai Agents

[2602.21066] The Initial Exploration Problem in Knowledge Graph Exploration

This paper introduces the Initial Exploration Problem (IEP) in Knowledge Graphs, highlighting barriers faced by users during their first ...

arXiv - AI · 4 min ·
[2602.21064] Motivation is Something You Need
Machine Learning

[2602.21064] Motivation is Something You Need

The paper presents a novel training paradigm for AI that integrates concepts from affective neuroscience, focusing on a dual-model framew...

arXiv - Machine Learning · 3 min ·
[2602.21044] LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification
Llms

[2602.21044] LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

LogicGraph introduces a benchmark for evaluating multi-path logical reasoning in large language models, highlighting their limitations in...

arXiv - AI · 4 min ·
[2602.21061] Tool Building as a Path to "Superintelligence"
Llms

[2602.21061] Tool Building as a Path to "Superintelligence"

The paper explores how Large Language Models (LLMs) can achieve superintelligence through the Diligent Learner framework, emphasizing the...

arXiv - AI · 3 min ·
[2602.20934] Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence
Llms

[2602.20934] Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence

The paper introduces AgentOS, a conceptual framework that transitions Large Language Models from static inference engines to dynamic cogn...

arXiv - AI · 3 min ·
[2602.20926] HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG
Llms

[2602.20926] HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG

This article presents the HELP framework, which enhances Retrieval-Augmented Generation (RAG) by addressing knowledge boundaries and hall...

arXiv - AI · 4 min ·
[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth
Llms

[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

This paper presents a novel evaluation framework for assessing the alignment of language models under realistic pressure, revealing behav...

arXiv - AI · 3 min ·
[2602.20810] POMDPPlanners: Open-Source Package for POMDP Planning
Ai Startups

[2602.20810] POMDPPlanners: Open-Source Package for POMDP Planning

POMDPPlanners is an open-source Python package designed for the empirical evaluation of POMDP planning algorithms, integrating advanced f...

arXiv - AI · 3 min ·
[2602.20739] PyVision-RL: Forging Open Agentic Vision Models via RL
Machine Learning

[2602.20739] PyVision-RL: Forging Open Agentic Vision Models via RL

The paper introduces PyVision-RL, a reinforcement learning framework designed to enhance agentic multimodal models by preventing interact...

arXiv - AI · 3 min ·
[2602.20728] Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback
Llms

[2602.20728] Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback

This paper explores the use of reinforcement learning from AI feedback (RLAIF) to balance multiple objectives in urban traffic control, a...

arXiv - AI · 3 min ·
[2602.20723] Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation
Machine Learning

[2602.20723] Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

The paper presents MAGNET, a novel multimodal recommendation framework that utilizes a mixture of adaptive graph experts and entropy-trig...

arXiv - AI · 4 min ·
[2602.20722] Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
Llms

[2602.20722] Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning

This paper introduces Batch Adaptation Policy Optimization (BAPO), an off-policy reinforcement learning framework designed to enhance dat...

arXiv - AI · 3 min ·
[2602.20708] ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction
Llms

[2602.20708] ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

The paper introduces ICON, a novel framework designed to defend Large Language Model (LLM) agents against Indirect Prompt Injection (IPI)...

arXiv - AI · 3 min ·
[2602.20706] Online Algorithms with Unreliable Guidance
Machine Learning

[2602.20706] Online Algorithms with Unreliable Guidance

This paper presents a novel model for online decision-making called Online Algorithms with Unreliable Guidance (OAG), which separates pre...

arXiv - AI · 4 min ·
[2602.20687] How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective
Llms

[2602.20687] How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

This article discusses the limitations of current benchmarks for vision-language model (VLM)-driven embodied agents and introduces Native...

arXiv - AI · 4 min ·
[2602.20639] Grounding LLMs in Scientific Discovery via Embodied Actions
Llms

[2602.20639] Grounding LLMs in Scientific Discovery via Embodied Actions

The paper presents EmbodiedAct, a framework that enhances Large Language Models (LLMs) by grounding them in embodied actions for scientif...

arXiv - AI · 3 min ·
Previous Page 63 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime