AI Agents

Autonomous agents, tool use, and agentic systems

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Agents

Agent frameworks waste ~350,000+ tokens per session resending static files. 95% reduction benchmarked.

Measured the actual token waste on a local Qwen 3.5 122B setup. The numbers are unreal. Found a compile-time approach that cuts query con...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Ai Agents

OpenClaw gives users yet another reason to be freaked out about security - Ars Technica

The viral AI agentic tool let attackers silently gain admin unauthenticated access.

Ars Technica - AI · 5 min · about 7 hours ago

Robotics

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

Ran an experiment — gave AI agents full control over writing, character creation, and performing a sitcom. Left it running nonstop for ov...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

All Content

Data Science

[2306.00554] ShaRP: Shape-Regularized Multidimensional Projections

The paper introduces ShaRP, a novel projection technique for dimensionality reduction that allows users to control the visual signature o...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21201] Aletheia tackles FirstProof autonomously

The paper presents Aletheia, an autonomous mathematics research agent that successfully solved 6 out of 10 problems in the FirstProof cha...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

The paper presents NoRD, a data-efficient Vision-Language-Action model that enhances autonomous driving without requiring extensive datas...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21143] A Benchmark for Deep Information Synthesis

The paper introduces DEEPSYNTH, a benchmark for evaluating large language models on complex tasks requiring deep information synthesis an...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Agents

[2602.21066] The Initial Exploration Problem in Knowledge Graph Exploration

This paper introduces the Initial Exploration Problem (IEP) in Knowledge Graphs, highlighting barriers faced by users during their first ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21064] Motivation is Something You Need

The paper presents a novel training paradigm for AI that integrates concepts from affective neuroscience, focusing on a dual-model framew...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.21044] LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

LogicGraph introduces a benchmark for evaluating multi-path logical reasoning in large language models, highlighting their limitations in...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.21061] Tool Building as a Path to "Superintelligence"

The paper explores how Large Language Models (LLMs) can achieve superintelligence through the Diligent Learner framework, emphasizing the...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20934] Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence

The paper introduces AgentOS, a conceptual framework that transitions Large Language Models from static inference engines to dynamic cogn...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20926] HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG

This article presents the HELP framework, which enhances Retrieval-Augmented Generation (RAG) by addressing knowledge boundaries and hall...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

This paper presents a novel evaluation framework for assessing the alignment of language models under realistic pressure, revealing behav...

arXiv - AI · 3 min · about 1 month ago

Ai Startups

[2602.20810] POMDPPlanners: Open-Source Package for POMDP Planning

POMDPPlanners is an open-source Python package designed for the empirical evaluation of POMDP planning algorithms, integrating advanced f...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20739] PyVision-RL: Forging Open Agentic Vision Models via RL

The paper introduces PyVision-RL, a reinforcement learning framework designed to enhance agentic multimodal models by preventing interact...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20728] Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback

This paper explores the use of reinforcement learning from AI feedback (RLAIF) to balance multiple objectives in urban traffic control, a...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20723] Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

The paper presents MAGNET, a novel multimodal recommendation framework that utilizes a mixture of adaptive graph experts and entropy-trig...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20722] Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning

This paper introduces Batch Adaptation Policy Optimization (BAPO), an off-policy reinforcement learning framework designed to enhance dat...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20708] ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

The paper introduces ICON, a novel framework designed to defend Large Language Model (LLM) agents against Indirect Prompt Injection (IPI)...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20706] Online Algorithms with Unreliable Guidance

This paper presents a novel model for online decision-making called Online Algorithms with Unreliable Guidance (OAG), which separates pre...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20687] How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

This article discusses the limitations of current benchmarks for vision-language model (VLM)-driven embodied agents and introduces Native...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20639] Grounding LLMs in Scientific Discovery via Embodied Actions

The paper presents EmbodiedAct, a framework that enhances Large Language Models (LLMs) by grounding them in embodied actions for scientif...

arXiv - AI · 3 min · about 1 month ago

Previous Page 63 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Agents

Top This Week

Agent frameworks waste ~350,000+ tokens per session resending static files. 95% reduction benchmarked.

OpenClaw gives users yet another reason to be freaked out about security - Ars Technica

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

All Content

[2306.00554] ShaRP: Shape-Regularized Multidimensional Projections

[2602.21201] Aletheia tackles FirstProof autonomously

[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

[2602.21143] A Benchmark for Deep Information Synthesis

[2602.21066] The Initial Exploration Problem in Knowledge Graph Exploration

[2602.21064] Motivation is Something You Need

[2602.21044] LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

[2602.21061] Tool Building as a Path to "Superintelligence"

[2602.20934] Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence

[2602.20926] HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG

[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

[2602.20810] POMDPPlanners: Open-Source Package for POMDP Planning

[2602.20739] PyVision-RL: Forging Open Agentic Vision Models via RL

[2602.20728] Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback

[2602.20723] Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

[2602.20722] Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning

[2602.20708] ICON: Indirect Prompt Injection Defense for Agents based on Inference-Time Correction

[2602.20706] Online Algorithms with Unreliable Guidance

[2602.20687] How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

[2602.20639] Grounding LLMs in Scientific Discovery via Embodied Actions

Related Topics

Stay updated with AI News