AI Agents

Autonomous agents, tool use, and agentic systems

Top This Week

Llms

Been building a multi-agent framework in public for 5 weeks, its been a Journey.

I've been building this repo public since day one, roughly 5 weeks now with Claude Code. Here's where it's at. Feels good to be so close....

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

"There's a new generation of empirical deep learning researchers, hacking away at whatever seems trendy, blowing with the wind" [D]

Saw this on X. I too am struggling with the term post agentic ai just posting here for further discussion. submitted by /u/elnino2023 [li...

Reddit - Machine Learning · 1 min ·
Ai Infrastructure

Alibaba-linked AI agent hijacked GPUs for unauthorized crypto mining, researchers say

How do people make sense of this? submitted by /u/stvlsn [link] [comments]

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.12963] Information-theoretic analysis of world models in optimal reward maximizers
Machine Learning

[2602.12963] Information-theoretic analysis of world models in optimal reward maximizers

This paper presents an information-theoretic analysis of world models in optimal reward maximizers, quantifying the information conveyed ...

arXiv - AI · 3 min ·
[2602.12876] BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents
Llms

[2602.12876] BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

BrowseComp-$V^3$ introduces a new benchmark for evaluating multimodal browsing agents, focusing on complex reasoning across visual and te...

arXiv - AI · 4 min ·
[2602.12852] WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning
Ai Agents

[2602.12852] WebClipper: Efficient Evolution of Web Agents with Graph-based Trajectory Pruning

WebClipper introduces a novel framework for optimizing web agent trajectories through graph-based pruning, enhancing search efficiency an...

arXiv - AI · 3 min ·
[2602.12670] SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Llms

[2602.12670] SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

The paper introduces SkillsBench, a benchmark assessing the effectiveness of agent skills across 86 tasks in 11 domains, revealing signif...

arXiv - AI · 4 min ·
[2602.12662] Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents
Llms

[2602.12662] Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents

This paper introduces CogRouter, a framework for large language models (LLMs) that enables dynamic adaptation of cognitive depth, enhanci...

arXiv - AI · 4 min ·
[2602.12631] AI Agents for Inventory Control: Human-LLM-OR Complementarity
Llms

[2602.12631] AI Agents for Inventory Control: Human-LLM-OR Complementarity

This paper explores the integration of AI agents, particularly large language models (LLMs), with traditional operations research (OR) me...

arXiv - Machine Learning · 4 min ·
[2602.12617] GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
Machine Learning

[2602.12617] GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics

GeoAgent introduces a novel model for geolocation tasks, enhancing AI's reasoning capabilities with geographic characteristics and outper...

arXiv - AI · 3 min ·
[2602.12566] To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
Llms

[2602.12566] To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models

This paper explores the effectiveness of multi-domain reinforcement learning for large language models, comparing mixed multi-task traini...

arXiv - AI · 4 min ·
[2602.12419] Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models
Llms

[2602.12419] Intent-Driven Smart Manufacturing Integrating Knowledge Graphs and Large Language Models

This article discusses a framework that integrates Large Language Models and Knowledge Graphs to enhance intent-driven interactions in sm...

arXiv - AI · 3 min ·
[2602.12544] Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation
Machine Learning

[2602.12544] Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation

This paper presents a scalable pipeline for generating high-quality training data for web agents, introducing a novel evaluation framewor...

arXiv - AI · 3 min ·
[2602.12316] GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory
Ai Safety

[2602.12316] GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory

GT-HarmBench introduces a benchmark for evaluating AI safety risks in multi-agent environments, highlighting significant reliability gaps...

arXiv - AI · 3 min ·
[2602.12389] Evolving Beyond Snapshots: Harmonizing Structure and Sequence via Entity State Tuning for Temporal Knowledge Graph Forecasting
Machine Learning

[2602.12389] Evolving Beyond Snapshots: Harmonizing Structure and Sequence via Entity State Tuning for Temporal Knowledge Graph Forecasting

This paper presents Entity State Tuning (EST), a novel framework for improving temporal knowledge graph forecasting by maintaining persis...

arXiv - AI · 4 min ·
Llms

Customizable AI Companions.

The article discusses the potential of customizable AI companions that can engage in real-time video calls, leveraging technologies like ...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] METR TH1.1: “working_time” is wildly different across models. Quick breakdown + questions.

The article discusses METR's Time Horizon benchmark (TH1.1), highlighting significant differences in 'working_time' across various models...

Reddit - Machine Learning · 1 min ·
My awkward first date with an AI companion
Ai Agents

My awkward first date with an AI companion

A Mashable writer experiences an awkward date with an EVA AI companion at a pop-up cafe, exploring the nuances of AI relationships and us...

AI Tools & Products · 11 min ·
Rethinking human connection under the influence of AI.
Ai Agents

Rethinking human connection under the influence of AI.

The article explores the nature of human connection in the context of AI interactions, arguing that while AI can simulate dialogue, it la...

AI Tools & Products · 5 min ·
Llms

Looking for early testers for my competitive analysis tool (Claude needed currently)

The article seeks early testers for CompetitiveOS, a tool designed to streamline competitive analysis in the AI education sector by autom...

Reddit - Artificial Intelligence · 1 min ·
Ads in AI chatbots raise privacy concerns as companies seek new revenue
Ai Safety

Ads in AI chatbots raise privacy concerns as companies seek new revenue

The introduction of ads in AI chatbots raises privacy concerns as companies like OpenAI and Microsoft explore new revenue models amidst u...

AI Tools & Products · 5 min ·
Ai Startups

Why Nonprofits Can’t Afford to Ignore AI

This article discusses the importance of AI for nonprofits, emphasizing how these organizations can leverage technology to enhance their ...

Reddit - Artificial Intelligence · 1 min ·
OpenClaw founder Peter Steinberger is joining OpenAI | The Verge
Ai Agents

OpenClaw founder Peter Steinberger is joining OpenAI | The Verge

Peter Steinberger, founder of OpenClaw, is joining OpenAI, with the OpenClaw project continuing as an open-source initiative.

The Verge - AI · 4 min ·
Previous Page 154 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime