AI Agents

Autonomous agents, tool use, and agentic systems

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Been building a multi-agent framework in public for 5 weeks, its been a Journey.

I've been building this repo public since day one, roughly 5 weeks now with Claude Code. Here's where it's at. Feels good to be so close....

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Machine Learning

"There's a new generation of empirical deep learning researchers, hacking away at whatever seems trendy, blowing with the wind" [D]

Saw this on X. I too am struggling with the term post agentic ai just posting here for further discussion. submitted by /u/elnino2023 [li...

Reddit - Machine Learning · 1 min · about 3 hours ago

Ai Infrastructure

Alibaba-linked AI agent hijacked GPUs for unauthorized crypto mining, researchers say

How do people make sense of this? submitted by /u/stvlsn [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

All Content

Nlp

[2512.18080] From Prompt to Product: A Human-Centered Benchmark of Agentic App Generation Systems

This paper introduces a human-centered benchmark for evaluating agentic app generation systems, comparing platforms like Replit, Bolt, an...

arXiv - AI · 4 min · about 2 months ago

Llms

[2511.10453] Reasoning about Intent for Ambiguous Requests

This paper explores how large language models can better handle ambiguous requests by generating multiple interpretation-answer pairs, en...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2510.24803] MASPRM: Multi-Agent System Process Reward Model

The MASPRM paper introduces a novel Multi-Agent System Process Reward Model that enhances performance during inference by guiding search ...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2509.14832] Diffusion-Based Scenario Tree Generation for Multivariate Time Series Prediction and Multistage Stochastic Optimization

The paper presents a Diffusion Scenario Tree (DST) framework for multivariate time series prediction and multistage stochastic optimizati...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2508.12685] ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction

ToolACE-MT introduces a non-autoregressive framework for generating high-quality multi-turn dialogues in agentic interactions, enhancing ...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2508.17742] EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models

The paper presents EEG-FM-Bench, a standardized benchmark for evaluating EEG foundation models, addressing inconsistencies in current eva...

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.08275] MLLM-CTBench: A Benchmark for Continual Instruction Tuning with Reasoning Process Diagnosis

The paper presents MLLM-CTBench, a benchmark for continual instruction tuning of multimodal large language models, addressing the need fo...

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.05004] R-Zero: Self-Evolving Reasoning LLM from Zero Data

The article presents R-Zero, a self-evolving reasoning LLM that autonomously generates training data, improving AI capabilities without h...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2507.16696] FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

FISHER is a proposed foundation model aimed at improving the analysis of multi-modal industrial signals, addressing the challenges posed ...

arXiv - Machine Learning · 4 min · about 2 months ago

Robotics

[2507.12108] Multimodal Coordinated Online Behavior: Trade-offs and Strategies

This paper explores multimodal coordinated online behavior, analyzing trade-offs between different integration strategies and their effec...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Safety

[2507.02310] Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment

This paper presents a novel framework for continual learning that addresses concept drift through Adaptive Memory Realignment (AMR), enha...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2503.22968] Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models

This article introduces the Haerae Evaluation Toolkit (HRET), a unified framework for evaluating the capabilities of Korean language mode...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2412.07909] Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

This paper explores the modality gap in contrastive multimodal learning, analyzing its causes and proposing methods to mitigate it for im...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.04634] WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

The paper introduces WideSeek-R1, a multi-agent reinforcement learning framework aimed at improving broad information seeking by enhancin...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.02709] ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters

The paper presents ATLAS, an adaptive self-evolutionary research agent that utilizes task-distributed multi-LLM supporters to enhance per...

arXiv - AI · 3 min · about 2 months ago

Nlp

[2601.10485] Panning for Gold: Expanding Domain-Specific Knowledge Graphs with General Knowledge

The paper proposes a novel approach for enhancing domain-specific knowledge graphs (DKGs) by integrating general knowledge graphs (GKGs) ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2601.00004] Finetuning Large Language Models for Automated Depression Screening in Nigerian Pidgin English: GENSCORE Pilot Study

This study explores the use of fine-tuned large language models for automated depression screening in Nigerian Pidgin English, addressing...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2512.19135] Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis

This paper explores the structural analysis of reasoning chains in large language models (LLMs) using Topological Data Analysis (TDA), re...

arXiv - AI · 4 min · about 2 months ago

Generative Ai

[2512.12182] TA-KAND: Two-stage Attention Triple Enhancement and U-KAN based Diffusion For Few-shot Knowledge Graph Completion

The paper presents TA-KAND, a novel framework for few-shot knowledge graph completion that employs a two-stage attention mechanism and U-...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2510.23883] Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

This article explores the security implications of agentic AI systems, detailing specific threats, defense strategies, and evaluation met...

arXiv - AI · 3 min · about 2 months ago

Previous Page 150 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Agents

Top This Week

Been building a multi-agent framework in public for 5 weeks, its been a Journey.

"There's a new generation of empirical deep learning researchers, hacking away at whatever seems trendy, blowing with the wind" [D]

Alibaba-linked AI agent hijacked GPUs for unauthorized crypto mining, researchers say

All Content

[2512.18080] From Prompt to Product: A Human-Centered Benchmark of Agentic App Generation Systems

[2511.10453] Reasoning about Intent for Ambiguous Requests

[2510.24803] MASPRM: Multi-Agent System Process Reward Model

[2509.14832] Diffusion-Based Scenario Tree Generation for Multivariate Time Series Prediction and Multistage Stochastic Optimization

[2508.12685] ToolACE-MT: Non-Autoregressive Generation for Agentic Multi-Turn Interaction

[2508.17742] EEG-FM-Bench: A Comprehensive Benchmark for the Systematic Evaluation of EEG Foundation Models

[2508.08275] MLLM-CTBench: A Benchmark for Continual Instruction Tuning with Reasoning Process Diagnosis

[2508.05004] R-Zero: Self-Evolving Reasoning LLM from Zero Data

[2507.16696] FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

[2507.12108] Multimodal Coordinated Online Behavior: Trade-offs and Strategies

[2507.02310] Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment

[2503.22968] Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models

[2412.07909] Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

[2602.04634] WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

[2602.02709] ATLAS : Adaptive Self-Evolutionary Research Agent with Task-Distributed Multi-LLM Supporters

[2601.10485] Panning for Gold: Expanding Domain-Specific Knowledge Graphs with General Knowledge

[2601.00004] Finetuning Large Language Models for Automated Depression Screening in Nigerian Pidgin English: GENSCORE Pilot Study

[2512.19135] Understanding Chain-of-Thought in Large Language Models via Topological Data Analysis

[2512.12182] TA-KAND: Two-stage Attention Triple Enhancement and U-KAN based Diffusion For Few-shot Knowledge Graph Completion

[2510.23883] Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Related Topics

Stay updated with AI News