AI Agents

Autonomous agents, tool use, and agentic systems

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Agents

Considering NeurIPS submission [D]

Wondering if it worth submitting paper I’m working on to NeurIPS. I have formal mathematical proof for convergence of a novel agentic sys...

Reddit - Machine Learning · 1 min · about 4 hours ago

Ai Agents

Agent frameworks waste ~350,000+ tokens per session resending static files. 95% reduction benchmarked.

Measured the actual token waste on a local Qwen 3.5 122B setup. The numbers are unreal. Found a compile-time approach that cuts query con...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

Ai Agents

OpenClaw gives users yet another reason to be freaked out about security - Ars Technica

The viral AI agentic tool let attackers silently gain admin unauthenticated access.

Ars Technica - AI · 5 min · about 11 hours ago

All Content

Llms

[2601.16449] Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding

The paper introduces Emotion-LLaMAv2 and MMEVerse, a new framework and benchmark aimed at enhancing multimodal emotion understanding thro...

arXiv - AI · 4 min · about 1 month ago

Generative Ai

[2601.16210] PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

The paper introduces PyraTok, a language-aligned pyramidal tokenizer designed to enhance video understanding and generation by improving ...

arXiv - AI · 3 min · about 1 month ago

Generative Ai

[2601.15500] Low-Dimensional Adaptation of Rectified Flow: A Diffusion and Stochastic Localization Perspective

This paper explores the adaptation of Rectified Flow (RF) to low-dimensional target distributions, demonstrating improved sampling effici...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2601.14242] APEX-Agents

The APEX-Agents paper introduces a benchmark for evaluating AI agents' ability to perform complex tasks across various applications, show...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2601.00671] Fast-weight Product Key Memory

The paper introduces Fast-weight Product Key Memory (FwPKM), a novel memory layer designed to enhance sequence modeling in language model...

arXiv - AI · 3 min · about 1 month ago

Llms

[2512.24943] RAIR: A Rule-Aware Benchmark Uniting Challenging Long-Tail and Visual Salience Subset for E-commerce Relevance Assessment

The RAIR benchmark introduces a comprehensive dataset for evaluating e-commerce relevance, addressing the limitations of existing benchma...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2512.21877] CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics

CricBench introduces a multilingual benchmark for evaluating Large Language Models (LLMs) in cricket analytics, highlighting performance ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2512.16167] Ev-Trust: An Evolutionary Stable Trust Mechanism for Decentralized LLM-Based Multi-Agent Service Economies

The paper presents Ev-Trust, an evolutionary stable trust mechanism designed for decentralized LLM-based multi-agent service economies, a...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2512.04808] Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors

This article presents a novel approach to uncovering neural mechanisms behind cognitive errors using recurrent neural networks (RNNs) tra...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2511.20629] MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

The paper presents MapReduce LoRA, a novel framework for optimizing generative models by addressing multi-preference alignment issues. It...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2511.06450] Countering Multi-modal Representation Collapse through Rank-targeted Fusion

This paper presents a novel framework, Rank-enhancing Token Fuser, to address multi-modal representation collapse in machine learning, en...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2511.05275] TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models

The paper presents TwinVLA, a modular framework for bimanual manipulation using two single-arm Vision-Language-Action models, enhancing d...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2510.25850] Debate2Create: Robot Co-design via Multi-Agent LLM Debate

The paper introduces Debate2Create, a framework for robot co-design that utilizes multi-agent LLM debate to optimize robot morphology and...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2511.17621] From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems

This article presents a market-making framework for coordinating multi-agent large language models (LLMs), enhancing trustworthiness and ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2511.16175] Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

The paper introduces Mantis, a Vision-Language-Action model that enhances visual foresight through a novel framework, achieving superior ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2511.14478] Agentic AI Systems in Electrical Power Systems Engineering: Current State-of-the-Art and Challenges

This article reviews the state-of-the-art in agentic AI systems within electrical power engineering, providing a taxonomy and practical a...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2511.02780] PoCo: Agentic Proof-of-Concept Exploit Generation for Smart Contracts

The paper presents PoCo, an automated framework for generating proof-of-concept exploits for smart contracts, enhancing security audits b...

arXiv - AI · 4 min · about 1 month ago

Llms

[2510.23038] Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

The paper presents TIR-Judge, a reinforcement learning framework that enhances Large Language Model (LLM) judges by integrating tool-base...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2510.18316] MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation

The paper presents MoMaGen, a novel approach for generating diverse datasets for multi-step bimanual mobile manipulation by addressing re...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.12666] PBPK-iPINNs: Inverse Physics-Informed Neural Networks for Physiologically Based Pharmacokinetic Brain Models

The paper presents PBPK-iPINNs, a method combining inverse physics-informed neural networks with physiologically based pharmacokinetic mo...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 66 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Agents

Top This Week

Considering NeurIPS submission [D]

Agent frameworks waste ~350,000+ tokens per session resending static files. 95% reduction benchmarked.

OpenClaw gives users yet another reason to be freaked out about security - Ars Technica

All Content

[2601.16449] Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding

[2601.16210] PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

[2601.15500] Low-Dimensional Adaptation of Rectified Flow: A Diffusion and Stochastic Localization Perspective

[2601.14242] APEX-Agents

[2601.00671] Fast-weight Product Key Memory

[2512.24943] RAIR: A Rule-Aware Benchmark Uniting Challenging Long-Tail and Visual Salience Subset for E-commerce Relevance Assessment

[2512.21877] CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics

[2512.16167] Ev-Trust: An Evolutionary Stable Trust Mechanism for Decentralized LLM-Based Multi-Agent Service Economies

[2512.04808] Setting up for failure: automatic discovery of the neural mechanisms of cognitive errors

[2511.20629] MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

[2511.06450] Countering Multi-modal Representation Collapse through Rank-targeted Fusion

[2511.05275] TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models

[2510.25850] Debate2Create: Robot Co-design via Multi-Agent LLM Debate

[2511.17621] From Competition to Coordination: Market Making as a Scalable Framework for Safe and Aligned Multi-Agent LLM Systems

[2511.16175] Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

[2511.14478] Agentic AI Systems in Electrical Power Systems Engineering: Current State-of-the-Art and Challenges

[2511.02780] PoCo: Agentic Proof-of-Concept Exploit Generation for Smart Contracts

[2510.23038] Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

[2510.18316] MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation

[2509.12666] PBPK-iPINNs: Inverse Physics-Informed Neural Networks for Physiologically Based Pharmacokinetic Brain Models

Related Topics

Stay updated with AI News