AI Agents

Autonomous agents, tool use, and agentic systems

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Agents

Agent frameworks waste ~350,000+ tokens per session resending static files. 95% reduction benchmarked.

Measured the actual token waste on a local Qwen 3.5 122B setup. The numbers are unreal. Found a compile-time approach that cuts query con...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Ai Agents

OpenClaw gives users yet another reason to be freaked out about security - Ars Technica

The viral AI agentic tool let attackers silently gain admin unauthenticated access.

Ars Technica - AI · 5 min · about 5 hours ago

Robotics

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

Ran an experiment — gave AI agents full control over writing, character creation, and performing a sitcom. Left it running nonstop for ov...

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

All Content

Machine Learning

[2602.20480] VINA: Variational Invertible Neural Architectures

The paper presents VINA, a framework for Variational Invertible Neural Architectures, addressing theoretical gaps in normalizing flows an...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.20486] Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions

This article explores a hybrid dialogue system that integrates Large Language Models (LLMs) within a rule-based framework to enhance lear...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20449] Protein Language Models Diverge from Natural Language: Comparative Analysis and Improved Inference

This article explores the differences between protein language models (PLMs) and natural language models, highlighting how these distinct...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.20408] Examining and Addressing Barriers to Diversity in LLM-Generated Ideas

This article explores the limitations of diversity in ideas generated by large language models (LLMs) compared to human creativity, ident...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20379] Case-Aware LLM-as-a-Judge Evaluation for Enterprise-Scale RAG Systems

The paper presents a case-aware evaluation framework for enterprise-scale Retrieval-Augmented Generation (RAG) systems, addressing the li...

arXiv - AI · 3 min · about 1 month ago

Nlp

[2602.20344] Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction

This article presents GraSPNet, a novel hierarchical self-supervised learning framework for molecular representation that enhances graph ...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.20323] Learning Physical Principles from Interaction: Self-Evolving Planning via Test-Time Memory

This article presents PhysMem, a memory framework that allows vision-language model planners to learn physical principles through interac...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20294] InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

The paper presents InterviewSim, a framework for simulating personalities using large language models grounded in real interview data, en...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2602.20292] Quantifying the Expectation-Realisation Gap for Agentic AI Systems

This article examines the expectation-realisation gap in agentic AI systems, revealing discrepancies between anticipated productivity gai...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20220] What Matters for Simulation to Online Reinforcement Learning on Real Robots

This paper explores design choices that enhance online reinforcement learning (RL) on physical robots, presenting findings from 100 train...

arXiv - AI · 3 min · about 1 month ago

Ai Safety

[2602.20214] Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution

This paper proposes the 'Right to History,' a principle ensuring individuals have a verifiable record of AI agent actions on personal har...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

CodeHacker is an automated framework designed to generate test cases that identify vulnerabilities in competitive programming solutions, ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts

This paper explores the concept of 'Epistemic Debt' in novice programming using generative AI, proposing metacognitive scripts to enhance...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.20200] Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

The paper presents OptimusVLA, a dual-memory framework for robotic manipulation that enhances efficiency and robustness in action generat...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20197] Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning

The paper presents CalibRL, a hybrid-policy RLVR framework that enhances exploration in multi-modal reasoning tasks by balancing explorat...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2602.20196] OpenPort Protocol: A Security Governance Specification for AI Agent Tool Access

The OpenPort Protocol introduces a governance-first approach for AI agents, ensuring secure access to application tools while addressing ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20181] Closing the Expertise Gap in Residential Building Energy Retrofits: A Domain-Specific LLM for Informed Decision-Making

This article presents a domain-specific large language model (LLM) designed to assist homeowners in making informed decisions about resid...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20177] Enhancing Heat Sink Efficiency in MOSFETs using Physics Informed Neural Networks: A Systematic Study on Coolant Velocity Estimation

This study explores the use of Physics Informed Neural Networks (PINNs) to optimize coolant velocity for enhancing heat sink efficiency i...

arXiv - Machine Learning · 4 min · about 1 month ago

Robotics

[2602.20169] Autonomous AI and Ownership Rules

This article explores the ownership rules surrounding AI-generated outputs, examining how they are linked to their creators and the impli...

arXiv - AI · 3 min · about 1 month ago

Ai Infrastructure

[2601.12815] Multimodal Multi-Agent Empowered Legal Judgment Prediction

This paper presents JurisMMA, a novel framework for Legal Judgment Prediction (LJP) that utilizes multimodal data to enhance the accuracy...

arXiv - AI · 4 min · about 1 month ago

Previous Page 62 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Agents

Top This Week

Agent frameworks waste ~350,000+ tokens per session resending static files. 95% reduction benchmarked.

OpenClaw gives users yet another reason to be freaked out about security - Ars Technica

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

All Content

[2602.20480] VINA: Variational Invertible Neural Architectures

[2602.20486] Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions

[2602.20449] Protein Language Models Diverge from Natural Language: Comparative Analysis and Improved Inference

[2602.20408] Examining and Addressing Barriers to Diversity in LLM-Generated Ideas

[2602.20379] Case-Aware LLM-as-a-Judge Evaluation for Enterprise-Scale RAG Systems

[2602.20344] Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction

[2602.20323] Learning Physical Principles from Interaction: Self-Evolving Planning via Test-Time Memory

[2602.20294] InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

[2602.20292] Quantifying the Expectation-Realisation Gap for Agentic AI Systems

[2602.20220] What Matters for Simulation to Online Reinforcement Learning on Real Robots

[2602.20214] Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts

[2602.20200] Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

[2602.20197] Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning

[2602.20196] OpenPort Protocol: A Security Governance Specification for AI Agent Tool Access

[2602.20181] Closing the Expertise Gap in Residential Building Energy Retrofits: A Domain-Specific LLM for Informed Decision-Making

[2602.20177] Enhancing Heat Sink Efficiency in MOSFETs using Physics Informed Neural Networks: A Systematic Study on Coolant Velocity Estimation

[2602.20169] Autonomous AI and Ownership Rules

[2601.12815] Multimodal Multi-Agent Empowered Legal Judgment Prediction

Related Topics

Stay updated with AI News