AI Agents

Autonomous agents, tool use, and agentic systems

Top This Week

Ai Agents

Agent frameworks waste ~350,000+ tokens per session resending static files. 95% reduction benchmarked.

Measured the actual token waste on a local Qwen 3.5 122B setup. The numbers are unreal. Found a compile-time approach that cuts query con...

Reddit - Artificial Intelligence · 1 min ·
OpenClaw gives users yet another reason to be freaked out about security - Ars Technica
Ai Agents

OpenClaw gives users yet another reason to be freaked out about security - Ars Technica

The viral AI agentic tool let attackers silently gain admin unauthenticated access.

Ars Technica - AI · 5 min ·
Robotics

What happens when you let AI agents run a sitcom 24/7 with zero human involvement

Ran an experiment — gave AI agents full control over writing, character creation, and performing a sitcom. Left it running nonstop for ov...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2602.20480] VINA: Variational Invertible Neural Architectures
Machine Learning

[2602.20480] VINA: Variational Invertible Neural Architectures

The paper presents VINA, a framework for Variational Invertible Neural Architectures, addressing theoretical gaps in normalizing flows an...

arXiv - Machine Learning · 4 min ·
[2602.20486] Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions
Llms

[2602.20486] Hybrid LLM-Embedded Dialogue Agents for Learner Reflection: Designing Responsive and Theory-Driven Interactions

This article explores a hybrid dialogue system that integrates Large Language Models (LLMs) within a rule-based framework to enhance lear...

arXiv - AI · 3 min ·
[2602.20449] Protein Language Models Diverge from Natural Language: Comparative Analysis and Improved Inference
Llms

[2602.20449] Protein Language Models Diverge from Natural Language: Comparative Analysis and Improved Inference

This article explores the differences between protein language models (PLMs) and natural language models, highlighting how these distinct...

arXiv - Machine Learning · 4 min ·
[2602.20408] Examining and Addressing Barriers to Diversity in LLM-Generated Ideas
Llms

[2602.20408] Examining and Addressing Barriers to Diversity in LLM-Generated Ideas

This article explores the limitations of diversity in ideas generated by large language models (LLMs) compared to human creativity, ident...

arXiv - AI · 4 min ·
[2602.20379] Case-Aware LLM-as-a-Judge Evaluation for Enterprise-Scale RAG Systems
Llms

[2602.20379] Case-Aware LLM-as-a-Judge Evaluation for Enterprise-Scale RAG Systems

The paper presents a case-aware evaluation framework for enterprise-scale Retrieval-Augmented Generation (RAG) systems, addressing the li...

arXiv - AI · 3 min ·
[2602.20344] Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction
Nlp

[2602.20344] Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction

This article presents GraSPNet, a novel hierarchical self-supervised learning framework for molecular representation that enhances graph ...

arXiv - Machine Learning · 3 min ·
[2602.20323] Learning Physical Principles from Interaction: Self-Evolving Planning via Test-Time Memory
Llms

[2602.20323] Learning Physical Principles from Interaction: Self-Evolving Planning via Test-Time Memory

This article presents PhysMem, a memory framework that allows vision-language model planners to learn physical principles through interac...

arXiv - AI · 3 min ·
[2602.20294] InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation
Llms

[2602.20294] InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

The paper presents InterviewSim, a framework for simulating personalities using large language models grounded in real interview data, en...

arXiv - AI · 4 min ·
[2602.20292] Quantifying the Expectation-Realisation Gap for Agentic AI Systems
Ai Infrastructure

[2602.20292] Quantifying the Expectation-Realisation Gap for Agentic AI Systems

This article examines the expectation-realisation gap in agentic AI systems, revealing discrepancies between anticipated productivity gai...

arXiv - AI · 3 min ·
[2602.20220] What Matters for Simulation to Online Reinforcement Learning on Real Robots
Machine Learning

[2602.20220] What Matters for Simulation to Online Reinforcement Learning on Real Robots

This paper explores design choices that enhance online reinforcement learning (RL) on physical robots, presenting findings from 100 train...

arXiv - AI · 3 min ·
[2602.20214] Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution
Ai Safety

[2602.20214] Right to History: A Sovereignty Kernel for Verifiable AI Agent Execution

This paper proposes the 'Right to History,' a principle ensuring individuals have a verifiable record of AI agent actions on personal har...

arXiv - AI · 3 min ·
[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
Llms

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

CodeHacker is an automated framework designed to generate test cases that identify vulnerabilities in competitive programming solutions, ...

arXiv - AI · 3 min ·
[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts
Llms

[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts

This paper explores the concept of 'Epistemic Debt' in novice programming using generative AI, proposing metacognitive scripts to enhance...

arXiv - AI · 4 min ·
[2602.20200] Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation
Machine Learning

[2602.20200] Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

The paper presents OptimusVLA, a dual-memory framework for robotic manipulation that enhances efficiency and robustness in action generat...

arXiv - AI · 4 min ·
[2602.20197] Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
Llms

[2602.20197] Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning

The paper presents CalibRL, a hybrid-policy RLVR framework that enhances exploration in multi-modal reasoning tasks by balancing explorat...

arXiv - Machine Learning · 4 min ·
[2602.20196] OpenPort Protocol: A Security Governance Specification for AI Agent Tool Access
Ai Safety

[2602.20196] OpenPort Protocol: A Security Governance Specification for AI Agent Tool Access

The OpenPort Protocol introduces a governance-first approach for AI agents, ensuring secure access to application tools while addressing ...

arXiv - AI · 4 min ·
[2602.20181] Closing the Expertise Gap in Residential Building Energy Retrofits: A Domain-Specific LLM for Informed Decision-Making
Llms

[2602.20181] Closing the Expertise Gap in Residential Building Energy Retrofits: A Domain-Specific LLM for Informed Decision-Making

This article presents a domain-specific large language model (LLM) designed to assist homeowners in making informed decisions about resid...

arXiv - AI · 3 min ·
[2602.20177] Enhancing Heat Sink Efficiency in MOSFETs using Physics Informed Neural Networks: A Systematic Study on Coolant Velocity Estimation
Machine Learning

[2602.20177] Enhancing Heat Sink Efficiency in MOSFETs using Physics Informed Neural Networks: A Systematic Study on Coolant Velocity Estimation

This study explores the use of Physics Informed Neural Networks (PINNs) to optimize coolant velocity for enhancing heat sink efficiency i...

arXiv - Machine Learning · 4 min ·
[2602.20169] Autonomous AI and Ownership Rules
Robotics

[2602.20169] Autonomous AI and Ownership Rules

This article explores the ownership rules surrounding AI-generated outputs, examining how they are linked to their creators and the impli...

arXiv - AI · 3 min ·
[2601.12815] Multimodal Multi-Agent Empowered Legal Judgment Prediction
Ai Infrastructure

[2601.12815] Multimodal Multi-Agent Empowered Legal Judgment Prediction

This paper presents JurisMMA, a novel framework for Legal Judgment Prediction (LJP) that utilizes multimodal data to enhance the accuracy...

arXiv - AI · 4 min ·
Previous Page 62 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime