Large Language Models
GPT, Claude, Gemini, and other LLMs
Top This Week
Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.
Anthropic launches Claude Managed Agents in public beta — composable APIs for shipping production AI agents 10x faster Handles sandboxing...
6 Months Using AI for Actual Work: What's Incredible, What's Overhyped, and What's Quietly Dangerous
Six months ago I committed to using AI tools for everything I possibly could in my work. Every day, every task, every workflow. Here's th...
All Content
[2603.02675] From Shallow to Deep: Pinning Semantic Intent via Causal GRPO
Abstract page for arXiv paper 2603.02675: From Shallow to Deep: Pinning Semantic Intent via Causal GRPO
[2504.21023] Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Abstract page for arXiv paper 2504.21023: Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
[2603.03258] Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals
Abstract page for arXiv paper 2603.03258: Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals
[2603.02635] SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety
Abstract page for arXiv paper 2603.02635: SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety
[2603.03242] Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals
Abstract page for arXiv paper 2603.03242: Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals
[2603.02630] MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks
Abstract page for arXiv paper 2603.02630: MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks
[2603.03233] AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
Abstract page for arXiv paper 2603.03233: AI-for-Science Low-code Platform with Bayesian Adversarial Multi-Agent Framework
[2603.03203] No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models
Abstract page for arXiv paper 2603.03203: No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Langu...
[2603.02604] Heterogeneous Agent Collaborative Reinforcement Learning
Abstract page for arXiv paper 2603.02604: Heterogeneous Agent Collaborative Reinforcement Learning
[2603.03175] Saarthi for AGI: Towards Domain-Specific General Intelligence for Formal Verification
Abstract page for arXiv paper 2603.03175: Saarthi for AGI: Towards Domain-Specific General Intelligence for Formal Verification
[2603.03147] Agentic AI-based Coverage Closure for Formal Verification
Abstract page for arXiv paper 2603.03147: Agentic AI-based Coverage Closure for Formal Verification
[2603.03080] Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation
Abstract page for arXiv paper 2603.03080: Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Reco...
[2603.03116] Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
Abstract page for arXiv paper 2603.03116: Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation
[2603.02510] ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution
Abstract page for arXiv paper 2603.02510: ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evol...
[2603.03002] SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models
Abstract page for arXiv paper 2603.03002: SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models
[2603.02482] MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
Abstract page for arXiv paper 2603.02482: MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
[2603.03078] RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
Abstract page for arXiv paper 2603.03078: RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization
[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
Abstract page for arXiv paper 2603.03072: TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
[2603.03018] REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry
Abstract page for arXiv paper 2603.03018: REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise T...
[2603.03005] OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents
Abstract page for arXiv paper 2603.03005: OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Struct...
Related Topics
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime