Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min ·
Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.24639] Experiential Reflective Learning for Self-Improving LLM Agents
Llms

[2603.24639] Experiential Reflective Learning for Self-Improving LLM Agents

Abstract page for arXiv paper 2603.24639: Experiential Reflective Learning for Self-Improving LLM Agents

arXiv - AI · 3 min ·
[2603.25284] SliderQuant: Accurate Post-Training Quantization for LLMs
Llms

[2603.25284] SliderQuant: Accurate Post-Training Quantization for LLMs

Abstract page for arXiv paper 2603.25284: SliderQuant: Accurate Post-Training Quantization for LLMs

arXiv - AI · 4 min ·
[2603.25283] A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion
Llms

[2603.25283] A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion

Abstract page for arXiv paper 2603.25283: A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion

arXiv - AI · 3 min ·
[2603.25158] Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Llms

[2603.25158] Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Abstract page for arXiv paper 2603.25158: Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

arXiv - AI · 4 min ·
[2603.25133] RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following
Llms

[2603.25133] RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following

Abstract page for arXiv paper 2603.25133: RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following

arXiv - AI · 3 min ·
[2603.25097] ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents
Llms

[2603.25097] ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

Abstract page for arXiv paper 2603.25097: ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

arXiv - AI · 4 min ·
[2603.25075] Sparse Visual Thought Circuits in Vision-Language Models
Llms

[2603.25075] Sparse Visual Thought Circuits in Vision-Language Models

Abstract page for arXiv paper 2603.25075: Sparse Visual Thought Circuits in Vision-Language Models

arXiv - AI · 3 min ·
[2603.25035] Mechanistically Interpreting Compression in Vision-Language Models
Llms

[2603.25035] Mechanistically Interpreting Compression in Vision-Language Models

Abstract page for arXiv paper 2603.25035: Mechanistically Interpreting Compression in Vision-Language Models

arXiv - AI · 3 min ·
[2603.25031] From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support
Llms

[2603.25031] From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support

Abstract page for arXiv paper 2603.25031: From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support

arXiv - AI · 4 min ·
[2603.24967] The Anatomy of Uncertainty in LLMs
Llms

[2603.24967] The Anatomy of Uncertainty in LLMs

Abstract page for arXiv paper 2603.24967: The Anatomy of Uncertainty in LLMs

arXiv - AI · 3 min ·
[2603.24961] Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math
Llms

[2603.24961] Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

Abstract page for arXiv paper 2603.24961: Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

arXiv - AI · 4 min ·
[2603.24947] Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For
Llms

[2603.24947] Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For

Abstract page for arXiv paper 2603.24947: Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For

arXiv - AI · 4 min ·
[2603.24943] FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol
Llms

[2603.24943] FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Abstract page for arXiv paper 2603.24943: FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context...

arXiv - AI · 3 min ·
[2603.24929] LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics
Llms

[2603.24929] LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics

Abstract page for arXiv paper 2603.24929: LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics

arXiv - AI · 3 min ·
[2603.24866] How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning
Llms

[2603.24866] How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning

Abstract page for arXiv paper 2603.24866: How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical G...

arXiv - AI · 4 min ·
[2603.24787] ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing
Llms

[2603.24787] ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing

Abstract page for arXiv paper 2603.24787: ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing

arXiv - AI · 4 min ·
[2603.24768] Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design
Llms

[2603.24768] Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

Abstract page for arXiv paper 2603.24768: Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineeri...

arXiv - AI · 4 min ·
[2603.24747] Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach
Llms

[2603.24747] Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

Abstract page for arXiv paper 2603.24747: Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

arXiv - AI · 3 min ·
[2603.24676] When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs
Llms

[2603.24676] When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

Abstract page for arXiv paper 2603.24676: When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

arXiv - AI · 4 min ·
Llms

Claude's system prompt + XML tags is the most underused power combo right now

Most people just type into ChatGPT like it's Google. Claude with a structured system prompt using XML tags behaves like a completely diff...

Reddit - Artificial Intelligence · 1 min ·
Previous Page 8 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime