Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

All Content

Llms

[2603.24639] Experiential Reflective Learning for Self-Improving LLM Agents

Abstract page for arXiv paper 2603.24639: Experiential Reflective Learning for Self-Improving LLM Agents

arXiv - AI · 3 min · 2 days ago

Llms

[2603.25284] SliderQuant: Accurate Post-Training Quantization for LLMs

Abstract page for arXiv paper 2603.25284: SliderQuant: Accurate Post-Training Quantization for LLMs

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25283] A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion

Abstract page for arXiv paper 2603.25283: A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion

arXiv - AI · 3 min · 2 days ago

Llms

[2603.25158] Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Abstract page for arXiv paper 2603.25158: Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25133] RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following

Abstract page for arXiv paper 2603.25133: RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following

arXiv - AI · 3 min · 2 days ago

Llms

[2603.25097] ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

Abstract page for arXiv paper 2603.25097: ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25075] Sparse Visual Thought Circuits in Vision-Language Models

Abstract page for arXiv paper 2603.25075: Sparse Visual Thought Circuits in Vision-Language Models

arXiv - AI · 3 min · 2 days ago

Llms

[2603.25035] Mechanistically Interpreting Compression in Vision-Language Models

Abstract page for arXiv paper 2603.25035: Mechanistically Interpreting Compression in Vision-Language Models

arXiv - AI · 3 min · 2 days ago

Llms

[2603.25031] From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support

Abstract page for arXiv paper 2603.25031: From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24967] The Anatomy of Uncertainty in LLMs

Abstract page for arXiv paper 2603.24967: The Anatomy of Uncertainty in LLMs

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24961] Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

Abstract page for arXiv paper 2603.24961: Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24947] Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For

Abstract page for arXiv paper 2603.24947: Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24943] FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

Abstract page for arXiv paper 2603.24943: FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context...

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24929] LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics

Abstract page for arXiv paper 2603.24929: LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24866] How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning

Abstract page for arXiv paper 2603.24866: How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical G...

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24787] ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing

Abstract page for arXiv paper 2603.24787: ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24768] Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

Abstract page for arXiv paper 2603.24768: Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineeri...

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24747] Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

Abstract page for arXiv paper 2603.24747: Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24676] When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

Abstract page for arXiv paper 2603.24676: When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

arXiv - AI · 4 min · 2 days ago

Llms

Claude's system prompt + XML tags is the most underused power combo right now

Most people just type into ChatGPT like it's Google. Claude with a structured system prompt using XML tags behaves like a completely diff...

Reddit - Artificial Intelligence · 1 min · 3 days ago

Previous Page 8 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

built an open source tool that auto generates AI context files for any codebase, 150 stars in

All Content

[2603.24639] Experiential Reflective Learning for Self-Improving LLM Agents

[2603.25284] SliderQuant: Accurate Post-Training Quantization for LLMs

[2603.25283] A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion

[2603.25158] Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

[2603.25133] RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following

[2603.25097] ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents

[2603.25075] Sparse Visual Thought Circuits in Vision-Language Models

[2603.25035] Mechanistically Interpreting Compression in Vision-Language Models

[2603.25031] From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support

[2603.24967] The Anatomy of Uncertainty in LLMs

[2603.24961] Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

[2603.24947] Shopping with a Platform AI Assistant: Who Adopts, When in the Journey, and What For

[2603.24943] FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol

[2603.24929] LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics

[2603.24866] How Far Are Vision-Language Models from Constructing the Real World? A Benchmark for Physical Generative Reasoning

[2603.24787] ReLope: KL-Regularized LoRA Probes for Multimodal LLM Routing

[2603.24768] Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

[2603.24747] Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

[2603.24676] When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

Claude's system prompt + XML tags is the most underused power combo right now

Related Topics

Stay updated with AI News