Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min · 4 minutes ago

Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

All Content

Llms

[2603.25099] Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

Abstract page for arXiv paper 2603.25099: Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Opti...

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25063] TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization

Abstract page for arXiv paper 2603.25063: TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visual...

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.25056] The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Abstract page for arXiv paper 2603.25056: The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Create...

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25052] Closing the Confidence-Faithfulness Gap in Large Language Models

Abstract page for arXiv paper 2603.25052: Closing the Confidence-Faithfulness Gap in Large Language Models

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24989] Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

Abstract page for arXiv paper 2603.24989: Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24986] Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

Abstract page for arXiv paper 2603.24986: Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24940] Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

Abstract page for arXiv paper 2603.24940: Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-i...

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24617] Multi-LLM Query Optimization

Abstract page for arXiv paper 2603.24617: Multi-LLM Query Optimization

arXiv - Machine Learning · 3 min · 2 days ago

Llms

[2603.25687] On Neural Scaling Laws for Weather Emulation through Continual Training

Abstract page for arXiv paper 2603.25687: On Neural Scaling Laws for Weather Emulation through Continual Training

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.24857] AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Abstract page for arXiv paper 2603.24857: AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.24846] NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders

Abstract page for arXiv paper 2603.24846: NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Ne...

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.25562] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Abstract page for arXiv paper 2603.25562: Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24804] GoldiCLIP: The Goldilocks Approach for Balancing Explicit Supervision for Language-Image Pretraining

Abstract page for arXiv paper 2603.24804: GoldiCLIP: The Goldilocks Approach for Balancing Explicit Supervision for Language-Image Pretra...

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.24774] From Untestable to Testable: Metamorphic Testing in the Age of LLMs

Abstract page for arXiv paper 2603.24774: From Untestable to Testable: Metamorphic Testing in the Age of LLMs

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24772] Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

Abstract page for arXiv paper 2603.24772: Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Val...

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.25385] GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

Abstract page for arXiv paper 2603.25385: GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25325] How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

Abstract page for arXiv paper 2603.25325: How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24721] Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

Abstract page for arXiv paper 2603.24721: Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.25186] Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserving Synthetic Data Generation

Abstract page for arXiv paper 2603.25186: Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserv...

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.24651] When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Abstract page for arXiv paper 2603.24651: When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

arXiv - AI · 3 min · 2 days ago

Previous Page 6 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

built an open source tool that auto generates AI context files for any codebase, 150 stars in

All Content

[2603.25099] Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

[2603.25063] TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization

[2603.25056] The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

[2603.25052] Closing the Confidence-Faithfulness Gap in Large Language Models

[2603.24989] Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

[2603.24986] Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

[2603.24940] Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

[2603.24617] Multi-LLM Query Optimization

[2603.25687] On Neural Scaling Laws for Weather Emulation through Continual Training

[2603.24857] AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

[2603.24846] NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders

[2603.25562] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

[2603.24804] GoldiCLIP: The Goldilocks Approach for Balancing Explicit Supervision for Language-Image Pretraining

[2603.24774] From Untestable to Testable: Metamorphic Testing in the Age of LLMs

[2603.24772] Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

[2603.25385] GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

[2603.25325] How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

[2603.24721] Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

[2603.25186] Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserving Synthetic Data Generation

[2603.24651] When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Related Topics

Stay updated with AI News