Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min ·
Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.25099] Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization
Llms

[2603.25099] Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

Abstract page for arXiv paper 2603.25099: Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Opti...

arXiv - AI · 4 min ·
[2603.25063] TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization
Llms

[2603.25063] TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization

Abstract page for arXiv paper 2603.25063: TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visual...

arXiv - Machine Learning · 4 min ·
[2603.25056] The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities
Llms

[2603.25056] The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Abstract page for arXiv paper 2603.25056: The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Create...

arXiv - AI · 4 min ·
[2603.25052] Closing the Confidence-Faithfulness Gap in Large Language Models
Llms

[2603.25052] Closing the Confidence-Faithfulness Gap in Large Language Models

Abstract page for arXiv paper 2603.25052: Closing the Confidence-Faithfulness Gap in Large Language Models

arXiv - AI · 3 min ·
[2603.24989] Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model
Llms

[2603.24989] Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

Abstract page for arXiv paper 2603.24989: Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

arXiv - AI · 4 min ·
[2603.24986] Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators
Llms

[2603.24986] Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

Abstract page for arXiv paper 2603.24986: Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

arXiv - AI · 3 min ·
[2603.24940] Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system
Llms

[2603.24940] Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

Abstract page for arXiv paper 2603.24940: Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-i...

arXiv - AI · 4 min ·
[2603.24617] Multi-LLM Query Optimization
Llms

[2603.24617] Multi-LLM Query Optimization

Abstract page for arXiv paper 2603.24617: Multi-LLM Query Optimization

arXiv - Machine Learning · 3 min ·
[2603.25687] On Neural Scaling Laws for Weather Emulation through Continual Training
Llms

[2603.25687] On Neural Scaling Laws for Weather Emulation through Continual Training

Abstract page for arXiv paper 2603.25687: On Neural Scaling Laws for Weather Emulation through Continual Training

arXiv - Machine Learning · 4 min ·
[2603.24857] AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective
Llms

[2603.24857] AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Abstract page for arXiv paper 2603.24857: AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

arXiv - Machine Learning · 4 min ·
[2603.24846] NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders
Llms

[2603.24846] NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Neurological Disorders

Abstract page for arXiv paper 2603.24846: NeuroVLM-Bench: Evaluation of Vision-Enabled Large Language Models for Clinical Reasoning in Ne...

arXiv - Machine Learning · 4 min ·
[2603.25562] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
Llms

[2603.25562] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Abstract page for arXiv paper 2603.25562: Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

arXiv - AI · 4 min ·
[2603.24804] GoldiCLIP: The Goldilocks Approach for Balancing Explicit Supervision for Language-Image Pretraining
Llms

[2603.24804] GoldiCLIP: The Goldilocks Approach for Balancing Explicit Supervision for Language-Image Pretraining

Abstract page for arXiv paper 2603.24804: GoldiCLIP: The Goldilocks Approach for Balancing Explicit Supervision for Language-Image Pretra...

arXiv - Machine Learning · 4 min ·
[2603.24774] From Untestable to Testable: Metamorphic Testing in the Age of LLMs
Llms

[2603.24774] From Untestable to Testable: Metamorphic Testing in the Age of LLMs

Abstract page for arXiv paper 2603.24774: From Untestable to Testable: Metamorphic Testing in the Age of LLMs

arXiv - AI · 3 min ·
[2603.24772] Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset
Llms

[2603.24772] Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

Abstract page for arXiv paper 2603.24772: Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Val...

arXiv - Machine Learning · 4 min ·
[2603.25385] GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs
Llms

[2603.25385] GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

Abstract page for arXiv paper 2603.25385: GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

arXiv - AI · 4 min ·
[2603.25325] How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
Llms

[2603.25325] How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

Abstract page for arXiv paper 2603.25325: How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

arXiv - AI · 4 min ·
[2603.24721] Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models
Llms

[2603.24721] Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

Abstract page for arXiv paper 2603.24721: Scalable Object Relation Encoding for Better 3D Spatial Reasoning in Large Language Models

arXiv - Machine Learning · 4 min ·
[2603.25186] Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserving Synthetic Data Generation
Llms

[2603.25186] Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserving Synthetic Data Generation

Abstract page for arXiv paper 2603.25186: Knowledge-Guided Retrieval-Augmented Generation for Zero-Shot Psychiatric Data: Privacy Preserv...

arXiv - Machine Learning · 4 min ·
[2603.24651] When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews
Llms

[2603.24651] When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

Abstract page for arXiv paper 2603.24651: When Consistency Becomes Bias: Interviewer Effects in Semi-Structured Clinical Interviews

arXiv - AI · 3 min ·
Previous Page 6 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime