Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

All Content

Llms

[2603.25184] Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

Abstract page for arXiv paper 2603.25184: Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reaso...

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25111] SEVerA: Verified Synthesis of Self-Evolving Agents

Abstract page for arXiv paper 2603.25111: SEVerA: Verified Synthesis of Self-Evolving Agents

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.24629] Sketch2Simulation: Automating Flowsheet Generation via Multi Agent Large Language Models

Abstract page for arXiv paper 2603.24629: Sketch2Simulation: Automating Flowsheet Generation via Multi Agent Large Language Models

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25062] SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning

Abstract page for arXiv paper 2603.25062: SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Auto...

arXiv - Machine Learning · 3 min · 2 days ago

Llms

[2603.25040] Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Abstract page for arXiv paper 2603.25040: Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

arXiv - Machine Learning · 5 min · 2 days ago

Llms

[2603.24601] FED-HARGPT: A Hybrid Centralized-Federated Approach of a Transformer-based Architecture for Human Context Recognition

Abstract page for arXiv paper 2603.24601: FED-HARGPT: A Hybrid Centralized-Federated Approach of a Transformer-based Architecture for Hum...

arXiv - Machine Learning · 3 min · 2 days ago

Llms

[2603.25033] Epistemic Compression: The Case for Deliberate Ignorance in High-Stakes AI

Abstract page for arXiv paper 2603.25033: Epistemic Compression: The Case for Deliberate Ignorance in High-Stakes AI

arXiv - Machine Learning · 3 min · 2 days ago

Llms

[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

Abstract page for arXiv paper 2603.24596: X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24595] Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Abstract page for arXiv paper 2603.24595: Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

arXiv - AI · 4 min · 2 days ago

Llms

[2506.11680] Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

Abstract page for arXiv paper 2506.11680: Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24883] Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization

Abstract page for arXiv paper 2603.24883: Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Op...

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.24844] Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

Abstract page for arXiv paper 2603.24844: Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25633] Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

Abstract page for arXiv paper 2603.25633: Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment P...

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25498] EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

Abstract page for arXiv paper 2603.25498: EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

arXiv - AI · 3 min · 2 days ago

Llms

[2603.24780] Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

Abstract page for arXiv paper 2603.24780: Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.25450] Cross-Model Disagreement as a Label-Free Correctness Signal

Abstract page for arXiv paper 2603.25450: Cross-Model Disagreement as a Label-Free Correctness Signal

arXiv - AI · 4 min · 2 days ago

Llms

[2603.25412] Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

Abstract page for arXiv paper 2603.25412: Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

arXiv - AI · 4 min · 2 days ago

Llms

[2603.24709] Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards

Abstract page for arXiv paper 2603.24709: Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated R...

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.24647] Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

Abstract page for arXiv paper 2603.24647: Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

arXiv - Machine Learning · 4 min · 2 days ago

Llms

[2603.25326] Evaluating Language Models for Harmful Manipulation

Abstract page for arXiv paper 2603.25326: Evaluating Language Models for Harmful Manipulation

arXiv - AI · 4 min · 2 days ago

Previous Page 7 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

built an open source tool that auto generates AI context files for any codebase, 150 stars in

All Content

[2603.25184] Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

[2603.25111] SEVerA: Verified Synthesis of Self-Evolving Agents

[2603.24629] Sketch2Simulation: Automating Flowsheet Generation via Multi Agent Large Language Models

[2603.25062] SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning

[2603.25040] Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

[2603.24601] FED-HARGPT: A Hybrid Centralized-Federated Approach of a Transformer-based Architecture for Human Context Recognition

[2603.25033] Epistemic Compression: The Case for Deliberate Ignorance in High-Stakes AI

[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

[2603.24595] Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

[2506.11680] Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

[2603.24883] Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization

[2603.24844] Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

[2603.25633] Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

[2603.25498] EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

[2603.24780] Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

[2603.25450] Cross-Model Disagreement as a Label-Free Correctness Signal

[2603.25412] Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

[2603.24709] Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards

[2603.24647] Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

[2603.25326] Evaluating Language Models for Harmful Manipulation

Related Topics

Stay updated with AI News