Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min ·
Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.25184] Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
Llms

[2603.25184] Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

Abstract page for arXiv paper 2603.25184: Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reaso...

arXiv - AI · 4 min ·
[2603.25111] SEVerA: Verified Synthesis of Self-Evolving Agents
Llms

[2603.25111] SEVerA: Verified Synthesis of Self-Evolving Agents

Abstract page for arXiv paper 2603.25111: SEVerA: Verified Synthesis of Self-Evolving Agents

arXiv - Machine Learning · 4 min ·
[2603.24629] Sketch2Simulation: Automating Flowsheet Generation via Multi Agent Large Language Models
Llms

[2603.24629] Sketch2Simulation: Automating Flowsheet Generation via Multi Agent Large Language Models

Abstract page for arXiv paper 2603.24629: Sketch2Simulation: Automating Flowsheet Generation via Multi Agent Large Language Models

arXiv - AI · 4 min ·
[2603.25062] SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning
Llms

[2603.25062] SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning

Abstract page for arXiv paper 2603.25062: SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Auto...

arXiv - Machine Learning · 3 min ·
[2603.25040] Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
Llms

[2603.25040] Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Abstract page for arXiv paper 2603.25040: Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

arXiv - Machine Learning · 5 min ·
[2603.24601] FED-HARGPT: A Hybrid Centralized-Federated Approach of a Transformer-based Architecture for Human Context Recognition
Llms

[2603.24601] FED-HARGPT: A Hybrid Centralized-Federated Approach of a Transformer-based Architecture for Human Context Recognition

Abstract page for arXiv paper 2603.24601: FED-HARGPT: A Hybrid Centralized-Federated Approach of a Transformer-based Architecture for Hum...

arXiv - Machine Learning · 3 min ·
[2603.25033] Epistemic Compression: The Case for Deliberate Ignorance in High-Stakes AI
Llms

[2603.25033] Epistemic Compression: The Case for Deliberate Ignorance in High-Stakes AI

Abstract page for arXiv paper 2603.25033: Epistemic Compression: The Case for Deliberate Ignorance in High-Stakes AI

arXiv - Machine Learning · 3 min ·
[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
Llms

[2603.24596] X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

Abstract page for arXiv paper 2603.24596: X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

arXiv - AI · 3 min ·
[2603.24595] Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels
Llms

[2603.24595] Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

Abstract page for arXiv paper 2603.24595: Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

arXiv - AI · 4 min ·
[2506.11680] Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information
Llms

[2506.11680] Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

Abstract page for arXiv paper 2506.11680: Malicious LLM-Based Conversational AI Makes Users Reveal Personal Information

arXiv - AI · 4 min ·
[2603.24883] Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization
Llms

[2603.24883] Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Optimization

Abstract page for arXiv paper 2603.24883: Learning to Staff: Offline Reinforcement Learning and Fine-Tuned LLMs for Warehouse Staffing Op...

arXiv - Machine Learning · 4 min ·
[2603.24844] Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
Llms

[2603.24844] Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

Abstract page for arXiv paper 2603.24844: Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models

arXiv - AI · 4 min ·
[2603.25633] Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?
Llms

[2603.25633] Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance?

Abstract page for arXiv paper 2603.25633: Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment P...

arXiv - AI · 4 min ·
[2603.25498] EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents
Llms

[2603.25498] EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

Abstract page for arXiv paper 2603.25498: EcoThink: A Green Adaptive Inference Framework for Sustainable and Accessible Agents

arXiv - AI · 3 min ·
[2603.24780] Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback
Llms

[2603.24780] Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

Abstract page for arXiv paper 2603.24780: Transformers in the Dark: Navigating Unknown Search Spaces via Bandit Feedback

arXiv - Machine Learning · 4 min ·
[2603.25450] Cross-Model Disagreement as a Label-Free Correctness Signal
Llms

[2603.25450] Cross-Model Disagreement as a Label-Free Correctness Signal

Abstract page for arXiv paper 2603.25450: Cross-Model Disagreement as a Label-Free Correctness Signal

arXiv - AI · 4 min ·
[2603.25412] Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models
Llms

[2603.25412] Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

Abstract page for arXiv paper 2603.25412: Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models

arXiv - AI · 4 min ·
[2603.24709] Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards
Llms

[2603.24709] Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards

Abstract page for arXiv paper 2603.24709: Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated R...

arXiv - Machine Learning · 4 min ·
[2603.24647] Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch
Llms

[2603.24647] Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

Abstract page for arXiv paper 2603.24647: Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

arXiv - Machine Learning · 4 min ·
[2603.25326] Evaluating Language Models for Harmful Manipulation
Llms

[2603.25326] Evaluating Language Models for Harmful Manipulation

Abstract page for arXiv paper 2603.25326: Evaluating Language Models for Harmful Manipulation

arXiv - AI · 4 min ·
Previous Page 7 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime