Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Transformer Math Explorer [P]

This is an interactive math reference for transformer models, presented via dataflow graphs, all the way down to elementary math. Covers ...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

Spotify wants to become the home for AI-generated personal audio | TechCrunch

Users will be able to create a podcast from Codex or Claude Code and import it to Spotify

TechCrunch - AI · 3 min · about 1 hour ago

Llms

We built something ChatGPT doesn't do — AI that delivers results, not answers

Most AI gives you text. We built cards. Here's what I mean. When you ask LookMood Agent to find you a job, you don't get advice on where ...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

All Content

Llms

[2603.03080] Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

Abstract page for arXiv paper 2603.03080: Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Reco...

arXiv - AI · 3 min · 2 months ago

Llms

[2603.03116] Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

Abstract page for arXiv paper 2603.03116: Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02510] ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Abstract page for arXiv paper 2603.02510: ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evol...

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2603.03002] SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

Abstract page for arXiv paper 2603.03002: SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02482] MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Abstract page for arXiv paper 2603.02482: MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2603.03078] RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Abstract page for arXiv paper 2603.03078: RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

arXiv - AI · 4 min · 2 months ago

Llms

[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

Abstract page for arXiv paper 2603.03072: TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

arXiv - AI · 4 min · 2 months ago

Llms

[2603.03018] REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

Abstract page for arXiv paper 2603.03018: REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise T...

arXiv - AI · 4 min · 2 months ago

Llms

[2603.03005] OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

Abstract page for arXiv paper 2603.03005: OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Struct...

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02960] Architecting Trust in Artificial Epistemic Agents

Abstract page for arXiv paper 2603.02960: Architecting Trust in Artificial Epistemic Agents

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02939] ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

Abstract page for arXiv paper 2603.02939: ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative...

arXiv - AI · 3 min · 2 months ago

Llms

[2603.02908] SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

Abstract page for arXiv paper 2603.02908: SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs with...

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02858] LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

Abstract page for arXiv paper 2603.02858: LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for R...

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02798] Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

Abstract page for arXiv paper 2603.02798: Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

arXiv - AI · 3 min · 2 months ago

Llms

[2603.02787] Rethinking Code Similarity for Automated Algorithm Design with LLMs

Abstract page for arXiv paper 2603.02787: Rethinking Code Similarity for Automated Algorithm Design with LLMs

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02680] LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Abstract page for arXiv paper 2603.02680: LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Opt...

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02268] PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

Abstract page for arXiv paper 2603.02268: PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differentia...

arXiv - AI · 4 min · 2 months ago

Llms

[2603.02626] See and Remember: A Multimodal Agent for Web Traversal

Abstract page for arXiv paper 2603.02626: See and Remember: A Multimodal Agent for Web Traversal

arXiv - AI · 3 min · 2 months ago

Llms

[2603.02599] SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

Abstract page for arXiv paper 2603.02599: SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2603.02586] LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

Abstract page for arXiv paper 2603.02586: LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

arXiv - AI · 3 min · 2 months ago

Previous Page 331 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Transformer Math Explorer [P]

Spotify wants to become the home for AI-generated personal audio | TechCrunch

We built something ChatGPT doesn't do — AI that delivers results, not answers

All Content

[2603.03080] Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

[2603.03116] Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

[2603.02510] ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

[2603.03002] SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

[2603.02482] MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

[2603.03078] RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

[2603.03018] REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

[2603.03005] OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

[2603.02960] Architecting Trust in Artificial Epistemic Agents

[2603.02939] ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

[2603.02908] SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

[2603.02858] LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

[2603.02798] Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

[2603.02787] Rethinking Code Similarity for Automated Algorithm Design with LLMs

[2603.02680] LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

[2603.02268] PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

[2603.02626] See and Remember: A Multimodal Agent for Web Traversal

[2603.02599] SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

[2603.02586] LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

Related Topics

Stay updated with AI News