Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

OpenAI starts laying foundations for ChatGPT ads in EU

submitted by /u/ThereWas [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

How do you use AI Agents for EDA/Data Analysis and getting it ready for ML model training? [D]

Like in manual workflow I would study the given data by using various functions like pd.info() and all column wise, remove null, outliers...

Reddit - Machine Learning · 1 min · about 4 hours ago

Llms

I built a router that automatically sends your AI tasks to the most appropriate model to handle them at low cost - 9,200 tasks in, $21 saved at $0.14 actual cost

The observation that started this: most of what people use AI for every day - summarising, drafting, classifying, extracting etc doesn't ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

All Content

Llms

[2603.03116] Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

Abstract page for arXiv paper 2603.03116: Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02510] ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Abstract page for arXiv paper 2603.02510: ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evol...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03002] SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

Abstract page for arXiv paper 2603.03002: SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02482] MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Abstract page for arXiv paper 2603.02482: MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03078] RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Abstract page for arXiv paper 2603.03078: RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

Abstract page for arXiv paper 2603.03072: TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03018] REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

Abstract page for arXiv paper 2603.03018: REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise T...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03005] OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

Abstract page for arXiv paper 2603.03005: OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Struct...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02960] Architecting Trust in Artificial Epistemic Agents

Abstract page for arXiv paper 2603.02960: Architecting Trust in Artificial Epistemic Agents

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02939] ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

Abstract page for arXiv paper 2603.02939: ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02908] SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

Abstract page for arXiv paper 2603.02908: SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs with...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02858] LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

Abstract page for arXiv paper 2603.02858: LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for R...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02798] Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

Abstract page for arXiv paper 2603.02798: Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02787] Rethinking Code Similarity for Automated Algorithm Design with LLMs

Abstract page for arXiv paper 2603.02787: Rethinking Code Similarity for Automated Algorithm Design with LLMs

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02680] LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Abstract page for arXiv paper 2603.02680: LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Opt...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02268] PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

Abstract page for arXiv paper 2603.02268: PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differentia...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02626] See and Remember: A Multimodal Agent for Web Traversal

Abstract page for arXiv paper 2603.02626: See and Remember: A Multimodal Agent for Web Traversal

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02599] SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

Abstract page for arXiv paper 2603.02599: SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.02586] LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

Abstract page for arXiv paper 2603.02586: LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02237] Concept Heterogeneity-aware Representation Steering

Abstract page for arXiv paper 2603.02237: Concept Heterogeneity-aware Representation Steering

arXiv - AI · 4 min · about 2 months ago

Previous Page 297 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

OpenAI starts laying foundations for ChatGPT ads in EU

How do you use AI Agents for EDA/Data Analysis and getting it ready for ML model training? [D]

I built a router that automatically sends your AI tasks to the most appropriate model to handle them at low cost - 9,200 tasks in, $21 saved at $0.14 actual cost

All Content

[2603.03116] Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

[2603.02510] ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

[2603.03002] SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

[2603.02482] MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

[2603.03078] RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

[2603.03018] REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

[2603.03005] OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

[2603.02960] Architecting Trust in Artificial Epistemic Agents

[2603.02939] ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

[2603.02908] SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

[2603.02858] LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

[2603.02798] Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

[2603.02787] Rethinking Code Similarity for Automated Algorithm Design with LLMs

[2603.02680] LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

[2603.02268] PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

[2603.02626] See and Remember: A Multimodal Agent for Web Traversal

[2603.02599] SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

[2603.02586] LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

[2603.02237] Concept Heterogeneity-aware Representation Steering

Related Topics

Stay updated with AI News