Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

it is impossible to stop AI chatbots from using quotes (any instance of the character ")

no matter how i phrase it in the instructions, how many times i repeat the rule not to use quotes, and which LLM i use, i have failed to ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Converting XQuery to SQL with Local LLMs: Do I Need Fine-Tuning or a Better Approach? [P]

I am trying to convert XQuery statements into SQL queries within an enterprise context, with the constraint that the solution must rely...

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

AI: Fragility of today's Claude Cowork type AI Agent Apps. RTZ 1061

...realities like memory management, highlight a longer road to resilient AI Agents and AGI

AI Tools & Products · 11 min · about 6 hours ago

All Content

Llms

[2603.02510] ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

Abstract page for arXiv paper 2603.02510: ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evol...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03002] SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

Abstract page for arXiv paper 2603.03002: SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02482] MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Abstract page for arXiv paper 2603.02482: MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03078] RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Abstract page for arXiv paper 2603.03078: RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

Abstract page for arXiv paper 2603.03072: TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03018] REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

Abstract page for arXiv paper 2603.03018: REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise T...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03005] OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

Abstract page for arXiv paper 2603.03005: OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Struct...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02960] Architecting Trust in Artificial Epistemic Agents

Abstract page for arXiv paper 2603.02960: Architecting Trust in Artificial Epistemic Agents

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02939] ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

Abstract page for arXiv paper 2603.02939: ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02908] SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

Abstract page for arXiv paper 2603.02908: SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs with...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02858] LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

Abstract page for arXiv paper 2603.02858: LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for R...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02798] Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

Abstract page for arXiv paper 2603.02798: Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02787] Rethinking Code Similarity for Automated Algorithm Design with LLMs

Abstract page for arXiv paper 2603.02787: Rethinking Code Similarity for Automated Algorithm Design with LLMs

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02680] LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Abstract page for arXiv paper 2603.02680: LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Opt...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02268] PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

Abstract page for arXiv paper 2603.02268: PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differentia...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02626] See and Remember: A Multimodal Agent for Web Traversal

Abstract page for arXiv paper 2603.02626: See and Remember: A Multimodal Agent for Web Traversal

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02599] SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

Abstract page for arXiv paper 2603.02599: SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.02586] LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

Abstract page for arXiv paper 2603.02586: LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02237] Concept Heterogeneity-aware Representation Steering

Abstract page for arXiv paper 2603.02237: Concept Heterogeneity-aware Representation Steering

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02542] AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

Abstract page for arXiv paper 2603.02542: AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical...

arXiv - AI · 4 min · about 2 months ago

Previous Page 208 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

it is impossible to stop AI chatbots from using quotes (any instance of the character ")

Converting XQuery to SQL with Local LLMs: Do I Need Fine-Tuning or a Better Approach? [P]

AI: Fragility of today's Claude Cowork type AI Agent Apps. RTZ 1061

All Content

[2603.02510] ParEVO: Synthesizing Code for Irregular Data: High-Performance Parallelism through Agentic Evolution

[2603.03002] SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

[2603.02482] MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

[2603.03078] RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

[2603.03018] REGAL: A Registry-Driven Architecture for Deterministic Grounding of Agentic AI in Enterprise Telemetry

[2603.03005] OrchMAS: Orchestrated Reasoning with Multi Collaborative Heterogeneous Scientific Expert Structured Agents

[2603.02960] Architecting Trust in Artificial Epistemic Agents

[2603.02939] ShipTraj-R1: Reinforcing Ship Trajectory Prediction in Large Language Models via Group Relative Policy Optimization

[2603.02908] SAE as a Crystal Ball: Interpretable Features Predict Cross-domain Transferability of LLMs without Training

[2603.02858] LLM-based Argument Mining meets Argumentation and Description Logics: a Unified Framework for Reasoning about Debates

[2603.02798] Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

[2603.02787] Rethinking Code Similarity for Automated Algorithm Design with LLMs

[2603.02680] LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

[2603.02268] PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

[2603.02626] See and Remember: A Multimodal Agent for Web Traversal

[2603.02599] SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

[2603.02586] LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

[2603.02237] Concept Heterogeneity-aware Representation Steering

[2603.02542] AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

Related Topics

Stay updated with AI News