Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Claude on Claude

The Story of Anthropic’s Latest Controversies Regarding the Business of Its Prized Creation… As Told by the Thing Itself. Editor’s note: ...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Cut Claude usage by ~85% in a job search pipeline (16k → 900 tokens/app) — here’s what worked

Like many here, I kept running into Claude usage limits when building anything non-trivial. I was working with a job search automation pi...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

"Authoritarian Parents In Rationalist Clothes": a piece I wrote in December about alignment

Posted today in light of the Claude Mythos model card release. Originally I wrote this for r/ControlProblem but realized it was getting o...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

All Content

Llms

[2603.04444] vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

Abstract page for arXiv paper 2603.04444: vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04436] ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation

Abstract page for arXiv paper 2603.04436: ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04443] AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Abstract page for arXiv paper 2603.04443: AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04429] What Is Missing: Interpretable Ratings for Large Language Model Outputs

Abstract page for arXiv paper 2603.04429: What Is Missing: Interpretable Ratings for Large Language Model Outputs

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04428] Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices

Abstract page for arXiv paper 2603.04428: Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Dev...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04421] Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

Abstract page for arXiv paper 2603.04421: Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models

Abstract page for arXiv paper 2603.04419: Context-Dependent Affordance Computation in Vision-Language Models

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04413] Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

Abstract page for arXiv paper 2603.04413: Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Me...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

Abstract page for arXiv paper 2603.04411: One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.04410] SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

Abstract page for arXiv paper 2603.04410: SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04409] Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

Abstract page for arXiv paper 2603.04409: Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04406] CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

Abstract page for arXiv paper 2603.04406: CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG M...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04407] Semantic Containment as a Fundamental Property of Emergent Misalignment

Abstract page for arXiv paper 2603.04407: Semantic Containment as a Fundamental Property of Emergent Misalignment

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04405] Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

Abstract page for arXiv paper 2603.04405: Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

Abstract page for arXiv paper 2603.05498: The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05485] Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Abstract page for arXiv paper 2603.05485: Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05399] Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

Abstract page for arXiv paper 2603.05399: Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05392] Legal interpretation and AI: from expert systems to argumentation and LLMs

Abstract page for arXiv paper 2603.05392: Legal interpretation and AI: from expert systems to argumentation and LLMs

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05294] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Abstract page for arXiv paper 2603.05294: STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05290] X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Abstract page for arXiv paper 2603.05290: X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

arXiv - AI · 4 min · about 1 month ago

Previous Page 111 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Claude on Claude

Cut Claude usage by ~85% in a job search pipeline (16k → 900 tokens/app) — here’s what worked

"Authoritarian Parents In Rationalist Clothes": a piece I wrote in December about alignment

All Content

[2603.04444] vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

[2603.04436] ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation

[2603.04443] AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

[2603.04429] What Is Missing: Interpretable Ratings for Large Language Model Outputs

[2603.04428] Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices

[2603.04421] Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models

[2603.04413] Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

[2603.04410] SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

[2603.04409] Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

[2603.04406] CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

[2603.04407] Semantic Containment as a Fundamental Property of Emergent Misalignment

[2603.04405] Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

[2603.05485] Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

[2603.05399] Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

[2603.05392] Legal interpretation and AI: from expert systems to argumentation and LLMs

[2603.05294] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

[2603.05290] X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Related Topics

Stay updated with AI News