Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

I compiled every major AI agent security incident from 2024-2026 in one place - 90 incidents, all sourced, updated weekly

After tracking AI agent security incidents for the past year, I put together a single reference covering every major breach, vulnerabilit...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]

LLM-Based task classifier tend to misroute prompts that look simple at first glance, but require deeper understanding - I call it "Type I...

Reddit - Machine Learning · 1 min · about 4 hours ago

Llms

I asked ChatGPT and Gemini to generate a world map

submitted by /u/Pitiful-Entrance5769 [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

All Content

Llms

[2603.04421] Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

Abstract page for arXiv paper 2603.04421: Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models

Abstract page for arXiv paper 2603.04419: Context-Dependent Affordance Computation in Vision-Language Models

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04413] Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

Abstract page for arXiv paper 2603.04413: Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Me...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

Abstract page for arXiv paper 2603.04411: One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.04410] SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

Abstract page for arXiv paper 2603.04410: SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04409] Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

Abstract page for arXiv paper 2603.04409: Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04406] CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

Abstract page for arXiv paper 2603.04406: CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG M...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04407] Semantic Containment as a Fundamental Property of Emergent Misalignment

Abstract page for arXiv paper 2603.04407: Semantic Containment as a Fundamental Property of Emergent Misalignment

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04405] Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

Abstract page for arXiv paper 2603.04405: Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

Abstract page for arXiv paper 2603.05498: The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05485] Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Abstract page for arXiv paper 2603.05485: Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05399] Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

Abstract page for arXiv paper 2603.05399: Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05392] Legal interpretation and AI: from expert systems to argumentation and LLMs

Abstract page for arXiv paper 2603.05392: Legal interpretation and AI: from expert systems to argumentation and LLMs

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05294] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Abstract page for arXiv paper 2603.05294: STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05290] X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Abstract page for arXiv paper 2603.05290: X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05240] GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

Abstract page for arXiv paper 2603.05240: GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05129] MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus

Abstract page for arXiv paper 2603.05129: MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty C...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05120] Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

Abstract page for arXiv paper 2603.05120: Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Re...

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05044] WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

Abstract page for arXiv paper 2603.05044: WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05040] Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

Abstract page for arXiv paper 2603.05040: Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

arXiv - AI · 3 min · about 1 month ago

Previous Page 131 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

I compiled every major AI agent security incident from 2024-2026 in one place - 90 incidents, all sourced, updated weekly

[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]

I asked ChatGPT and Gemini to generate a world map

All Content

[2603.04421] Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models

[2603.04413] Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

[2603.04410] SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

[2603.04409] Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

[2603.04406] CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

[2603.04407] Semantic Containment as a Fundamental Property of Emergent Misalignment

[2603.04405] Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

[2603.05485] Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

[2603.05399] Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

[2603.05392] Legal interpretation and AI: from expert systems to argumentation and LLMs

[2603.05294] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

[2603.05290] X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

[2603.05240] GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

[2603.05129] MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus

[2603.05120] Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

[2603.05044] WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

[2603.05040] Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

Related Topics

Stay updated with AI News