Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Gemini Can Now Create AI Images Using Your Own Photos and Videos

Gemini, Google Photos, Nano Banana 2, and Personal Intelligence have all combined to give you new features through the AI prompt box.

AI Tools & Products · 6 min · 17 minutes ago

Llms

Claude Mythos: Finance ministers and top bankers raise serious concerns about AI model

Experts say Mythos potentially has an unprecedented ability to identify and exploit cyber-security weaknesses.

AI Tools & Products · 6 min · 17 minutes ago

Llms

What is Anthopic's Claude Mythos and what risks does it pose?

The company's claim the AI tool can outperform humans at some hacking and cyber-security tasks has sparked fears in the financial world.

AI Tools & Products · 6 min · 17 minutes ago

All Content

Llms

[2603.04421] Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

Abstract page for arXiv paper 2603.04421: Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models

Abstract page for arXiv paper 2603.04419: Context-Dependent Affordance Computation in Vision-Language Models

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04413] Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

Abstract page for arXiv paper 2603.04413: Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Me...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

Abstract page for arXiv paper 2603.04411: One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.04410] SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

Abstract page for arXiv paper 2603.04410: SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04409] Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

Abstract page for arXiv paper 2603.04409: Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04406] CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

Abstract page for arXiv paper 2603.04406: CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG M...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04407] Semantic Containment as a Fundamental Property of Emergent Misalignment

Abstract page for arXiv paper 2603.04407: Semantic Containment as a Fundamental Property of Emergent Misalignment

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04405] Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

Abstract page for arXiv paper 2603.04405: Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

Abstract page for arXiv paper 2603.05498: The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05485] Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Abstract page for arXiv paper 2603.05485: Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05399] Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

Abstract page for arXiv paper 2603.05399: Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05392] Legal interpretation and AI: from expert systems to argumentation and LLMs

Abstract page for arXiv paper 2603.05392: Legal interpretation and AI: from expert systems to argumentation and LLMs

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05294] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Abstract page for arXiv paper 2603.05294: STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05290] X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Abstract page for arXiv paper 2603.05290: X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05240] GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

Abstract page for arXiv paper 2603.05240: GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05129] MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus

Abstract page for arXiv paper 2603.05129: MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty C...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05120] Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

Abstract page for arXiv paper 2603.05120: Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Re...

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05044] WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

Abstract page for arXiv paper 2603.05044: WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05040] Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

Abstract page for arXiv paper 2603.05040: Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

arXiv - AI · 3 min · about 1 month ago

Previous Page 189 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Gemini Can Now Create AI Images Using Your Own Photos and Videos

Claude Mythos: Finance ministers and top bankers raise serious concerns about AI model

What is Anthopic's Claude Mythos and what risks does it pose?

All Content

[2603.04421] Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?

[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models

[2603.04413] Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

[2603.04410] SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

[2603.04409] Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

[2603.04406] CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

[2603.04407] Semantic Containment as a Fundamental Property of Emergent Misalignment

[2603.04405] Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

[2603.05485] Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

[2603.05399] Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

[2603.05392] Legal interpretation and AI: from expert systems to argumentation and LLMs

[2603.05294] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

[2603.05290] X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

[2603.05240] GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

[2603.05129] MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus

[2603.05120] Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

[2603.05044] WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

[2603.05040] Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

Related Topics

Stay updated with AI News