Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]

LLM-Based task classifier tend to misroute prompts that look simple at first glance, but require deeper understanding - I call it "Type I...

Reddit - Machine Learning · 1 min · 3 minutes ago

Llms

I asked ChatGPT and Gemini to generate a world map

submitted by /u/Pitiful-Entrance5769 [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Cant wait to use Mythos model - Anthropic refuses to release Claude Mythos publicly — model found thousands of zero-days across every major OS and browser. Launches Project Glasswing with Apple, Microsoft, Google, and others for defensive use.

Anthropic announced Project Glasswing, a defensive cybersecurity initiative with Apple, Microsoft, Google, AWS, NVIDIA, CrowdStrike, and ...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

All Content

Llms

[2603.05432] Ensembling Language Models with Sequential Monte Carlo

Abstract page for arXiv paper 2603.05432: Ensembling Language Models with Sequential Monte Carlo

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.05421] MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis

Abstract page for arXiv paper 2603.05421: MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.05308] Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

Abstract page for arXiv paper 2603.05308: Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05210] Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding

Abstract page for arXiv paper 2603.05210: Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.05299] WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

Abstract page for arXiv paper 2603.05299: WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.05167] C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning

Abstract page for arXiv paper 2603.05167: C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reas...

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05121] Measuring the Redundancy of Decoder Layers in SpeechLLMs

Abstract page for arXiv paper 2603.05121: Measuring the Redundancy of Decoder Layers in SpeechLLMs

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04982] Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

Abstract page for arXiv paper 2603.04982: Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04976] 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

Abstract page for arXiv paper 2603.04976: 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04968] When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger

Abstract page for arXiv paper 2603.04968: When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04918] BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

Abstract page for arXiv paper 2603.04918: BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforc...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.04893] Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models

Abstract page for arXiv paper 2603.04893: Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04819] On the Strengths and Weaknesses of Data for Open-set Embodied Assistance

Abstract page for arXiv paper 2603.04819: On the Strengths and Weaknesses of Data for Open-set Embodied Assistance

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04805] Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation

Abstract page for arXiv paper 2603.04805: Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04799] Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm

Abstract page for arXiv paper 2603.04799: Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04772] TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings

Abstract page for arXiv paper 2603.04772: TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04763] Evaluating GPT-5 as a Multimodal Clinical Reasoner: A Landscape Commentary

Abstract page for arXiv paper 2603.04763: Evaluating GPT-5 as a Multimodal Clinical Reasoner: A Landscape Commentary

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04743] DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

Abstract page for arXiv paper 2603.04743: DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04759] Stacked from One: Multi-Scale Self-Injection for Context Window Extension

Abstract page for arXiv paper 2603.04759: Stacked from One: Multi-Scale Self-Injection for Context Window Extension

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04727] Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

Abstract page for arXiv paper 2603.04727: Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in t...

arXiv - AI · 4 min · about 1 month ago

Previous Page 129 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]

I asked ChatGPT and Gemini to generate a world map

Cant wait to use Mythos model - Anthropic refuses to release Claude Mythos publicly — model found thousands of zero-days across every major OS and browser. Launches Project Glasswing with Apple, Microsoft, Google, and others for defensive use.

All Content

[2603.05432] Ensembling Language Models with Sequential Monte Carlo

[2603.05421] MobileFetalCLIP: Selective Repulsive Knowledge Distillation for Mobile Fetal Ultrasound Analysis

[2603.05308] Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

[2603.05210] Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding

[2603.05299] WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

[2603.05167] C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning

[2603.05121] Measuring the Redundancy of Decoder Layers in SpeechLLMs

[2603.04982] Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

[2603.04976] 3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding

[2603.04968] When Weak LLMs Speak with Confidence, Preference Alignment Gets Stronger

[2603.04918] BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

[2603.04893] Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models

[2603.04819] On the Strengths and Weaknesses of Data for Open-set Embodied Assistance

[2603.04805] Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation

[2603.04799] Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm

[2603.04772] TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings

[2603.04763] Evaluating GPT-5 as a Multimodal Clinical Reasoner: A Landscape Commentary

[2603.04743] DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval

[2603.04759] Stacked from One: Multi-Scale Self-Injection for Context Window Extension

[2603.04727] Are Multimodal LLMs Ready for Surveillance? A Reality Check on Zero-Shot Anomaly Detection in the Wild

Related Topics

Stay updated with AI News