Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min · 19 minutes ago

Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

https://futurism.com/artificial-intelligence/paper-ai-chatbots-chatgpt-claude-sycophantic Your AI chatbot isn’t neutral. Trust its advice...

Reddit - Artificial Intelligence · 1 min · 19 minutes ago

Llms

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

Anthropic says “human error” resulted in a leak that exposed Claude Code’s source code. The leaked code, which has since been copied to G...

The Verge - AI · 4 min · about 1 hour ago

All Content

Llms

[2512.03903] BERnaT: Basque Encoders for Representing Natural Textual Diversity

Abstract page for arXiv paper 2512.03903: BERnaT: Basque Encoders for Representing Natural Textual Diversity

arXiv - AI · 3 min · 8 days ago

Llms

[2512.05959] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

Abstract page for arXiv paper 2512.05959: M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

arXiv - AI · 4 min · 8 days ago

Llms

[2511.23455] The Price of Progress: Price Performance and the Future of AI

Abstract page for arXiv paper 2511.23455: The Price of Progress: Price Performance and the Future of AI

arXiv - Machine Learning · 4 min · 8 days ago

Llms

[2511.19299] Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

Abstract page for arXiv paper 2511.19299: Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

arXiv - Machine Learning · 4 min · 8 days ago

Llms

[2511.22169] Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

Abstract page for arXiv paper 2511.22169: Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

arXiv - AI · 4 min · 8 days ago

Llms

[2511.17561] LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

Abstract page for arXiv paper 2511.17561: LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

arXiv - AI · 3 min · 8 days ago

Llms

[2511.14977] SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification

Abstract page for arXiv paper 2511.14977: SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification

arXiv - AI · 4 min · 8 days ago

Llms

[2511.11828] Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

Abstract page for arXiv paper 2511.11828: Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

arXiv - Machine Learning · 4 min · 8 days ago

Llms

[2511.06174] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

Abstract page for arXiv paper 2511.06174: LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

arXiv - AI · 4 min · 8 days ago

Llms

[2510.27543] DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models

Abstract page for arXiv paper 2510.27543: DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Mo...

arXiv - AI · 4 min · 8 days ago

Llms

[2510.13232] What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

Abstract page for arXiv paper 2510.13232: What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

arXiv - AI · 4 min · 8 days ago

Llms

[2510.08138] Understanding Temporal Logic Consistency in Video-Language Models through Cross-Modal Attention Discriminability

Abstract page for arXiv paper 2510.08138: Understanding Temporal Logic Consistency in Video-Language Models through Cross-Modal Attention...

arXiv - AI · 4 min · 8 days ago

Llms

[2510.06638] StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering

Abstract page for arXiv paper 2510.06638: StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering

arXiv - AI · 4 min · 8 days ago

Llms

[2510.05181] Auditing Pay-Per-Token in Large Language Models

Abstract page for arXiv paper 2510.05181: Auditing Pay-Per-Token in Large Language Models

arXiv - AI · 4 min · 8 days ago

Llms

[2510.05092] Learning to Interpret Weight Differences in Language Models

Abstract page for arXiv paper 2510.05092: Learning to Interpret Weight Differences in Language Models

arXiv - Machine Learning · 4 min · 8 days ago

Llms

[2510.02375] Pretraining with hierarchical memories: separating long-tail and common knowledge

Abstract page for arXiv paper 2510.02375: Pretraining with hierarchical memories: separating long-tail and common knowledge

arXiv - Machine Learning · 4 min · 8 days ago

Llms

[2510.02249] Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

Abstract page for arXiv paper 2510.02249: Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

arXiv - Machine Learning · 4 min · 8 days ago

Llms

[2510.01037] CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

Abstract page for arXiv paper 2510.01037: CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

arXiv - Machine Learning · 4 min · 8 days ago

Llms

[2508.05694] DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Language Models

Abstract page for arXiv paper 2508.05694: DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Langu...

arXiv - AI · 4 min · 8 days ago

Llms

[2507.08704] Knowledge Fusion via Bidirectional Information Aggregation

Abstract page for arXiv paper 2507.08704: Knowledge Fusion via Bidirectional Information Aggregation

arXiv - AI · 4 min · 8 days ago

Previous Page 46 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

All Content

[2512.03903] BERnaT: Basque Encoders for Representing Natural Textual Diversity

[2512.05959] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

[2511.23455] The Price of Progress: Price Performance and the Future of AI

[2511.19299] Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

[2511.22169] Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

[2511.17561] LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

[2511.14977] SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification

[2511.11828] Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

[2511.06174] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

[2510.27543] DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models

[2510.13232] What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

[2510.08138] Understanding Temporal Logic Consistency in Video-Language Models through Cross-Modal Attention Discriminability

[2510.06638] StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering

[2510.05181] Auditing Pay-Per-Token in Large Language Models

[2510.05092] Learning to Interpret Weight Differences in Language Models

[2510.02375] Pretraining with hierarchical memories: separating long-tail and common knowledge

[2510.02249] Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

[2510.01037] CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

[2508.05694] DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Language Models

[2507.08704] Knowledge Fusion via Bidirectional Information Aggregation

Related Topics

Stay updated with AI News