Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min ·
Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

https://futurism.com/artificial-intelligence/paper-ai-chatbots-chatgpt-claude-sycophantic Your AI chatbot isn’t neutral. Trust its advice...

Reddit - Artificial Intelligence · 1 min ·
Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge
Llms

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

Anthropic says “human error” resulted in a leak that exposed Claude Code’s source code. The leaked code, which has since been copied to G...

The Verge - AI · 4 min ·

All Content

[2512.03903] BERnaT: Basque Encoders for Representing Natural Textual Diversity
Llms

[2512.03903] BERnaT: Basque Encoders for Representing Natural Textual Diversity

Abstract page for arXiv paper 2512.03903: BERnaT: Basque Encoders for Representing Natural Textual Diversity

arXiv - AI · 3 min ·
[2512.05959] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
Llms

[2512.05959] M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

Abstract page for arXiv paper 2512.05959: M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

arXiv - AI · 4 min ·
[2511.23455] The Price of Progress: Price Performance and the Future of AI
Llms

[2511.23455] The Price of Progress: Price Performance and the Future of AI

Abstract page for arXiv paper 2511.23455: The Price of Progress: Price Performance and the Future of AI

arXiv - Machine Learning · 4 min ·
[2511.19299] Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning
Llms

[2511.19299] Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

Abstract page for arXiv paper 2511.19299: Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

arXiv - Machine Learning · 4 min ·
[2511.22169] Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization
Llms

[2511.22169] Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

Abstract page for arXiv paper 2511.22169: Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization

arXiv - AI · 4 min ·
[2511.17561] LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
Llms

[2511.17561] LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

Abstract page for arXiv paper 2511.17561: LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

arXiv - AI · 3 min ·
[2511.14977] SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification
Llms

[2511.14977] SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification

Abstract page for arXiv paper 2511.14977: SVBRD-LLM: Self-Verifying Behavioral Rule Discovery for Autonomous Vehicle Identification

arXiv - AI · 4 min ·
[2511.11828] Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
Llms

[2511.11828] Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

Abstract page for arXiv paper 2511.11828: Conformal Constrained Policy Optimization for Cost-Effective LLM Agents

arXiv - Machine Learning · 4 min ·
[2511.06174] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
Llms

[2511.06174] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

Abstract page for arXiv paper 2511.06174: LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

arXiv - AI · 4 min ·
[2510.27543] DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models
Llms

[2510.27543] DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models

Abstract page for arXiv paper 2510.27543: DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Mo...

arXiv - AI · 4 min ·
[2510.13232] What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging
Llms

[2510.13232] What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

Abstract page for arXiv paper 2510.13232: What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

arXiv - AI · 4 min ·
[2510.08138] Understanding Temporal Logic Consistency in Video-Language Models through Cross-Modal Attention Discriminability
Llms

[2510.08138] Understanding Temporal Logic Consistency in Video-Language Models through Cross-Modal Attention Discriminability

Abstract page for arXiv paper 2510.08138: Understanding Temporal Logic Consistency in Video-Language Models through Cross-Modal Attention...

arXiv - AI · 4 min ·
[2510.06638] StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering
Llms

[2510.06638] StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering

Abstract page for arXiv paper 2510.06638: StaR-KVQA: Structured Reasoning Traces for Implicit-Knowledge Visual Question Answering

arXiv - AI · 4 min ·
[2510.05181] Auditing Pay-Per-Token in Large Language Models
Llms

[2510.05181] Auditing Pay-Per-Token in Large Language Models

Abstract page for arXiv paper 2510.05181: Auditing Pay-Per-Token in Large Language Models

arXiv - AI · 4 min ·
[2510.05092] Learning to Interpret Weight Differences in Language Models
Llms

[2510.05092] Learning to Interpret Weight Differences in Language Models

Abstract page for arXiv paper 2510.05092: Learning to Interpret Weight Differences in Language Models

arXiv - Machine Learning · 4 min ·
[2510.02375] Pretraining with hierarchical memories: separating long-tail and common knowledge
Llms

[2510.02375] Pretraining with hierarchical memories: separating long-tail and common knowledge

Abstract page for arXiv paper 2510.02375: Pretraining with hierarchical memories: separating long-tail and common knowledge

arXiv - Machine Learning · 4 min ·
[2510.02249] Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
Llms

[2510.02249] Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

Abstract page for arXiv paper 2510.02249: Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation

arXiv - Machine Learning · 4 min ·
[2510.01037] CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs
Llms

[2510.01037] CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

Abstract page for arXiv paper 2510.01037: CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs

arXiv - Machine Learning · 4 min ·
[2508.05694] DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Language Models
Llms

[2508.05694] DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Language Models

Abstract page for arXiv paper 2508.05694: DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Langu...

arXiv - AI · 4 min ·
[2507.08704] Knowledge Fusion via Bidirectional Information Aggregation
Llms

[2507.08704] Knowledge Fusion via Bidirectional Information Aggregation

Abstract page for arXiv paper 2507.08704: Knowledge Fusion via Bidirectional Information Aggregation

arXiv - AI · 4 min ·
Previous Page 46 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime