Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Gemini caught a $280M crypto exploit before it hit the news, then retracted it as a hallucination because I couldn't verify it - because the news hadn't dropped yet

So this happened mere hours ago and I feel like I genuinely stumbled onto something worth documenting for people interested in AI behavio...

Reddit - Artificial Intelligence · 1 min · about 8 hours ago

Llms

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

I am a solo developer who has been using all three seriously. Here is what I actually think: GPT-4o — Strengths: Large context window, st...

Reddit - Artificial Intelligence · 1 min · about 10 hours ago

Llms

You're giving feedback on a new version of ChatGPT

So I will be paying attention to these system messages more now- the last time I got one of these not so long back the 'tone' changed to ...

Reddit - Artificial Intelligence · 1 min · about 10 hours ago

All Content

Llms

[2503.03170] AttackSeqBench: Benchmarking the Capabilities of LLMs for Attack Sequences Understanding

Abstract page for arXiv paper 2503.03170: AttackSeqBench: Benchmarking the Capabilities of LLMs for Attack Sequences Understanding

arXiv - AI · 4 min · about 2 months ago

Llms

[2502.08666] Hallucination, Monofacts, and Miscalibration: An Empirical Investigation

Abstract page for arXiv paper 2502.08666: Hallucination, Monofacts, and Miscalibration: An Empirical Investigation

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.01077] The Lattice Geometry of Neural Network Quantization -- A Short Equivalence Proof of GPTQ and Babai's Algorithm

Abstract page for arXiv paper 2508.01077: The Lattice Geometry of Neural Network Quantization -- A Short Equivalence Proof of GPTQ and Ba...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2410.04949] Leverage Knowledge Graph and Large Language Model for Law Article Recommendation: A Case Study of Chinese Criminal Law

Abstract page for arXiv paper 2410.04949: Leverage Knowledge Graph and Large Language Model for Law Article Recommendation: A Case Study ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2407.16893] The Price of Prompting: Profiling Energy Use in Large Language Models Inference

Abstract page for arXiv paper 2407.16893: The Price of Prompting: Profiling Energy Use in Large Language Models Inference

arXiv - AI · 4 min · about 2 months ago

Llms

[2506.07275] Tailored Behavior-Change Messaging for Physical Activity: Integrating Contextual Bandits and Large Language Models

Abstract page for arXiv paper 2506.07275: Tailored Behavior-Change Messaging for Physical Activity: Integrating Contextual Bandits and La...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2403.07183] Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Abstract page for arXiv paper 2403.07183: Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2506.07218] Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward

Abstract page for arXiv paper 2506.07218: Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2506.03230] DiaBlo: Diagonal Blocks Are Sufficient For Finetuning

Abstract page for arXiv paper 2506.03230: DiaBlo: Diagonal Blocks Are Sufficient For Finetuning

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2512.18857] CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

Abstract page for arXiv paper 2512.18857: CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematica...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2511.09710] Echoing: Identity Failures when LLM Agents Talk to Each Other

Abstract page for arXiv paper 2511.09710: Echoing: Identity Failures when LLM Agents Talk to Each Other

arXiv - AI · 4 min · about 2 months ago

Llms

[2503.22165] Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

Abstract page for arXiv paper 2503.22165: Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2503.14572] Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation

Abstract page for arXiv paper 2503.14572: Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2510.12264] Reducing Belief Deviation in Reinforcement Learning for Active Reasoning

Abstract page for arXiv paper 2510.12264: Reducing Belief Deviation in Reinforcement Learning for Active Reasoning

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.06410] Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?

Abstract page for arXiv paper 2510.06410: Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.05684] D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

Abstract page for arXiv paper 2510.05684: D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

arXiv - AI · 4 min · about 2 months ago

Llms

[2509.23725] MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models

Abstract page for arXiv paper 2509.23725: MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language M...

arXiv - AI · 4 min · about 2 months ago

Llms

[2509.22613] Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective

Abstract page for arXiv paper 2509.22613: Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Pers...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2507.08207] Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbreaking

Abstract page for arXiv paper 2507.08207: Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbr...

arXiv - AI · 3 min · about 2 months ago

Llms

[2505.19892] OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

Abstract page for arXiv paper 2505.19892: OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

arXiv - AI · 4 min · about 2 months ago

Previous Page 204 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Gemini caught a $280M crypto exploit before it hit the news, then retracted it as a hallucination because I couldn't verify it - because the news hadn't dropped yet

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

You're giving feedback on a new version of ChatGPT

All Content

[2503.03170] AttackSeqBench: Benchmarking the Capabilities of LLMs for Attack Sequences Understanding

[2502.08666] Hallucination, Monofacts, and Miscalibration: An Empirical Investigation

[2508.01077] The Lattice Geometry of Neural Network Quantization -- A Short Equivalence Proof of GPTQ and Babai's Algorithm

[2410.04949] Leverage Knowledge Graph and Large Language Model for Law Article Recommendation: A Case Study of Chinese Criminal Law

[2407.16893] The Price of Prompting: Profiling Energy Use in Large Language Models Inference

[2506.07275] Tailored Behavior-Change Messaging for Physical Activity: Integrating Contextual Bandits and Large Language Models

[2403.07183] Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

[2506.07218] Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward

[2506.03230] DiaBlo: Diagonal Blocks Are Sufficient For Finetuning

[2512.18857] CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap in Mathematical Reasoning

[2511.09710] Echoing: Identity Failures when LLM Agents Talk to Each Other

[2503.22165] Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

[2503.14572] Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation

[2510.12264] Reducing Belief Deviation in Reinforcement Learning for Active Reasoning

[2510.06410] Off-Trajectory Reasoning: Can LLMs Collaborate on Reasoning Trajectory?

[2510.05684] D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI

[2509.23725] MedLA: A Logic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models

[2509.22613] Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective

[2507.08207] Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbreaking

[2505.19892] OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

Related Topics

Stay updated with AI News