Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Been building a multi-agent framework in public for 7 weeks, its been a Journey.

I've been building this repo public since day one, roughly 7 weeks now with Claude Code. Here's where it's at. Feels good to be so close....

Reddit - Artificial Intelligence · 1 min · 33 minutes ago

Llms

We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R]

TLDR; We were overpaying for OCR, so we compared flagship models with cheaper and older models. New mini-bench + leaderboard. Free tool t...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

I used Jeff Bezos' Day 1 rule with ChatGPT to beat procrastination

I used Jeff Bezos’ Day 1 rule with ChatGPT to stop procrastinating. These simple prompts helped me start faster, overthink less and get m...

AI Tools & Products · 9 min · about 3 hours ago

All Content

Llms

[2603.03555] Molt Dynamics: Emergent Social Phenomena in Autonomous AI Agent Populations

Abstract page for arXiv paper 2603.03555: Molt Dynamics: Emergent Social Phenomena in Autonomous AI Agent Populations

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03543] Tucano 2 Cool: Better Open Source LLMs for Portuguese

Abstract page for arXiv paper 2603.03543: Tucano 2 Cool: Better Open Source LLMs for Portuguese

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03541] RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering

Abstract page for arXiv paper 2603.03541: RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03536] SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

Abstract page for arXiv paper 2603.03536: SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03946] Lang2Str: Two-Stage Crystal Structure Generation with LLMs and Continuous Flow Models

Abstract page for arXiv paper 2603.03946: Lang2Str: Two-Stage Crystal Structure Generation with LLMs and Continuous Flow Models

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03512] Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

Abstract page for arXiv paper 2603.03512: Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03508] Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi

Abstract page for arXiv paper 2603.03508: Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03805] Relational In-Context Learning via Synthetic Pre-training with Structural Prior

Abstract page for arXiv paper 2603.03805: Relational In-Context Learning via Synthetic Pre-training with Structural Prior

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03417] Parallel Test-Time Scaling with Multi-Sequence Verifiers

Abstract page for arXiv paper 2603.03417: Parallel Test-Time Scaling with Multi-Sequence Verifiers

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03415] Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs

Abstract page for arXiv paper 2603.03415: Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03756] MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Abstract page for arXiv paper 2603.03756: MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Ba...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2603.03410] On Google's SynthID-Text LLM Watermarking System: Theoretical Analysis and Empirical Validation

Abstract page for arXiv paper 2603.03410: On Google's SynthID-Text LLM Watermarking System: Theoretical Analysis and Empirical Validation

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03379] MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

Abstract page for arXiv paper 2603.03379: MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03612] Why Are Linear RNNs More Parallelizable?

Abstract page for arXiv paper 2603.03612: Why Are Linear RNNs More Parallelizable?

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03371] Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

Abstract page for arXiv paper 2603.03371: Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03597] NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

Abstract page for arXiv paper 2603.03597: NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2603.03538] Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

Abstract page for arXiv paper 2603.03538: Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03535] Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

Abstract page for arXiv paper 2603.03535: Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2603.03352] Perfect score on IPhO 2025 theory by Gemini agent

Abstract page for arXiv paper 2603.03352: Perfect score on IPhO 2025 theory by Gemini agent

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03527] Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

Abstract page for arXiv paper 2603.03527: Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

arXiv - Machine Learning · 4 min · about 2 months ago

Previous Page 231 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Been building a multi-agent framework in public for 7 weeks, its been a Journey.

We benchmarked 18 LLMs on OCR (7k+ calls) — cheaper/old models oftentimes win. Full dataset + framework open-sourced. [R]

I used Jeff Bezos' Day 1 rule with ChatGPT to beat procrastination

All Content

[2603.03555] Molt Dynamics: Emergent Social Phenomena in Autonomous AI Agent Populations

[2603.03543] Tucano 2 Cool: Better Open Source LLMs for Portuguese

[2603.03541] RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering

[2603.03536] SafeCRS: Personalized Safety Alignment for LLM-Based Conversational Recommender Systems

[2603.03946] Lang2Str: Two-Stage Crystal Structure Generation with LLMs and Continuous Flow Models

[2603.03512] Baseline Performance of AI Tools in Classifying Cognitive Demand of Mathematical Tasks

[2603.03508] Raising Bars, Not Parameters: LilMoo Compact Language Model for Hindi

[2603.03805] Relational In-Context Learning via Synthetic Pre-training with Structural Prior

[2603.03417] Parallel Test-Time Scaling with Multi-Sequence Verifiers

[2603.03415] Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs

[2603.03756] MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

[2603.03410] On Google's SynthID-Text LLM Watermarking System: Theoretical Analysis and Empirical Validation

[2603.03379] MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

[2603.03612] Why Are Linear RNNs More Parallelizable?

[2603.03371] Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

[2603.03597] NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

[2603.03538] Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

[2603.03535] Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

[2603.03352] Perfect score on IPhO 2025 theory by Gemini agent

[2603.03527] Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

Related Topics

Stay updated with AI News