Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Comparing SVG generation for top models

These are the top open and closed model: Opus 4.7, GPT-5.5 Pro, DeepSeek V4, GLM-5.1 and Gemini 3.1 Pro. They both show similar performan...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Why v2 of my trading system strips the LLM of its execution rights (Blueprint & Architecture)

Thanks to the incredible feedback on my last post, I’m officially moving away from the "distributed veto" system (where 8 LLM agents argu...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

The more young people use AI, the more they hate it | The Verge

Caught between fears of job loss and social stigma, Gen Z’s opinions of AI are hitting new lows.

The Verge - AI · 13 min · about 2 hours ago

All Content

Llms

[2603.03379] MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

Abstract page for arXiv paper 2603.03379: MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03612] Why Are Linear RNNs More Parallelizable?

Abstract page for arXiv paper 2603.03612: Why Are Linear RNNs More Parallelizable?

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03371] Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

Abstract page for arXiv paper 2603.03371: Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03597] NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

Abstract page for arXiv paper 2603.03597: NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2603.03538] Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

Abstract page for arXiv paper 2603.03538: Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03535] Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

Abstract page for arXiv paper 2603.03535: Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2603.03352] Perfect score on IPhO 2025 theory by Gemini agent

Abstract page for arXiv paper 2603.03352: Perfect score on IPhO 2025 theory by Gemini agent

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03527] Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

Abstract page for arXiv paper 2603.03527: Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03524] Test-Time Meta-Adaptation with Self-Synthesis

Abstract page for arXiv paper 2603.03524: Test-Time Meta-Adaptation with Self-Synthesis

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03517] MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

Abstract page for arXiv paper 2603.03517: MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03332] Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Abstract page for arXiv paper 2603.03332: Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03330] Certainty robustness: Evaluating LLM stability under self-challenging prompts

Abstract page for arXiv paper 2603.03330: Certainty robustness: Evaluating LLM stability under self-challenging prompts

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03329] AutoHarness: improving LLM agents by automatically synthesizing a code harness

Abstract page for arXiv paper 2603.03329: AutoHarness: improving LLM agents by automatically synthesizing a code harness

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03328] StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

Abstract page for arXiv paper 2603.03328: StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03326] Controllable and explainable personality sliders for LLMs at inference time

Abstract page for arXiv paper 2603.03326: Controllable and explainable personality sliders for LLMs at inference time

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03325] IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

Abstract page for arXiv paper 2603.03325: IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03324] Controlling Chat Style in Language Models via Single-Direction Editing

Abstract page for arXiv paper 2603.03324: Controlling Chat Style in Language Models via Single-Direction Editing

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03323] Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

Abstract page for arXiv paper 2603.03323: Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.03322] Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery

Abstract page for arXiv paper 2603.03322: Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Di...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03321] DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

Abstract page for arXiv paper 2603.03321: DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

arXiv - AI · 3 min · about 2 months ago

Previous Page 279 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Comparing SVG generation for top models

Why v2 of my trading system strips the LLM of its execution rights (Blueprint & Architecture)

The more young people use AI, the more they hate it | The Verge

All Content

[2603.03379] MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

[2603.03612] Why Are Linear RNNs More Parallelizable?

[2603.03371] Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs

[2603.03597] NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training

[2603.03538] Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

[2603.03535] Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

[2603.03352] Perfect score on IPhO 2025 theory by Gemini agent

[2603.03527] Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

[2603.03524] Test-Time Meta-Adaptation with Self-Synthesis

[2603.03517] MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

[2603.03332] Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

[2603.03330] Certainty robustness: Evaluating LLM stability under self-challenging prompts

[2603.03329] AutoHarness: improving LLM agents by automatically synthesizing a code harness

[2603.03328] StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

[2603.03326] Controllable and explainable personality sliders for LLMs at inference time

[2603.03325] IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

[2603.03324] Controlling Chat Style in Language Models via Single-Direction Editing

[2603.03323] Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

[2603.03322] Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery

[2603.03321] DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

Related Topics

Stay updated with AI News