Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

I am a solo developer who has been using all three seriously. Here is what I actually think: GPT-4o — Strengths: Large context window, st...

Reddit - Artificial Intelligence · 1 min · 32 minutes ago

Llms

You're giving feedback on a new version of ChatGPT

So I will be paying attention to these system messages more now- the last time I got one of these not so long back the 'tone' changed to ...

Reddit - Artificial Intelligence · 1 min · 32 minutes ago

Llms

Gemma 4 actually running usable on an Android phone (not llama.cpp)

I wanted a real local assistant on my phone, not a demo. First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

All Content

Llms

[2603.03538] Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

Abstract page for arXiv paper 2603.03538: Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.03535] Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

Abstract page for arXiv paper 2603.03535: Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.03352] Perfect score on IPhO 2025 theory by Gemini agent

Abstract page for arXiv paper 2603.03352: Perfect score on IPhO 2025 theory by Gemini agent

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03527] Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

Abstract page for arXiv paper 2603.03527: Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.03524] Test-Time Meta-Adaptation with Self-Synthesis

Abstract page for arXiv paper 2603.03524: Test-Time Meta-Adaptation with Self-Synthesis

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03517] MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

Abstract page for arXiv paper 2603.03517: MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.03332] Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

Abstract page for arXiv paper 2603.03332: Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.03330] Certainty robustness: Evaluating LLM stability under self-challenging prompts

Abstract page for arXiv paper 2603.03330: Certainty robustness: Evaluating LLM stability under self-challenging prompts

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03329] AutoHarness: improving LLM agents by automatically synthesizing a code harness

Abstract page for arXiv paper 2603.03329: AutoHarness: improving LLM agents by automatically synthesizing a code harness

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.03328] StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

Abstract page for arXiv paper 2603.03328: StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03326] Controllable and explainable personality sliders for LLMs at inference time

Abstract page for arXiv paper 2603.03326: Controllable and explainable personality sliders for LLMs at inference time

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03325] IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

Abstract page for arXiv paper 2603.03325: IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.03324] Controlling Chat Style in Language Models via Single-Direction Editing

Abstract page for arXiv paper 2603.03324: Controlling Chat Style in Language Models via Single-Direction Editing

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03323] Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

Abstract page for arXiv paper 2603.03323: Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03322] Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery

Abstract page for arXiv paper 2603.03322: Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Di...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.03321] DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

Abstract page for arXiv paper 2603.03321: DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03389] Towards Improved Sentence Representations using Token Graphs

Abstract page for arXiv paper 2603.03389: Towards Improved Sentence Representations using Token Graphs

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.03320] From We to Me: Theory Informed Narrative Shift with Abductive Reasoning

Abstract page for arXiv paper 2603.03320: From We to Me: Theory Informed Narrative Shift with Abductive Reasoning

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.03319] Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

Abstract page for arXiv paper 2603.03319: Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.03378] AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

Abstract page for arXiv paper 2603.03378: AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 199 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

You're giving feedback on a new version of ChatGPT

Gemma 4 actually running usable on an Android phone (not llama.cpp)

All Content

[2603.03538] Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs

[2603.03535] Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

[2603.03352] Perfect score on IPhO 2025 theory by Gemini agent

[2603.03527] Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis

[2603.03524] Test-Time Meta-Adaptation with Self-Synthesis

[2603.03517] MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

[2603.03332] Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations

[2603.03330] Certainty robustness: Evaluating LLM stability under self-challenging prompts

[2603.03329] AutoHarness: improving LLM agents by automatically synthesizing a code harness

[2603.03328] StructLens: A Structural Lens for Language Models via Maximum Spanning Trees

[2603.03326] Controllable and explainable personality sliders for LLMs at inference time

[2603.03325] IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference

[2603.03324] Controlling Chat Style in Language Models via Single-Direction Editing

[2603.03323] Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

[2603.03322] Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery

[2603.03321] DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

[2603.03389] Towards Improved Sentence Representations using Token Graphs

[2603.03320] From We to Me: Theory Informed Narrative Shift with Abductive Reasoning

[2603.03319] Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

[2603.03378] AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

Related Topics

Stay updated with AI News