Top Large Language Models This Week
The most engaging large language models content from this week, curated by AI News.
-
1
LLM rankings are not a ladder: experimental results from a transitive benchmark graph [D]
I built a small website called LLM Win: https://llm-win.com It turns LLM benchmark results into a directed graph: text If model A beats model B on benchmark X, add an edge A -> B. Then it search...
Reddit - Machine Learning · 2 days ago -
2
[2605.07631] Inference Time Causal Probing in LLMs
Abstract page for arXiv paper 2605.07631: Inference Time Causal Probing in LLMs
arXiv - AI · about 9 hours ago -
3
[2605.07692] GASim: A Graph-Accelerated Hybrid Framework for Social Simulation
Abstract page for arXiv paper 2605.07692: GASim: A Graph-Accelerated Hybrid Framework for Social Simulation
arXiv - AI · about 9 hours ago -
4
[2605.07926] AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents
Abstract page for arXiv paper 2605.07926: AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents
arXiv - AI · about 9 hours ago -
5
[2605.08011] Abductive Reasoning with Probabilistic Commonsense
Abstract page for arXiv paper 2605.08011: Abductive Reasoning with Probabilistic Commonsense
arXiv - AI · about 8 hours ago -
6
[2605.06765] VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing
Abstract page for arXiv paper 2605.06765: VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing
arXiv - AI · about 8 hours ago -
7
[2605.04100] Regularized Centered Emphatic Temporal Difference Learning
Abstract page for arXiv paper 2605.04100: Regularized Centered Emphatic Temporal Difference Learning
arXiv - AI · 4 days ago -
8
[2605.06903] MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
Abstract page for arXiv paper 2605.06903: MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
arXiv - AI · about 8 hours ago -
9
[2605.06936] Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing
Abstract page for arXiv paper 2605.06936: Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing
arXiv - AI · about 8 hours ago -
10
[2605.07019] LensVLM: Selective Context Expansion for Compressed Visual Representation of Text
Abstract page for arXiv paper 2605.07019: LensVLM: Selective Context Expansion for Compressed Visual Representation of Text
arXiv - AI · about 8 hours ago -
11
[2605.07068] WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems
Abstract page for arXiv paper 2605.07068: WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems
arXiv - AI · about 8 hours ago -
12
[2605.07186] The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval
Abstract page for arXiv paper 2605.07186: The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval
arXiv - AI · about 8 hours ago -
13
[2605.07234] Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
Abstract page for arXiv paper 2605.07234: Reformulating KV Cache Eviction Problem for Long-Context LLM Inference
arXiv - AI · about 8 hours ago -
14
[2605.02209] Submodular Benchmark Selection
Abstract page for arXiv paper 2605.02209: Submodular Benchmark Selection
arXiv - AI · 6 days ago -
15
[2605.07299] EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams
Abstract page for arXiv paper 2605.07299: EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams
arXiv - AI · about 8 hours ago -
16
[2605.07314] DCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendation
Abstract page for arXiv paper 2605.07314: DCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendation
arXiv - AI · about 8 hours ago -
17
AEO? SEO? Help please?
Curious how many of you are regularly checking ChatGPT, Perplexity, Google AI search about your business? Not talking about page rankings. I'm talking about how models are referring/summarizing you...
Reddit - Artificial Intelligence · 7 days ago -
18
[2605.02765] U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning
Abstract page for arXiv paper 2605.02765: U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning
arXiv - AI · 6 days ago -
19
[2605.07325] CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations
Abstract page for arXiv paper 2605.07325: CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations
arXiv - AI · about 8 hours ago -
20
[2605.07394] BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
Abstract page for arXiv paper 2605.07394: BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning
arXiv - AI · about 8 hours ago
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime