Top Large Language Models This Week

The most engaging large language models content from this week, curated by AI News.

  1. 1

    LLM rankings are not a ladder: experimental results from a transitive benchmark graph [D]

    I built a small website called LLM Win: https://llm-win.com It turns LLM benchmark results into a directed graph: text If model A beats model B on benchmark X, add an edge A -> B. Then it search...

    Reddit - Machine Learning · 2 days ago
  2. 2

    [2605.07631] Inference Time Causal Probing in LLMs

    Abstract page for arXiv paper 2605.07631: Inference Time Causal Probing in LLMs

    arXiv - AI · about 9 hours ago
  3. 3

    [2605.07692] GASim: A Graph-Accelerated Hybrid Framework for Social Simulation

    Abstract page for arXiv paper 2605.07692: GASim: A Graph-Accelerated Hybrid Framework for Social Simulation

    arXiv - AI · about 9 hours ago
  4. 4

    [2605.07926] AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

    Abstract page for arXiv paper 2605.07926: AgentEscapeBench: Evaluating Out-of-Domain Tool-Grounded Reasoning in LLM Agents

    arXiv - AI · about 9 hours ago
  5. 5

    [2605.08011] Abductive Reasoning with Probabilistic Commonsense

    Abstract page for arXiv paper 2605.08011: Abductive Reasoning with Probabilistic Commonsense

    arXiv - AI · about 8 hours ago
  6. 6

    [2605.06765] VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

    Abstract page for arXiv paper 2605.06765: VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

    arXiv - AI · about 8 hours ago
  7. 7

    [2605.04100] Regularized Centered Emphatic Temporal Difference Learning

    Abstract page for arXiv paper 2605.04100: Regularized Centered Emphatic Temporal Difference Learning

    arXiv - AI · 4 days ago
  8. 8

    [2605.06903] MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

    Abstract page for arXiv paper 2605.06903: MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

    arXiv - AI · about 8 hours ago
  9. 9

    [2605.06936] Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

    Abstract page for arXiv paper 2605.06936: Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

    arXiv - AI · about 8 hours ago
  10. 10

    [2605.07019] LensVLM: Selective Context Expansion for Compressed Visual Representation of Text

    Abstract page for arXiv paper 2605.07019: LensVLM: Selective Context Expansion for Compressed Visual Representation of Text

    arXiv - AI · about 8 hours ago
  11. 11

    [2605.07068] WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems

    Abstract page for arXiv paper 2605.07068: WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems

    arXiv - AI · about 8 hours ago
  12. 12

    [2605.07186] The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

    Abstract page for arXiv paper 2605.07186: The Text Uncanny Valley: Non-Monotonic Performance Degradation in LLM Information Retrieval

    arXiv - AI · about 8 hours ago
  13. 13

    [2605.07234] Reformulating KV Cache Eviction Problem for Long-Context LLM Inference

    Abstract page for arXiv paper 2605.07234: Reformulating KV Cache Eviction Problem for Long-Context LLM Inference

    arXiv - AI · about 8 hours ago
  14. 14

    [2605.02209] Submodular Benchmark Selection

    Abstract page for arXiv paper 2605.02209: Submodular Benchmark Selection

    arXiv - AI · 6 days ago
  15. 15

    [2605.07299] EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams

    Abstract page for arXiv paper 2605.07299: EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams

    arXiv - AI · about 8 hours ago
  16. 16

    [2605.07314] DCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendation

    Abstract page for arXiv paper 2605.07314: DCGL: Dual-Channel Graph Learning with Large Language Models for Knowledge-Aware Recommendation

    arXiv - AI · about 8 hours ago
  17. 17

    AEO? SEO? Help please?

    Curious how many of you are regularly checking ChatGPT, Perplexity, Google AI search about your business? Not talking about page rankings. I'm talking about how models are referring/summarizing you...

    Reddit - Artificial Intelligence · 7 days ago
  18. 18

    [2605.02765] U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning

    Abstract page for arXiv paper 2605.02765: U-Define: Designing User Workflows for Hard and Soft Constraints in LLM-Based Planning

    arXiv - AI · 6 days ago
  19. 19

    [2605.07325] CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations

    Abstract page for arXiv paper 2605.07325: CSR: Infinite-Horizon Real-Time Policies with Massive Cached State Representations

    arXiv - AI · about 8 hours ago
  20. 20

    [2605.07394] BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

    Abstract page for arXiv paper 2605.07394: BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

    arXiv - AI · about 8 hours ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime