Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

I built a router that automatically sends your AI tasks to the most appropriate model to handle them at low cost - 9,200 tasks in, $21 saved at $0.14 actual cost

The observation that started this: most of what people use AI for every day - summarising, drafting, classifying, extracting etc doesn't ...

Reddit - Artificial Intelligence · 1 min · 32 minutes ago

Llms

Anthropic just analyzed 1 million Claude conversations. 6% of people were asking Claude whether to quit their jobs, who to date, and if they should move countries.

They published the full research yesterday. Here's what shocked me: The breakdown of what people actually ask Claude for guidance on: Hea...

Reddit - Artificial Intelligence · 1 min · 32 minutes ago

Llms

The Download: a new Christian phone network, and debugging LLMs | MIT Technology Review

Elon Musk has admitted that xAI trained Grok on OpenAI models.

MIT Technology Review - AI · 7 min · about 1 hour ago

All Content

Llms

[2507.08207] Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbreaking

Abstract page for arXiv paper 2507.08207: Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbr...

arXiv - AI · 3 min · about 2 months ago

Llms

[2505.19892] OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

Abstract page for arXiv paper 2505.19892: OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

arXiv - AI · 4 min · about 2 months ago

Llms

[2505.13909] Efficient Agent Training for Computer Use

Abstract page for arXiv paper 2505.13909: Efficient Agent Training for Computer Use

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2505.13180] ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models

Abstract page for arXiv paper 2505.13180: ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03269] LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

Abstract page for arXiv paper 2603.03269: LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03180] Type-Aware Retrieval-Augmented Generation with Dependency Closure for Solver-Executable Industrial Optimization Modeling

Abstract page for arXiv paper 2603.03180: Type-Aware Retrieval-Augmented Generation with Dependency Closure for Solver-Executable Industr...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.03192] MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

Abstract page for arXiv paper 2603.03192: MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Pr...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2603.02952] Sparse autoencoders reveal organized biological knowledge but minimal regulatory logic in single-cell foundation models: a comparative atlas of Geneformer and scGPT

Abstract page for arXiv paper 2603.02952: Sparse autoencoders reveal organized biological knowledge but minimal regulatory logic in singl...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03095] Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection

Abstract page for arXiv paper 2603.03095: Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02775] From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

Abstract page for arXiv paper 2603.02775: From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.03047] TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

Abstract page for arXiv paper 2603.03047: TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language M...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02983] Contextualized Privacy Defense for LLM Agents

Abstract page for arXiv paper 2603.02983: Contextualized Privacy Defense for LLM Agents

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02623] Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

Abstract page for arXiv paper 2603.02623: Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.02949] SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment

Abstract page for arXiv paper 2603.02949: SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark ...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02909] Learning to Generate and Extract: A Multi-Agent Collaboration Framework For Zero-shot Document-level Event Arguments Extraction

Abstract page for arXiv paper 2603.02909: Learning to Generate and Extract: A Multi-Agent Collaboration Framework For Zero-shot Document-...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02830] Faster, Cheaper, More Accurate: Specialised Knowledge Tracing Models Outperform LLMs

Abstract page for arXiv paper 2603.02830: Faster, Cheaper, More Accurate: Specialised Knowledge Tracing Models Outperform LLMs

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02789] OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

Abstract page for arXiv paper 2603.02789: OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-S...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02470] Video TokenCom: Textual Intent-Guided Multi-Rate Video Token Communications with UEP-Based Adaptive Source-Channel Coding

Abstract page for arXiv paper 2603.02470: Video TokenCom: Textual Intent-Guided Multi-Rate Video Token Communications with UEP-Based Adap...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.02760] Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration

Abstract page for arXiv paper 2603.02760: Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02748] iGVLM: Dynamic Instruction-Guided Vision Encoding for Question-Aware Multimodal Understanding

Abstract page for arXiv paper 2603.02748: iGVLM: Dynamic Instruction-Guided Vision Encoding for Question-Aware Multimodal Understanding

arXiv - AI · 3 min · about 2 months ago

Previous Page 294 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

I built a router that automatically sends your AI tasks to the most appropriate model to handle them at low cost - 9,200 tasks in, $21 saved at $0.14 actual cost

Anthropic just analyzed 1 million Claude conversations. 6% of people were asking Claude whether to quit their jobs, who to date, and if they should move countries.

The Download: a new Christian phone network, and debugging LLMs | MIT Technology Review

All Content

[2507.08207] Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbreaking

[2505.19892] OptMerge: Unifying Multimodal LLM Capabilities and Modalities via Model Merging

[2505.13909] Efficient Agent Training for Computer Use

[2505.13180] ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models

[2603.03269] LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

[2603.03180] Type-Aware Retrieval-Augmented Generation with Dependency Closure for Solver-Executable Industrial Optimization Modeling

[2603.03192] MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

[2603.02952] Sparse autoencoders reveal organized biological knowledge but minimal regulatory logic in single-cell foundation models: a comparative atlas of Geneformer and scGPT

[2603.03095] Compact Prompting in Instruction-tuned LLMs for Joint Argumentative Component Detection

[2603.02775] From Solver to Tutor: Evaluating the Pedagogical Intelligence of LLMs with KMP-Bench

[2603.03047] TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health

[2603.02983] Contextualized Privacy Defense for LLM Agents

[2603.02623] Uni-Skill: Building Self-Evolving Skill Repository for Generalizable Robotic Manipulation

[2603.02949] SEALing the Gap: A Reference Framework for LLM Inference Carbon Estimation via Multi-Benchmark Driven Embodiment

[2603.02909] Learning to Generate and Extract: A Multi-Agent Collaboration Framework For Zero-shot Document-level Event Arguments Extraction

[2603.02830] Faster, Cheaper, More Accurate: Specialised Knowledge Tracing Models Outperform LLMs

[2603.02789] OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

[2603.02470] Video TokenCom: Textual Intent-Guided Multi-Rate Video Token Communications with UEP-Based Adaptive Source-Channel Coding

[2603.02760] Efficient Self-Evaluation for Diffusion Language Models via Sequence Regeneration

[2603.02748] iGVLM: Dynamic Instruction-Guided Vision Encoding for Question-Aware Multimodal Understanding

Related Topics

Stay updated with AI News