Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min · about 3 hours ago

Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

All Content

Llms

[2603.25226] WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing

Abstract page for arXiv paper 2603.25226: WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.25196] A Decade-Scale Benchmark Evaluating LLMs' Clinical Practice Guidelines Detection and Adherence in Multi-turn Conversations

Abstract page for arXiv paper 2603.25196: A Decade-Scale Benchmark Evaluating LLMs' Clinical Practice Guidelines Detection and Adherence ...

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.25187] Probing the Lack of Stable Internal Beliefs in LLMs

Abstract page for arXiv paper 2603.25187: Probing the Lack of Stable Internal Beliefs in LLMs

arXiv - AI · 3 min · about 10 hours ago

Llms

[2603.25250] Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language Models

Abstract page for arXiv paper 2603.25250: Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language ...

arXiv - Machine Learning · 4 min · about 10 hours ago

Llms

[2603.25164] PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Abstract page for arXiv paper 2603.25164: PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented ...

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.25145] Learning to Rank Caption Chains for Video-Text Alignment

Abstract page for arXiv paper 2603.25145: Learning to Rank Caption Chains for Video-Text Alignment

arXiv - Machine Learning · 3 min · about 10 hours ago

Llms

[2603.25155] Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models

Abstract page for arXiv paper 2603.25155: Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models

arXiv - AI · 3 min · about 10 hours ago

Llms

[2603.25146] Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence

Abstract page for arXiv paper 2603.25146: Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.25015] Imperative Interference: Social Register Shapes Instruction Topology in Large Language Models

Abstract page for arXiv paper 2603.25015: Imperative Interference: Social Register Shapes Instruction Topology in Large Language Models

arXiv - AI · 3 min · about 10 hours ago

Llms

[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.24946] MobileDev-Bench: A Comprehensive Benchmark for Evaluating Language Models on Mobile Application Development

Abstract page for arXiv paper 2603.24946: MobileDev-Bench: A Comprehensive Benchmark for Evaluating Language Models on Mobile Application...

arXiv - Machine Learning · 4 min · about 10 hours ago

Llms

[2603.24917] Estimating near-verbatim extraction risk in language models with decoding-constrained beam search

Abstract page for arXiv paper 2603.24917: Estimating near-verbatim extraction risk in language models with decoding-constrained beam search

arXiv - Machine Learning · 4 min · about 10 hours ago

Llms

[2603.25099] Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

Abstract page for arXiv paper 2603.25099: Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Opti...

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.25063] TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization

Abstract page for arXiv paper 2603.25063: TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visual...

arXiv - Machine Learning · 4 min · about 10 hours ago

Llms

[2603.25056] The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Abstract page for arXiv paper 2603.25056: The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Create...

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.25052] Closing the Confidence-Faithfulness Gap in Large Language Models

Abstract page for arXiv paper 2603.25052: Closing the Confidence-Faithfulness Gap in Large Language Models

arXiv - AI · 3 min · about 10 hours ago

Llms

[2603.24989] Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

Abstract page for arXiv paper 2603.24989: Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.24986] Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

Abstract page for arXiv paper 2603.24986: Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

arXiv - AI · 3 min · about 10 hours ago

Llms

[2603.24940] Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

Abstract page for arXiv paper 2603.24940: Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-i...

arXiv - AI · 4 min · about 10 hours ago

Llms

[2603.24617] Multi-LLM Query Optimization

Abstract page for arXiv paper 2603.24617: Multi-LLM Query Optimization

arXiv - Machine Learning · 3 min · about 10 hours ago

Previous Page 4 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

All Content

[2603.25226] WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing

[2603.25196] A Decade-Scale Benchmark Evaluating LLMs' Clinical Practice Guidelines Detection and Adherence in Multi-turn Conversations

[2603.25187] Probing the Lack of Stable Internal Beliefs in LLMs

[2603.25250] Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language Models

[2603.25164] PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

[2603.25145] Learning to Rank Caption Chains for Video-Text Alignment

[2603.25155] Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models

[2603.25146] Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence

[2603.25015] Imperative Interference: Social Register Shapes Instruction Topology in Large Language Models

[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

[2603.24946] MobileDev-Bench: A Comprehensive Benchmark for Evaluating Language Models on Mobile Application Development

[2603.24917] Estimating near-verbatim extraction risk in language models with decoding-constrained beam search

[2603.25099] Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

[2603.25063] TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization

[2603.25056] The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

[2603.25052] Closing the Confidence-Faithfulness Gap in Large Language Models

[2603.24989] Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

[2603.24986] Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators

[2603.24940] Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

[2603.24617] Multi-LLM Query Optimization

Related Topics

Stay updated with AI News