Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Llms

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

For people just starting out in GPU kernel engineering or LLM inference (FlashAttention / FlashInfer / SGLang / vLLM style work), most jo...

Reddit - Machine Learning · 1 min ·
[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Llms

[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Abstract page for arXiv paper 2511.10262: MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duple...

arXiv - AI · 4 min ·
[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction
Llms

[2602.07303] KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

Abstract page for arXiv paper 2602.07303: KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction

arXiv - AI · 4 min ·

All Content

[2603.02542] AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation
Llms

[2603.02542] AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

Abstract page for arXiv paper 2603.02542: AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical...

arXiv - AI · 4 min ·
[2603.02236] CUDABench: Benchmarking LLMs for Text-to-CUDA Generation
Llms

[2603.02236] CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Abstract page for arXiv paper 2603.02236: CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

arXiv - AI · 3 min ·
[2603.02540] A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities
Llms

[2603.02540] A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

Abstract page for arXiv paper 2603.02540: A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

arXiv - AI · 3 min ·
[2603.02528] LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model
Llms

[2603.02528] LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

Abstract page for arXiv paper 2603.02528: LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

arXiv - AI · 4 min ·
[2603.02504] NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect
Llms

[2603.02504] NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

Abstract page for arXiv paper 2603.02504: NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail E...

arXiv - AI · 4 min ·
[2603.02232] Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback
Llms

[2603.02232] Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

Abstract page for arXiv paper 2603.02232: Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

arXiv - AI · 4 min ·
[2603.02473] Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory
Llms

[2603.02473] Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

Abstract page for arXiv paper 2603.02473: Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

arXiv - AI · 3 min ·
[2603.02435] VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings
Llms

[2603.02435] VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Abstract page for arXiv paper 2603.02435: VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

arXiv - Machine Learning · 4 min ·
[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents
Llms

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents

Abstract page for arXiv paper 2603.02229: Safety Training Persists Through Helpfulness Optimization in LLM Agents

arXiv - Machine Learning · 3 min ·
[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents
Llms

[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Abstract page for arXiv paper 2603.02228: Neural Paging: Learning Context Management Policies for Turing-Complete Agents

arXiv - AI · 3 min ·
[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning
Llms

[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Abstract page for arXiv paper 2603.02240: SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Mem...

arXiv - AI · 3 min ·
[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents
Llms

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Abstract page for arXiv paper 2603.02239: Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foun...

arXiv - AI · 4 min ·
[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation
Llms

[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

Abstract page for arXiv paper 2603.02222: MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Eval...

arXiv - AI · 3 min ·
[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction
Llms

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Abstract page for arXiv paper 2603.02221: MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabul...

arXiv - AI · 4 min ·
[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels
Llms

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Abstract page for arXiv paper 2603.02219: NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

arXiv - AI · 3 min ·
[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain
Llms

[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Abstract page for arXiv paper 2603.02218: Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

arXiv - AI · 4 min ·
[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue
Llms

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Abstract page for arXiv paper 2603.02216: ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

arXiv - AI · 4 min ·
[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning
Llms

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

Abstract page for arXiv paper 2603.02215: RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchi...

arXiv - AI · 4 min ·
I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week
Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

Is Claude underperforming? It’s probably not the model—it’s your prompts. Discover the 7 specific strategies, from 'Few-Shot' prompting t...

AI Tools & Products · 9 min ·
Llms

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

Built a dataset scoring every testable claim from Marcus's 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed...

Reddit - Machine Learning · 1 min ·
Previous Page 217 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime