Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

OpenAI starts laying foundations for ChatGPT ads in EU

submitted by /u/ThereWas [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

How do you use AI Agents for EDA/Data Analysis and getting it ready for ML model training? [D]

Like in manual workflow I would study the given data by using various functions like pd.info() and all column wise, remove null, outliers...

Reddit - Machine Learning · 1 min · about 6 hours ago

Llms

I built a router that automatically sends your AI tasks to the most appropriate model to handle them at low cost - 9,200 tasks in, $21 saved at $0.14 actual cost

The observation that started this: most of what people use AI for every day - summarising, drafting, classifying, extracting etc doesn't ...

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

All Content

Llms

[2603.02542] AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

Abstract page for arXiv paper 2603.02542: AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02236] CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Abstract page for arXiv paper 2603.02236: CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02540] A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

Abstract page for arXiv paper 2603.02540: A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02528] LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

Abstract page for arXiv paper 2603.02528: LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02504] NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

Abstract page for arXiv paper 2603.02504: NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail E...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02232] Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

Abstract page for arXiv paper 2603.02232: Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02473] Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

Abstract page for arXiv paper 2603.02473: Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02435] VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Abstract page for arXiv paper 2603.02435: VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents

Abstract page for arXiv paper 2603.02229: Safety Training Persists Through Helpfulness Optimization in LLM Agents

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Abstract page for arXiv paper 2603.02228: Neural Paging: Learning Context Management Policies for Turing-Complete Agents

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Abstract page for arXiv paper 2603.02240: SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Mem...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Abstract page for arXiv paper 2603.02239: Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foun...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

Abstract page for arXiv paper 2603.02222: MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Eval...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Abstract page for arXiv paper 2603.02221: MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabul...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Abstract page for arXiv paper 2603.02219: NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Abstract page for arXiv paper 2603.02218: Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Abstract page for arXiv paper 2603.02216: ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

Abstract page for arXiv paper 2603.02215: RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchi...

arXiv - AI · 4 min · about 2 months ago

Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

Is Claude underperforming? It’s probably not the model—it’s your prompts. Discover the 7 specific strategies, from 'Few-Shot' prompting t...

AI Tools & Products · 9 min · about 2 months ago

Llms

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

Built a dataset scoring every testable claim from Marcus's 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed...

Reddit - Machine Learning · 1 min · about 2 months ago

Previous Page 298 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

OpenAI starts laying foundations for ChatGPT ads in EU

How do you use AI Agents for EDA/Data Analysis and getting it ready for ML model training? [D]

I built a router that automatically sends your AI tasks to the most appropriate model to handle them at low cost - 9,200 tasks in, $21 saved at $0.14 actual cost

All Content

[2603.02542] AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

[2603.02236] CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

[2603.02540] A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

[2603.02528] LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

[2603.02504] NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

[2603.02232] Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

[2603.02473] Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

[2603.02435] VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents

[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents

[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

Related Topics

Stay updated with AI News