Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

it is impossible to stop AI chatbots from using quotes (any instance of the character ")

no matter how i phrase it in the instructions, how many times i repeat the rule not to use quotes, and which LLM i use, i have failed to ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Converting XQuery to SQL with Local LLMs: Do I Need Fine-Tuning or a Better Approach? [P]

I am trying to convert XQuery statements into SQL queries within an enterprise context, with the constraint that the solution must rely...

Reddit - Machine Learning · 1 min · about 5 hours ago

Llms

AI: Fragility of today's Claude Cowork type AI Agent Apps. RTZ 1061

...realities like memory management, highlight a longer road to resilient AI Agents and AGI

AI Tools & Products · 11 min · about 7 hours ago

All Content

Llms

[2603.02236] CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

Abstract page for arXiv paper 2603.02236: CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02540] A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

Abstract page for arXiv paper 2603.02540: A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02528] LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

Abstract page for arXiv paper 2603.02528: LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02504] NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

Abstract page for arXiv paper 2603.02504: NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail E...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02232] Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

Abstract page for arXiv paper 2603.02232: Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02473] Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

Abstract page for arXiv paper 2603.02473: Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02435] VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

Abstract page for arXiv paper 2603.02435: VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents

Abstract page for arXiv paper 2603.02229: Safety Training Persists Through Helpfulness Optimization in LLM Agents

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Abstract page for arXiv paper 2603.02228: Neural Paging: Learning Context Management Policies for Turing-Complete Agents

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Abstract page for arXiv paper 2603.02240: SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Mem...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Abstract page for arXiv paper 2603.02239: Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foun...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

Abstract page for arXiv paper 2603.02222: MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Eval...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Abstract page for arXiv paper 2603.02221: MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabul...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Abstract page for arXiv paper 2603.02219: NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Abstract page for arXiv paper 2603.02218: Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Abstract page for arXiv paper 2603.02216: ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

Abstract page for arXiv paper 2603.02215: RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchi...

arXiv - AI · 4 min · about 2 months ago

Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

Is Claude underperforming? It’s probably not the model—it’s your prompts. Discover the 7 specific strategies, from 'Few-Shot' prompting t...

AI Tools & Products · 9 min · about 2 months ago

Llms

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

Built a dataset scoring every testable claim from Marcus's 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed...

Reddit - Machine Learning · 1 min · about 2 months ago

Llms

IBM Confluent Deal And Claude Code Put AI Focus In View

IBM is acquiring Confluent to enhance its AI and cloud services for enterprise clients, while Anthropic has launched Claude Code, a codin...

AI Tools & Products · 6 min · about 2 months ago

Previous Page 209 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

it is impossible to stop AI chatbots from using quotes (any instance of the character ")

Converting XQuery to SQL with Local LLMs: Do I Need Fine-Tuning or a Better Approach? [P]

AI: Fragility of today's Claude Cowork type AI Agent Apps. RTZ 1061

All Content

[2603.02236] CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

[2603.02540] A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

[2603.02528] LLM-MLFFN: Multi-Level Autonomous Driving Behavior Feature Fusion via Large Language Model

[2603.02504] NeuroProlog: Multi-Task Fine-Tuning for Neurosymbolic Mathematical Reasoning via the Cocktail Effect

[2603.02232] Beyond Binary Preferences: A Principled Framework for Reward Modeling with Ordinal Feedback

[2603.02473] Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

[2603.02435] VL-KGE: Vision-Language Models Meet Knowledge Graph Embeddings

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents

[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents

[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

IBM Confluent Deal And Claude Code Put AI Focus In View

Related Topics

Stay updated with AI News