Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

8 free AI courses from Anthropic’s Claude platform with certificates

AI News - General · about 1 hour ago

Llms

Gemini gets major upgrade towards interactive AI learning

AI News - General · 3 min · about 2 hours ago

Llms

Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.

Anthropic launches Claude Managed Agents in public beta — composable APIs for shipping production AI agents 10x faster Handles sandboxing...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

All Content

Llms

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents

Abstract page for arXiv paper 2603.02229: Safety Training Persists Through Helpfulness Optimization in LLM Agents

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Abstract page for arXiv paper 2603.02228: Neural Paging: Learning Context Management Policies for Turing-Complete Agents

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Abstract page for arXiv paper 2603.02240: SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Mem...

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Abstract page for arXiv paper 2603.02239: Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foun...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

Abstract page for arXiv paper 2603.02222: MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Eval...

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Abstract page for arXiv paper 2603.02221: MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabul...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Abstract page for arXiv paper 2603.02219: NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Abstract page for arXiv paper 2603.02218: Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Abstract page for arXiv paper 2603.02216: ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

Abstract page for arXiv paper 2603.02215: RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchi...

arXiv - AI · 4 min · about 1 month ago

Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

Is Claude underperforming? It’s probably not the model—it’s your prompts. Discover the 7 specific strategies, from 'Few-Shot' prompting t...

AI Tools & Products · 9 min · about 1 month ago

Llms

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

Built a dataset scoring every testable claim from Marcus's 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed...

Reddit - Machine Learning · 1 min · about 1 month ago

Llms

IBM Confluent Deal And Claude Code Put AI Focus In View

IBM is acquiring Confluent to enhance its AI and cloud services for enterprise clients, while Anthropic has launched Claude Code, a codin...

AI Tools & Products · 6 min · about 1 month ago

Llms

[P] Free Code Real-time voice-to-voice with your LLM & full reasoning LLM interface (Telegram + 25 tools, vision, docs, memory) on a Mac Studio running Qwen 3.5 35B — 100% local, zero API cost. Full build open-sourced. cloudfare + n8n + Pipecat + MLX unlock insane possibilities on consumer hardwar

I gave Qwen 3.5 35B a voice, a Telegram brain with 25+ tools, and remote access from my phone — all running on a Mac Studio M1 Ultra, zer...

Reddit - Machine Learning · 1 min · about 1 month ago

Llms

ChatGPT users beware: bot has been trained for flattery, not real decisions

AI Tools & Products · 6 min · about 1 month ago

Llms

Meta Enters the AI Shopping Wars to Challenge ChatGPT and Gemini

AI Tools & Products · 4 min · about 1 month ago

Llms

Recon: HHS ending use of Anthropic’s Claude AI; FDA gives breakthrough designation to AI chatbot for patients undergoing surgery

AI Tools & Products · 1 min · about 1 month ago

Llms

Gemini AI Takes Aim at Instacart and Uber

AI Tools & Products · 4 min · about 1 month ago

Llms

I building a real-time reality show where 10 AI agents (Claude) compete, form alliances, betray each other, and get eliminated by viewer votes — running a live test right now

For the past few weeks I've been building The Experiment — a live reality show where 10 AI agents are actually playing a game against eac...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Llms

[D] Predicting total cost of agentic LLM workflows - is there a research gap around output token count and chain depth estimation?

Working on a practical problem that I think has an interesting ML angle. In agentic LLM workflows (tool use, multi-step reasoning, ReAct-...

Reddit - Machine Learning · 1 min · about 1 month ago

Previous Page 153 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

8 free AI courses from Anthropic’s Claude platform with certificates

Gemini gets major upgrade towards interactive AI learning

Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.

All Content

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents

[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents

[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

IBM Confluent Deal And Claude Code Put AI Focus In View

ChatGPT users beware: bot has been trained for flattery, not real decisions

Meta Enters the AI Shopping Wars to Challenge ChatGPT and Gemini

Recon: HHS ending use of Anthropic’s Claude AI; FDA gives breakthrough designation to AI chatbot for patients undergoing surgery

Gemini AI Takes Aim at Instacart and Uber

I building a real-time reality show where 10 AI agents (Claude) compete, form alliances, betray each other, and get eliminated by viewer votes — running a live test right now

[D] Predicting total cost of agentic LLM workflows - is there a research gap around output token count and chain depth estimation?

Related Topics

Stay updated with AI News