Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Llms

8 free AI courses from Anthropic’s Claude platform with certificates

AI News - General ·
Gemini gets major upgrade towards interactive AI learning
Llms

Gemini gets major upgrade towards interactive AI learning

AI News - General · 3 min ·
Llms

Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.

Anthropic launches Claude Managed Agents in public beta — composable APIs for shipping production AI agents 10x faster Handles sandboxing...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents
Llms

[2603.02229] Safety Training Persists Through Helpfulness Optimization in LLM Agents

Abstract page for arXiv paper 2603.02229: Safety Training Persists Through Helpfulness Optimization in LLM Agents

arXiv - Machine Learning · 3 min ·
[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents
Llms

[2603.02228] Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Abstract page for arXiv paper 2603.02228: Neural Paging: Learning Context Management Policies for Turing-Complete Agents

arXiv - AI · 3 min ·
[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning
Llms

[2603.02240] SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Memory Poisoning

Abstract page for arXiv paper 2603.02240: SuperLocalMemory: Privacy-Preserving Multi-Agent Memory with Bayesian Trust Defense Against Mem...

arXiv - AI · 3 min ·
[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents
Llms

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Abstract page for arXiv paper 2603.02239: Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foun...

arXiv - AI · 4 min ·
[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation
Llms

[2603.02222] MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

Abstract page for arXiv paper 2603.02222: MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Eval...

arXiv - AI · 3 min ·
[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction
Llms

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Abstract page for arXiv paper 2603.02221: MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabul...

arXiv - AI · 4 min ·
[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels
Llms

[2603.02219] NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Abstract page for arXiv paper 2603.02219: NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

arXiv - AI · 3 min ·
[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain
Llms

[2603.02218] Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Abstract page for arXiv paper 2603.02218: Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

arXiv - AI · 4 min ·
[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue
Llms

[2603.02216] ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Abstract page for arXiv paper 2603.02216: ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

arXiv - AI · 4 min ·
[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning
Llms

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

Abstract page for arXiv paper 2603.02215: RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchi...

arXiv - AI · 4 min ·
I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week
Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

Is Claude underperforming? It’s probably not the model—it’s your prompts. Discover the 7 specific strategies, from 'Few-Shot' prompting t...

AI Tools & Products · 9 min ·
Llms

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

Built a dataset scoring every testable claim from Marcus's 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed...

Reddit - Machine Learning · 1 min ·
IBM Confluent Deal And Claude Code Put AI Focus In View
Llms

IBM Confluent Deal And Claude Code Put AI Focus In View

IBM is acquiring Confluent to enhance its AI and cloud services for enterprise clients, while Anthropic has launched Claude Code, a codin...

AI Tools & Products · 6 min ·
Llms

[P] *Free Code* Real-time voice-to-voice with your LLM & full reasoning LLM interface (Telegram + 25 tools, vision, docs, memory) on a Mac Studio running Qwen 3.5 35B — 100% local, zero API cost. Full build open-sourced. cloudfare + n8n + Pipecat + MLX unlock insane possibilities on consumer hardwar

I gave Qwen 3.5 35B a voice, a Telegram brain with 25+ tools, and remote access from my phone — all running on a Mac Studio M1 Ultra, zer...

Reddit - Machine Learning · 1 min ·
ChatGPT users beware: bot has been trained for flattery, not real decisions
Llms

ChatGPT users beware: bot has been trained for flattery, not real decisions

AI Tools & Products · 6 min ·
Meta Enters the AI Shopping Wars to Challenge ChatGPT and Gemini
Llms

Meta Enters the AI Shopping Wars to Challenge ChatGPT and Gemini

AI Tools & Products · 4 min ·
Recon: HHS ending use of Anthropic’s Claude AI; FDA gives breakthrough designation to AI chatbot for patients undergoing surgery
Llms

Recon: HHS ending use of Anthropic’s Claude AI; FDA gives breakthrough designation to AI chatbot for patients undergoing surgery

AI Tools & Products · 1 min ·
Gemini AI Takes Aim at Instacart and Uber
Llms

Gemini AI Takes Aim at Instacart and Uber

AI Tools & Products · 4 min ·
Llms

I building a real-time reality show where 10 AI agents (Claude) compete, form alliances, betray each other, and get eliminated by viewer votes — running a live test right now

For the past few weeks I've been building The Experiment — a live reality show where 10 AI agents are actually playing a game against eac...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Predicting total cost of agentic LLM workflows - is there a research gap around output token count and chain depth estimation?

Working on a practical problem that I think has an interesting ML angle. In agentic LLM workflows (tool use, multi-step reasoning, ReAct-...

Reddit - Machine Learning · 1 min ·
Previous Page 153 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime