Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min · about 5 hours ago

Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

All Content

Llms

Ridiculous. Anthropic is behaving exactly like OpenAI.

Claude was fantastic when I paid monthly, right up until I chose to commit to a yearly Pro subscription. Now, a mere thirty-four text pro...

Reddit - Artificial Intelligence · 1 min · 3 days ago

Llms

[R] Interested in recent research into recall vs recognition in LLMs

I've casually seen LLMs correctly verify exact quotations that they either couldn't or wouldn't quote directly for me. I'm aware that the...

Reddit - Machine Learning · 1 min · 3 days ago

Llms

Pretrained ADAM v2 weights [D]

Hi everyone, I'm a master's student working on anatomy-aware unsupervised anomaly detection in chest X-rays. My thesis uses ADAM v2 (Auto...

Reddit - Machine Learning · 1 min · 3 days ago

Llms

Apple will reportedly allow other AI chatbots to plug into Siri | The Verge

Apple is planning to allow other third-party AI chatbots to work with Siri with a new “Extensions” feature in iOS 27, according to a repo...

The Verge - AI · 4 min · 3 days ago

Llms

[D] Why evaluating only final outputs is misleading for local LLM agents

Been running local agents with Ollama + LangChain lately and noticed something kind of uncomfortable — you can get a completely correct f...

Reddit - Machine Learning · 1 min · 3 days ago

Llms

[D] - 1M tokens/second serving Qwen 3.5 27B on B200 GPUs, benchmark results and findings

Wrote up the process of pushing Qwen 3.5 27B (dense, FP8) to 1.1M total tok/s on 96 B200 GPUs with vLLM v0.18.0. DP=8 nearly 4x'd through...

Reddit - Machine Learning · 1 min · 3 days ago

Llms

Reducing AI agent token consumption by 90% by fixing the retrieval layer

Quick insight from building retrieval infrastructure for AI agents: Most agents stuff 50,000 tokens of context into every prompt. They re...

Reddit - Artificial Intelligence · 1 min · 3 days ago

Llms

we built an open source library of AI agent prompts and configs, just hit 100 stars

yo so i been grinding on AI agents for a while now and honestly the biggest pain is everyone reinventing the wheel with system prompts an...

Reddit - Artificial Intelligence · 1 min · 3 days ago

Llms

we made a community repo of AI agent setups and configs, just hit 100 stars with 90 PRs

quick share for folks who build stuff with AI agents we started this repo because setting up AI agent workflows from scratch every single...

Reddit - Artificial Intelligence · 1 min · 3 days ago

Llms

Mistral releases a new open-source model for speech generation | TechCrunch

Mistral's new speech model can run on a smartwatch or a smartphone.

TechCrunch - AI · 4 min · 3 days ago

Llms

OpenAI shelves erotic chatbot ‘indefinitely’ | The Verge

OpenAI has paused plans to release a sexualized “adult mode” for ChatGPT, in its latest move to refocus on the company’s core products.

The Verge - AI · 4 min · 3 days ago

Llms

I open-sourced an always-on direct bridge between your LLM and your Mac. "Hey Q, read my screen and reply to this Slack message" please meet CODEC

TL;DR: Meet CODEC—a completely open-source tool that transforms any LLM into a personal computer agent. You can command it via text or vo...

Reddit - Artificial Intelligence · 1 min · 3 days ago

Llms

Claude Code hits $2.5B in revenue and ships auto mode, an AI classifier that decides what's safe to run on your machine

Anthropic dropped three features for Claude Code on Monday, but the interesting one is auto mode. Until now you had two choices: approve ...

Reddit - Artificial Intelligence · 1 min · 3 days ago

Llms

[D] Probabilistic Neuron Activation in Predictive Coding Algorithm using 1 Bit LLM Architecture

If we use Predictive Coding architecture we wouldn't need backpropogation anymore which would work well for a non deterministic system th...

Reddit - Machine Learning · 1 min · 3 days ago

Llms

Google Gemini still has no native chat export in 2025. Here's how I solved it for my research workflow.

One thing that's always bothered me about Gemini: you can run a 30-minute Deep Research session, get an incredible research report with 4...

Reddit - Artificial Intelligence · 1 min · 3 days ago

Llms

Anthropic report finds Claude users improve with experience | ETIH EdTech News

Anthropic’s latest report shows experienced Claude users achieve better outcomes and use AI for higher-value work, even as adoption broad...

AI Tools & Products · 8 min · 3 days ago

Llms

[2603.11804] OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs

Abstract page for arXiv paper 2603.11804: OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs

arXiv - Machine Learning · 4 min · 3 days ago

Llms

[2512.16917] Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Abstract page for arXiv paper 2512.16917: Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

arXiv - Machine Learning · 4 min · 3 days ago

Llms

[2512.07801] Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

Abstract page for arXiv paper 2512.07801: Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

arXiv - Machine Learning · 4 min · 3 days ago

Llms

[2510.12728] Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior

Abstract page for arXiv paper 2510.12728: Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior

arXiv - Machine Learning · 4 min · 3 days ago

Previous Page 9 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

built an open source tool that auto generates AI context files for any codebase, 150 stars in

All Content

Ridiculous. Anthropic is behaving exactly like OpenAI.

[R] Interested in recent research into recall vs recognition in LLMs

Pretrained ADAM v2 weights [D]

Apple will reportedly allow other AI chatbots to plug into Siri | The Verge

[D] Why evaluating only final outputs is misleading for local LLM agents

[D] - 1M tokens/second serving Qwen 3.5 27B on B200 GPUs, benchmark results and findings

Reducing AI agent token consumption by 90% by fixing the retrieval layer

we built an open source library of AI agent prompts and configs, just hit 100 stars

we made a community repo of AI agent setups and configs, just hit 100 stars with 90 PRs

Mistral releases a new open-source model for speech generation | TechCrunch

OpenAI shelves erotic chatbot ‘indefinitely’ | The Verge

I open-sourced an always-on direct bridge between your LLM and your Mac. "Hey Q, read my screen and reply to this Slack message" please meet CODEC

Claude Code hits $2.5B in revenue and ships auto mode, an AI classifier that decides what's safe to run on your machine

[D] Probabilistic Neuron Activation in Predictive Coding Algorithm using 1 Bit LLM Architecture

Google Gemini still has no native chat export in 2025. Here's how I solved it for my research workflow.

Anthropic report finds Claude users improve with experience | ETIH EdTech News

[2603.11804] OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs

[2512.16917] Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

[2512.07801] Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

[2510.12728] Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior

Related Topics

Stay updated with AI News