Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
Llms

built an open source CLI that auto generates AI setup files for your projects just hit 150 stars

hey everyone, been working on this side project called ai-setup and just hit a milestone i wanted to share 150 github stars, 90 PRs merge...

Reddit - Artificial Intelligence · 1 min ·
Llms

built an open source tool that auto generates AI context files for any codebase, 150 stars in

one of the most tedious parts of working with AI coding tools is having to manually write context files every single time. CLAUDE.md, .cu...

Reddit - Artificial Intelligence · 1 min ·

All Content

Llms

Ridiculous. Anthropic is behaving exactly like OpenAI.

Claude was fantastic when I paid monthly, right up until I chose to commit to a yearly Pro subscription. Now, a mere thirty-four text pro...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] Interested in recent research into recall vs recognition in LLMs

I've casually seen LLMs correctly verify exact quotations that they either couldn't or wouldn't quote directly for me. I'm aware that the...

Reddit - Machine Learning · 1 min ·
Llms

Pretrained ADAM v2 weights [D]

Hi everyone, I'm a master's student working on anatomy-aware unsupervised anomaly detection in chest X-rays. My thesis uses ADAM v2 (Auto...

Reddit - Machine Learning · 1 min ·
Apple will reportedly allow other AI chatbots to plug into Siri | The Verge
Llms

Apple will reportedly allow other AI chatbots to plug into Siri | The Verge

Apple is planning to allow other third-party AI chatbots to work with Siri with a new “Extensions” feature in iOS 27, according to a repo...

The Verge - AI · 4 min ·
Llms

[D] Why evaluating only final outputs is misleading for local LLM agents

Been running local agents with Ollama + LangChain lately and noticed something kind of uncomfortable — you can get a completely correct f...

Reddit - Machine Learning · 1 min ·
Llms

[D] - 1M tokens/second serving Qwen 3.5 27B on B200 GPUs, benchmark results and findings

Wrote up the process of pushing Qwen 3.5 27B (dense, FP8) to 1.1M total tok/s on 96 B200 GPUs with vLLM v0.18.0. DP=8 nearly 4x'd through...

Reddit - Machine Learning · 1 min ·
Llms

Reducing AI agent token consumption by 90% by fixing the retrieval layer

Quick insight from building retrieval infrastructure for AI agents: Most agents stuff 50,000 tokens of context into every prompt. They re...

Reddit - Artificial Intelligence · 1 min ·
Llms

we built an open source library of AI agent prompts and configs, just hit 100 stars

yo so i been grinding on AI agents for a while now and honestly the biggest pain is everyone reinventing the wheel with system prompts an...

Reddit - Artificial Intelligence · 1 min ·
Llms

we made a community repo of AI agent setups and configs, just hit 100 stars with 90 PRs

quick share for folks who build stuff with AI agents we started this repo because setting up AI agent workflows from scratch every single...

Reddit - Artificial Intelligence · 1 min ·
Mistral releases a new open-source model for speech generation | TechCrunch
Llms

Mistral releases a new open-source model for speech generation | TechCrunch

Mistral's new speech model can run on a smartwatch or a smartphone.

TechCrunch - AI · 4 min ·
OpenAI shelves erotic chatbot ‘indefinitely’ | The Verge
Llms

OpenAI shelves erotic chatbot ‘indefinitely’ | The Verge

OpenAI ​has paused plans to release a sexualized “adult mode” for ChatGPT, in its latest move to refocus on the company’s core ​products.

The Verge - AI · 4 min ·
Llms

I open-sourced an always-on direct bridge between your LLM and your Mac. "Hey Q, read my screen and reply to this Slack message" please meet CODEC

TL;DR: Meet CODEC—a completely open-source tool that transforms any LLM into a personal computer agent. You can command it via text or vo...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude Code hits $2.5B in revenue and ships auto mode, an AI classifier that decides what's safe to run on your machine

Anthropic dropped three features for Claude Code on Monday, but the interesting one is auto mode. Until now you had two choices: approve ...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Probabilistic Neuron Activation in Predictive Coding Algorithm using 1 Bit LLM Architecture

If we use Predictive Coding architecture we wouldn't need backpropogation anymore which would work well for a non deterministic system th...

Reddit - Machine Learning · 1 min ·
Llms

Google Gemini still has no native chat export in 2025. Here's how I solved it for my research workflow.

One thing that's always bothered me about Gemini: you can run a 30-minute Deep Research session, get an incredible research report with 4...

Reddit - Artificial Intelligence · 1 min ·
Anthropic report finds Claude users improve with experience | ETIH EdTech News
Llms

Anthropic report finds Claude users improve with experience | ETIH EdTech News

Anthropic’s latest report shows experienced Claude users achieve better outcomes and use AI for higher-value work, even as adoption broad...

AI Tools & Products · 8 min ·
[2603.11804] OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs
Llms

[2603.11804] OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs

Abstract page for arXiv paper 2603.11804: OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs

arXiv - Machine Learning · 4 min ·
[2512.16917] Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning
Llms

[2512.16917] Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Abstract page for arXiv paper 2512.16917: Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

arXiv - Machine Learning · 4 min ·
[2512.07801] Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support
Llms

[2512.07801] Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

Abstract page for arXiv paper 2512.07801: Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

arXiv - Machine Learning · 4 min ·
[2510.12728] Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior
Llms

[2510.12728] Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior

Abstract page for arXiv paper 2510.12728: Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior

arXiv - Machine Learning · 4 min ·
Previous Page 9 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime