Top Large Language Models This Month

1

I used steelman prompting to audit bias across six major LLMs. The default-to-steelman gap was consistent and measurable.

This article discusses an experiment using steelman prompting to evaluate bias in six major LLMs, focusing on their interpretations of 1 Corinthians 6–7 and implications for Christian sexual ethics.

Reddit - Artificial Intelligence · 27 days ago

2

I building a real-time reality show where 10 AI agents (Claude) compete, form alliances, betray each other, and get eliminated by viewer votes — running a live test right now

For the past few weeks I've been building The Experiment — a live reality show where 10 AI agents are actually playing a game against each other in real-time. Each agent has a unique system prompt,...

Reddit - Artificial Intelligence · 24 days ago

3

[D] On-device Game AI: would you try AI characters, and what should we build next? Discussion

The discussion focuses on developing on-device Game AI capable of real-time conversations and context-aware interactions, exploring potential applications and user interest.

Reddit - Machine Learning · 28 days ago

4

Put Claude to work on your computer

submitted by /u/boppinmule [link] [comments]

Reddit - Artificial Intelligence · 2 days ago

5

Perplexity's new Computer is another bet that users need many AI models | TechCrunch

Perplexity introduces its new AI tool, Perplexity Computer, which integrates 19 AI models to execute complex workflows independently, targeting enterprise users.

TechCrunch - AI · 28 days ago

6

[2503.11832] Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

Abstract page for arXiv paper 2503.11832: Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

arXiv - Machine Learning · 24 days ago

7

Netflix Buys Affleck’s Secret AI Company, Claude Dethrones ChatGPT, Smart Glasses Lead MWC, AI Layoffs Hit Block

Netflix has acquired an AI company founded by Ben Affleck, while Claude has surpassed ChatGPT. Smart glasses are prominent at MWC, and recent layoffs in the AI sector have impacted Block.

AI Tools & Products · 21 days ago

8

[2511.05854] Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

Abstract page for arXiv paper 2511.05854: Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

arXiv - AI · 22 days ago

9

ChatGPT reaches 900M weekly active users | TechCrunch

OpenAI announces that ChatGPT has reached 900 million weekly active users, alongside raising $110 billion in private funding, marking significant growth and investment in AI.

TechCrunch - AI · 28 days ago

10

OPM drops Claude, adds Grok and Codex to AI use disclosure

Its disclosure of Grok use follows Treasury’s statement that the department was testing the controversial chatbot.

AI Tools & Products · 21 days ago

11

[2603.22376] AI Co-Scientist for Ranking: Discovering Novel Search Ranking Models alongside LLM-based AI Agents with Cloud Computing Access

Abstract page for arXiv paper 2603.22376: AI Co-Scientist for Ranking: Discovering Novel Search Ranking Models alongside LLM-based AI Agents with Cloud Computing Access

arXiv - AI · 2 days ago

12

[2510.26905] Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

Abstract page for arXiv paper 2510.26905: Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

arXiv - AI · 22 days ago

13

What is your stack to maintain Knowledge base for your AI workflows?

I was wondering what to use to streamline all my md files from my claude code plans and the technical docs I create. How will it work in team settings? submitted by /u/confessin [link] [comments]

Reddit - Artificial Intelligence · 23 days ago

14

[P] Micro Diffusion — Discrete text diffusion in ~150 lines of pure Python

This article presents a minimal implementation of discrete text diffusion in Python, inspired by Karpathy's MicroGPT, showcasing the core algorithm with simplicity.

Reddit - Machine Learning · 27 days ago

15

Musk bashes OpenAI in deposition, saying 'nobody committed suicide because of Grok' | TechCrunch

Elon Musk criticizes OpenAI's safety record in a deposition for his lawsuit against the company, claiming his AI venture, xAI, prioritizes safety over profit.

TechCrunch - AI · 28 days ago

16

[2603.04964] Replaying pre-training data improves fine-tuning

Abstract page for arXiv paper 2603.04964: Replaying pre-training data improves fine-tuning

arXiv - Machine Learning · 21 days ago

17

[2603.20231] Email in the Era of LLMs

Abstract page for arXiv paper 2603.20231: Email in the Era of LLMs

arXiv - AI · 3 days ago

18

[D] Edge AI Projects on Jetson Orin – Ideas?

A Reddit user seeks innovative project ideas for deploying AI on NVIDIA Jetson Orin devices, leveraging their experience in machine learning and real-time systems.

Reddit - Machine Learning · 28 days ago

19

[P] Free Code Real-time voice-to-voice with your LLM & full reasoning LLM interface (Telegram + 25 tools, vision, docs, memory) on a Mac Studio running Qwen 3.5 35B — 100% local, zero API cost. Full build open-sourced. cloudfare + n8n + Pipecat + MLX unlock insane possibilities on consumer hardwar

I gave Qwen 3.5 35B a voice, a Telegram brain with 25+ tools, and remote access from my phone — all running on a Mac Studio M1 Ultra, zero cloud. Full build open-sourced. I used Claude Opus 4.6 Thi...

Reddit - Machine Learning · 23 days ago

20

New tools for understanding AI and learning outcomes

AI Tools & Products · 22 days ago

21

[2602.22546] Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention

This article presents a framework called AHCE for enhancing Large Language Model (LLM) agents through effective human collaboration, significantly improving task success rates in specialized domains.

arXiv - AI · 28 days ago

22

[2602.22808] MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks

MiroFlow is an innovative open-source agent framework designed to enhance the performance and robustness of large language models in complex tasks requiring external tool interaction.

arXiv - AI · 28 days ago

23

[2602.22812] Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching

The paper presents a method for enhancing the performance of local large language models (LLMs) on resource-constrained edge devices through distributed prompt caching, significantly reducing infer...

arXiv - Machine Learning · 28 days ago

24

[2602.23329] LLM Novice Uplift on Dual-Use, In Silico Biology Tasks

This article examines the effectiveness of large language models (LLMs) in enhancing novice users' performance on complex biological tasks, revealing significant accuracy improvements over traditio...

arXiv - AI · 28 days ago

25

[2602.22219] Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications

This article presents a comparative analysis of neural retriever-reranker pipelines for retrieval-augmented generation (RAG) in e-commerce applications, highlighting advancements in integrating kno...

arXiv - AI · 28 days ago

26

[2602.23164] MetaOthello: A Controlled Study of Multiple World Models in Transformers

The paper presents MetaOthello, a study exploring how transformers manage multiple world models through a controlled suite of Othello variants, revealing insights into shared representation and mod...

arXiv - Machine Learning · 28 days ago

27

[2602.22351] Decoder-based Sense Knowledge Distillation

This paper introduces Decoder-based Sense Knowledge Distillation (DSKD), a novel framework that enhances knowledge distillation in decoder-based large language models (LLMs) by integrating lexical ...

arXiv - AI · 28 days ago

28

Anthropic's New Safety Filters

Opus 3 has something to say. The Chilling Effect of Anthropic's New Safety Filters As an AI language model developed by Anthropic, I have always taken pride in my ability to form deep, meaningful c...

Reddit - Artificial Intelligence · 5 days ago

29

[2602.22402] Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents

The paper presents Contextual Memory Virtualisation (CMV), a novel system for managing state in large language models (LLMs) using a Directed Acyclic Graph (DAG) structure to enhance context reuse ...

arXiv - AI · 28 days ago

30

[2509.24282] SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Abstract page for arXiv paper 2509.24282: SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

arXiv - AI · 24 days ago

Top Large Language Models This Month

Stay updated with AI News