Machine Learning

ML algorithms, training, and inference

Top This Week

Llms

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

For people just starting out in GPU kernel engineering or LLM inference (FlashAttention / FlashInfer / SGLang / vLLM style work), most jo...

Reddit - Machine Learning · 1 min ·
[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models
Llms

[2511.10262] MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

Abstract page for arXiv paper 2511.10262: MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duple...

arXiv - AI · 4 min ·
[2603.11698] OSCBench: Benchmarking Object State Change in Text-to-Video Generation
Machine Learning

[2603.11698] OSCBench: Benchmarking Object State Change in Text-to-Video Generation

Abstract page for arXiv paper 2603.11698: OSCBench: Benchmarking Object State Change in Text-to-Video Generation

arXiv - AI · 4 min ·

All Content

[2603.29142] REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour
Llms

[2603.29142] REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour

Abstract page for arXiv paper 2603.29142: REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour

arXiv - AI · 4 min ·
[2603.29139] SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents
Llms

[2603.29139] SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

Abstract page for arXiv paper 2603.29139: SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

arXiv - AI · 4 min ·
[2603.29112] GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification
Llms

[2603.29112] GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification

Abstract page for arXiv paper 2603.29112: GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification

arXiv - AI · 3 min ·
[2603.29085] PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering
Llms

[2603.29085] PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering

Abstract page for arXiv paper 2603.29085: PAR$^2$-RAG: Planned Active Retrieval and Reasoning for Multi-Hop Question Answering

arXiv - AI · 3 min ·
[2603.29075] The Future of AI is Many, Not One
Machine Learning

[2603.29075] The Future of AI is Many, Not One

Abstract page for arXiv paper 2603.29075: The Future of AI is Many, Not One

arXiv - AI · 3 min ·
[2603.28990] Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures
Llms

[2603.28990] Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures

Abstract page for arXiv paper 2603.28990: Drop the Hierarchy and Roles: How Self-Organizing LLM Agents Outperform Designed Structures

arXiv - AI · 4 min ·
[2603.28986] Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research
Llms

[2603.28986] Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research

Abstract page for arXiv paper 2603.28986: Mimosa Framework: Toward Evolving Multi-Agent Systems for Scientific Research

arXiv - AI · 4 min ·
[2603.28955] Enhancing Policy Learning with World-Action Model
Machine Learning

[2603.28955] Enhancing Policy Learning with World-Action Model

Abstract page for arXiv paper 2603.28955: Enhancing Policy Learning with World-Action Model

arXiv - AI · 3 min ·
Machine Learning

The missing layer between current AI and AGI may be intent architecture

A lot of the AI/ potential AGI conversation still assumes the main path forward is straightforward: increase model capability, expand con...

Reddit - Artificial Intelligence · 1 min ·
Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet
Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min ·
Llms

LLM agents can trigger real actions now. But what actually stops them from executing?

We ran into a simple but important issue while building agents with tool calling: the model can propose actions but nothing actually enfo...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

OkCupid gave 3 million dating-app photos to facial recognition firm, FTC says

submitted by /u/Mathemodel [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

Are LLMs a Dead End? (Investors Just Bet $1 Billion on “Yes”)

| AI Reality Check | Cal Newport Chapters 0:00 What is Yan LeCun Up To? 14:55 How is it possible that LeCun could be right about LLM’s be...

Reddit - Artificial Intelligence · 1 min ·
20+ Best AI Project Ideas for 2026: Trending AI Projects
Ai Startups

20+ Best AI Project Ideas for 2026: Trending AI Projects

This article presents over 20 AI project ideas tailored for various skill levels, providing a roadmap for building portfolio-ready projec...

AI Events ·
Machine Learning

[P] Looking for people who have had training runs fail unexpectedly to beta test a stability monitor. Free, takes 5 minutes to add to your existing loop. DM me.

Anyone actively training models want to try a stability monitor on a real run? Trying to get real world validation outside my own benchma...

Reddit - Machine Learning · 1 min ·
Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min ·
Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch
Machine Learning

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yu...

TechCrunch - AI · 4 min ·
Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min ·
Previous Page 228 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime