Why AI Is Training on Its Own Garbage (and How to Fix It)

AI News - General April 09, 2026 8 min read

Machine Learning Why AI Is Training on Its Own Garbage (and How to Fix It) Deep web data is the gold we can't touch, yet Sabrine Bendimerad Apr 8, 2026 7 min read Share Image generated with Gemini If you have been interested in AI for a while, you are probably an LLM/Agent/Chat user, but have you ever asked yourself how these tools will be trained in the near future, and what if we have already used up the data we need to train models? Many theories say that we are running out of high-quality, human-generated data to train our models. New content goes up every day, that’s a reality, but an increasing share of what gets added daily is itself AI-generated. So if you keep training on public web data, you’re eventually training on the outputs of your own predecessors. The snake eating its tail. Researchers call this phenomenon Model Collapse, where AI models start learning from the errors of their predecessors until the whole system degrades into nonsense. But what if I told you we aren’t actually running out of data? We’ve just been looking in the wrong place. In this article, I am going to break down the key insights from this brilliant paper. The Web We Already use and the Web That Matters Most of us consider the web as a unique source of information. In reality, there are at least two. There is the Surface Web: the indexed, public world like what we find on Reddit, Wikipedia, and news sites. This is what we’ve already scraped and overused for years to train the mainstream ...

Originally published on April 09, 2026. Curated by AI News.

Machine Learning

Looking for advice.[D]

Hi everyone. As a mechanical engineering student, I'm trying to learn Python and machine learning applications, but I have a serious prob...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 1 hour ago

Machine Learning

Sam Altman's Coworkers Say He Can Barely Code and Misunderstands Basic Machine Learning Concepts

AI News - General · 2 min · about 1 hour ago

Machine Learning

Interpretable machine learning model advances analysis of complex genetic traits

AI News - General · 6 min · about 1 hour ago

Why AI Is Training on Its Own Garbage (and How to Fix It)

Related Articles

Looking for advice.[D]

UMKC Announces New Master of Science in Artificial Intelligence

Sam Altman's Coworkers Say He Can Barely Code and Misunderstands Basic Machine Learning Concepts

Interpretable machine learning model advances analysis of complex genetic traits

No comments

Stay updated with AI News