How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.

How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.

How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.

https://www.livescience.com/technology/artificial-intelligence/how-can-we-prevent-ai-models-from-cannibalizing-themselves-when-human-generated-data-runs-out-scientists-say-theyve-found-the-answer

Publish Date: 2026-05-21 06:00:00

Source Domain: www.livescience.com

While the evolution of artificial intelligence (AI) systems has shown no sign of slowing, there’s a growing concern that large language models (LLMs) will soon run out of human-made data to ingest and learn from.

Once this happens, scientists say, AI models will increasingly rely on synthetic AI-made information, which will lead to an effect called “model collapse.” This is where LLMs spout gibberish and the AI systems they underpin deliver inaccurate answers and hallucinate information to queries far more commonly than they do today.

“That’s especially worrying considering some experts think that we will run out of high-quality human-generated data by the end of the year — so if you’re relying on this synthetic data, but there’s an almost existential threat it will sink your AI, you’re in trouble,” Yasser Roudi, a professor of disordered systems in the Department of Mathematics at King’s College London (KCL), told Live Science. “If, for example, you had LLMs that were used in hospitals to analyze brain scans and find cancers, if while training another model they experienced model collapse, these machines could misdiagnose people.”


You may like

However, Roudi recently found that model collapse can be bypassed by adding a single human-made data point to an AI’s training data, even if all the other data is AI-generated.

The study ‪—‬ which involved researchers from KCL, the Norwegian University of Science and Technology, and the Abdus Salam International Centre for Theoretical Physics in Italy ‪—‬ was published May 14 in the journal Physical Review Letters.

While AI model collapse hasn’t happened in a real-world scenario with an actively deployed AI system, anyone who uses tools like ChatGPT or Gemini to generate answers or text has very likely experienced errors or hallucinations. However, Roudi hopes the new findings might outline a method to sidestep this potential emergent threat.

Countering collapse

Beyond…

Source