How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.

https://www.livescience.com/technology/artificial-intelligence/how-can-we-prevent-ai-models-from-cannibalizing-themselves-when-human-generated-data-runs-out-scientists-say-theyve-found-the-answer

Publish Date: 2026-05-21 06:00:00

Source Domain: www.livescience.com

While the evolution of artificial intelligence (AI) systems has shown no sign of slowing, there’s a growing concern that large language models (LLMs) will soon run out of human-made data to ingest and learn from.

Once this happens, scientists say, AI models will increasingly rely on synthetic AI-made information, which will lead to an effect called “model collapse.” This is where LLMs spout gibberish and the AI systems they underpin deliver inaccurate answers and hallucinate information to queries far more commonly than they do today.

“That’s especially worrying considering some experts think that we will run out of high-quality human-generated data by the end of the year — so if you’re relying on this synthetic data, but there’s an almost existential threat it will sink your AI, you’re in trouble,” Yasser Roudi, a professor of disordered systems in the Department of Mathematics at King’s College London (KCL), told Live Science. “If, for example, you had LLMs that were used in hospitals to analyze brain scans and find cancers, if while training another model they experienced model collapse, these machines could misdiagnose people.”