Digital Inbreeding and the AI Doom Spiral, By Brian Simpson

Inbreeding in both humans and animals has adverse effects, genetically reducing the resilience of the gene pool, and leading to the increase in mutational load, and defects. Now it seems there is a problem of AI inbreeding in relation to generative AI. Early generative AI was based upon use of human generated materials on the internet. But now data sets have massively expanded to include AI generated material, not created by humans at all. This is producing a situation similar to making a photocopy of a photocopy of a photocopy, and so one. This is producing a situation where quality control of data rapidly diminishes:

https://www.forbes.com/sites/bernardmarr/2024/03/28/generative-ai-and-the-risk-of-inbreeding/

"Inbreeding could pose a significant problem for future generative AI systems, rendering them less and less able to accurately simulate human language and creativity. One study has confirmed how inbreeding leads to generative AIs becoming less effective, finding that "without enough fresh real data in each generation … future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease."

https://arxiv.org/abs/2307.01850

In other words, AIs need fresh (human-generated) data to get better and better over time. If the data they're trained on is increasingly generated by other AIs, you end up with what's called "model collapse." Which is a fancy way of saying the AIs get dumber. This can happen with any sort of generative AI output – not just text but also images. This video shows what happens when two generative AI models bounce back and forth between each other, with one AI describing an image and then the other creating an image based on the description, and so on and so on in a loop. The starting point was the Mona Lisa, one of the world's great masterpieces. The end result is just a freaky picture of squiggly lines.

Imagine this in terms of a customer service chatbot that gets progressively worse, producing increasingly clunky, robotic or even nonsensical responses. That's the danger for generative AI systems – inbreeding could, in theory, render them pointless. It defeats the purpose of using generative AI in the first place. We want these systems to do a good job of representing human language and creativity, not get progressively worse. We want generative AI systems to get smarter and better at responding to our requests over time. If they can't do that, what's the point of them?"

This is fast going to create a situation where it becomes increasingly difficult to distinguish between appearance and reality, something that has been emerging with first generation generative AI, and it will only get worse over time. I agree with this conclusion: "Imagine this in terms of a customer service chatbot that gets progressively worse, producing increasingly clunky, robotic or even nonsensical responses. That's the danger for generative AI systems – inbreeding could, in theory, render them pointless. It defeats the purpose of using generative AI in the first place. We want these systems to do a good job of representing human language and creativity, not get progressively worse. We want generative AI systems to get smarter and better at responding to our requests over time. If they can't do that, what's the point of them?" Exactly but AI is the cult of the age, so I don't expect this.

https://www.the-sun.com/tech/12245502/artificial-intelligence-inbreeding-habsburg-ai-autophagy/

"ARTIFICIAL intelligence models may soon fall into a doom spiral as machine-generated gibberish floods the internet.

It is no secret that generative AI must train on large swathes of data to generate an output.

However, that data must be "high-quality," meaning accurate and reliable - and the tech giants know it, too.

ChatGPT developer OpenAI has partnered with newsmakers like Vox Media and News Corp to train its chatbots on fresh content.

But this may not be enough to slow the spread of synthetic data, which has flooded the internet since AI systems became widely accessible.

As companies like Google and Meta comb search engines and social media for training data, it is inevitable that they will encounter AI-generated content.

When this information is compiled into a dataset for an AI model, the result is the equivalent of inbreeding.

Systems become increasingly deformed as they learn from inaccurate, machine-generated content and spit falsities out.

This information then winds up in a dataset for a different model, and the process repeats, leading to a total meltdown.

Researcher Jathan Sadowski has been documenting the phenomenon on X for over a year.

He coined the term "Habsburg AI" in February 2023, taking the name from a notoriously inbred royal dynasty.

Sadowski defines it as "a system that is so heavily trained on the outputs of other generative AIs that it becomes an inbred mutant."

The phenomenon takes many names. Other researchers know it as model autophagy disorder or MAD.

The term "autophagy" comes from the Greek "self-devouring," aptly capturing the way a system trains itself on AI-synthesized content like a snake eating its own tail.

Researchers at Rice and Stanford University were among the first to discover that models decline in the quality and diversity of their output without a constant stream of quality data.

Complete autophagy occurs when a model is trained solely on its own responses, but machines can also train on data published by other AI programs.

"Training large-language models on data created by other models...causes 'irreversible defects in the resulting models,'" Sadowski tweeted, referencing an article in the journal Nature.

Digital inbreeding harkens back to the idea of "model collapse," where systems grow increasingly incoherent due to an influx of AI-generated content.

While the idea was once just theory, experts believe it is becoming increasingly likely as more and more synthetic data appears.

NewsGuard, a platform that rates the credibility of news sites, has been tracking the increase of "AI-enabled misinformation" online.

By the end of 2023, the group identified 614 unreliable AI-generated news and information websites, dubbed "UAINS."

That number has since swelled to 1,036.

The websites span over a dozen languages and have generic names like "Ireland Top News" and "iBusiness Day" that appear like legitimate outlets.

Chatbots and other generative AI models may train on this information, regurgitating falsities about news events, celebrity deaths, and more in their responses.

While some netizens could care less about the future of AI, the phenomenon, if unchecked, could have disastrous impacts on human users.

As media literacy declines and AI-generated content floods the internet, users may struggle to distinguish between factual information and machine-generated nonsense." 

 

Comments

No comments made yet. Be the first to submit a comment
Already Registered? Login Here
Tuesday, 07 January 2025

Captcha Image