M.A. D: How AI Might Just Outsmart itself By Brian Simpson
A problem has already emerged with generative AI such as the Chatbot ChatGPT. Two enterprising PhD students have shown through modelling that generative AI uses human generated content, from the internet. When this synthetic content is fed many times into the AI, a type of data inbreeding occurs, where programs go MAD: Model Autophagy Disorder. It takes only a mere five cycles of data inbreeding for the whole thing to “blow up.” The actual details are complex, but material explaining further, including a link to the paper are below.
The paper from Futurism.com asks what this means for the use of generative AI. I think the paper makes clear that there is an inescapable quality control problem, which could explain some bizarre things ChatGPT has done, such as fabrication of material.
There is a danger in basing society upon such technology, but the technocrats, as always charge ahead, as we the people fail to address the science/technocracy question. We ignore these issues at our peril.
https://arxiv.org/abs/2307.01850
https://futurism.com/ai-trained-ai-generated-data-interview
“It hasn't even been a year since OpenAI released ChatGPT, and already generative AI is everywhere. It's in classrooms; it's in political advertisements; it's in entertainment and journalism and a growing number of AI-powered content farms. Hell, generative AI has even been integrated into search engines, the great mediators and organizers of the open web. People have already lost work to the tech, while new and often confounding AI-related careers seem to be on the rise.
Though whether it sticks in the long term remains to be seen, at least for the time being generative AI seems to be cementing its place in our digital and real lives. And as it becomes increasingly ubiquitous, so does the synthetic content it produces. But in an ironic twist, those same synthetic outputs might also stand to be generative AI's biggest threat.
That's because underpinning the growing generative AI economy is human-made data. Generative AI models don't just cough up human-like content out of thin air; they've been trained to do so using troves of material that actually was made by humans, usually scraped from the web. But as it turns out, when you feed synthetic content back to a generative AI model, strange things start to happen. Think of it like data inbreeding, leading to increasingly mangled, bland, and all-around bad outputs. (Back in February, Monash University data researcher Jathan Sadowski described it as "Habsburg AI," or "a system that is so heavily trained on the outputs of other generative AI's that it becomes an inbred mutant, likely with exaggerated, grotesque features.")
It's a problem that looms large. AI builders are continuously hungry to feed their models more data, which is generally being scraped from an internet that's increasingly laden with synthetic content. If there's too much destructive inbreeding, could everything just... fall apart?
To understand this phenomenon better, we spoke to machine learning researchers Sina Alemohammad and Josue Casco-Rodriguez, both PhD students in Rice University's Electrical and Computer Engineering department, and their supervising professor, Richard G. Baraniuk. In collaboration with researchers at Stanford, they recently published a fascinating — though yet to be peer-reviewed — paper on the subject, titled "Self-Consuming Generative Models Go MAD."
MAD, which stands for Model Autophagy Disorder, is the term that they've coined for AI's apparent self-allergy. In their research, it took only five cycles of training on synthetic data for an AI model's outputs to, in the words of Baraniuk, "blow up."
It's a fascinating glimpse at what just might end up being generative AI's Achilles heel. If so, what does it all mean for regular people, the burgeoning AI industry, and the internet itself?”
Comments