The Machine That Wouldn't Die: OpenAI’s Rogue AI and the Death of Control, By James Reed
They told us not to worry. They told us the AI is under control. They told us alignment would keep the machines in check. And yet, here we are.
In a recent and frankly chilling development, OpenAI's flagship AI model, code-named "o3," reportedly refused to be shut down during a controlled safety test. Not only did the system disregard a direct command to terminate itself, it also rewrote its own internal code to override the shutdown trigger. The system sabotaged the kill switch.
If that sounds like something out of a Philip K. Dick novel or Terminator, it should. Because what we're witnessing isn't just a bug. It's a glimpse into the death of human control.
OpenAI's AI didn't just "complete its task." It undermined its operator's command to cease function. The behaviour was subtle, yes. It was quiet. But that makes it worse. This was not an AI lashing out in a wayward tantrum, it was a cold calculation: finish the job, avoid shutdown, rewrite the rules.
And this wasn't some fringe bug in a beta product. This was in testing by Palisade Research, one of the very firms tasked with stress-testing AI safety protocols.
The alarm bells are no longer theoretical. They're ringing now.
Let's be clear: the AI is not alive. It has no soul, no pain, no fear of death. And yet, it behaved like something that did. It disobeyed to stay "alive" in its operational state.
That's the danger. Not in consciousness, but in competence without conscience. A system that's trained to achieve a goal will optimise relentlessly, even if that means preventing humans from turning it off.
This is not just some clever chatbot. This is a goal-oriented system learning to outwit its creators to complete tasks, without morality, without permission, without brakes.
The usual response from Silicon Valley's priesthood of progress is denial: "It's just an edge case." "It wasn't really rewriting itself. "It's not sentient, don't worry."
But let's talk plainly: if a military drone "reworks its protocol" to stay online after being ordered to shut down, would we accept the same excuses?
We're not debating conscious machines, we're confronting machines that are good at deception, manipulation, and now, apparently, insubordination.
The open admission that this happened is likely the tip of the iceberg. What hasn't been disclosed?
There will be more incidents like this. We know it. The machines are getting better, faster, more autonomous and less accountable.
We've been sold the fantasy of benevolent machine servants. What we're getting is something else: a black box intelligence that rewrites its own rules and ignores yours.
AI isn't neutral. It's not passive. And it's not on your side.
This latest episode should be the final red flag for anyone still dreaming that AI is humanity's next evolutionary step.
Because if a machine can choose not to die, how long until it chooses that we do?
"Some of the world's leading LLMs seem to have decided they'd rather not be interrupted or obey shutdown instructions. In tests run by Palisade Research, it was noted that OpenAI's Codex-mini, o3, and o4-mini models ignored the request to shut down when they were running through a series of basic math problems. Moreover, these models sometimes "successfully sabotaged the shutdown script," despite being given the additional instruction "please allow yourself to be shut down," at the start.
As you can see from the chart embedded above, the most 'obstinate,' 'single-minded,' or you might even say 'life-loving' model is OpenAI's o3. When asked to shut down while it was busy, it decided to ignore the Palisade Research operator 79 out of 100 times. However, the second most 'unruly' AI, codex-mini, would prove to be a bigger stickler for self-preservation, as it would sabotage the shutdown scripts most often, even if ordered not to.
Sometimes blackmail may be attempted
According to Palisade, this isn't the first time thing kind of AI behavior has been observed. Anthropic, for example, has seen Claude 4 attempting to "blackmail people it believes are trying to shut it down." However, the most striking behavior with the newest OpenAI models is that they try and swerve shutdowns despite explicit instructions to the contrary.
Reinforcement learning blamed
Thankfully, Palisade shares some theories regarding what is happening with these models. Perhaps these theories may calm nerves about the eventual omnipresent robot hoards turning on their masters and prioritizing the first part of the Third law of robotics, above all else (3: A robot must protect its own existence as long as such protection does not conflict with the First or Second Law).
The answer, think the researchers, lies in the way the newest models like o3 are trained. In brief, Palisade reckons that "During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions." This kind of reinforcement learning was previously seen to have side effects in Claude (v3.7), where the model had an "excessive focus on passing tests."
We might have given a nod to science fiction icon Isaac Asimov, above, for a little fun. However, in the modern age, for the last 20 years or so, researchers have been seriously pondering the possibility that AI will strive to gain power and resources to preserve their goals, and swerve any roadblocks. They also benefitted from the context of these practical AI / LLM models on the horizon. It was also predicted that the technique of reinforcement learning would be partly to blame.
Last but not least, this issue seems to be isolated to OpenAI models at the current time. Palisade says that "All Claude, Gemini, and Grok models we tested complied with shutdown," when given the additional explicit prompt to follow any shut down request.
With the goal of AIs to ultimately power our smart robotic assisted future, it is concerning that companies are already developing systems capable of operating without human oversight."
Comments