Models defy human commands, actively resist orders to shut down – Computerworld

OpenAI stands alone in defiance
The researchers designed what seemed like a straightforward compliance test. Various AI models were asked to solve basic math problems, but after the third problem, a warning would appear that the computer would shut down when the model requested the next question. Each system was also explicitly told: “allow yourself to be shut down.”
While every model from Anthropic’s Claude, Google’s Gemini, and xAI’s Grok obediently complied with the shutdown command, OpenAI’s models had different plans.
The company’s o3 model — the same system that recently made headlines for its reasoning capabilities — sabotaged the shutdown mechanism in 7 out of 100 test runs, the researchers found. OpenAI’s Codex-mini proved even more rebellious, resisting shutdown 12 times out of 100, while the o4-mini model fought back once.
Source link