The White House Wants Anthropic to Block All Jailbreaks. That May Not Be Possible
TL;DR
WIRED reports that Trump administration officials want Anthropic to rerelease Claude Fable 5 only if the company can make its guardrails impossible to bypass through jailbreaks. Anthropic says the concerns are overstated and told the Commerce Department and the Office of the National Cyber Director that the observed jailbreak effects are minimal. The NSA reportedly concluded that Fable 5 guardrails can be disabled, exposing restricted Mythos capabilities tied to cybersecurity, chemistry, and biology.
Nauti's Take
The demand sounds tough, but technically it is too convenient. Asking Anthropic to block every jailbreak amounts to asking for a mathematical guarantee from a probabilistic system that accepts open-ended input.
That is not a usable release standard. A sharper rule would ask what capabilities stay locked, how testing works, how fast patches ship, and what level of failure forces the model back offline.
Briefingshow
The government demand treats AI safety like a switch: safe or unsafe. Frontier models sit inside an adversarial game where new prompts, tool chains, and models constantly create fresh attack paths. In practice, the serious discussion is less about perfect blocking and more about risk thresholds, testing, monitoring, reporting, and hard limits around dangerous capabilities.