The White House Wants Anthropic to Block All Jailbreaks. That May Not Be Possible
TL;DR
The Trump administration is reportedly tying any rerelease of Claude Fable 5 to Anthropic proving that the model’s safety guardrails cannot be bypassed. Fable 5 was taken offline last week through export controls after officials raised concerns about jailbreaks exposing restricted cybersecurity, chemistry, and biology capabilities. Anthropic argues the risks are overstated and the practical effects are limited. Officials say the NSA found ways to disable guardrails and now see the fix as Anthropic’s problem.
Nauti's Take
The demand sounds tough, but it is technically messy. Serious security programs do not promise that a complex system can never be bypassed.
A better standard would focus on testing, disclosure, tiered access, monitoring, and fast fixes. Asking for absolute safety may produce stronger PR language, but it will not automatically produce safer models.
Briefingshow
This moves AI safety from voluntary red-teaming toward a political release condition. If governments demand absolute jailbreak resistance, they are setting a technical bar today’s models may not be able to meet. The risk is that frontier models get judged by impossible guarantees instead of measurable safety work, monitoring, and rapid mitigation.