The White House Wants Anthropic to Block All Jailbreaks. That May Not Be Possible
TL;DR
WIRED reports that Trump administration officials want Anthropic to rerelease Claude Fable 5 only if its guardrails can no longer be bypassed through jailbreaks. Fable 5 was taken offline last week through export controls after officials raised concerns about cyber, chemistry, and biology capabilities linked to the Mythos model. Anthropic says the concerns are overstated and the jailbreak impact is minimal. Officials say the NSA found ways to disable Fable 5 guardrails.
Nauti's Take
The demand sounds clean in policy terms, but shaky in technical terms. No frontier model can credibly be brought to zero jailbreaks while open-ended prompts remain the main interface.
A better path is layered red-teaming, tiered access, monitoring, incident reporting, and clear liability rules. Treating safety as a switch oversells what guardrails can actually do.
Briefingshow
This case shows how quickly AI safety disputes can move from model evaluations into direct government pressure. If regulators demand impossible guarantees, companies may be pushed to promise absolute safety in systems that remain probabilistic, evolving, and attackable by design.