18 / 1495

The White House Wants Anthropic to Block All Jailbreaks. That May Not Be Possible

TL;DR

The Trump administration is keeping pressure on Anthropic: Claude Fable 5 should only return if its safety guardrails can no longer be bypassed through jailbreaks. According to WIRED, the guardrails are meant to limit access to risky capabilities in the underlying Mythos model, including cybersecurity, chemistry, and biology functions. Anthropic says the concerns are overstated and the jailbreak effects are minimal. The NSA reportedly found ways to disable Fable 5 safeguards anyway.

Nauti's Take

The demand sounds politically strong, but technically shaky. Making a model fully jailbreak-proof is not a realistic product standard; it is more like a wishlist item.

Anthropic should still not get away with PR framing: if Fable 5 has risky capabilities, it needs serious testing, transparent failure classes, and fast fix cycles. But regulators need to stop confusing absolute safety with real safety.

Briefingshow

This shows how quickly AI safety disputes can turn into market access and export-control decisions. If regulators demand absolute jailbreak-proofing, they may set a bar frontier models cannot realistically meet. A more workable path is measurable red-teaming, monitoring, incident reporting, and strict controls around high-risk capabilities.

Sources