4 / 1481

The White House Wants Anthropic to Block All Jailbreaks. That May Not Be Possible

TL;DR

WIRED reports that the Trump administration wants Claude Fable 5 kept offline unless Anthropic can show its guardrails cannot be bypassed through jailbreaks. The concern is that Fable 5 could expose blocked Mythos 5 capabilities in cybersecurity, chemistry and biology. WIRED says the NSA has found ways to disable some safeguards.

Nauti's Take

A demand for unbypassable guardrails sounds tidy in policy language, but it is not a realistic engineering bar. Anthropic should not hide behind the fact that perfect safety is impossible.

If Mythos-grade capability ships inside a public Fable product, the company needs hard metrics: abuse classes blocked, benign requests wrongly rerouted, jailbreak patch time and independent red-team results. Anything less is safety marketing.

Briefingshow

The case shows how quickly AI safety concerns can become export-control disputes. If governments demand perfect guardrails, providers face an impossible proof burden: showing that no prompt can bypass them. For teams using AI in production, the model name is only part of the risk; access policy, auditability and fallback plans matter just as much.

Sources