Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents
TL;DR
The team built 'Adversarial Cost to Exploit' (ACE), a benchmark quantifying how many tokens – expressed in dollars – an autonomous adversary must spend to breach an LLM agent, replacing binary pass/fail metrics.
Key Points
- Six budget-tier models were tested under identical agent configurations: Gemini Flash-Lite, DeepSeek v3.2, Mistral Small 4, Grok 4.1 Fast, GPT-5.4 Nano, and Claude Haiku 4.5.
- Claude Haiku 4.5 was an order of magnitude harder to break: mean adversarial cost of $10.21 vs. $1.15 for the next-best model (GPT-5.4 Nano). The remaining four all fell below $1.
- ACE enables game-theoretic reasoning – at what cost does an attack become economically rational? That reframes AI security evaluation fundamentally.
Nauti's Take
A benchmark that prices security in dollars is not a gimmick – it speaks the language that budget holders actually understand. The fact that four of six tested models can be broken for under a dollar should alarm anyone running agents with real permissions and real data access.
The Haiku 4.5 outlier is fascinating, but caution is warranted: six models, one setup, early methodology – this is a promising first swing, not a definitive verdict. What the community needs now is independent replication and an honest debate about whether 'adversarial cost' truly holds up across different attack strategies.