1 / 676

Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents

TL;DR

The team built 'Adversarial Cost to Exploit' (ACE), a benchmark quantifying how many tokens – expressed in dollars – an autonomous adversary must spend to breach an LLM agent, replacing binary pass/fail metrics.

Key Points

  • Six budget-tier models were tested under identical agent configurations: Gemini Flash-Lite, DeepSeek v3.2, Mistral Small 4, Grok 4.1 Fast, GPT-5.4 Nano, and Claude Haiku 4.5.
  • Claude Haiku 4.5 was an order of magnitude harder to break: mean adversarial cost of $10.21 vs. $1.15 for the next-best model (GPT-5.4 Nano). The remaining four all fell below $1.
  • ACE enables game-theoretic reasoning – at what cost does an attack become economically rational? That reframes AI security evaluation fundamentally.

Nauti's Take

A benchmark that prices security in dollars is not a gimmick – it speaks the language that budget holders actually understand. The fact that four of six tested models can be broken for under a dollar should alarm anyone running agents with real permissions and real data access.

The Haiku 4.5 outlier is fascinating, but caution is warranted: six models, one setup, early methodology – this is a promising first swing, not a definitive verdict. What the community needs now is independent replication and an honest debate about whether 'adversarial cost' truly holds up across different attack strategies.

Sources