112 / 766

Show HN: ACE – A dynamic benchmark measuring the cost to break AI agents

TL;DR

The team built 'Adversarial Cost to Exploit' (ACE), a benchmark quantifying how many tokens – expressed in dollars – an autonomous adversary must spend to breach an LLM agent, replacing binary pass/fail metrics.

Key Points

  • Six budget-tier models were tested under identical agent configurations: Gemini Flash-Lite, DeepSeek v3.2, Mistral Small 4, Grok 4.1 Fast, GPT-5.4 Nano, and Claude Haiku 4.5.
  • Claude Haiku 4.5 was an order of magnitude harder to break: mean adversarial cost of $10.21 vs. $1.15 for the next-best model (GPT-5.4 Nano). The remaining four all fell below $1.
  • ACE enables game-theoretic reasoning – at what cost does an attack become economically rational? That reframes AI security evaluation fundamentally.

Nauti's Take

A benchmark that prices security in dollars is not a gimmick – it speaks the language that budget holders actually understand. The fact that four of six tested models can be broken for under a dollar should alarm anyone running agents with real permissions and real data access.

The Haiku 4.5 outlier is fascinating, but caution is warranted: six models, one setup, early methodology – this is a promising first swing, not a definitive verdict. What the community needs now is independent replication and an honest debate about whether 'adversarial cost' truly holds up across different attack strategies.

Context

Most agent security benchmarks return binary verdicts – nearly useless for real-world risk decisions. ACE translates resilience into money, giving developers, security teams, and decision-makers a shared language. The dramatic lead of Haiku 4.5 is both a signal and an open question: is it model training, RLHF specifics, or a measurement artifact?

Until the methodology matures, the numbers should be treated as directional indicators, not ground truth.

Sources