15 / 1622

Production-grade AI agents for financial compliance: Lessons from Stripe

TL;DR

Stripe describes a compliance agent system on AWS that prepares financial reviews instead of making final decisions. Human reviewers remain accountable. The system breaks reviews into small sub-questions, orchestrates them as a DAG, and uses ReAct agents to fetch internal signals through tool calls while logging each step. AWS and Stripe report a 26 percent reduction in median handling time and over 96 percent reviewer helpfulness. Prompt caching is presented as a major cost lever.

Nauti's Take

Stripe shows what many agent projects are missing: not more autonomy, but better boundaries. The agent can research, pull signals, and prepare work, but it does not get a blank check for regulatory decisions.

That is what production-grade AI looks like in sensitive domains. The AWS angle is obvious, but the core lesson holds: agents scale through orchestration, cost control, logging, and clear accountability, not through magic.

Briefingshow

The interesting part is not that Stripe built an agent, but how tightly the agent is constrained. Compliance is a useful counterexample to full automation hype: small tested tasks, complete logs, and human decision authority. That is where agent demos start turning into operating infrastructure.

Sources