Production-grade AI agents for financial compliance: Lessons from Stripe
TL;DR
Stripe details a compliance agent system on AWS Bedrock that assists human reviewers in financial risk reviews instead of automating final decisions. The system decomposes complex reviews into small sub-questions orchestrated as a DAG. Agent outputs act as prep work, while humans remain accountable for the judgment. The stack uses ReAct agents with tool calls, a dedicated agent service, and an LLM proxy for model access, monitoring, fallbacks, and prompt caching.
Nauti's Take
This is PR-heavy, but still useful because Stripe exposes concrete architecture choices. The key lesson: the agent is not a magical coworker, but a tightly guided research process with tool access, audit trails, and clear accountability.
Anyone putting agents into production should first slice the task, build the control layer, and measure cost. Only then does the model debate matter.
Briefingshow
The important part is not the agent hype, but the operating model: small tasks, strict orchestration, full logs, and humans as the decision layer. Those are exactly the pieces many AI agent prototypes lack. Stripe shows that production systems in regulated domains look more like controlled infrastructure than a free-running chatbot.