Amazon SageMaker AI Async Inference now supports inline request payloads
TL;DR
As of June 17, 2026, SageMaker AI Async Inference supports inline payloads: InvokeEndpointAsync can now accept Body directly in the request. For raw payloads up to 128,000 bytes, teams can skip the prior S3 upload per invocation. That removes one network hop, S3 PUT costs, and IAM or cleanup work for small JSON prompts and structured data.
Nauti's Take
AWS frames this as a convenience update, and for once the convenience is real. Teams that used to create S3 keys, maintain permissions, and clean up stale input objects for every small prompt now get a cleaner path.
The 128,000-byte ceiling is the hard boundary: images, audio, and larger documents still belong on the S3 route. The practical win is less infrastructure clutter, not more AI magic.
Briefingshow
This is not a new model capability; it removes friction from AWS ML plumbing. Many agent, classification, and document workflows send small requests but need longer processing than real-time inference allows. For those cases, Async Inference now feels closer to a normal API call while keeping the S3-based output path.