Amazon SageMaker AI Async Inference now supports inline request payloads
TL;DR
AWS announced inline payload support for Amazon SageMaker AI Async Inference on June 17, 2026, letting small inference inputs travel directly in the InvokeEndpointAsync request body. For eligible jobs, teams no longer need to upload every input to Amazon S3 first and then pass an InputLocation reference into the async endpoint. The raw inline payload limit is 128,000 bytes. Body and InputLocation are mutually exclusive, while inference results still write to the configured S3 OutputLocation.
Nauti's Take
This is not a shiny model upgrade, it is the plumbing fix production teams actually feel. If you run lots of small async inference jobs, you just lost a pile of S3 ceremony, IAM clutter, and stale input object cleanup.
Briefingshow
This does not make models cheaper by itself, but it removes a recurring integration chore. Teams running many small async jobs can skip one S3 PUT, one network hop, input-bucket IAM grants, and stale-object cleanup per request. The practical win is less infrastructure wrapped around the same inference workload.