Amazon SageMaker AI Async Inference now supports inline request payloads
TL;DR
Amazon SageMaker AI Async Inference can now accept payloads directly in the request body of InvokeEndpointAsync. For inputs up to 128,000 bytes, teams no longer need to upload data to Amazon S3 before every call. The new path targets small JSON prompts and structured data. Body and InputLocation are mutually exclusive; clients that set both, or exceed the inline size limit, get a synchronous ValidationError.
Nauti's Take
This is not a keynote feature, it is a monthly-bill feature. Anyone running async inference pipelines knows the S3 busywork: permissions, keys, cleanup, latency.
Inline payloads do not make the path magical, but they make it much less brittle.
Briefingshow
This sounds small, but it removes a real point of friction in async inference. Many AI requests are small enough for 128 KB, yet still need longer processing time than real-time inference allows. In that gap, mandatory S3 input uploads were often architecture overhead rather than real value.