3 / 1485

Amazon SageMaker AI Async Inference now supports inline request payloads

TL;DR

Amazon SageMaker AI Async Inference now accepts inline payloads in the Body field of InvokeEndpointAsync. For inputs up to 128,000 bytes, customers no longer need to upload data to Amazon S3 before every invocation. That makes small async workloads cleaner: one API call instead of S3 upload code, an input bucket, S3 write permissions, UUID key handling, and stale-object cleanup.

Nauti's Take

AWS frames this as a convenience feature, and that is exactly what it is: small, technical, but genuinely useful in day-to-day infrastructure work. Anyone who has built Async Inference pipelines around tiny JSON requests knows the odd detour through S3. Still, teams should not blindly move everything inline.

For traceable inputs, larger data, or later replay, S3 remains the cleaner path.

Briefingshow

This is not a new model capability; it removes friction from AWS ML plumbing. Many agent, classification, and document workflows send small requests but need longer processing than real-time inference allows. For those cases, Async Inference now feels closer to a normal API call while keeping the S3-based output path.

Sources