13 / 1495

Amazon SageMaker AI Async Inference now supports inline request payloads

TL;DR

Amazon SageMaker AI Async Inference can now take small inputs directly in the InvokeEndpointAsync request. The new Body parameter removes the previous S3 upload step before each invocation. Inline payloads are capped at 128,000 raw bytes. Body and InputLocation are mutually exclusive, and size or mixed-parameter errors return synchronously as ValidationError responses.

Nauti's Take

This is plumbing, but useful plumbing. AWS is removing a forced dependency that made small async inference calls feel heavier than they needed to be.

If a team is sending a few KB of JSON into a slower queued job, creating an S3 object first was ceremony, not architecture. The catch is clear: 128 KB is a small ceiling, and output still depends on S3. Good developer ergonomics, not an AI breakthrough.

Briefingshow

This is not a model upgrade; it is a workflow fix for teams using Async Inference for longer-running jobs with small inputs. Until now, they still had to carry S3 buckets, write permissions, object naming, and cleanup logic. Inline payloads cut that operational noise, while the architecture still depends on S3 because outputs remain there.

Sources