13 / 1495

Amazon SageMaker AI Async Inference now supports inline request payloads

TL;DR

Amazon SageMaker AI Async Inference now supports inline payloads: InvokeEndpointAsync can accept small inputs directly through the Body parameter. The cap is 128,000 raw bytes. Body and InputLocation are mutually exclusive, so large images, audio files, or documents still need the S3 path. Output behavior stays the same: results are written to the configured S3 OutputLocation. AWS says existing async endpoints need no model or container change.

Nauti's Take

This is not a glamorous AI feature. It is the kind of infrastructure fix that makes pipelines quieter, cheaper, and less annoying.

If you use Async Inference for small jobs, you can finally cut one pointless S3 detour out of the path.

Briefingshow

For teams using Async Inference because jobs run longer than real-time inference allows, the old S3-first flow was unnecessary glue for small JSON prompts or structured inputs. The client code gets simpler, and validation failures arrive before work is queued. The 128 KB limit keeps the feature scoped: less boilerplate for small jobs, S3 for large files and audit replay.

Sources