13 / 1495

Amazon SageMaker AI Async Inference now supports inline request payloads

TL;DR

AWS added inline payloads to SageMaker AI Async Inference: InvokeEndpointAsync can now take a Body payload up to 128,000 bytes instead of requiring an S3 InputLocation. For small JSON prompts or structured requests, this removes one S3 upload plus client code, input bucket setup, PutObject permission, key naming, and stale-object cleanup.

Nauti's Take

Useful plumbing with direct impact. Plenty of ML workflows get slowed by side logic: buckets, IAM, object names, lifecycle rules.

Inline Body removes that friction for small requests. The AWS post is predictably salesy on latency and cost, but the change has substance: fewer S3 PUTs, fewer permissions, less cleanup.

Teams that need audit trails or process large media should keep the S3 path.

Briefingshow

Async Inference is built for jobs that can run longer than real-time endpoints, but small requests still had to take the S3 detour. That made prompt or JSON workloads heavier than necessary: more IAM, more failure paths, more stale objects. The new option does not remove the whole storage layer, but it cuts one forced step from the invocation path.

Sources