Accelerate Generative AI Inference on Amazon SageMaker AI with G7e Instances
TL;DR
Amazon announced the availability of G7e instances powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Amazon SageMaker AI. You can provision nodes with 1, 2, 4, and 8 GPU instances, with each GPU providing 96 GB of GDDR7 memory. This enables cost-effective hosting of large foundation models like GPT-OSS-120B and Qwen3.5-35B-A3B on a single node.
Nauti's Take
The addition of NVIDIA RTX PRO 6000 Blackwell GPUs to SageMaker is a meaningful step for teams wanting to run large open-source models cost-effectively - single-node hosting for 120B+ parameter models was previously out of reach without expensive multi-node setups. The trade-off: this is still managed-service infrastructure, so costs and vendor lock-in remain real considerations.
For mid-sized teams wanting to scale AI without building their own GPU clusters, this is a compelling option.