Container image
AWS publishes pre-built, security-patched SGLang DLCs. The SageMaker GPU image is available from the Amazon ECR registry (account763104351884) in each supported region. For example, in us-west-2:
Specifying the model
The SageMaker image resolves the model in this order:SM_SGLANG_MODEL_PATHenvironment variable — explicit Hugging Face ID or path./opt/ml/model— when SageMaker mounts model artifacts viaModelDataUrlorModelDataSource, the entrypoint uses this path by default.
HF_TOKEN.
Any SM_SGLANG_* environment variable is converted to a --<name> SGLang server argument
(for example, SM_SGLANG_CONTEXT_LENGTH=4096 becomes --context-length 4096).
Deploy with the SageMaker Python SDK
Deploy with Boto3
Model artifacts
WhenModelDataUrl (or ModelDataSource) points to a tarball or S3 prefix, SageMaker mounts the contents
at /opt/ml/model. The entrypoint defaults --model-path to that location, so SM_SGLANG_MODEL_PATH
can be omitted:
Notes
- GPU deployments require
inference_ami_version— the default SageMaker host AMI has incompatible NVIDIA drivers for CUDA 13 images. See the ProductionVariant API reference for valid values. - The endpoint exposes an OpenAI-compatible API, so the request body matches the SGLang server’s
/v1/chat/completionsschema.
