> ## Documentation Index > Fetch the complete documentation index at: https://docs.sglang.io/llms.txt > Use this file to discover all available pages before exploring further. # Loading Models from Object Storage SGLang supports direct loading of models from object storage (S3 and Google Cloud Storage) without requiring a full local download. This feature uses the `runai_streamer` load format to stream model weights directly from cloud storage, significantly reducing startup time and local storage requirements. ## Overview When loading models from object storage, SGLang uses a two-phase approach: 1. **Metadata Download** (once, before process launch): Configuration files and tokenizer files are downloaded to a local cache 2. **Weight Streaming** (lazy, during model loading): Model weights are streamed directly from object storage as needed ## Supported Storage Backends 1. **Amazon S3**: `s3://bucket-name/path/to/model/` 2. **Google Cloud Storage**: `gs://bucket-name/path/to/model/` 3. **Azure Blob**: `az://some-azure-container/path/` 4. **S3 compatible**: `s3://bucket-name/path/to/model/` ## Quick Start ### Basic Usage Simply provide an object storage URI as the model path: ```bash theme={null} # S3 python -m sglang.launch_server \ --model-path s3://my-bucket/models/llama-3-8b/ \ --load-format runai_streamer # Google Cloud Storage python -m sglang.launch_server \ --model-path gs://my-bucket/models/llama-3-8b/ \ --load-format runai_streamer ``` **Note**: The `--load-format runai_streamer` is automatically detected when using object storage URIs, so you can omit it: ```bash theme={null} python -m sglang.launch_server \ --model-path s3://my-bucket/models/llama-3-8b/ ``` ### With Tensor Parallelism ```bash theme={null} python -m sglang.launch_server \ --model-path gs://my-bucket/models/llama-70b/ \ --tp 4 \ --model-loader-extra-config '{"distributed": true}' ``` ## Configuration ### Load Format The `runai_streamer` load format is specifically designed for object storage, ssd and shared file systems ```bash theme={null} python -m sglang.launch_server \ --model-path s3://bucket/model/ \ --load-format runai_streamer ``` ### Extended Configuration Parameters Use `--model-loader-extra-config` to pass additional configuration as a JSON string: ```bash theme={null} python -m sglang.launch_server \ --model-path s3://bucket/model/ \ --model-loader-extra-config '{ "distributed": true, "concurrency": 8, "memory_limit": 2147483648 }' ``` #### Available Parameters

Parameter	Type	Description	Default
`distributed`	bool	Enable distributed streaming for multi-GPU setups. Automatically set to `true` for object storage paths and cuda alike devices.	Auto-detected
`concurrency`	int	Number of concurrent download streams. Higher values can improve throughput for large models.	4
`memory\_limit`	int	Memory limit (in bytes) for the streaming buffer.	System-dependent

## Performance Considerations ### Distributed Streaming For multi-GPU setups, enable distributed streaming to parallelize weight loading between the processes: ```bash theme={null} python -m sglang.launch_server \ --model-path s3://bucket/model/ \ --tp 8 \ --model-loader-extra-config '{"distributed": true}' ``` ## Limitations * **Supported Formats**: Currently only supports `.safetensors` weight format (recommended format) * **Supported Device**: Distributed streaming is supported on cuda alike devices. Otherwise fallback to non distributed streaming ## See Also * [Runai model streamer documentation](https://github.com/run-ai/runai-model-streamer)