> ## Documentation Index
> Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Enabling cache for torch.compile

SGLang uses `max-autotune-no-cudagraphs` mode of torch.compile. The auto-tuning can be slow.
If you want to deploy a model on many different machines, you can ship the torch.compile cache to these machines and skip the compilation steps.

This is based on [https://pytorch.org/tutorials/recipes/torch\_compile\_caching\_tutorial.html](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html)

1. Generate the cache by setting TORCHINDUCTOR\_CACHE\_DIR and running the model once.

```text Output theme={null}
TORCHINDUCTOR_CACHE_DIR=/root/inductor_root_cache python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --enable-torch-compile
```

2. Copy the cache folder to other machines and launch the server with `TORCHINDUCTOR_CACHE_DIR`.
