This document describes how run SGLang on Apple Silicon using Metal (MLX). If you encounter issues or have questions, please open an issue.Documentation Index
Fetch the complete documentation index at: https://docs.sglang.io/llms.txt
Use this file to discover all available pages before exploring further.
Install SGLang
You can install SGLang using one of the methods below.Install from Source
Launch of the Serving Engine
Launch the server with:SGLANG_USE_MLX=1- Enables the use of MLX as the SGLang runtime backend (if disabled, SGLang will fall back totorch.mps, which has less support)--disable-cuda-graph- Disables usage of CUDA graph, which is not relevant for Apple Metal.--disable-overlap-schedule- Disables overlap scheduling (enabled/not present by default) achieved using MLX’sasync_eval()
Benchmarking with Requests
sglang.benchmark_one_batch calls the synchronous prefill/decode methods directly without going through the scheduler and the overlap code path.
sglang.benchmark_offline_throughput can toggle overlap scheduling as it uses the scheduler and the overlap code path by using the flag --disable-overlap-schedule.
