1. Model introduction
ERNIE-Image is Baidu’s text-to-image diffusion model family. SGLang Diffusion supports both the regular and Turbo checkpoints with the nativeErnieImagePipeline.
| Model | Hugging Face model ID | Notes |
|---|---|---|
| ERNIE-Image | baidu/ERNIE-Image | Regular text-to-image checkpoint |
| ERNIE-Image-Turbo | baidu/ERNIE-Image-Turbo | Turbo text-to-image checkpoint |
2. Installation
Install SGLang with the diffusion dependencies:Command
3. Serve the model
The commands below target a single supported NVIDIA CUDA or AMD ROCm GPU. Start with--performance-mode auto; use speed only when the full pipeline fits comfortably on the selected GPU(s), and use memory when you need lower peak GPU memory.
Serve ERNIE-Image:
Command
Command
4. Generate an image
Use the OpenAI-compatible image generation API after the server starts:Python
5. Configuration tips
- ERNIE-Image is a text-to-image pipeline; do not pass
--image-path. --performance-mode autokeeps conservative defaults while preserving explicit user flags.- If the checkpoint includes a PE component, SGLang loads it automatically from
model_index.json. - Treat FSDP, SP/Ulysses/Ring, and TP as explicit benchmark knobs. Measure the target resolution, step count, and GPU type before making them production defaults.
