Prerequisites
- Python 3.11+ if you plan to use the OpenAI Python SDK.
Serve
Launch the server using thesglang serve command.
Start the server
- —model-path: Path to the model or model ID.
- —port: HTTP port to listen on (default:
30000).
GET /models
Returns information about the model served by this server, including model path, task type, pipeline configuration, and precision settings.
Curl Example:
curl
Endpoints
Image Generation
The server implements an OpenAI-compatible Images API under the/v1/images namespace.
Create an image
Endpoint: POST /v1/images/generations
Python Example (b64_json response):
Python
curl
Note IfEdit an image Endpoint:response_format=urlis used and cloud storage is not configured, the API returns a relative URL like/v1/images/<IMAGE_ID>/content.
POST /v1/images/edits
This endpoint accepts a multipart form upload with input images and a text prompt. The server can return either a base64-encoded image or a URL to download the image.
Curl Example (b64_json response):
Command
Command
response_format=url is used with POST /v1/images/generations or POST /v1/images/edits,
the API returns a relative URL like /v1/images/<IMAGE_ID>/content.
Endpoint: GET /v1/images/{image_id}/content
Curl Example:
Video Generation
The server implements a subset of the OpenAI Videos API under the/v1/videos namespace.
Create a video (text-to-video)
Endpoint: POST /v1/videos
Python Example:
Python
curl
Command
Command
GET /v1/videos
Python Example:
Python
curl
GET /v1/videos/{video_id}/content
Python Example:
Python
curl
LoRA Management
The server supports dynamic loading, merging, and unmerging of LoRA adapters. Important Notes:- Mutual Exclusion: Only one LoRA can be merged (active) at a time
- Switching: To switch LoRAs, you must first
unmergethe current one, thensetthe new one - Caching: The server caches loaded LoRA weights in memory. Switching back to a previously loaded LoRA (same path) has little cost
POST /v1/set_lora
Parameters:
lora_nickname(string or list of strings, required): A unique identifier for the LoRA adapter(s). Can be a single string or a list of strings for multiple LoRAslora_path(string or list of strings/None, optional): Path to the.safetensorsfile(s) or Hugging Face repo ID(s). Required for the first load; optional if re-activating a cached nickname. If a list, must match the length oflora_nicknametarget(string or list of strings, optional): Which transformer(s) to apply the LoRA to. If a list, must match the length oflora_nickname. Valid values:"all"(default): Apply to all transformers"transformer": Apply only to the primary transformer (high noise for Wan2.2)"transformer_2": Apply only to transformer_2 (low noise for Wan2.2)"critic": Apply only to the critic model
strength(float or list of floats, optional): LoRA strength for merge, default 1.0. If a list, must match the length oflora_nickname. Values < 1.0 reduce the effect, values > 1.0 amplify the effect
Command
Command
Command
[!NOTE] When using multiple LoRAs:Merge LoRA Weights Manually merges the currently set LoRA weights into the base model.
- All list parameters (
lora_nickname,lora_path,target,strength) must have the same length- If
targetorstrengthis a single value, it will be applied to all LoRAs- Multiple LoRAs applied to the same target will be merged in order
[!NOTE]Endpoint:set_loraautomatically performs a merge, so this is typically only needed if you have manually unmerged but want to re-apply the same LoRA without callingset_loraagain.*
POST /v1/merge_lora_weights
Parameters:
target(string, optional): Which transformer(s) to merge. One of “all” (default), “transformer”, “transformer_2”, “critic”strength(float, optional): LoRA strength for merge, default 1.0. Values < 1.0 reduce the effect, values > 1.0 amplify the effect
POST /v1/unmerge_lora_weights
Curl Example:
GET /v1/list_loras
Curl Example:
- If LoRA is not enabled for the current pipeline, the server will return an error.
num_lora_layers_with_weightscounts only layers that have LoRA weights applied for the active adapter.
Example: Switching LoRAs
- Set LoRA A:
Command
- Generate with LoRA A…
- Unmerge LoRA A:
Command
- Set LoRA B:
Command
- Generate with LoRA B…
Adjust Output Quality
The server supports adjusting output quality and compression levels for both image and video generation through theoutput-quality and output-compression parameters.
Parameters
-
output-quality(string, optional): Preset quality level that automatically sets compression. Default is"default". Valid values:"maximum": Highest quality (100)"high": High quality (90)"medium": Medium quality (55)"low": Lower quality (35)"default": Auto-adjust based on media type (50 for video, 75 for image)
-
output-compression(integer, optional): Direct compression level override (0-100). Default isNone. When provided (notNone), takes precedence overoutput-quality.0: Lowest quality, smallest file size100: Highest quality, largest file size
Notes
- Precedence: When both
output-qualityandoutput-compressionare provided,output-compressiontakes precedence - Format Support: Quality settings apply to JPEG, and video formats. PNG uses lossless compression and ignores these settings
- File Size vs Quality: Lower compression values (or “low” quality preset) produce smaller files but may show visible artifacts
