Deployment
Install SGLang
Install SGLang
Unlimited-OCR support is in SGLang PR #29186. Until that PR is included in a tagged SGLang release, install from a build that contains the PR.Then run the Python output of the command panel below in that environment.
- Python (pip / uv)
- Docker
Command
--page-size 1, which is required by the current prefill-aware sliding-window attention path. It also disables radix cache by default, which is the better fit for batch OCR workloads where each request usually contains a different image.
Playground
Use the Playground to adjust tensor parallelism on top of the selected deployment cell.1. Model Introduction
Unlimited-OCR is Baidu’s multimodal OCR model for document parsing. It uses a sliding-window language backbone, but SGLang serves it with a prefill-aware sliding-window path so image and prompt tokens remain visible during long decode. The SGLang integration loads the standalone Unlimited-OCR architecture with SAM and CLIP vision encoders plus a DeepSeek-style language backbone. It supports OpenAI-compatible image requests and model-specific image processing options throughimages_config.
Resources: Hugging Face · SGLang PR #29186
2. Configuration Tips
- Attention backend: use
--attention-backend fa3 --page-size 1. The prefill-aware SWA page table is built with token-level locations, so page size 1 is required. - Radix cache: keep
--disable-radix-cachefor batch OCR over different documents. If your workload repeatedly asks about the same image and prompt, remove this flag to allow prefix reuse throughPureSWARadixCache. - Long OCR generations: keep the default prefill-aware SWA path enabled. It retains prompt and image KV while still applying a sliding window to generated text.
- Custom logit processor: keep
--enable-custom-logit-processorin the launch command. - Image modes: pass
images_config.image_modeper request. Supported modes aretiny,small,base,large, andgundam. Multiple images are supported only fortiny,small, andbase. - Default image mode: when
images_config.image_modeis omitted, SGLang usesgundam.
3. Advanced Usage
3.1 OCR request
OCR Example (Python)
OCR Example (Python)
Example
3.2 Choosing an image mode
Use lower modes to reduce prefill cost for simple images, and usegundam for high-detail document parsing.
| Mode | Use | Multiple images |
|---|---|---|
tiny | Lowest prefill cost. | Yes |
small | Lightweight OCR requests. | Yes |
base | Balanced quality and cost. | Yes |
large | Higher resolution single-image OCR. | No |
gundam | Default high-detail document parsing mode. | No |
