Unlimited-OCR - SGLang Documentation

Deployment

Install SGLang

Unlimited-OCR support is in SGLang PR #29186. Until that PR is included in a tagged SGLang release, install from a build that contains the PR.

Python (pip / uv)
Docker

Command

pip install -U uv
uv venv --python 3.12 && source .venv/bin/activate

git clone https://github.com/sgl-project/sglang.git
cd sglang
git fetch origin pull/29186/head && git checkout FETCH_HEAD
uv pip install -e python

Then run the Python output of the command panel below in that environment.

Command

docker pull lmsysorg/sglang:dev

For how to launch the image, see Install → Method 3: Using Docker. Substitute the inner sglang serve ... with what the command generator below produces.

Pick your hardware to generate the launch command. The recipe uses FlashAttention-3 with --page-size 1, which is required by the current prefill-aware sliding-window attention path. It also disables radix cache by default, which is the better fit for batch OCR workloads where each request usually contains a different image.

Playground

Use the Playground to adjust tensor parallelism on top of the selected deployment cell.

1. Model Introduction

Unlimited-OCR is Baidu’s multimodal OCR model for document parsing. It uses a sliding-window language backbone, but SGLang serves it with a prefill-aware sliding-window path so image and prompt tokens remain visible during long decode. The SGLang integration loads the standalone Unlimited-OCR architecture with SAM and CLIP vision encoders plus a DeepSeek-style language backbone. It supports OpenAI-compatible image requests and model-specific image processing options through images_config. Resources: Hugging Face · SGLang PR #29186

2. Configuration Tips

Attention backend: use --attention-backend fa3 --page-size 1. The prefill-aware SWA page table is built with token-level locations, so page size 1 is required.
Radix cache: keep --disable-radix-cache for batch OCR over different documents. If your workload repeatedly asks about the same image and prompt, remove this flag to allow prefix reuse through PureSWARadixCache.
Long OCR generations: keep the default prefill-aware SWA path enabled. It retains prompt and image KV while still applying a sliding window to generated text.
Custom logit processor: keep --enable-custom-logit-processor in the launch command.
Image modes: pass images_config.image_mode per request. Supported modes are tiny, small, base, large, and gundam. Multiple images are supported only for tiny, small, and base.
Default image mode: when images_config.image_mode is omitted, SGLang uses gundam.

3. Advanced Usage

3.1 OCR request

OCR Example (Python)

Example

from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="baidu/Unlimited-OCR",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "document parsing."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/your_document.png"
                    },
                },
            ],
        }
    ],
    max_tokens=2048,
    temperature=0,
    extra_body={"images_config": {"image_mode": "gundam"}},
)

print(response.choices[0].message.content)

3.2 Choosing an image mode

Use lower modes to reduce prefill cost for simple images, and use gundam for high-detail document parsing.

Mode	Use	Multiple images
`tiny`	Lowest prefill cost.	Yes
`small`	Lightweight OCR requests.	Yes
`base`	Balanced quality and cost.	Yes
`large`	Higher resolution single-image OCR.	No
`gundam`	Default high-detail document parsing mode.	No

​Deployment

​Playground

​1. Model Introduction

​2. Configuration Tips

​3. Advanced Usage

​3.1 OCR request

​3.2 Choosing an image mode