Skip to main content

Deployment

Unlimited-OCR support is in SGLang PR #29186. Until that PR is included in a tagged SGLang release, install from a build that contains the PR.
Command
pip install -U uv
uv venv --python 3.12 && source .venv/bin/activate

git clone https://github.com/sgl-project/sglang.git
cd sglang
git fetch origin pull/29186/head && git checkout FETCH_HEAD
uv pip install -e python
Then run the Python output of the command panel below in that environment.
Pick your hardware to generate the launch command. The recipe uses FlashAttention-3 with --page-size 1, which is required by the current prefill-aware sliding-window attention path. It also disables radix cache by default, which is the better fit for batch OCR workloads where each request usually contains a different image.

Playground

Use the Playground to adjust tensor parallelism on top of the selected deployment cell.

1. Model Introduction

Unlimited-OCR is Baidu’s multimodal OCR model for document parsing. It uses a sliding-window language backbone, but SGLang serves it with a prefill-aware sliding-window path so image and prompt tokens remain visible during long decode. The SGLang integration loads the standalone Unlimited-OCR architecture with SAM and CLIP vision encoders plus a DeepSeek-style language backbone. It supports OpenAI-compatible image requests and model-specific image processing options through images_config. Resources: Hugging Face · SGLang PR #29186

2. Configuration Tips

  • Attention backend: use --attention-backend fa3 --page-size 1. The prefill-aware SWA page table is built with token-level locations, so page size 1 is required.
  • Radix cache: keep --disable-radix-cache for batch OCR over different documents. If your workload repeatedly asks about the same image and prompt, remove this flag to allow prefix reuse through PureSWARadixCache.
  • Long OCR generations: keep the default prefill-aware SWA path enabled. It retains prompt and image KV while still applying a sliding window to generated text.
  • Custom logit processor: keep --enable-custom-logit-processor in the launch command.
  • Image modes: pass images_config.image_mode per request. Supported modes are tiny, small, base, large, and gundam. Multiple images are supported only for tiny, small, and base.
  • Default image mode: when images_config.image_mode is omitted, SGLang uses gundam.

3. Advanced Usage

3.1 OCR request

Example
from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="baidu/Unlimited-OCR",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "document parsing."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/your_document.png"
                    },
                },
            ],
        }
    ],
    max_tokens=2048,
    temperature=0,
    extra_body={"images_config": {"image_mode": "gundam"}},
)

print(response.choices[0].message.content)

3.2 Choosing an image mode

Use lower modes to reduce prefill cost for simple images, and use gundam for high-detail document parsing.
ModeUseMultiple images
tinyLowest prefill cost.Yes
smallLightweight OCR requests.Yes
baseBalanced quality and cost.Yes
largeHigher resolution single-image OCR.No
gundamDefault high-detail document parsing mode.No