1. Model Introduction
DiffusionGemma is a uniform-state (renoising) block-diffusion language model from Google. An encoder builds causal context, and a decoder denoises a fixed-length bidirectional canvas ofcanvas_length tokens. The Gemma4Renoise sampler runs max_denoising_steps reverse steps over the canvas, feeding the previous step’s logits back as self-conditioning and emitting the greedy argmax of the processed logits.
Key Features:
- Uniform-State Renoising: The canvas starts from random tokens and is refined each step by accepting confident positions and re-noising the rest, with no mask token.
- Encoder / Decoder Canvas: The encoder produces causal context KV, the decoder attends bidirectionally over the canvas.
- Self-Conditioning: Each step conditions on the previous step’s logits.
- EntropyBound Acceptance: Each step accepts the lowest-entropy canvas positions within an entropy budget and re-noises the rest.
- StableAndConfident Stopping: A canvas stops early once it is stable and confident.
- MoE Architecture: The 26B-A4B model uses a Mixture-of-Experts architecture for efficient inference.
- Multimodal Input: Accepts text and image inputs (via a ~550M vision encoder) and generates text output.
| Model | Architecture | Parameters |
|---|---|---|
| google/diffusiongemma-26B-A4B-it | MoE, uniform-state diffusion (text + image) | 25.2B total / 3.8B active |
| Spec | Value |
|---|---|
| Total Parameters | 25.2B |
| Active Parameters | 3.8B |
| Layers | 30 |
| Sliding Window | 1024 tokens |
| Context Length | Up to 256K tokens |
| Canvas Length | 256 |
| Vocabulary Size | 262K |
| Experts | 8 active / 128 total + 1 shared |
| Supported Modalities | Text, Image |
| Vision Encoder | ~550M parameters |
2. SGLang Installation
Please refer to the official SGLang installation guide for installation instructions. The checkpoint ships its own modeling code, so--trust-remote-code is required when serving.
3. Model Deployment
3.1 Basic Configuration
The required runtime settings are applied automatically forGemma4Renoise (the Triton attention backend, eager mode, and unchunked prefill, needed because the full-attention head_dim is 512 and the canvas uses bidirectional attention), so a default launch works:
Command
3.2 Configuration Tips
dLLM-Specific Parameters:| Parameter | Description | Recommended Value |
|---|---|---|
--dllm-algorithm | Diffusion decoding algorithm | Gemma4Renoise |
--trust-remote-code | Required to load the checkpoint’s modeling code | Always enabled |
--dllm-algorithm-config | Optional YAML overriding the renoise schedule | Checkpoint defaults |
Gemma4Renoise, so they do not need to be passed on the command line.
Sampling is governed by the renoise schedule. Request-level logprobs, penalties, logit_bias, and grammar / structured output (json_schema / regex / ebnf / structural_tag) are not applied and are rejected with a 400. Core sampling controls (temperature, top_k, top_p) are accepted but have no effect. Streaming is block-level: one fully-denoised canvas per chunk.
Gemma4Renoise Config (defaults follow the checkpoint’s generation_config.json):
Config
4. Model Invocation
4.1 Deployment
Start the server with the command from Section 3.1.4.2 Basic Usage
Example
4.3 Streaming
Streaming emits one fully-denoised canvas per chunk.Example
5. Benchmark
5.1 Speed Benchmark
Not benchmarked for speed.5.2 Accuracy Benchmark
Full test splits, every item scored (no failed-request exclusions). Text MCQ benchmarks use greedy generate-and-parse, MATH uses boxed-answer extraction plus sympy equivalence. MMLU, ARC-Challenge, and MATH-500 are the mean of two independent server launches.| Benchmark | Score |
|---|---|
| GSM8K | 95.4% |
| ARC-Challenge | 91.6% |
| HumanEval | 92.7% pass@1 |
| MMLU | 76.2% |
| MMLU-Pro | 73.7% |
| GSM-Symbolic | 92.2% |
| MATH-500 | 72.1% |
| AIME-2026 | 10.0% |
| HMMT-Feb-2025 | 10.0% |
| GPQA-main | 59.2% |
| Multimodal benchmark | Score |
|---|---|
| MMMU (val, MC) | 64.9% |
| MMMU-Pro (standard 10-opt, MC) | 57.3% |
| MathVista (testmini) | 68.4% |
| DocVQA (val) | 85.9% |
| ChartQA (test) | 61.7% |
| AI2D (test) | 78.7% |
| MMStar (val) | 65.9% |
