> ## Documentation Index > Fetch the complete documentation index at: https://docs.sglang.io/llms.txt > Use this file to discover all available pages before exploring further. # Reasoning Parser SGLang supports parsing reasoning content out from "normal" content for reasoning models such as [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1). ## Supported Models & Parsers

Model	Reasoning tags	Parser	Notes
[Apertus 2509 models](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509)	`<\|inner_prefix\|>` … `<\|inner_suffix\|>`	`apertus2509`	Supports `enable_thinking` parameter
[DeepSeek‑R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d)	`` … ``	`deepseek-r1`	Supports all variants (R1, R1-0528, R1-Distill)
[DeepSeek‑V3 series](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)	`` … ``	`deepseek-v3`	Including [DeepSeek‑V3.2](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp). Supports `thinking` parameter
[Standard Qwen3 models](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f)	`` … ``	`qwen3`	Supports `enable_thinking` parameter
[Qwen3-Thinking models](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507)	`` … ``	`qwen3` or `qwen3-thinking`	Always generates thinking content
[Kimi K2 Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking)	`◁think▷` … `◁/think▷`	`kimi_k2`	Uses special thinking delimiters. Also requires `--tool-call-parser kimi_k2` for tool use.
[GPT OSS](https://huggingface.co/openai/gpt-oss-120b)	`<\|channel\|>analysis<\|message\|>` … `<\|end\|>`	`gpt-oss`	N/A

### Model-Specific Behaviors **Apertus 2509:** * Uses `<|inner_prefix|>` and `<|inner_suffix|>` to delimit reasoning content. For agentic tool use, also specify `--tool-call-parser apertus2509`. **DeepSeek-R1 Family:** * DeepSeek-R1: No `` start tag, jumps directly to thinking content * DeepSeek-R1-0528: Generates both `` start and `` end tags * Both are handled by the same `deepseek-r1` parser **DeepSeek-V3 Family:** * DeepSeek-V3.1/V3.2: Hybrid model supporting both thinking and non-thinking modes, use the `deepseek-v3` parser and `thinking` parameter (NOTE: not `enable_thinking`) **Qwen3 Family:** * Standard Qwen3 (e.g., Qwen3-2507): Use `qwen3` parser, supports `enable_thinking` in chat templates * Qwen3-Thinking (e.g., Qwen3-235B-A22B-Thinking-2507): Use `qwen3` or `qwen3-thinking` parser, always thinks **Kimi K2:** * Kimi K2 Thinking: Uses special `◁think▷` and `◁/think▷` tags. For agentic tool use, also specify `--tool-call-parser kimi_k2`. **GPT OSS:** * GPT OSS: Uses special `<|channel|>analysis<|message|>` and `<|end|>` tags ## Usage ### Launching the Server Specify the `--reasoning-parser` option. ```python Example theme={null} import requests from openai import OpenAI from sglang.test.doc_patch import launch_server_cmd from sglang.utils import wait_for_server, print_highlight, terminate_process server_process, port = launch_server_cmd( "python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --host 0.0.0.0 --reasoning-parser deepseek-r1 --log-level warning" ) wait_for_server(f"http://localhost:{port}") ``` Note that `--reasoning-parser` defines the parser used to interpret responses. ### OpenAI Compatible API Using the OpenAI compatible API, the contract follows the [DeepSeek API design](https://api-docs.deepseek.com/guides/reasoning_model) established with the release of DeepSeek-R1: * `reasoning_content`: The content of the CoT. * `content`: The content of the final answer. ```python Example theme={null} # Initialize OpenAI-like client client = OpenAI(api_key="None", base_url=f"http://0.0.0.0:{port}/v1") model_name = client.models.list().data[0].id messages = [ { "role": "user", "content": "What is 1+3?", } ] ``` #### Non-Streaming Request ```python Example theme={null} response_non_stream = client.chat.completions.create( model=model_name, messages=messages, temperature=0.6, top_p=0.95, stream=False, # Non-streaming extra_body={"separate_reasoning": True}, ) print_highlight("==== Reasoning ====") print_highlight(response_non_stream.choices[0].message.reasoning_content) print_highlight("==== Text ====") print_highlight(response_non_stream.choices[0].message.content) ``` #### Streaming Request ```python Example theme={null} response_stream = client.chat.completions.create( model=model_name, messages=messages, temperature=0.6, top_p=0.95, stream=True, # Non-streaming extra_body={"separate_reasoning": True}, ) reasoning_content = "" content = "" for chunk in response_stream: if chunk.choices[0].delta.content: content += chunk.choices[0].delta.content if chunk.choices[0].delta.reasoning_content: reasoning_content += chunk.choices[0].delta.reasoning_content print_highlight("==== Reasoning ====") print_highlight(reasoning_content) print_highlight("==== Text ====") print_highlight(content) ``` Optionally, you can buffer the reasoning content to the last reasoning chunk (or the first chunk after the reasoning content). ```python Example theme={null} response_stream = client.chat.completions.create( model=model_name, messages=messages, temperature=0.6, top_p=0.95, stream=True, # Non-streaming extra_body={"separate_reasoning": True, "stream_reasoning": False}, ) reasoning_content = "" content = "" for chunk in response_stream: if chunk.choices[0].delta.content: content += chunk.choices[0].delta.content if chunk.choices[0].delta.reasoning_content: reasoning_content += chunk.choices[0].delta.reasoning_content print_highlight("==== Reasoning ====") print_highlight(reasoning_content) print_highlight("==== Text ====") print_highlight(content) ``` The reasoning separation is enable by default when specify . **To disable it, set the `separate_reasoning` option to `False` in request.** ```python Example theme={null} response_non_stream = client.chat.completions.create( model=model_name, messages=messages, temperature=0.6, top_p=0.95, stream=False, # Non-streaming extra_body={"separate_reasoning": False}, ) print_highlight("==== Original Output ====") print_highlight(response_non_stream.choices[0].message.content) ``` ### SGLang Native API ```python Example theme={null} from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B") input = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, return_dict=False ) gen_url = f"http://localhost:{port}/generate" gen_data = { "text": input, "sampling_params": { "skip_special_tokens": False, "max_new_tokens": 1024, "temperature": 0.6, "top_p": 0.95, }, } gen_response = requests.post(gen_url, json=gen_data).json()["text"] print_highlight("==== Original Output ====") print_highlight(gen_response) parse_url = f"http://localhost:{port}/separate_reasoning" separate_reasoning_data = { "text": gen_response, "reasoning_parser": "deepseek-r1", } separate_reasoning_response_json = requests.post( parse_url, json=separate_reasoning_data ).json() print_highlight("==== Reasoning ====") print_highlight(separate_reasoning_response_json["reasoning_text"]) print_highlight("==== Text ====") print_highlight(separate_reasoning_response_json["text"]) ``` ```python Example theme={null} terminate_process(server_process) ``` ### Offline Engine API ```python Example theme={null} import sglang as sgl from sglang.srt.parser.reasoning_parser import ReasoningParser from sglang.utils import print_highlight llm = sgl.Engine(model_path="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B") tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B") input = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, return_dict=False ) sampling_params = { "max_new_tokens": 1024, "skip_special_tokens": False, "temperature": 0.6, "top_p": 0.95, } result = llm.generate(prompt=input, sampling_params=sampling_params) generated_text = result["text"] # Assume there is only one prompt print_highlight("==== Original Output ====") print_highlight(generated_text) parser = ReasoningParser("deepseek-r1") reasoning_text, text = parser.parse_non_stream(generated_text) print_highlight("==== Reasoning ====") print_highlight(reasoning_text) print_highlight("==== Text ====") print_highlight(text) ``` ```python Example theme={null} llm.shutdown() ``` ## Supporting New Reasoning Model Schemas For future reasoning models, you can implement the reasoning parser as a subclass of `BaseReasoningFormatDetector` in `python/sglang/srt/reasoning_parser.py` and specify the reasoning parser for new reasoning model schemas accordingly.