Open
Description
Feature Description
As reasoning models are becoming mainstream, we start to see some pattern:
- Most models use
<think>
,<reasoning>
, etc, basically a set of known tokens now - The "reasoning budget" can technically be supported by any models, not just Qwen, by keeping track of number of tokens between
<think>
and</think>
- "no think" is just a reasoning budget == 0
So I'm thinking about accepting an object like this for each request:
"reasoning": {
"budget": -1, // number of reasoning tokens budget
default: -1 (inf) ; 0 for no think
"format": "", // equivalent of --reasoning-format
if set to "deepseek", reasoning will be returned in "message.reasoning_content"
if set to "hide", it will be completely hidden
default: "none", return the reasoning with the message as normal
}
The reasoning format "hide" can be implemented via #13214 ; the "deepseek" format current only supported for non-stream, but I think we can modify a bit to support this.
For the budget, we don't yet have the logic to handle it.