-
Notifications
You must be signed in to change notification settings - Fork 12k
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
76549e1
to
379c7a8
Compare
can you add chat_template_kwargs to cli argument as well? |
I added it. I tested it using updated command (You might want to check the escaping of the double quotes): |
d1861c4
to
01b58b5
Compare
Very useful for Qwen3 series. +1 for this feature! |
2d1d595
to
28dd37e
Compare
Cannot work with the --chat_template_kwargs option from CLI: error: invalid argument: --chat-template-kwargs |
@ggerganov is there any reason why this PR has not been accepted and merged yet? |
This PR is implemented only for llama-server and its webui. llama-cli has unresolved bugs that prevent me from enabling this feature. |
Hope you'll integrate it for the CLI environment soon, thanks! |
d44d099
to
5b3de5d
Compare
would be nice a enable_thinking checkbox or something like that on llama cpp webui too |
@celsowm Lack of eyes on this area would be my guess. With 438 open PRs (many obsolete), I've kind of come to accept I'll need to pull in some PRs of interest to me when building. |
vLLM and SGLang have got this feature the first day Qwen3 released. At the same time many useful enhancement and fix PRs become obsolete just because of delay on merging here in llama.cpp community. Really sad about that. |
This is so necessary when dealing with Qwen3! Can't wait to see this merged and be able to use the latest version with this <3 |
dbbe6e4
to
9192412
Compare
9192412
to
5ad5a6f
Compare
@neolee I think disabling thoughts should be handled more manually, the
@matteoserva Thanks! I think each and every code path should pass the extra context variables if we want to support generic use cases (although as I just noted in a review comment, we should be weary of requiring users to set custom key-values for features that would be better handled in a generic & unified way across templates & models; ideally most users won't need to inspect any templates, and the ones who do can trivially create their own template overrides anyway). |
Just to make explicit, the added value of this PR is that:
|
5ad5a6f
to
e45a43a
Compare
Thanks for this PR @matteoserva, I'm cherry-picking it for my own use :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me after a few nits are addressed.
@ngxson would you mind giving it a final look and approving it?
@@ -73,6 +74,7 @@ struct common_chat_templates_inputs { | |||
bool parallel_tool_calls = false; | |||
bool extract_reasoning = true; | |||
std::chrono::system_clock::time_point now = std::chrono::system_clock::now(); | |||
std::map<std::string, std::string> chat_template_kwargs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took the liberty to push an update in your branch with support for propagating the extra context for all the chat formats with slightly simpler code.
Could you please update the PR description mentioning the new alternative way to disable the thinking globally (#13771, and discussions on upcoming per-request mechanism for it: #13272), and adding a link to the VLLM API mentioning the new param has the same syntax (I only realized that from your last comment :-))
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
coding standard: cosmetic changes Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
… extra_context (+ the odd existing additional_context)
528deeb
to
54f128a
Compare
fb7bf27
to
54f128a
Compare
I applied the requested changes but the latest rebase has been rough. Please review more carefully to make sure I didn't do any mistake (sorry). |
We do need this PR, maintainers please review this. |
This PR implements support for setting additional jinja parameters.
An example of this is
enable_thinking
in the Qwen3 models template.Main features:
--chat_template_kwargs
or the environment variablechat_template_kwargs
parameterNotice
server
: add--reasoning-budget 0
to disable thinking (incl. qwen3 w/ enable_thinking:false) #13771 the preferred way for disabling thinking with a command line argument is now--reasoning-budget 0
. The command line setting can be overridden anyway by passing thechat_template_kwargs
during the request to the OAI compatible APIOther info
The official template is still only partially compatible. I modified it to use only supported features.
It's here:
https://pastebin.com/16ZpCLHkhttps://pastebin.com/GGuTbFRcAnd should be loaded with
llama-server --jinja --chat-template-file {template_file}
It fixes #13160 and #13189
Test it with:
{"prompt":"\n<|im_start|>user\nGive me a short introduction to large language models.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"}