-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Added support for the "think" for Ollama #3386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Added support for the "think" for Ollama #3386
Conversation
1. Added the `think` field to Ollama's `ChatRequest` 2. Added the `thinking` field to Ollama's `Message` 3. Added the `think` property to `OllamaOptions`, allowing users to specify whether to enable or disable thinking Signed-off-by: Sun Yuhan <1085481446@qq.com>
Signed-off-by: Sun Yuhan <1085481446@qq.com>
…fault behavior, thereby ensuring compatibility with older versions of Ollama calls. Signed-off-by: Sun Yuhan <1085481446@qq.com>
@tzolov @ilayaperumalg @markpollack Could you please help review this PR? Thank you. |
…tainer image version of ollama to 0.9.0 Signed-off-by: Sun Yuhan <1085481446@qq.com>
Yes, we will review. Thanks |
* If this value is not specified, it defaults to null, and Ollama will return | ||
* the thought process within the `content` field of the response, wrapped in `<thinking>` tags. | ||
*/ | ||
@JsonProperty("think") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like 'think' is not a part of the options map in ollama, but a 'top level' field in the request object.
In https://ollama.com/blog/thinking there is the example
curl http://localhost:11434/api/chat -d '{
"model": "deepseek-r1",
"messages": [
{
"role": "user",
"content": "how many r in the word strawberry?"
}
],
"think": true,
"stream": false
}'
and the golang type supporting this feature also shows the same structure.
https://github.com/ollama/ollama/blob/45f56355d557b7130c7c07bbd6e1b634a758d946/api/types.go#L91
So it shouldn't be added to the options map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment in the code
and Ollama will return
* the thought process within the `content` field of the response, wrapped in `<thinking>` tags.
seems to contradict what was documented on the ollama web site that shows the 'think' response as a separate field from 'content', and not nested inside the 'content' field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to also expose it in OllamaOptions
as that right now would be the only way to pass in this feature flag when making calls via ChatModel
or ChatClient
. I think if the feature of enabling thinking mode is implemented from a ChatModel
or ChatClient
level the right solution will present itself. Can you improve this PR to handle this scenario please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to also expose it in
OllamaOptions
as that right now would be the only way to pass in this feature flag when making calls viaChatModel
orChatClient
. I think if the feature of enabling thinking mode is implemented from aChatModel
orChatClient
level the right solution will present itself. Can you improve this PR to handle this scenario please?
Of course, no problem. I will continue improving this PR and update my progress here in a timely manner.
Before proceeding, I want to confirm whether I’ve correctly understood your point: Are you suggesting that we should not add the think
flag in OllamaOptions
, but instead make adjustments at the OllamaChatModel
level? Or do you mean that we should implement support for think
at the ChatModel
or ChatClient
level? If it's the latter, I think we would also need to adjust the implementations of different ChatModel
s to support this option (of course, depending on whether the underlying model actually supports it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to contradict what was documented on the ollama web site that shows the 'think' response as a separate field from 'content', and not nested inside the 'content' field.
Firstly, the content in the comment is actually a summary I derived through practical testing.
I think this is a form of backward compatibility Ollama implemented for users still employing the old parameter-passing method. Before Ollama supported the "think" flag, if we made a request to a model that supports thinking like this:
curl http://localhost:11434/api/chat -d '{
"model": "qwen3:4b",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
Ollama would enable thinking by default and return the thought process wrapped in <thinking>
tags within the content
field of the response. The response would look something like this:
{"model":"qwen3:4b","created_at":"2025-06-12T12:17:08.385341Z","message":{"role":"assistant","content":"\u003cthink\u003e"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:17:08.402756Z","message":{"role":"assistant","content":"\n"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:17:08.420851Z","message":{"role":"assistant","content":"Okay"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:17:08.439825Z","message":{"role":"assistant","content":","},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:17:08.457618Z","message":{"role":"assistant","content":" the"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:17:08.474711Z","message":{"role":"assistant","content":" user"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:17:08.491833Z","message":{"role":"assistant","content":" is"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:17:08.509124Z","message":{"role":"assistant","content":" asking"},"done":false}
This remains the case in the latest version of Ollama: if the think
flag is not specified, the behavior of Ollama remains unchanged from before. This is what I intended to express in the comment:
If this value is not specified, it defaults to null, and Ollama will return the thought process within the
content
field of the response, wrapped in<thinking>
tags.
Only when we specify the think
flag will Ollama return the thought process in the thinking
field of the response:
curl http://localhost:11434/api/chat -d '{
"model": "qwen3:4b",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
],
"think": true
}'
Response:
{"model":"qwen3:4b","created_at":"2025-06-12T12:22:48.135211Z","message":{"role":"assistant","content":"","thinking":"Okay"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:22:48.152511Z","message":{"role":"assistant","content":"","thinking":","},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:22:48.169911Z","message":{"role":"assistant","content":"","thinking":" the"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:22:48.187023Z","message":{"role":"assistant","content":"","thinking":" user"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:22:48.204039Z","message":{"role":"assistant","content":"","thinking":" is"},"done":false}
{"model":"qwen3:4b","created_at":"2025-06-12T12:22:48.221233Z","message":{"role":"assistant","content":"","thinking":" asking"},"done":false}
I think it's the same for Spring AI. We should maintain compatibility with users who are using older versions, meaning that if the "think" flag is not specified, the returned format should remain unchanged.
@@ -260,6 +262,7 @@ public Flux<ProgressResponse> pullModel(PullModelRequest pullModelRequest) { | |||
public record Message( | |||
@JsonProperty("role") Role role, | |||
@JsonProperty("content") String content, | |||
@JsonProperty("thinking") String thinking, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should propagate the 'thinking' response back into the ChatGenerationMetadata
so that it is accessible in the response when using OllamaChatModel. This would be inside the internalCall
method of OllamaChatModel
and add add a test for that in OllamaChatModelIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I will add the implementation for this part.
Fixes #3383
As mentioned in issue: #3383 , Ollama added support for "think" in its latest 0.9.0 version:
https://github.com/ollama/ollama/releases
https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-chat-completion.
This PR implements support for that attribute and includes the following key changes:
think
field to Ollama'sChatRequest
thinking
field to Ollama'sMessage
think
property toOllamaOptions
, allowing users to specify whether to enable or disable thinkingActually, there is currently another issue:
As stated in Ollama's API documentation here, during requests to Ollama, the
message
field supports sending the model's own reasoning (thoughts) back to it. However,AssistantMessage
does not currently support transmitting this field, which means the model will not be aware of its previous thoughts.Therefore, perhaps we need to add a specialized
Message
implementation for Ollama, such asOllamaAssistantMessage
. I'm not sure whether this would be considered a significant change.