Support start strings, the opposite of stop tokens. #13214

matteoserva · 2025-04-30T16:23:17Z

This allows setting one or multiple start strings that behave the opposite of stop tokens.
The output is discarded until a start string is reached, then it is sent to client as usual even in streaming mode.
If no start string is found, the entire generated text is sent to client at the end of generation.

Use case:

Support for broken clients that don't behave well with reasoning models (github copilot) while still allowing the model to perform its reasoning. Example: set start string to "</think>

From command line: llama-server --start-string "</think>" --start-string "</thinking>"

From client:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "Qwen/Qwen3-8B",
  "messages": [
    {"role": "user", "content": "Hi"}
  ],
  "start_strings": ["</think>","</thinking>"]
}'

examples/server/server.cpp

ngxson · 2025-04-30T17:03:18Z

Please also add a test case under server/tests/unit/test_chat_completion.py

matteoserva · 2025-04-30T19:08:24Z

Please also add a test case under server/tests/unit/test_chat_completion.py

done

examples/server/server.cpp

ngxson · 2025-05-02T18:33:38Z

examples/server/server.cpp

@@ -2200,7 +2241,7 @@ struct server_context {
                pos = std::min(slot.n_sent_text, slot.generated_text.size());
            } else if (slot.has_next_token) {
                stop_pos = slot.find_stopping_strings(str_test, token_str.size(), false);
-                send_text = stop_pos == std::string::npos;
+                send_text = send_text && stop_pos == std::string::npos;
            }

            // check if there is any token to predict


Upon closer inspection, I think manipulating send_text is not an optimum solution here.

If you look at the streamed response, when start string is not yet found, the server will still send an empty string for each token generated, which is very wasteful. See the else branch of if (send_text) below.

Instead, I think this logic should be placed outside this if (!incomplete) block

ngxson

Sorry for slow review, this feature is very cool but I'm always not very confident about the code for processing the generated token. The logic has always been a bit fragile, so I may take it slow to think about all of the possible cases.

I think I will take over this PR for now, will push some modification directly here.

examples/server/utils.hpp

ngxson · 2025-05-02T19:41:21Z

examples/server/server.cpp

+                    std::string found_string = slot.params.start_strings[search_result.second];
+                    slot.generated_text.erase(
+                        slot.generated_text.begin(),
+                        slot.generated_text.begin() + found_pos + found_string.size());


If I understand correctly, the final thing we need is found_pos + found_string.size(), so probably find_first_substring don't even need to return the found_pos and found_string. Instead, it could return the position right after the found start string. This way, we don't even need to care about which start string is selected.

For example, my start string is </think>:

Ok I'm ready to give the answer</think>The final answer is ^ return this position

matteoserva · 2025-05-03T16:29:53Z

Sorry for slow review, this feature is very cool but I'm always not very confident about the code for processing the generated token. The logic has always been a bit fragile, so I may take it slow to think about all of the possible cases.

I think I will take over this PR for now, will push some modification directly here.

I took some time to refactor the server code and try to handle all the funny edge cases.
Maybe you could take a look at it.

ngxson · 2025-05-03T16:40:33Z

I don't have time for this rn, but tbh the new version looks even more hacky and risky than before. I would expect something very simple

matteoserva requested a review from ngxson as a code owner April 30, 2025 16:23

github-actions bot added examples server labels Apr 30, 2025

ngxson reviewed Apr 30, 2025

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

matteoserva force-pushed the start branch from 8af595f to 0524a08 Compare April 30, 2025 16:59

github-actions bot added the python python script changes label Apr 30, 2025

ngxson reviewed May 1, 2025

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

examples/server/server.cpp Outdated Show resolved Hide resolved

examples/server/server.cpp Outdated Show resolved Hide resolved

examples/server/server.cpp Outdated Show resolved Hide resolved

matteoserva force-pushed the start branch 2 times, most recently from 0bcef2a to c52ce58 Compare May 2, 2025 07:07

ngxson reviewed May 2, 2025

View reviewed changes

ngxson mentioned this pull request May 2, 2025

Feature Request: add per-request "reasoning" options in llama-server #13272

Open

matteoserva added 15 commits May 3, 2025 15:27

support --start-string

e57d8c5

can set start-string multiple times, doc

5c0c036

added doc for client parameter

513c419

remove whitespaces

a4b5247

use correct coding style

e4f4864

Added tests for start string feature

a7349d1

fixed formatting

b843667

precompute start string len, and keep start string state in slot

792604b

refactor the substring search function

124a92d

fix comments

5f99f8a

cleaning

521868f

fix substring pos calculation

8a12c96

initial refactoring of token processing

89d0c7a

refactoring the find_first_substring function

0c65d40

refactoring the process_token function

3bead57

matteoserva force-pushed the start branch from 8b1af0f to 3bead57 Compare May 3, 2025 16:18

remove empty whitespace

a943218

matteoserva marked this pull request as draft May 3, 2025 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support start strings, the opposite of stop tokens. #13214

Support start strings, the opposite of stop tokens. #13214

Uh oh!

matteoserva commented Apr 30, 2025

Uh oh!

Uh oh!

ngxson commented Apr 30, 2025

Uh oh!

matteoserva commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson May 2, 2025 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

ngxson May 2, 2025 •

edited

Loading

Uh oh!

matteoserva commented May 3, 2025

Uh oh!

ngxson commented May 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Support start strings, the opposite of stop tokens. #13214

Are you sure you want to change the base?

Support start strings, the opposite of stop tokens. #13214

Uh oh!

Conversation

matteoserva commented Apr 30, 2025

Uh oh!

Uh oh!

ngxson commented Apr 30, 2025

Uh oh!

matteoserva commented Apr 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngxson May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matteoserva commented May 3, 2025

Uh oh!

ngxson commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ngxson May 2, 2025 •

edited

Loading

ngxson May 2, 2025 •

edited

Loading

ngxson commented May 3, 2025 •

edited

Loading