server : fix thread.join() on exit #12831

ngxson · 2025-04-08T14:18:08Z

Supersede #12813

Resolve #12180

@qnixsynapse can you try if this fixes the issue?

qnixsynapse

LGTM

TBh, I like this approach over the other one.
Thanks for fixing it.

tyronecai · 2025-04-09T00:39:39Z

it's not working @ngxson @qnixsynapse

download the latest release b5083
wget https://github.com/ggml-org/llama.cpp/releases/download/b5083/llama-b5083-bin-ubuntu-arm64.zip

$ in terminal

LD_LIBRARY_PATH=build/bin build/bin/llama-server -m /data/ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --host 192.168.50.16 --port 9990 -ngl 29

$ in chrome <----

tell a long story

$ in terminal

ctrl + c

^Csrv    operator(): operator(): cleaning up before exit...
terminate called without an active exception
Aborted (core dumped)

qnixsynapse · 2025-04-09T02:51:01Z

it's not working @ngxson @qnixsynapse

download the latest release b5083 wget https://github.com/ggml-org/llama.cpp/releases/download/b5083/llama-b5083-bin-ubuntu-arm64.zip

$ in terminal

LD_LIBRARY_PATH=build/bin build/bin/llama-server -m /data/ollama/models/blobs/sha256-2bada8a7450677000f678be90653b85d364de7db25eb5ea54136ada5f3933730 --host 192.168.50.16 --port 9990 -ngl 29

$ in chrome <----

tell a long story

$ in terminal

ctrl + c
^Csrv    operator(): operator(): cleaning up before exit...
terminate called without an active exception
Aborted (core dumped)

I think best here is to allow the server to throw an exception if a task is running and it receives a termination signal:

diff --git a/examples/server/server.cpp b/examples/server/server.cpp
index 1bf1ee87..c4a01547 100644
--- a/examples/server/server.cpp
+++ b/examples/server/server.cpp
@@ -1,3 +1,4 @@
+#include <stdexcept>
 #include "utils.hpp"
 
 #include "arg.h"
@@ -1763,7 +1764,7 @@ struct server_response {
             condition_results.wait(lock, [&]{
                 if (!running) {
                     SRV_DBG("%s : queue result stop\n", __func__);
-                    std::terminate(); // we cannot return here since the caller is HTTP code
+                    throw std::runtime_error("Server task terminated"); // we cannot return here since the caller is HTTP code
                 }
                 return !queue_results.empty();
             });
@@ -1797,7 +1798,7 @@ struct server_response {
             std::cv_status cr_res = condition_results.wait_for(lock, std::chrono::seconds(timeout));
             if (!running) {
                 SRV_DBG("%s : queue result stop\n", __func__);
-                std::terminate(); // we cannot return here since the caller is HTTP code
+                throw std::runtime_error("Server task terminated"); // we cannot return here since the caller is HTTP code
             }
             if (cr_res == std::cv_status::timeout) {
                 return nullptr;

tyronecai · 2025-04-09T06:55:48Z

the [thread](std::thread t) stuck at
https://github.com/ggml-org/llama.cpp/blob/master/examples/server/server.cpp#L2594C1-L2601C23

        while (true) {
            server_task_result_ptr result = queue_results.recv_with_timeout(id_tasks, HTTP_POLLING_SECONDS);

so may by just

        while (true) {
            if (!queue_tasks.running) {  return; }
            server_task_result_ptr result = queue_results.recv_with_timeout(id_tasks, HTTP_POLLING_SECONDS);

ngxson · 2025-04-09T07:18:48Z

@qnixsynapse Yes, throwing an exception makes more sense, as the HTTP will catch it

@tyronecai That will work but not optimal, we will still need to wait until recv_with_timeout returns after HTTP_POLLING_SECONDS. Also, I need a more generic solution that works with both recv and recv_with_timeout

tyronecai · 2025-04-09T08:22:57Z

@ngxson

close all client socks in httplib.h Server.stop is good solution ?

diff --git a/examples/server/httplib.h b/examples/server/httplib.h
index 0f981dc8..5255ce6e 100644
--- a/examples/server/httplib.h
+++ b/examples/server/httplib.h
@@ -1117,9 +1117,22 @@ private:

   virtual bool process_and_close_socket(socket_t sock);

+  inline void record_client_sock(socket_t sock) {
+    std::lock_guard<std::mutex> guard(cilent_sock_set_mutex_);
+    client_sock_set_.insert(sock);
+  }
+  inline void remove_client_sock(socket_t sock) {
+    std::lock_guard<std::mutex> guard(cilent_sock_set_mutex_);
+    client_sock_set_.erase(sock);
+  }
+
   std::atomic<bool> is_running_{false};
   std::atomic<bool> is_decommissioned{false};

+  // record all client socks
+  std::mutex cilent_sock_set_mutex_;
+  std::unordered_set<socket_t> client_sock_set_;
+
   struct MountPointEntry {
     std::string mount_point;
     std::string base_dir;
@@ -6564,6 +6577,13 @@ inline void Server::stop() {
     std::atomic<socket_t> sock(svr_sock_.exchange(INVALID_SOCKET));
     detail::shutdown_socket(sock);
     detail::close_socket(sock);
+
+    std::lock_guard<std::mutex> guard(cilent_sock_set_mutex_);
+    for (auto it = client_sock_set_.begin(); it != client_sock_set_.end();) {
+        detail::shutdown_socket(*it);
+        detail::close_socket(*it);
+        it = client_sock_set_.erase(it);
+    }
   }
   is_decommissioned = false;
 }
@@ -7014,8 +7034,10 @@ inline bool Server::listen_internal() {
       detail::set_socket_opt_time(sock, SOL_SOCKET, SO_SNDTIMEO,
                                   write_timeout_sec_, write_timeout_usec_);

+      record_client_sock(sock);
       if (!task_queue->enqueue(
               [this, sock]() { process_and_close_socket(sock); })) {
+        remove_client_sock(sock);
         detail::shutdown_socket(sock);
         detail::close_socket(sock);
       }
@@ -7436,6 +7458,7 @@ inline bool Server::process_and_close_socket(socket_t sock) {
                                nullptr);
       });

+  remove_client_sock(sock);
   detail::shutdown_socket(sock);
   detail::close_socket(sock);
   return ret;
@@ -9449,6 +9472,7 @@ inline bool SSLServer::process_and_close_socket(socket_t sock) {
     detail::ssl_delete(ctx_mutex_, ssl, sock, shutdown_gracefully);
   }

+  remove_client_sock(sock);
   detail::shutdown_socket(sock);
   detail::close_socket(sock);
   return ret;

ngxson · 2025-04-09T09:01:48Z

I don't think @yhirose gonna like it, it's best to do it on llama.cpp side

TL;DR for you @yhirose , we're finding a way to stop HTTP thread when server stop, but it's quite tricky in case the thread is busy waiting for something. I think throwing an exception would be the best, right? Because it will be catched by exception handler anyway

tyronecai · 2025-04-09T10:54:09Z

thanks for your patience
how about this @ngxson ?

diff --git a/tmp/a.cc b/../examples/server/server.cpp
index 1bf1ee87..9fe7d896 100644
--- a/tmp/a.cc
+++ b/../examples/server/server.cpp
@@ -1762,12 +1762,15 @@ struct server_response {
             std::unique_lock<std::mutex> lock(mutex_results);
             condition_results.wait(lock, [&]{
                 if (!running) {
-                    SRV_DBG("%s : queue result stop\n", __func__);
-                    std::terminate(); // we cannot return here since the caller is HTTP code
+                    SRV_DBG("%s : queue result stop\n", __func__);
+                    return true;
                 }
                 return !queue_results.empty();
             });
-
+            if (!running) {
+                SRV_DBG("%s : queue result stop1\n", __func__);
+                return std::make_unique<server_task_result_error>();
+           }
             for (size_t i = 0; i < queue_results.size(); i++) {
                 if (id_tasks.find(queue_results[i]->id) != id_tasks.end()) {
                     server_task_result_ptr res = std::move(queue_results[i]);
@@ -1796,8 +1799,8 @@ struct server_response {

             std::cv_status cr_res = condition_results.wait_for(lock, std::chrono::seconds(timeout));
             if (!running) {
-                SRV_DBG("%s : queue result stop\n", __func__);
-                std::terminate(); // we cannot return here since the caller is HTTP code
+                SRV_DBG("%s : queue result stop\n", __func__);
+                return std::make_unique<server_task_result_error>();
             }
             if (cr_res == std::cv_status::timeout) {
                 return nullptr;

* master: (123 commits) cuda : add f32 to bf16 copy op (ggml-org#12806) llava: improve clip_ctx destructor to not memleak load_image_size (ggml-org#12834) llama : fix FA when KV cache is not used (i.e. embeddings) (ggml-org#12825) server : fix thread.join() on exit (ggml-org#12831) llava: add more helper functions to check projector types in clip context (ggml-org#12824) arg : Including limits file on AIX (ggml-org#12822) server : webui : Improve Chat Input with Auto-Sizing Textarea (ggml-org#12785) Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (ggml-org#12812) gguf-py : support lazy tensor splitting (ggml-org#12809) llama : Support llama 4 text-only (ggml-org#12791) opencl: better identify Adreno GPU (ggml-org#12760) hellaswag: display estimated score confidence interval (ggml-org#12797) cuda : fix HIP and MUSA BF16 (#0) sync : ggml ggml : simplify Arm fp16 CPU logic (ggml/1177) CUDA: don't convert BF16 weights to FP32 (ggml/1174) cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (ggml-org#12734) ci : no curl on ggml-ci (ggml-org#12796) cmake : enable curl by default (ggml-org#12761) ... # Conflicts: # common/arg.cpp # common/common.cpp # common/common.h

server : fix thread.join() on exit

72e054b

ngxson requested a review from qnixsynapse April 8, 2025 14:18

github-actions bot added examples server labels Apr 8, 2025

qnixsynapse approved these changes Apr 8, 2025

View reviewed changes

ngxson mentioned this pull request Apr 8, 2025

examples: fix coredump in llama-server after ctrl+c (#12180) #12813

Closed

ngxson merged commit 78a1ba0 into ggml-org:master Apr 8, 2025
48 of 51 checks passed

tyronecai mentioned this pull request Apr 9, 2025

feat: close client sock when server stop yhirose/cpp-httplib#2127

Closed

colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 21, 2025

server : fix thread.join() on exit (ggml-org#12831)

79d8de7

tyronecai pushed a commit to tyronecai/llama.cpp that referenced this pull request Apr 24, 2025

server : fix coredump on std::terminate() ggml-org#12831

0a75c94

tyronecai mentioned this pull request Apr 24, 2025

server : fix coredump on std::terminate() #12831 #13088

Closed

timwu pushed a commit to timwu/llama.cpp that referenced this pull request May 5, 2025

server : fix thread.join() on exit (ggml-org#12831)

c8e58fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : fix thread.join() on exit #12831

server : fix thread.join() on exit #12831

Uh oh!

ngxson commented Apr 8, 2025

Uh oh!

qnixsynapse left a comment

Uh oh!

Uh oh!

tyronecai commented Apr 9, 2025 •

edited

Loading

Uh oh!

qnixsynapse commented Apr 9, 2025

Uh oh!

tyronecai commented Apr 9, 2025 •

edited

Loading

Uh oh!

ngxson commented Apr 9, 2025 •

edited

Loading

Uh oh!

tyronecai commented Apr 9, 2025

Uh oh!

ngxson commented Apr 9, 2025

Uh oh!

tyronecai commented Apr 9, 2025

Uh oh!

Uh oh!

server : fix thread.join() on exit #12831

server : fix thread.join() on exit #12831

Uh oh!

Conversation

ngxson commented Apr 8, 2025

Uh oh!

qnixsynapse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tyronecai commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qnixsynapse commented Apr 9, 2025

Uh oh!

tyronecai commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tyronecai commented Apr 9, 2025

Uh oh!

ngxson commented Apr 9, 2025

Uh oh!

tyronecai commented Apr 9, 2025

Uh oh!

Uh oh!

tyronecai commented Apr 9, 2025 •

edited

Loading

tyronecai commented Apr 9, 2025 •

edited

Loading

ngxson commented Apr 9, 2025 •

edited

Loading