-
Notifications
You must be signed in to change notification settings - Fork 12k
ggml : fix race-condition in ggml-rpc #13600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The RPC backend is always called in a single thread. What kind of crashes do you observe? |
I'm adding RPC support to Ollama, which depends on this feature. In Ollama, it's called from multiple Go threads, leading to race conditions and crashes. I assumed |
While we use I'd suggest to add proper synchronization in the Go code. |
Most ggml-backend objects are not intended to be thread safe, however it should still be possible to use different objects in different threads simultaneously. The RPC client reuses the same socket for multiple objects, so the synchronization here seems necessary. |
Talking specifically about Ollama for a second: adding synchronization at the Go layer is sub-optimal since we do benefit from parallelization on other backends, the RPC is the only one that doesn't support it well (without the following fix). |
ggml/src/ggml-rpc/ggml-rpc.cpp
Outdated
static std::mutex send_rpc_cmd_mutex; | ||
std::lock_guard<std::mutex> lock(send_rpc_cmd_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a global mutex, probably it's better to associate it with the specific instance of the socket. Consider moving the mutex to struct socket_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, let me try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I've updated the PR.
@@ -51,6 +51,7 @@ struct socket_t { | |||
close(this->fd); | |||
#endif | |||
} | |||
std::mutex send_rpc_cmd_mutex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes it sense to use this mutex for both sending and receiving data, so better rename it to sock_mutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense for multiple threads to read from the same socket? Sounds like it would introduce new issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My assumption was that if 3rd party clients need multi-threaded send, they would also need multi-threaded receive. If this is not the case, you can leave it as is, just rename it to send_mutex
@@ -386,6 +387,7 @@ static bool parse_endpoint(const std::string & endpoint, std::string & host, int | |||
// RPC request : | rpc_cmd (1 byte) | request_size (8 bytes) | request_data (request_size bytes) | | |||
// No response | |||
static bool send_rpc_cmd(const std::shared_ptr<socket_t> & sock, enum rpc_cmd cmd, const void * input, size_t input_size) { | |||
std::lock_guard<std::mutex> lock(sock->send_rpc_cmd_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should also acquire the lock in the overloaded send_rpc_cmd
function below to prevent multiple threads reading from the socket at the same time;
you may also need to use std::recursive_mutex
for this to work
No description provided.