Skip to content

ggml : fix race-condition in ggml-rpc #13600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

gkpln3
Copy link

@gkpln3 gkpln3 commented May 17, 2025

No description provided.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 17, 2025
@gkpln3 gkpln3 changed the title ggml : Fixed race-condition that leads to crashes when using ggml-rpc ggml : fix race-condition that leads to crashes when using ggml-rpc May 17, 2025
@gkpln3 gkpln3 changed the title ggml : fix race-condition that leads to crashes when using ggml-rpc ggml : fix race-condition in ggml-rpc May 17, 2025
@rgerganov
Copy link
Collaborator

The RPC backend is always called in a single thread. What kind of crashes do you observe?

@gkpln3
Copy link
Author

gkpln3 commented May 17, 2025

I'm adding RPC support to Ollama, which depends on this feature. In Ollama, it's called from multiple Go threads, leading to race conditions and crashes. I assumed ggml-rpc.cpp was intended to be thread-safe, given the use of mutexes elsewhere in the file.

@rgerganov
Copy link
Collaborator

While we use std::mutex in few places, the RPC backend is not considered thread-safe. I think that ggml backends in general are not required to be thread-safe by design, @slaren please correct me if I am wrong.

I'd suggest to add proper synchronization in the Go code.

@slaren
Copy link
Member

slaren commented May 17, 2025

Most ggml-backend objects are not intended to be thread safe, however it should still be possible to use different objects in different threads simultaneously. The RPC client reuses the same socket for multiple objects, so the synchronization here seems necessary.

@gkpln3
Copy link
Author

gkpln3 commented May 25, 2025

Talking specifically about Ollama for a second: adding synchronization at the Go layer is sub-optimal since we do benefit from parallelization on other backends, the RPC is the only one that doesn't support it well (without the following fix).

Comment on lines 389 to 390
static std::mutex send_rpc_cmd_mutex;
std::lock_guard<std::mutex> lock(send_rpc_cmd_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a global mutex, probably it's better to associate it with the specific instance of the socket. Consider moving the mutex to struct socket_t.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, let me try that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I've updated the PR.

@@ -51,6 +51,7 @@ struct socket_t {
close(this->fd);
#endif
}
std::mutex send_rpc_cmd_mutex;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes it sense to use this mutex for both sending and receiving data, so better rename it to sock_mutex

Copy link
Author

@gkpln3 gkpln3 May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense for multiple threads to read from the same socket? Sounds like it would introduce new issues

Copy link
Collaborator

@rgerganov rgerganov May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that if 3rd party clients need multi-threaded send, they would also need multi-threaded receive. If this is not the case, you can leave it as is, just rename it to send_mutex

@@ -386,6 +387,7 @@ static bool parse_endpoint(const std::string & endpoint, std::string & host, int
// RPC request : | rpc_cmd (1 byte) | request_size (8 bytes) | request_data (request_size bytes) |
// No response
static bool send_rpc_cmd(const std::shared_ptr<socket_t> & sock, enum rpc_cmd cmd, const void * input, size_t input_size) {
std::lock_guard<std::mutex> lock(sock->send_rpc_cmd_mutex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also acquire the lock in the overloaded send_rpc_cmd function below to prevent multiple threads reading from the socket at the same time;

you may also need to use std::recursive_mutex for this to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants