Skip to content

Feature Request: Support RPC with -dev/-devd #10609

Closed
@person4268

Description

@person4268

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

-dev/-devd currently doesn't appear to work with RPC, due to RPC devices getting created later down the line:

130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
version: 4230 (0c39f44d)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu
person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server -m /mnt4/models/Mistral-Large-Instruct-2411-IQ3_XXS.gguf -ngl 18 -c 16384 --host 0.0.0.0 --log-colors -fa --no-mmap --rpc 192.168.0.104:50052 -md /mnt4/models/Ministral-8B-Instruct-2410.i1-Q6_K.gguf -devd "RPC[192.168.0.104:50052]" -ngld 99 -cd 8192 -dev ROCm0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
error while handling argument "-devd": invalid device: RPC[192.168.0.104:50052]

usage:
-dev,  --device <dev1,dev2,..>          comma-separated list of devices to use for offloading (none = don't
                                        offload)
                                        use --list-devices to see a list of available devices
                                        (env: LLAMA_ARG_DEVICE)


to show complete usage, run with -h
130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --list-devices --rpc 192.168.0.104:50052
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
Available devices:
  ROCm0: AMD Radeon RX 6700 XT (12272 MiB, 11872 MiB free)

Motivation

I have one computer that can run a large model and fit nothing else. I have another computer that can fit a smaller draft model and run it pretty quickly, so it'd be pretty nice if I could run the draft model over RPC. To do so, I need to set -dev to my local machine's gpu, and -devd to the system over RPC.

Possible Implementation

RPC's device creation would need to happen much earlier, before the arguments for -dev are validated. I was trying to see if I could hack the feature in but wasn't sure how to approach it.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions