Feature Request: Support RPC with -dev/-devd

### Prerequisites

- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

`-dev`/`-devd` currently doesn't appear to work with RPC, due to RPC devices getting created later down the line:
```
130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
version: 4230 (0c39f44d)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu
person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server -m /mnt4/models/Mistral-Large-Instruct-2411-IQ3_XXS.gguf -ngl 18 -c 16384 --host 0.0.0.0 --log-colors -fa --no-mmap --rpc 192.168.0.104:50052 -md /mnt4/models/Ministral-8B-Instruct-2410.i1-Q6_K.gguf -devd "RPC[192.168.0.104:50052]" -ngld 99 -cd 8192 -dev ROCm0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
error while handling argument "-devd": invalid device: RPC[192.168.0.104:50052]

usage:
-dev,  --device <dev1,dev2,..>          comma-separated list of devices to use for offloading (none = don't
                                        offload)
                                        use --list-devices to see a list of available devices
                                        (env: LLAMA_ARG_DEVICE)


to show complete usage, run with -h
130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --list-devices --rpc 192.168.0.104:50052
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
Available devices:
  ROCm0: AMD Radeon RX 6700 XT (12272 MiB, 11872 MiB free)
```

### Motivation

I have one computer that can run a large model and fit nothing else. I have another computer that can fit a smaller draft model and run it pretty quickly, so it'd be pretty nice if I could run the draft model over RPC. To do so, I need to set `-dev` to my local machine's gpu, and `-devd` to the system over RPC.

### Possible Implementation

RPC's device creation would need to happen much earlier, before the arguments for `-dev` are validated. I was trying to see if I could hack the feature in but wasn't sure how to approach it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Support RPC with -dev/-devd #10609

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Support RPC with -dev/-devd #10609

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions