Closed
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
-dev
/-devd
currently doesn't appear to work with RPC, due to RPC devices getting created later down the line:
130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
version: 4230 (0c39f44d)
built with cc (GCC) 14.2.1 20240910 for x86_64-pc-linux-gnu
person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server -m /mnt4/models/Mistral-Large-Instruct-2411-IQ3_XXS.gguf -ngl 18 -c 16384 --host 0.0.0.0 --log-colors -fa --no-mmap --rpc 192.168.0.104:50052 -md /mnt4/models/Ministral-8B-Instruct-2410.i1-Q6_K.gguf -devd "RPC[192.168.0.104:50052]" -ngld 99 -cd 8192 -dev ROCm0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
error while handling argument "-devd": invalid device: RPC[192.168.0.104:50052]
usage:
-dev, --device <dev1,dev2,..> comma-separated list of devices to use for offloading (none = don't
offload)
use --list-devices to see a list of available devices
(env: LLAMA_ARG_DEVICE)
to show complete usage, run with -h
130 person4268@person4269 ~/source/llama.cpp/build/bin (git)-[master] % ./llama-server --list-devices --rpc 192.168.0.104:50052
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 6700 XT, compute capability 10.3, VMM: no
Available devices:
ROCm0: AMD Radeon RX 6700 XT (12272 MiB, 11872 MiB free)
Motivation
I have one computer that can run a large model and fit nothing else. I have another computer that can fit a smaller draft model and run it pretty quickly, so it'd be pretty nice if I could run the draft model over RPC. To do so, I need to set -dev
to my local machine's gpu, and -devd
to the system over RPC.
Possible Implementation
RPC's device creation would need to happen much earlier, before the arguments for -dev
are validated. I was trying to see if I could hack the feature in but wasn't sure how to approach it.