Description
While working on #13365 , I'm thinking about the use case where people can control llama-server
completely via web UI, this including load/unload models and turning off the server.
The reason why I think about this idea is because I recently found myself going back to LM Studio quite often 😂 . llama.cpp server is good, but having to go back and forth between web <> CLI is not always a pleasant experience.
Basically I'm thinking about 3 low-hanging fruits that could improve the situation:
Idea 1: allow loading / unloading model via API: in server.cpp
, we can add a kinda "super" main()
function that wraps around the current main()
. The new main will spawn an "interim" HTTP server that expose the API to load a model. Ofc this functionality will be restricted to local deployment to avoid any security issues.
Idea 2: add a -d, --detach
flag to make the CLI go "headless", so user can close the terminal and server can keep running on background. It should be trivial to do on Mac and Linux, but may requires some efforts on Windows. We can add an API to terminate the server process.
Idea 3: we can make a desktop shortcut that opens the web browser to llama.cpp localhost
page. Basically this will make llama.cpp to become an "app" on the desktop, without spending too much efforts on our side. This is nice-to-have, but just noting here to see if anyone else have a better idea.
WDYT @ggerganov @slaren ?
Also tagging @cebtenzzre if you have any suggestions for the -d, --detach
mode