(Discussion) Improve usability of llama-server

While working on https://github.com/ggml-org/llama.cpp/pull/13365 , I'm thinking about the use case where people can control `llama-server` completely via web UI, this including load/unload models and turning off the server.

The reason why I think about this idea is because I recently found myself going back to LM Studio quite often 😂  . llama.cpp server is good, but having to go back and forth between web <> CLI is not always a pleasant experience.

Basically I'm thinking about 3 low-hanging fruits that could improve the situation:

**Idea 1**: allow loading / unloading model via API: in `server.cpp`, we can add a kinda "super" `main()` function that wraps around the current `main()`. The new main will spawn an "interim" HTTP server that expose the API to load a model. Ofc this functionality will be restricted to **local deployment** to avoid any security issues.

**Idea 2**: add a `-d, --detach` flag to make the CLI go "headless", so user can close the terminal and server can keep running on background. It should be trivial to do on Mac and Linux, but may requires some efforts on Windows. We can add an API to terminate the server process.

**Idea 3**: we can make a desktop shortcut that opens the web browser to llama.cpp `localhost` page. Basically this will make llama.cpp to become an "app" on the desktop, without spending too much efforts on our side. This is nice-to-have, but just noting here to see if anyone else have a better idea.

WDYT @ggerganov @slaren ?

Also tagging @cebtenzzre if you have any suggestions for the `-d, --detach` mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(Discussion) Improve usability of llama-server #13367

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(Discussion) Improve usability of llama-server #13367

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions