Skip to content

(Discussion) Improve usability of llama-server #13367

Open
@ngxson

Description

@ngxson

While working on #13365 , I'm thinking about the use case where people can control llama-server completely via web UI, this including load/unload models and turning off the server.

The reason why I think about this idea is because I recently found myself going back to LM Studio quite often 😂 . llama.cpp server is good, but having to go back and forth between web <> CLI is not always a pleasant experience.

Basically I'm thinking about 3 low-hanging fruits that could improve the situation:

Idea 1: allow loading / unloading model via API: in server.cpp, we can add a kinda "super" main() function that wraps around the current main(). The new main will spawn an "interim" HTTP server that expose the API to load a model. Ofc this functionality will be restricted to local deployment to avoid any security issues.

Idea 2: add a -d, --detach flag to make the CLI go "headless", so user can close the terminal and server can keep running on background. It should be trivial to do on Mac and Linux, but may requires some efforts on Windows. We can add an API to terminate the server process.

Idea 3: we can make a desktop shortcut that opens the web browser to llama.cpp localhost page. Basically this will make llama.cpp to become an "app" on the desktop, without spending too much efforts on our side. This is nice-to-have, but just noting here to see if anyone else have a better idea.

WDYT @ggerganov @slaren ?

Also tagging @cebtenzzre if you have any suggestions for the -d, --detach mode

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions