Clean up server code

## Motivation

As seen on https://github.com/ggerganov/llama.cpp/issues/4216 , one of the important task is to refactor / clean up the server code so that it's easier to maintain. However, without a detailed plan, personally I feel like it's unlikely to be archived.

This issue is created so that we can discuss about how to refactor or clean up the code.

The goal is to help existing and new contributors to easily find out where to work in the code base.

## Current architecture

The current server implementation has 2 thread: one for HTTP part and one for inference.

![image](https://github.com/ggerganov/llama.cpp/assets/7702203/6e44b6cc-04f0-465c-a3fb-dc5c4f13b8ae)

- The direction from HTTP ==> inference thread is done by `llama_server_queue.post(task)`
- The direction from inference ==> HTTP thread is done by `llama_server_response.send(result)`

## Ideas

Feel free to suggest any ideas that you find helpful (please keep in mind that we do not introduce new features here, just to re-write the code):

- Abstract out `llama_server_queue` and `llama_server_response`, mutexes are now bound to these 2 structs (already finished)
  https://github.com/ggerganov/llama.cpp/pull/5065

- Renaming and move structs to `utils.hpp`: https://github.com/ggerganov/llama.cpp/issues/5762#issuecomment-1968873115
  https://github.com/ggerganov/llama.cpp/pull/5779

- Investigate [httplib](https://github.com/yhirose/cpp-httplib?tab=readme-ov-file#post-routing-handler) to see if we can use more functions already exist in this lib, for example CORS can be done using `set_post_routing_handler` (the same idea with "middleware" in high level web frameworks)

- Merge handlers of `/v1/{endpoints}` and `/{endpoints}` to prevent code duplications
  https://github.com/ggerganov/llama.cpp/pull/5722

- No more hard-coding js files into hpp, as these files pollute the code base. They should be converted to hpp by using [code generation](https://stackoverflow.com/questions/71906069/what-is-the-proper-way-of-using-a-source-generator-in-cmake) (like how `build-info.cpp` is generated in `common.cpp`)
  https://github.com/ggerganov/llama.cpp/pull/6661

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clean up server code #5762

Motivation

Current architecture

Ideas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clean up server code #5762

Description

Motivation

Current architecture

Ideas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions