Open
Description
Depends on: #5214
The llamax
library will wrap llama
and expose common high-level functionality. The main goal is to ease the integration of llama.cpp
into 3rd party projects. Ideally, most projects would interface through the llamax
API for all common use cases, while still have the option to use the low-level llama
API for more uncommon applications that require finer control of the state.
A simple way to think about llamax
is that it will simplify all of the existing examples in llama.cpp
by hiding the low-level stuff, such as managing the KV cache and batching requests.
Roughly, llamax
will require it's own state object and a run-loop function.
The specifics of the API are yet to be determined - suggestions are welcome.