Closed

Description
It feels like there's minimal/no testing before llama.cpp merges commits. I don't understand why there's so many unintenional consequences, and bugs.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I expect ./server to function so that I can visit http://127.0.0.1:8080
Current Behavior
Here's what server looks like after navigating to http://127.0.0.1:8080:
It's blank, refreshing doesn't help. Here's the log:
./server -m ~/Pygmalion-Vicuna-1.1-7b.ggmlv3.Q4_0.bin -c 2048 -t 2 -b 7
{"timestamp":1692016653,"level":"INFO","function":"main","line":1179,"message":"build info","build":983,"commit":"1cd06fa"}
{"timestamp":1692016653,"level":"INFO","function":"main","line":1184,"message":"system info","n_threads":2,"total_threads":8,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | "}
llama.cpp: loading model from /data/data/com.termux/files/home/Pygmalion-Vicuna-1.1-7b.ggmlv3.Q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 5504
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_head_kv = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: n_gqa = 1
llama_model_load_internal: rnorm_eps = 5.0e-06
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 3647.96 MB (+ 1024.00 MB per state)
llama_new_context_with_model: kv self size = 1024.00 MB
llama_new_context_with_model: compute buffer total size = 3.42 MB
llama server listening at http://127.0.0.1:8080
{"timestamp":1692016659,"level":"INFO","function":"main","line":1413,"message":"HTTP server listening","hostname":"127.0.0.1","port":8080}
{"timestamp":1692016667,"level":"INFO","function":"log_server_request","line":1152,"message":"request","remote_addr":"127.0.0.1","remote_port":37016,"status":200,"method":"GET","path":"/","params":{}}
{"timestamp":1692016667,"level":"INFO","function":"log_server_request","line":1152,"message":"request","remote_addr":"127.0.0.1","remote_port":37016,"status":404,"method":"GET","path":"/json-schema-to-grammar.mjs","params":{}}
{"timestamp":1692016667,"level":"INFO","function":"log_server_request","line":1152,"message":"request","remote_addr":"127.0.0.1","remote_port":37016,"status":200,"method":"GET","path":"/index.js","params":{}}
{"timestamp":1692016667,"level":"INFO","function":"log_server_request","line":1152,"message":"request","remote_addr":"127.0.0.1","remote_port":37018,"status":200,"method":"GET","path":"/completion.js","params":{}}
Environment and Context
$ uname -a
Linux localhost 4.14.190-23725627-abG975WVLS8IWD1 #2 SMP PREEMPT Mon Apr 10 18:16:39 KST 2023 aarch64 Android
Python 3.11.4
GNU Make 4.4.1
Built for aarch64-unknown-linux-android
clang version 16.0.6
Target: aarch64-unknown-linux-android24
Thread model: posix
InstalledDir: /data/data/com.termux/files/usr/bin
Steps to Reproduce
- git clone
cd llama.cpp, cmake -B build -DCMAKE_C_FLAGS=-march=armv8.4a, cd build, cmake --build . --config Release
- ./server ...
./server functions using make, fails on CMake.
Metadata
Metadata
Assignees
Labels
No labels