Closed
Description
Hi,
I think a recent change might have caused this. I am using llama-2-7b-chat.Q4_K_M.gguf for a local Q&A RAG pipeline, created using LlamaIndex. I developed a proof of concept on a machine using 0.2.13 version and saw this in the output:
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 2048.00 MB
When I recently installed llama-cpp-python on a new machine, I don't see this in output anymore and my process has slowed down significantly. Can you please advise? Let me know if you need anything additional.
Metadata
Metadata
Assignees
Labels
No labels