Skip to content

metal : the Q3_K and Q4_K kernels with LLAMA_QKK_64=1 are broken #3276

Closed
@ggerganov

Description

@ggerganov

The following commands fail to generate coherent text:

LLAMA_QKK_64=1 make -j && ./main -m tmp/mnt/models/open-llama/3B-v2/ggml-model-q4_k.gguf -p "I believe the meaning of life is" -t 8 -ngl 1

LLAMA_QKK_64=1 make -j && ./main -m tmp/mnt/models/open-llama/3B-v2/ggml-model-q3_k.gguf -p "I believe the meaning of life is" -t 8 -ngl 1

It works on the CPU (Arm and x86).
It also works with the following patch:

diff --git a/ggml-metal.m b/ggml-metal.m
index 1139ee3..ed9857f 100644
--- a/ggml-metal.m
+++ b/ggml-metal.m
@@ -889,7 +889,7 @@ void ggml_metal_graph_compute(
                                 src1t == GGML_TYPE_F32 &&
                                 [ctx->device supportsFamily:MTLGPUFamilyApple7] &&
                                 ne00%32 == 0 &&
-                                ne11 > 1) {
+                                ne11 >= 1) {
                                 switch (src0->type) {
                                     case GGML_TYPE_F32:  [encoder setComputePipelineState:ctx->pipeline_mul_mm_f32_f32];  break;
                                     case GGML_TYPE_F16:  [encoder setComputePipelineState:ctx->pipeline_mul_mm_f16_f32];  break;

So it seems the issue is in the kernel_mul_mat_q4_K_f32 kernel in the QK_K == 64 branch:

https://github.com/ggerganov/llama.cpp/blob/a40f2b656fab364ce0aff98dbefe9bd9c3721cc9/ggml-metal.metal#L1576-L1663

Might have been broken with #2615 , but I haven't tested this yet

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions