metal : the Q3_K and Q4_K kernels with LLAMA_QKK_64=1 are broken

The following commands fail to generate coherent text:

```bash
LLAMA_QKK_64=1 make -j && ./main -m tmp/mnt/models/open-llama/3B-v2/ggml-model-q4_k.gguf -p "I believe the meaning of life is" -t 8 -ngl 1

LLAMA_QKK_64=1 make -j && ./main -m tmp/mnt/models/open-llama/3B-v2/ggml-model-q3_k.gguf -p "I believe the meaning of life is" -t 8 -ngl 1
```

It works on the CPU (Arm and x86).
It also works with the following patch:

```diff
diff --git a/ggml-metal.m b/ggml-metal.m
index 1139ee3..ed9857f 100644
--- a/ggml-metal.m
+++ b/ggml-metal.m
@@ -889,7 +889,7 @@ void ggml_metal_graph_compute(
                                 src1t == GGML_TYPE_F32 &&
                                 [ctx->device supportsFamily:MTLGPUFamilyApple7] &&
                                 ne00%32 == 0 &&
-                                ne11 > 1) {
+                                ne11 >= 1) {
                                 switch (src0->type) {
                                     case GGML_TYPE_F32:  [encoder setComputePipelineState:ctx->pipeline_mul_mm_f32_f32];  break;
                                     case GGML_TYPE_F16:  [encoder setComputePipelineState:ctx->pipeline_mul_mm_f16_f32];  break;
```

So it seems the issue is in the `kernel_mul_mat_q4_K_f32` kernel in the `QK_K == 64` branch:

https://github.com/ggerganov/llama.cpp/blob/a40f2b656fab364ce0aff98dbefe9bd9c3721cc9/ggml-metal.metal#L1576-L1663

Might have been broken with #2615 , but I haven't tested this yet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : the Q3_K and Q4_K kernels with LLAMA_QKK_64=1 are broken #3276

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

metal : the Q3_K and Q4_K kernels with LLAMA_QKK_64=1 are broken #3276

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions