Error: 70B Model quantizing on mac: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192

Used this model: https://huggingface.co/meta-llama/Llama-2-70b

Used these commands:

```
$ convert-pth-to-ggml.py models/LLaMa2-70B-meta 1
```

```
$ ./quantize ./models/LLaMa2-70B-meta/ggml-model-f16.bin ./models/LLaMa2-70B-meta/ggml-model-q4_0.bin 2
```

7B and 11B models work without any problems. This is only when using the 70B model.

_error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected  8192 x  8192, got  8192 x  1024_

```
llama.cpp: loading model from /Users/xyz/Desktop/llama.cpp/models/LLaMa2-70B-meta/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 8192
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 64
llama_model_load_internal: n_layer    = 80
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 22016
llama_model_load_internal: model size = 65B
llama_model_load_internal: ggml ctx size =    0.19 MB
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected  8192 x  8192, got  8192 x  1024
llama_load_model_from_file: failed to load model
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 llm = Llama(model_path="/Users/xyz/Desktop/llama.cpp/models/LLaMa2-70B-meta/ggml-model-q4_0.bin", n_ctx=512, seed=43, n_threads=8, n_gpu_layers=1)

File /opt/homebrew/Caskroom/miniforge/base/envs/tensorflow_m1/lib/python3.11/site-packages/llama_cpp/llama.py:305, in Llama.__init__(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, verbose)
    300     raise ValueError(f"Model path does not exist: {model_path}")
    302 self.model = llama_cpp.llama_load_model_from_file(
    303     self.model_path.encode("utf-8"), self.params
    304 )
--> 305 assert self.model is not None
    307 self.ctx = llama_cpp.llama_new_context_with_model(self.model, self.params)
    309 assert self.ctx is not None

AssertionError: 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error: 70B Model quantizing on mac: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192 #2285

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error: 70B Model quantizing on mac: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192 #2285

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions