Failed to apply lora adapter after fine-tuning a llama-2-13B-chat with ./finetune

Hi everybody,

I am trying to fine-tune a llama-2-13B-chat model and I think I did everything correctly but I still cannot apply my lora.

What I did was:

1) I converted the llama2 weights into hf format using this: (https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py)

python convert_llama_weights_to_hf.py --input_dir ../models/llama-2-13b-chat --output_dir ../models/llama-2-13b-chat/llama-2-13b-chat-hf --model_size 13B

2) Then I ran the convert.py and converted it in fp32

python convert.py ../models/llama-2-13b-chat/llama-2-13b-chat-hf --outtype f32 --outfile ../models/llama-2-13b-chat/llama-2-13b-chat-hf-f32.bin

3) then I quantised it (I am using q5_k_m and not q4_0)

./quantize ../models/llama-2-13b-chat/llama-2-13b-chat-hf-f32.bin ../models/llama-2-13b-chat/llama-2-13b-chat-hf-quantized_q5_k_m.bin q5_k_m 

At this point I test the models with ./main and they work perfectly.

4) So I created a dataset trying to orientate myself in many contradictory answers about the format and at the end I opted for the one that seemed the most used:

------------------------------------------------------------------------------------------------------------------------------

> &lt;s&gt;[INST] &lt;&lt;SYS&gt;&gt;
> In the context of physics
> &lt;&lt;/SYS&gt;&gt;
> 
> What is quantum entanglement? [/INST] Quantum entanglement is the phenomenon that occurs when a group of  particles are generated, interact, or share in such a way that the quantum state of each particle of the group cannot be described independently of the state of the others, including when the particles are separated by a large distance.

------------------------------------------------------------------------------------------------------------------------------

(I didn't put any &lt;/s&gt; at the end because for some reason the loss became nan after less than 10 iterations)

5) I started the fine-tuning and left it for almost 12h for completing 1 epoch

./finetune --model-base ../models/llama-2-13b-chat/llama-2-13b-chat-hf-quantized_q5_k_m.bin --train-data ../datasets/FineTune/train_llamacpp.txt --threads 26 --sample-start "&lt;s&gt;" --ctx 512 -ngl 32

6) Then I tested the model

./main -i -m ../models/llama-2-13b-chat/llama-2-13b-chat-hf-quantized_q5_k_m.bin --lora-base ../models/llama-2-13b-chat/llama-2-13b-chat-hf-f32.bin --lora ../models/llama-2-13b-chat/ggml-lora-LATEST-f32.gguf --color -p "What is entanglement in physics?"


I always get as last lines of the logs

.....
llama_apply_lora_from_file_internal: unsupported tensor dimension 1
llama_init_from_gpt_params: error: failed to apply lora adapter
ggml_metal_free: deallocating
main: error: unable to load model

I am running everything on an M1 Max with 64GB of Ram and 1 GPU with 32 cores

What can be the problem? I tried already different things but no success, that's why I'm writing here...

Thank you for any help

Luca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to apply lora adapter after fine-tuning a llama-2-13B-chat with ./finetune #4499

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failed to apply lora adapter after fine-tuning a llama-2-13B-chat with ./finetune #4499

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions