Skip to content

Llama3 GGUF conversion with merged LORA Adapter seems to lose training data randomly #7062

Closed
@Sneakr

Description

@Sneakr

I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b .

1: I merge the model with the LORA adapter into safetensors
2: Running inference in python both with the merged model directly or the unsloth loaded model with the adapter on top of it produces correct outputs as per the fine tune

Bug:
GGUF conversion of the merged model does not produce the same output. The GGUF has lost some of its fine tune data, while still maintaining most of it.

I can ask it who it is, who created it etc. And it responds Llama and Meta as usual, but it incorporates the fine tuned speech style and humor into the response. This is not the case for my fine tuned model.

1: I tried merging the LORA adapter with the original GGUF (non-fine tuned) using llama.cpp, the same results.
2: I tried running the server on the original GGUF (non-fine tuned) usling llama.cpp server and the adapter loaded into the server terminal command - same results.

It seemes that GGUF conversion is losing fine tuned data randomly during conversion.

If this is the case, all GGUF converts of the fine tuned models are basically out the window. And the question is how much the non-fine tuned models are affected by this.

I've tried F16, Q8, same issues.

This is not a quantization issue as I get the exact same results running FP16 as well as 4-bit in python running HF loader or Unsloth, both works fine as mentioned.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions