Skip to content

convert-hf-to-gguf.py breaks on phi-2 #7219

Open
@CrispStrobe

Description

@CrispStrobe

this was possible earlier before the bpe pre tokenizer fixes. now it leads to
File "/kaggle/working/llama.cpp/./convert-hf-to-gguf.py", line 432, in get_vocab_base_pre
raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

thought this would easily be solved by updating the hashes. but cannot get past "llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'phi2'" seemingly without code changes.

how is this supposed to be done? like so? And why does the script break when there is no correlate to the pre tokenizer string, instead of just defaulting, as illustrated in this diff?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions