convert-hf-to-gguf.py breaks on phi-2

this was possible earlier before the bpe pre tokenizer fixes. now it leads to
  File "/kaggle/working/llama.cpp/./convert-hf-to-gguf.py", line 432, in get_vocab_base_pre
    raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

thought this would easily be solved by updating the hashes. but cannot get past "llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'phi2'" seemingly without code changes.

how is this supposed to be done? like [so](https://github.com/CrispStrobe/llama.cpp/commit/b1dc90ede31b5e0c4c0d2010f1f2173c68e6713a)? And why does the script break when there is no correlate to the pre tokenizer string, instead of just defaulting, as illustrated in this diff?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert-hf-to-gguf.py breaks on phi-2 #7219

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

convert-hf-to-gguf.py breaks on phi-2 #7219

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions