Open
Description
🚀 The feature, motivation and pitch
As a sequel to #1518 where we added an enum for tokenizer types to simplify TokenizerArgs __post_init__
, we need to further improve it to simplify new tokenizer type onboarding:
Tasks
- Move TokenizerType to a centralized place
- We now have two of them:
Lines 67 to 69 in 0299a37
torchchat/torchchat/cli/builder.py
Lines 241 to 245 in 0299a37
- We now have two of them:
- Check all getters of tokenizer types
- It may be able to be simplified as inline
torchchat/torchchat/generate.py
Line 368 in 0299a37
- It may be able to be simplified as inline
- Add documentation for future tokenizer onboard.
- We may need to point people to update the model validation logic:
torchchat/torchchat/cli/builder.py
Lines 290 to 322 in 0299a37
- We may need to point people to update the model validation logic:
To test, run a model with each tokenizer type:
- python torchchat.py generate llama2
- python torchchat.py generate llama3
- python torchchat.py generate granite-code
cc @Jack-Khuu @byjlw
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status