Skip to content

bug: tokenizer only matches exact words and no subwords #85

Closed
@Green-Sky

Description

@Green-Sky

eg spacestation should probably be tokenized as space + station</w>, but right now it is just an unhandled token.

[DEBUG] stable-diffusion.cpp:1077 - parse 'spacestation' to [['spacestation', 1], ]
[DEBUG] stable-diffusion.cpp:469  - split prompt "spacestation" to tokens ["<|endoftext|>", ]

(resulting images have nothing to do with a spacestation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions