Skip to content

llama : refactor model loading code #1991

Closed
@ggerganov

Description

@ggerganov

In llama.cpp we have logic for supporting some very old model formats and features such as sharded models which is making the code unnecessary complicated and difficult to maintain. We should simplify it and remove support for old stuff that is no longer used.

Additionally, with the upcoming unified file format (ggml-org/ggml#220) we will have to look into reimplementing the code to use it and add support for loading non-LLaMA models as well. This will be an important step towards adding inference of new models such as MPT and Falcon. Therefore, simplifying the logic as much as possible will help to easily adopt the new unified file format when it is ready

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions