in situ auto-Frankenmerges

# Feature Description

Modify llama.cpp to support on-the-fly "Frankenmerging" of the model in memory with itself.

# Motivation

Frankenmerges, including auto-Frankenmerges, are becoming increasingly popular and appear to have properties that merit further study; it's Rich Sutton's "bitter lesson" in the small: stacking more decoder blocks means a greater total amount of computation in a single inference pass and, perhaps surprisingly, under the right circumstances, that greater accessible computation outweighs the 'noise' induced by performing fairly brutal surgery on the order of decoder blocks.

Right now experimentation is taking place at the level of building new models with mergekit. This is slow. The ability to mix-and-match decoder blocks on the fly in llama.cpp would speed up iteration and experimentation, helping better understand the tradeoff between greater available net computation and decoder surgery induced noise.

# Possible Implementation

Something like this:

https://github.com/semiring/IRL-llama.cpp/blob/master/llama.cpp#L4346


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

in situ auto-Frankenmerges #4718

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

in situ auto-Frankenmerges #4718

Description

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions