ggml : add GPU support for Mamba models

Recently, initial Mamba support (CPU-only) has been introduced in #5328 by @compilade 

In order to support running these models efficiently on the GPU, we seem to be lacking kernel implementations for the following 2 ops:

- `GGML_OP_SSM_CONV`
- `GGML_OP_SSM_SCAN`

Creating this issue to keep track of this and give more visibility of this feature. Help with implementing the missing kernels for CUDA and Metal (and other backends potentially) is welcome. We can also discuss if anything else is required to better support this architecture in `llama.cpp`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : add GPU support for Mamba models #6758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ggml : add GPU support for Mamba models #6758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions