(draft) tts: Orpheus support #12487

jamorphy · 2025-03-21T04:19:48Z

A rough draft of SNAC conversion to .gguf with convert_hf_to_gguf.py, will add support for this model incrementally, otherwise the PR may be helpful to others.

The upstream config.json (https://huggingface.co/hubertsiuzdak/snac_24khz/resolve/main/config.json) does not contain the following which I added manually:

    "n_layers": 4,
    "architectures": ["SNACDec"],
    "decoder_channel_dims": [1024, 512, 256, 128, 64]

This gets conversion working, but will need to make some tweaks to infer this information from the weights and avoid changes to config.json. Next steps are to try decoding with some sample orpheus tokens.

reference issue: #12476

Working on each part incrementally, added a rough draft for SNAC convertion to .gguf

ggerganov · 2025-03-21T07:25:13Z

Good job. Let us know if you have any questions. You might also find some answers by looking at the commits of the OuteTTS PR: #10784

SNAC uses the snake activation function. Added scaffolding to include `GGML_OP_SNAKE` as a new op. Should this be a unary op? The SNAC decoder uses noise blocks to enhance outputs, its optional, so omitting it for now until the model is integrated e2e. Next steps: write the `llm_graph_context` for SNAC

now, integrate the LM (seems straightforward it's llama3), rewrite/extend/add to tts.cpp, then fix bugs and optimize.

jamorphy · 2025-03-28T22:48:46Z

I'm still working on this PR. Orpheus is outputting tokens fine - now ironing out issues in the SNAC graph. I'm aiming to get a reviewable PR out in a few days.

WIP orpheus tts

jamorphy · 2025-04-02T04:52:07Z

src/llama-arch.cpp

@@ -1391,6 +1392,55 @@ static const std::map<llm_arch, std::map<llm_tensor, const char *>> LLM_TENSOR_N
            { LLM_TENSOR_POS_NET_ATTN_OUT,  "posnet.%d.attn_output" },
        },
    },
+    {
+        LLM_ARCH_SNAC_DEC,


fix this. do we really need to create a new tensor type for every sub block and res unit

For now yes, but this will probably be reworked soon. For now follow the existing pattern.

cleanup

Run forward passes with dummy codes. Output tensor shapes (raw audio samples) seem to match expected shape given number of input frames. Attempts with Orpheus to be done soon. The gguf used in this commit is at: https://huggingface.co/jamorphy/snac-fwd-pass-devel-gguf

A forward pass

jamorphy · 2025-04-08T23:59:09Z

Running into speed troubles during graph compute, likely due to some operations being done on the CPU. Is there a profiling tool for the compute graph or something similar? For now I'm logging in ggml_backend_sched_graph_compute_async, but was wondering if there's something better I may have missed

ggerganov · 2025-04-22T12:50:28Z

For profiling individual ops, you can use:

# profile the GGML_OP_ADD (see the source code for the defined perf tests)
./bin/test-backend-ops -o ADD perf

Although for now I think you can just focus on correctness and leave the performance optimizations for later.

ngxson

Re. your question about performance, some conv_1d may not be available on all backends, so I suspect there are many copies back and forth between CPU and GPU.

My kyutai-mimi.cpp implementation does run much faster on CPU compared to GPU because of this. And btw I usually experiment things on ggml-easy first, as there are many debugging tool there, then copy the cgraph over llama.cpp once I'm happy with it. Probably this could help you do faster experiments on ggml.

ngxson · 2025-04-25T09:39:30Z

src/llama-model.cpp

+                    cur = ggml_snake(ctx0, cur, alpha);
+
+                    ggml_tensor * w = layer.decoder_blocks[1].up_weight;
+                    ggml_tensor * s = ggml_cpy(ctx0, layer.decoder_blocks[1].up_scale,


Out of curiosity, why we need to copy the tensor here?

Ran into many type mismatches, some ops expecting f16 and others f32. ggml_cpy is just a workaround and I suspect this may be the cause of slowness. If I remember correctly the bottleneck was ggml mul running on CPU.

In this case, you can use ggml_cast. But yeah the best is to force the dtype of this tensor to F16 upon converting to GGUF

jamorphy · 2025-05-02T18:15:03Z

Been some time since I looked at this but I'll check out ggml-easy.

logikstate · 2025-05-18T09:16:24Z

has somebody already done this? https://github.com/foldl/chatllm.cpp/blob/master/models/orpheus.cpp

(draft) tts: Orpheus support

9906bd9

Working on each part incrementally, added a rough draft for SNAC convertion to .gguf

github-actions bot added the python python script changes label Mar 21, 2025

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 22, 2025

(rebase me) snac graph, tensor layour and creation

efd5276

now, integrate the LM (seems straightforward it's llama3), rewrite/extend/add to tts.cpp, then fix bugs and optimize.

jamorphy and others added 2 commits April 1, 2025 21:47

codes to graph build

98e5834

Merge pull request #1 from jamorphy/wip-orpheus-tts

e004a88

WIP orpheus tts

jamorphy commented Apr 2, 2025

View reviewed changes

github-actions bot added the examples label Apr 2, 2025

jamorphy and others added 4 commits April 1, 2025 22:02

cleanup

1a6fa98

Merge pull request #2 from jamorphy/wip-orpheus-tts

0173967

cleanup

A forward pass

9d25ca1

Run forward passes with dummy codes. Output tensor shapes (raw audio samples) seem to match expected shape given number of input frames. Attempts with Orpheus to be done soon. The gguf used in this commit is at: https://huggingface.co/jamorphy/snac-fwd-pass-devel-gguf

Merge pull request #3 from jamorphy/wip-orpheus-tts

befc55f

A forward pass

Ashoka74 mentioned this pull request Apr 7, 2025

[Feat]: support for Orpheus & csm a-ghorbani/pocketpal-ai#257

Open

ngxson mentioned this pull request Apr 7, 2025

Support for OuteTTS 1.0 #12794

Draft

lazy inputs for snac codes tensors instead of cpu buffers

b7d0456

ngxson reviewed Apr 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(draft) tts: Orpheus support #12487

(draft) tts: Orpheus support #12487

jamorphy commented Mar 21, 2025

ggerganov commented Mar 21, 2025

jamorphy commented Mar 28, 2025

jamorphy Apr 2, 2025

ggerganov Apr 2, 2025

jamorphy commented Apr 8, 2025

ggerganov commented Apr 22, 2025

ngxson left a comment

ngxson Apr 25, 2025

jamorphy May 2, 2025

ngxson May 2, 2025 •

edited

Loading

jamorphy commented May 2, 2025

logikstate commented May 18, 2025 •

edited

Loading

(draft) tts: Orpheus support #12487

Are you sure you want to change the base?

(draft) tts: Orpheus support #12487

Conversation

jamorphy commented Mar 21, 2025

ggerganov commented Mar 21, 2025

jamorphy commented Mar 28, 2025

jamorphy Apr 2, 2025

Choose a reason for hiding this comment

ggerganov Apr 2, 2025

Choose a reason for hiding this comment

jamorphy commented Apr 8, 2025

ggerganov commented Apr 22, 2025

ngxson left a comment

Choose a reason for hiding this comment

ngxson Apr 25, 2025

Choose a reason for hiding this comment

jamorphy May 2, 2025

Choose a reason for hiding this comment

ngxson May 2, 2025 • edited Loading

Choose a reason for hiding this comment

jamorphy commented May 2, 2025

logikstate commented May 18, 2025 • edited Loading

ngxson May 2, 2025 •

edited

Loading

logikstate commented May 18, 2025 •

edited

Loading