CUDA cannot generate images 

I encountered a strange problem. After using CUDA, I got a pure green picture when running.But it works fine on another computer.

```
sd_cuda.exe  -m meinamix_meinaV11-f16.gguf -p "1girl" -v
Option:
    n_threads:       6
    mode:            txt2img
    model_path:      meinamix_meinaV11-f16.gguf
    output_path:     output.png
    init_img:
    prompt:          1girl
    negative_prompt:
    cfg_scale:       7.00
    width:           512
    height:          512
    sample_method:   euler_a
    schedule:        default
    sample_steps:    20
    strength:        0.75
    rng:             cuda
    seed:            42
    batch_count:     1
System Info:
    BLAS = 1
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:3701 - Using CUDA backend
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1
[INFO]  stable-diffusion.cpp:3715 - loading model from 'meinamix_meinaV11-f16.gguf'
[DEBUG] stable-diffusion.cpp:3733 - load_from_file: - kv   0:                              sd.model.name str
[DEBUG] stable-diffusion.cpp:3733 - load_from_file: - kv   1:                             sd.model.dtype i32
[DEBUG] stable-diffusion.cpp:3733 - load_from_file: - kv   2:                           sd.model.version i8
[DEBUG] stable-diffusion.cpp:3733 - load_from_file: - kv   3:                            sd.vocab.tokens arr
[INFO]  stable-diffusion.cpp:3743 - Stable Diffusion 1.x | meinamix_meinaV11.safetensors
[INFO]  stable-diffusion.cpp:3751 - model data type: f16
[DEBUG] stable-diffusion.cpp:3755 - loading vocab
[DEBUG] stable-diffusion.cpp:3771 - ggml tensor size = 416 bytes
[DEBUG] stable-diffusion.cpp:887  - clip params backend buffer size =  236.18 MB (449 tensors)
[DEBUG] stable-diffusion.cpp:2028 - unet params backend buffer size =  1641.16 MB (706 tensors)
[DEBUG] stable-diffusion.cpp:3118 - vae params backend buffer size =  95.47 MB (164 tensors)
[DEBUG] stable-diffusion.cpp:3780 - preparing memory for the weights
[DEBUG] stable-diffusion.cpp:3798 - loading weights
[DEBUG] stable-diffusion.cpp:3903 - model size = 1969.67MB
[INFO]  stable-diffusion.cpp:3913 - total memory buffer size = 1972.80MB (clip 236.18MB, unet 1641.16MB, vae 95.47MB)
[INFO]  stable-diffusion.cpp:3915 - loading model from 'meinamix_meinaV11-f16.gguf' completed, taking 0.92s
[INFO]  stable-diffusion.cpp:3939 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:3966 - finished loaded file
[DEBUG] stable-diffusion.cpp:4647 - prompt after extract and remove lora: "1girl"
[INFO]  stable-diffusion.cpp:4652 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:1118 - parse '1girl' to [['1girl', 1], ]
[DEBUG] stable-diffusion.cpp:521  - split prompt "1girl" to tokens ["1</w>", "girl</w>", ]
[DEBUG] stable-diffusion.cpp:1051 - learned condition compute buffer size: 1.58 MB
[DEBUG] stable-diffusion.cpp:4061 - computing condition graph completed, taking 455 ms
[DEBUG] stable-diffusion.cpp:1118 - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp:521  - split prompt "" to tokens []
[DEBUG] stable-diffusion.cpp:1051 - learned condition compute buffer size: 1.58 MB
[DEBUG] stable-diffusion.cpp:4061 - computing condition graph completed, taking 415 ms
[INFO]  stable-diffusion.cpp:4681 - get_learned_condition completed, taking 876 ms
[INFO]  stable-diffusion.cpp:4691 - sampling using Euler A method
[INFO]  stable-diffusion.cpp:4694 - generating image: 1/1
[DEBUG] stable-diffusion.cpp:2384 - diffusion compute buffer size: 552.57 MB
  |==================================================| 20/20 - 7.42s/it
[INFO]  stable-diffusion.cpp:4706 - sampling completed, taking 157.10s
[INFO]  stable-diffusion.cpp:4714 - generating 1 latent images completed, taking 157.12s
[INFO]  stable-diffusion.cpp:4716 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:3252 - vae compute buffer size: 1664.00 MB
[DEBUG] stable-diffusion.cpp:4605 - computing vae [mode: DECODE] graph completed, taking 6.65s
[INFO]  stable-diffusion.cpp:4724 - latent 1 decoded, taking 6.66s
[INFO]  stable-diffusion.cpp:4728 - decode_first_stage completed, taking 6.66s
[INFO]  stable-diffusion.cpp:4735 - txt2img completed in 164.66s
save result image to 'output.png'


```
![output](https://github.com/leejet/stable-diffusion.cpp/assets/5035199/d6208828-24dc-48f1-9b61-5f61ed0fa912)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA cannot generate images #95

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

CUDA cannot generate images #95

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions