`CUDA error: invalid configuration argument` for MoEs - `--ubatch-size 8192` exceeds `INT_MAX`

Tagging @JohannesGaessler for visibility!

## TLDR:
I'm running imatrix.cpp (latest llama.cpp) with `--ubatch-size 8192`, but am getting CUDA errors. My suspicion is CUDA needs arguments < INT_MAX (2^31-1), but large physical batch sizes causes CUDA launch errors for MoEs. **`--ubatch-size 8191` works fine**. 8192 does not.

## Long form:
I'm running imatrix.cpp with large physical batch sizes (8192), but sadly I get errors with:
```bash
CUDA error: invalid configuration argument
  current device: 0, in function ggml_cuda_mul_mat_id at llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2062
  cudaGetLastError()
```
ie the error is here:
```cpp
get_rows_cuda(src1->data, src1->type, ids_to_sorted, src1_sorted.ptr, type_src1_sorted,
        ne10, nb11, nb12, nb13,
        ne_get_rows, 1, 1, sizeof(int32_t), ne_get_rows*sizeof(int32_t), ne_get_rows*sizeof(int32_t),
        ne10*ts_src1_sorted, ne_get_rows*ne10*ts_src1_sorted, ne_get_rows*ne10*ts_src1_sorted, stream);
CUDA_CHECK(cudaGetLastError());
```

Using `--ubatch-size 8192` causes the error to occur on Qwen 3 30B MoE.
### `--ubatch-size 8191` works fine.

My suspicion is because CUDA I think requires arguments to be `< INT_MAX` It's because Qwen has 128 experts, 2048 in dim, so `8192 * 2048 * 128 = 2147483648 > 2147483647 (INT_MAX)`.

`8191 * 2048 * 128 = 2147221504, so less than INT_MAX`.

Ie one of the arguments:
```cpp
ne10*ts_src1_sorted, ne_get_rows*ne10*ts_src1_sorted, ne_get_rows*ne10*ts_src1_sorted
```
is exceeding `INT_MAX`, thus causing CUDA to error out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`CUDA error: invalid configuration argument` for MoEs - `--ubatch-size 8192` exceeds `INT_MAX` #13376

TLDR:

Long form:

`--ubatch-size 8191` works fine.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CUDA error: invalid configuration argument for MoEs - --ubatch-size 8192 exceeds INT_MAX #13376

Description

TLDR:

Long form:

--ubatch-size 8191 works fine.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`CUDA error: invalid configuration argument` for MoEs - `--ubatch-size 8192` exceeds `INT_MAX` #13376

`--ubatch-size 8191` works fine.