Skip to content

Commit f0995d2

Browse files
authored
metal : use FA-vec kernel up to batch size 20 (#13496)
* batched-bench : fix pp batch contents * metal : optimize multi-sequence FA vec kernel ggml-ci * metal : use FA-vec kernel up to batch size 20 ggml-ci
1 parent c252e0c commit f0995d2

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

ggml/src/ggml-metal/ggml-metal.m

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4358,7 +4358,7 @@ static bool ggml_metal_encode_node(
43584358
// TODO: add vec kernels for (ne00%64 == 0) and maybe also for (ne00%32 == 0)
43594359
// for now avoiding mainly to keep the number of templates/kernels a bit lower
43604360
// these are now trivial to add after: https://github.com/ggml-org/llama.cpp/pull/12612
4361-
if (ne01 >= 4 || (ne00%128 != 0 && ne00 != 96 && ne00 != 192 && ne00 != 576)) {
4361+
if (ne01 >= 20 || (ne00%128 != 0 && ne00 != 96 && ne00 != 192 && ne00 != 576)) {
43624362
switch (src1->type) {
43634363
case GGML_TYPE_F16:
43644364
{

0 commit comments

Comments
 (0)