Skip to content

Commit 08cfede

Browse files
Moved local update kernels to separate function which take fewer template params
Removed unncessary template parameters from kernel names submitted by these functions. As a consequence, the size of `_tensor_accumulation_impl` shared object reduced from 49'360'152 bytes to 36'422'888, that is, by almost 13MB.
1 parent 80f288c commit 08cfede

File tree

1 file changed

+216
-251
lines changed

1 file changed

+216
-251
lines changed

0 commit comments

Comments
 (0)