You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Moved local update kernels to separate function which take fewer template params
Removed unncessary template parameters from kernel names submitted by these
functions. As a consequence, the size of `_tensor_accumulation_impl` shared
object reduced from 49'360'152 bytes to 36'422'888, that is, by almost 13MB.
0 commit comments