You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that for the CUDA backend using an RTX 3090 the reported achieved memory bandwidth for matrix multiplication can be much greater than 936 GB/s (the maximum of the hardware). Therefore, there must be some bug with how these numbers are calculated.