Closed
Description
In line of code, you calculate positional encoding for Transformers by taking the log first and then apply the exponential function.
Would you please elaborate on why you do this instead of directly doing the calculation?
I'm aware that log transformation can make multiplication become addition, but it seems that this is not the case here.
cc @suraj813