Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
rng = np.random.default_rng(1)
df = pd.DataFrame(rng.normal(size=(100, 5)))
print(1 - np.diag(df.corr().abs()).min())
# pandas 1.4.0
# -4.440892098500626e-16
# pandas 1.3.5
# 0.0
Issue Description
With pandas 1.4.0, df.corr()
returns a matrix where the diagonal is not exactly 1 down to floating point precision.
In pandas 1.3.5 the diagonal of df.corr()
was exactly 1.
The example above show the difference.
This causes issues when using the dist = 1 - df.corr().abs()
as a distance matrix for clustering. In particular the call to scipy.spatial.distance.squareform(dist)
raises an error with pandas 1.4.0 when the dist
diagonal is not exactly 0.
Expected Behavior
The diagonal of df.corr()
should be exactly 1 down to floating point accuracy
Installed Versions
Replace this line with the output of pd.show_versions()