Skip to content

numpy vs pandas: different estimation of covariance in presence of nan values #16837

Open
@asdf8601

Description

@asdf8601

Code Sample, a copy-pastable example if possible

# =============================================================================
# NUMPY VS PANDAS: DIFFERENT ESTIMATION OF COVARIANCE IN PRESENCE OF NAN VALUES
# =============================================================================
# data with nan values
M = np.random.randn(10,2)
# add missing values
M[0,0] = np.nan
M[1,1] = np.nan

# Covariance matrix calculations
# ==============================
# numpy
# -----
masked_arr = np.ma.array(M, mask=np.isnan(M))
cov_numpy = np.ma.cov(masked_arr, rowvar=0, allow_masked=True, ddof=1).data

# pandas
# ------
cov_pandas = pd.DataFrame(M).cov(min_periods=0).values

# Homemade covariance coefficient calculation
# (what each of them is actually doing)
# =============================================
# select elements to estimate the covariance matrix element (0,1)
x = M[:,0]
y = M[:,1]

mask_x = ~np.isnan(x)
mask_y = ~np.isnan(y)
mask_common = mask_x & mask_y

# numpy
# -----
xn = x-np.mean(x[mask_x])
yn = y-np.mean(y[mask_y])
cov_np = sum(a*b for a,b,c in zip(xn,yn, mask_common) if c)/(np.sum(mask_common)-1)

# pandas
# ------
xn = x-np.mean(x[mask_common])
yn = y-np.mean(y[mask_common])
cov_pd = sum(a*b for a,b,c in zip(xn,yn, mask_common) if c)/(np.sum(mask_common)-1)

Problem description

I try to calculate the covariance matrix in presence of missing values and I've note that numpy and pandas retrieve differents matrix and that difference increases when increase the presence of missing values. I let above a snippet of both implementations. For me is more useful numpy way, it's seems to be more robust in presence of missing values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions