Skip to content

BUG: hash_pandas_object hash differs for NaN  #28363

Open
@naupa

Description

@naupa

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

from pandas.util import hash_pandas_object

x = pd.Series([np.nan])
print(x.data.hex())

expected = hash_pandas_object(x)

x1 = np.array([-np.sqrt(-1.)])
print(x1.data.hex())
hashed_x1 = hash_pandas_object(pd.Series(x1))
assert expected[0] == hashed_x1.values[0]

x2 = np.array([np.sqrt(-1.)])
print(x2.data.hex())
hashed_x2 = hash_pandas_object(pd.Series(x2))
assert expected[0] == hashed_x2.values[0]

Problem description

For the array x2 the nan value results in a different hash what is not expected.

Expected Output

# To fix the test for x2 one could do a fillna
# x2.fillna(np.nan)

x2 = np.array([np.sqrt(-1.)])
print(x2.data.hex())
x2 = pd.Series(x2)
x2 = x2.fillna(np.nan)
hashed_x2 = hash_pandas_object(x2)
assert expected[0] == hashed_x2.values[0]

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.4.final.0
python-bits: 64
OS: Linux
OS-release: 5.2.8-1-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8

pandas: 0.24.2
pytest: 5.0.1
pip: 19.0.3
setuptools: 41.0.1
Cython: 0.29.13
numpy: 1.17.0
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.6.1
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatehashinghash_pandas_object

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions