Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
from pandas.util import hash_pandas_object
x = pd.Series([np.nan])
print(x.data.hex())
expected = hash_pandas_object(x)
x1 = np.array([-np.sqrt(-1.)])
print(x1.data.hex())
hashed_x1 = hash_pandas_object(pd.Series(x1))
assert expected[0] == hashed_x1.values[0]
x2 = np.array([np.sqrt(-1.)])
print(x2.data.hex())
hashed_x2 = hash_pandas_object(pd.Series(x2))
assert expected[0] == hashed_x2.values[0]
Problem description
For the array x2 the nan value results in a different hash what is not expected.
Expected Output
# To fix the test for x2 one could do a fillna
# x2.fillna(np.nan)
x2 = np.array([np.sqrt(-1.)])
print(x2.data.hex())
x2 = pd.Series(x2)
x2 = x2.fillna(np.nan)
hashed_x2 = hash_pandas_object(x2)
assert expected[0] == hashed_x2.values[0]
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.7.4.final.0
python-bits: 64
OS: Linux
OS-release: 5.2.8-1-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8
pandas: 0.24.2
pytest: 5.0.1
pip: 19.0.3
setuptools: 41.0.1
Cython: 0.29.13
numpy: 1.17.0
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.6.1
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None