Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
df = pd.DataFrame({'x': [1, 1, 1, 2, 2, 2], 'y': [1, 2, np.nan, np.nan, np.nan, 6]})
print(df)
# Doesn't replace NaN's
df.groupby('x').sum()
print(df)
# Replaces NaN's
df.groupby('x').nunique()
print(df)
Problem description
Using DataFrameGroupBy.nunique on a dataframe makes an inplace replacement of NaN values. Simply applying the function (and not assigning it to anything) will replace existing NaN's with -9223372036854775808.
There are two distinct issues:
- There is some connection to issue API / BUG: How do we differentiate between -9223372036854775808 and iNaT? #16674 in comparing NaN with -9223372036854775808 (thanks @mroeschke)
- The inplace replacement, which seems to be more problematic.
Expected Output
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
My Output
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
x y
0 1 1.000000e+00
1 1 2.000000e+00
2 1 -9.223372e+18
3 2 -9.223372e+18
4 2 -9.223372e+18
5 2 6.000000e+00
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 2.6.32-754.27.1.el6.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : None
pytest : None
hypothesis : None
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.48.0