Description
Code Sample, a copy-pastable example if possible
pd.DataFrame({'first': [1], 'second': [1], 'value': [16148277970000000000]}).groupby(['first', 'second'])['value'].max()
Problem description
When that code snippet runs, the result is not 16148277970000000000
as you would expect, but 16148277969999998976
. Note that int(float(16148277970000000000)) == 16148277969999998976
.
Additional notes:
- The problem only appears to happen if there are multiple groupby keys. For example just doing
groupby(['first'])
returns the expected result. So does removing thegroupby
statement entirely. - The problem is not specific to
max
. I get the same problem formin
,first
,last
,median
,mean
, but nothead
,tail
, orapply
. - The problem also happens if you do
.transform(max)
. - The smallest number I've found so far that has a problem is 2**63 + 1
Expected Output
16148277970000000000
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.137+
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.17.5
pytz : 2018.9
dateutil : 2.6.1
pip : 19.3.1
setuptools : 42.0.2
Cython : 0.29.14
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 5.5.0
pandas_datareader: 0.7.4
bs4 : 4.6.3
bottleneck : 1.3.1
fastparquet : None
gcsfs : 0.6.0
lxml.etree : 4.2.6
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.4.4
xarray : 0.14.1
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : None