Skip to content

Multicolumn GroupBy appears to convert unit64s to floats #30859

Closed
@brianwgoldman

Description

@brianwgoldman

Code Sample, a copy-pastable example if possible

pd.DataFrame({'first': [1], 'second': [1], 'value': [16148277970000000000]}).groupby(['first', 'second'])['value'].max()

Problem description

When that code snippet runs, the result is not 16148277970000000000 as you would expect, but 16148277969999998976. Note that int(float(16148277970000000000)) == 16148277969999998976.

Additional notes:

  1. The problem only appears to happen if there are multiple groupby keys. For example just doing groupby(['first']) returns the expected result. So does removing the groupby statement entirely.
  2. The problem is not specific to max. I get the same problem for min, first, last, median, mean, but not head, tail, or apply.
  3. The problem also happens if you do .transform(max).
  4. The smallest number I've found so far that has a problem is 2**63 + 1

Expected Output

16148277970000000000

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.137+
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.5
pytz : 2018.9
dateutil : 2.6.1
pip : 19.3.1
setuptools : 42.0.2
Cython : 0.29.14
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 5.5.0
pandas_datareader: 0.7.4
bs4 : 4.6.3
bottleneck : 1.3.1
fastparquet : None
gcsfs : 0.6.0
lxml.etree : 4.2.6
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.4.4
xarray : 0.14.1
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : None

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions