Skip to content

BUG: Same function calls on the same DataFrameGroupBy object give different results #34271

Closed
@leetschau

Description

@leetschau
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

Source codes

import pandas as pd                                    
inp = pd.DataFrame([[1, 2, 3, 4],
                    [5, 6, 7, 8],
                    [1, 10, 11, 12],
                    [5, 14, 15, 16],
                    [1, 18, 19, 20],
                    [21, 22, 23, 24],
                    [5, 26, 27, 28]],
                columns=['group', 'fa', 'fb', 'fc'])
print('pandas version:', pd.__version__)
grps = inp.groupby('group', as_index=True)
print('Column number in all groups:',
    grps.apply(lambda x: x.shape[1]).unique())
print('Column number in all groups:',
    grps.apply(lambda x: x.shape[1]).unique())
print('ID:', id(grps))
print('Shape of first dataframe in the group:', grps.first().shape)
print('ID:', id(grps))
print('Column number in all groups:',
    grps.apply(lambda x: x.shape[1]).unique())

Problem description

In above codes, same function calls grps.apply(lambda x: x.shape[1]).unique() give different results:

In the first 2 times before grps.first().shape is called, it returns 4.
While after grps.first().shape is called, it returns 3.

Running output

Column number in all groups: [4]
Column number in all groups: [4]
ID: 140686222651664
Shape of first dataframe in the group: (3, 3)
ID: 140686222651664
Column number in all groups: [3]

Expected Output

Column number in all groups: [4]
Column number in all groups: [4]
ID: 140686222651664
Shape of first dataframe in the group: (3, 3)
ID: 140686222651664
Column number in all groups: [4]

Environments

Python 3.7.7, Ubuntu 18.04.

Output of pd.show_versions():

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-96-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.2
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.1
setuptools       : 41.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.13.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions