Skip to content

Should Groupby.sum modify _selected_obj? #32468

Closed
@MarcoGorelli

Description

@MarcoGorelli

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd                                                                                                                                                                            

In [2]: df = pd.DataFrame({"A": ["foo"] * 3 + ["bar"] * 3, "B": [1] * 6})                                                                                                                              

In [3]: g = df.groupby("A")                                                                                                                                                                            

In [4]: g._selected_obj                                                                                                                                                                                
Out[4]: 
     A  B
0  foo  1
1  foo  1
2  foo  1
3  bar  1
4  bar  1
5  bar  1

In [5]: g.sum()                                                                                                                                                                                        
Out[5]: 
     B
A     
bar  3
foo  3

In [6]: g._selected_obj                                                                                                                                                                                
Out[6]: 
   B
0  1
1  1
2  1
3  1
4  1
5  1

Problem description

Noticed this while working on #32332

Expected Output

I wasn't expecting this to have side-effects

Output of pd.show_versions()

In [7]: pd.show_versions()
/home/SERILOCAL/m.gorelli/miniconda3/envs/pandas-dev/lib/python3.7/site-packages/fastparquet/dataframe.py:5: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import CategoricalIndex, RangeIndex, Index, MultiIndex

INSTALLED VERSIONS

commit : f7bed05
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-88-generic
Version : #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.1.0.dev0+694.gf7bed0583
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 45.1.0.post20200119
Cython : 0.29.14
pytest : 5.3.4
hypothesis : 5.3.0
sphinx : 2.3.1
blosc : 1.8.3
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : 0.3.2
gcsfs : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : 0.4.0
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
tabulate : 0.8.6
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.47.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions