Skip to content

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna=True #35612

Closed
@arw2019

Description

@arw2019
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.

xref #35014

Creating a separate issue as the dropna=True requires a different fix to dropna=False (resolved by #35078)

Problem description

The setup is:

In [1]: import pandas as pd                                                                                                                    
In [2]: df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]})                                                                        
In [3]: gb = df.groupby("A", dropna=True)                                                                                                      

All three of these commands:

In [4]: gb['B'].transform(len)                                                                                                                 
In [5]: gb[['B']].transform(len)                                                                                                               
In [6]: gb.transform(len)                                                                                                                      

generate a variant of this error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-3bae7d67a46f> in <module>
----> 1 gb['B'].transform(len)

/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
    487 
    488         if not isinstance(func, str):
--> 489             return self._transform_general(
    490                 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
    491             )

/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs)
    556 
    557         result.name = self._selected_obj.name
--> 558         result.index = self._selected_obj.index
    559         return result
    560 

/workspaces/pandas-arw2019/pandas/core/generic.py in __setattr__(self, name, value)
   5167         try:
   5168             object.__getattribute__(self, name)
-> 5169             return object.__setattr__(self, name, value)
   5170         except AttributeError:
   5171             pass

/workspaces/pandas-arw2019/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
     64 
     65     def __set__(self, obj, value):
---> 66         obj._set_axis(self.axis, value)

/workspaces/pandas-arw2019/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
    422         if not fastpath:
    423             # The ensure_index call above ensures we have an Index object
--> 424             self._mgr.set_axis(axis, labels)
    425 
    426     # ndarray compatibility

/workspaces/pandas-arw2019/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    214 
    215         if new_len != old_len:
--> 216             raise ValueError(
    217                 f"Length mismatch: Expected axis has {old_len} elements, new "
    218                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements

Expected Output

All three should return:

Out[9]: 
   B
0  2
1  2
2  1

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 9843926
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+54.g9843926e3
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions