Skip to content

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna #35014

Closed
@TomAugspurger

Description

@TomAugspurger

Code Sample, a copy-pastable example

In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]})
In [3]: gb = df.groupby("A", dropna=False)
In [6]: gb['B'].transform(len)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-3bae7d67a46f> in <module>
----> 1 gb['B'].transform(len)

~/sandbox/pandas/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
    471         if not isinstance(func, str):
    472             return self._transform_general(
--> 473                 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
    474             )
    475

~/sandbox/pandas/pandas/core/groupby/generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs)
    537
    538         result.name = self._selected_obj.name
--> 539         result.index = self._selected_obj.index
    540         return result
    541

~/sandbox/pandas/pandas/core/generic.py in __setattr__(self, name, value)
   5141         try:
   5142             object.__getattribute__(self, name)
-> 5143             return object.__setattr__(self, name, value)
   5144         except AttributeError:
   5145             pass

~/sandbox/pandas/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
     64
     65     def __set__(self, obj, value):
---> 66         obj._set_axis(self.axis, value)

~/sandbox/pandas/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
    422         if not fastpath:
    423             # The ensure_index call above ensures we have an Index object
--> 424             self._mgr.set_axis(axis, labels)
    425
    426     # ndarray compatibility

~/sandbox/pandas/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    213         if new_len != old_len:
    214             raise ValueError(
--> 215                 f"Length mismatch: Expected axis has {old_len} elements, new "
    216                 f"values have {new_len} elements"
    217             )

ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements

Problem description

Compare that with the following

In [4]: gb.transform(len)
Out[4]:
   B
0  2
1  2
2  1
3  1

In [5]: gb[['B']].transform(len)
Out[5]:
   B
0  2
1  2
2  1
3  1

So it's just when slicing down to a SeriesGroupBy object.

Expected Output

A series:

Out[5]:
0  2
1  2
2  1
3  1

Metadata

Metadata

Assignees

Labels

BugGroupbyMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions