Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
xref #35014
Creating a separate issue as the dropna=True
requires a different fix to dropna=False
(resolved by #35078)
Problem description
The setup is:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]})
In [3]: gb = df.groupby("A", dropna=True)
All three of these commands:
In [4]: gb['B'].transform(len)
In [5]: gb[['B']].transform(len)
In [6]: gb.transform(len)
generate a variant of this error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-3bae7d67a46f> in <module>
----> 1 gb['B'].transform(len)
/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
487
488 if not isinstance(func, str):
--> 489 return self._transform_general(
490 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs
491 )
/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in _transform_general(self, func, engine, engine_kwargs, *args, **kwargs)
556
557 result.name = self._selected_obj.name
--> 558 result.index = self._selected_obj.index
559 return result
560
/workspaces/pandas-arw2019/pandas/core/generic.py in __setattr__(self, name, value)
5167 try:
5168 object.__getattribute__(self, name)
-> 5169 return object.__setattr__(self, name, value)
5170 except AttributeError:
5171 pass
/workspaces/pandas-arw2019/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
64
65 def __set__(self, obj, value):
---> 66 obj._set_axis(self.axis, value)
/workspaces/pandas-arw2019/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
422 if not fastpath:
423 # The ensure_index call above ensures we have an Index object
--> 424 self._mgr.set_axis(axis, labels)
425
426 # ndarray compatibility
/workspaces/pandas-arw2019/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
214
215 if new_len != old_len:
--> 216 raise ValueError(
217 f"Length mismatch: Expected axis has {old_len} elements, new "
218 f"values have {new_len} elements"
ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements
Expected Output
All three should return:
Out[9]:
B
0 2
1 2
2 1
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 9843926
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-42-generic
Version : #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.0.dev0+54.g9843926e3
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1