Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import pdb
df = pd.DataFrame(
{
'col1': ['A', 'A', 'A', 'B', 'B', 'B'],
'col2': [1, 2, 3, 4, 5, 6],
}
)
def fn(x):
pdb.set_trace()
x.col2[x.index[-1]] = 0
return x.col2
result = df.groupby(['col1'], as_index=False).apply(fn)
print(result)
Problem description
The expected output is:
0 0 1
1 2
2 0
1 3 4
4 5
5 0
Instead, I get a Series one row longer than expected:
0 0 1
1 2
2 0
1 3 4
4 5
5 6
5 0
The problem seems to come from processing the second group (col1 == 'B'), where indices do not match row numbers. If I stand at the breakpoint (pdb.set_trace()), I can run this with the following results:
-> x.col2[x.index[-1]] = 0
(Pdb) x.col2
3 4
4 5
5 6
Name: col2, dtype: int64
(Pdb) x.col2[5]
*** KeyError: 5
(Pdb) x.col2[5] = 0
(Pdb) x.col2
3 4
4 5
5 6
5 0
Name: col2, dtype: int64
(Pdb) x.col2[5]
5 6
5 0
Name: col2, dtype: int64
(Pdb) x.col2[5] = 0
(Pdb) x.col2
3 4
4 5
5 0
5 0
Name: col2, dtype: int64
Expected output
0 0 1
1 2
2 0
1 3 4
4 5
5 0
This was working before. Unfortunately, I do not know what Pandas version it was.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.5.13-050513-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.3
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.2.3
setuptools : 40.6.2
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.2.1
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.9
tables : 3.5.2
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.1
numba : 0.45.1