Skip to content

groupby + shift drops group columns when as_index is False #13519

Closed
@tjader

Description

@tjader

Using groupby + shift seems to have changed behaviour in 0.17 and 0.18 compared to 0.16.
With as_index=False, I would expect the columns that the groupby is made over to remain in the output dataframe, but they are no longer present.

Code Sample, a copy-pastable example if possible

>>>import pandas as pd
>>>import numpy as np
>>>df = pd.DataFrame({'A': [1.0, 1, 1, 2, 2, 3], 'B': [1.0, 2, 3, 4, 5, 6]})
>>>df
     A    B
0  1.0  1.0
1  1.0  2.0
2  1.0  3.0
3  2.0  4.0
4  2.0  5.0
5  3.0  6.0
>>>df_sorted.groupby('A', as_index=False).shift(1)
     B
0  NaN
1  1.0
2  2.0
3  NaN
4  4.0
5  NaN

Expected Output

>>>pd.DataFrame({'A':[np.nan, 1, 1, np.nan, 2, np.nan], 'B':[np.nan, 1, 2, np.nan, 4, np.nan]})
     A    B
0  NaN  NaN
1  1.0  1.0
2  1.0  2.0
3  NaN  NaN
4  2.0  4.0
5  NaN  NaN

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.1
statsmodels: None
xarray: 0.7.2
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.1
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

I have also confirmed the issue on an similar install Linux installation using pandas 0.18.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions