Description
Code Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({"numerical_col": [-1, -2, -3, -4, -5, -6, -7, -8, -9, -10],
"date": ["2018-01-01", "2018-01-01", "2018-01-01", "2018-02-01", "2018-02-01",
"2018-03-01", "2018-01-01", "2018-01-01", "2018-02-01", "2018-02-01"],
"categories": ["a", "a", "a", "a", "a", "a", "b", "b", "b", "b"]})
df["date"] = pd.to_datetime(df["date"])
df.set_index("date").groupby("categories").tshift(0, "M")
Problem description
In short words, tshift after a groupby deletes the groupby columns ("categories" in this case) if the shift used was zero
In [14]: df.set_index("date").groupby("categories").tshift(0, "M")
Out[14]:
numerical_col
date
2018-01-01 -1
2018-01-01 -2
2018-01-01 -3
2018-02-01 -4
2018-02-01 -5
2018-03-01 -6
2018-01-01 -7
2018-01-01 -8
2018-02-01 -9
2018-02-01 -10
Expected Output
NOTE: This is the output format of any non zero shift (of course with the dates shifted)
In [15]: df.set_index("date").groupby("categories").tshift(0, "M")
Out[15]:
numerical_col
categories date
a 2018-01-01 -1
2018-01-01 -2
2018-01-01 -3
2018-02-01 -4
2018-02-01 -5
2018-03-01 -6
b 2018-01-01 -7
2018-01-01 -8
2018-02-01 -9
2018-02-01 -10
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.17.3-200.fc28.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.3
pytest: None
pip: 9.0.3
setuptools: 39.2.0
Cython: 0.28.4
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None