Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
# generate a simple df with weekly DateTimeIndex, group and a value
df = pd.DataFrame({
'group':['Group1','Group2','Group3']*3,
'value':np.random.randint(100, 1000, size=9)},
index=pd.date_range('1991-10-2',periods=3, freq='W-MON').repeat(3)
)
# this works as expected
new_value = df.groupby('group')[['value']].apply(
lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='W-MON'), fill_value=0)
)
# this fails
new_value = df.groupby('group').value.apply(
lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='W-MON'), fill_value=0)
)
Problem description
We have a DF with a group and DateTimeIndex, we want to reindex
the values per group to make sure all groups have all the appropriate weekly DateTimeIndex. This works as expected when there is a gap to be resolved by the reindex
ing... However, it fails when there's no gap to be filled by reindex
.
ValueError: cannot reindex from a duplicate axis
This issue is resolved by turning the corresponding Series into a DF by [[]]
syntax.
Expected Output
The original Series.
Output of pd.show_versions()
pandas: 0.24.2
pytest: 3.5.1
pip: 19.0.3
setuptools: 41.0.0
cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: none
xarray: none
ipython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.8.0
pytz: 2019.1
blosc: none
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: none
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml.etree: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: none
psycopg2: none
jinja2: 2.10
s3fs: none
fastparquet: none
pandas_gbq: none
pandas_datareader: none
gcsfs: None