Skip to content

reindex does not work for groupby series with DateTimeIndex #26209

Closed
@ptmminh

Description

@ptmminh

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

# generate a simple df with weekly DateTimeIndex, group and a value
df = pd.DataFrame({
    'group':['Group1','Group2','Group3']*3, 
    'value':np.random.randint(100, 1000, size=9)}, 
    index=pd.date_range('1991-10-2',periods=3, freq='W-MON').repeat(3)
)

# this works as expected
new_value = df.groupby('group')[['value']].apply(
    lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='W-MON'), fill_value=0)
)

# this fails
new_value = df.groupby('group').value.apply(
    lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='W-MON'), fill_value=0)
)

Problem description

We have a DF with a group and DateTimeIndex, we want to reindex the values per group to make sure all groups have all the appropriate weekly DateTimeIndex. This works as expected when there is a gap to be resolved by the reindexing... However, it fails when there's no gap to be filled by reindex.

ValueError: cannot reindex from a duplicate axis

This issue is resolved by turning the corresponding Series into a DF by [[]] syntax.

Expected Output

The original Series.

Output of pd.show_versions()

installed versions ------------------ commit: none python: 3.6.8.final.0 python-bits: 64 os: linux os-release: 4.4.0-43-microsoft machine: x86_64 processor: x86_64 byteorder: little lc_all: none lang: en_us.utf-8 locale: en_us.utf-8

pandas: 0.24.2
pytest: 3.5.1
pip: 19.0.3
setuptools: 41.0.0
cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: none
xarray: none
ipython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.8.0
pytz: 2019.1
blosc: none
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: none
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml.etree: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: none
psycopg2: none
jinja2: 2.10
s3fs: none
fastparquet: none
pandas_gbq: none
pandas_datareader: none
gcsfs: None

Metadata

Metadata

Assignees

Labels

DatetimeDatetime data dtypeGroupbyIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions