Skip to content

groupby apply failed on dataframe with DatetimeIndex #26182

Closed
@goldenbull

Description

@goldenbull

Code Sample, a copy-pastable example if possible

# -*- coding: utf-8 -*-

import pandas as pd

def _do_calc_(_df):
    df2 = _df.copy()
    df2["vol_ma20"] = df2["volume"].rolling(20, min_periods=1).mean()
    return df2

dbars = pd.read_csv("2019.dbar_ftridx.csv.gz",
                    index_col=False,
                    encoding="utf-8-sig",
                    parse_dates=["trade_day"])
dbars = dbars.set_index("trade_day", drop=False)  # everything works fine if this line is commented
df = dbars.groupby("exchange").apply(_do_calc_)
print(len(df))

Problem description

here is the input data file:
2019.dbar_ftridx.csv.gz

this piece of code runs well with pandas 0.23, when upgraded to 0.24.2, it reports error:

Traceback (most recent call last):
  File "D:/test/groupby_bug.py", line 16, in <module>
    df = dbars.groupby("exchange").apply(_do_calc_)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 701, in apply
    return self._python_apply_general(f)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 712, in _python_apply_general
    not_indexed_same=mutated or self.mutated)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby\generic.py", line 318, in _wrap_applied_output
    not_indexed_same=not_indexed_same)
  File "C:\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py", line 918, in _concat_objects
    sort=False)
  File "C:\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 228, in concat
    copy=copy, sort=sort)
  File "C:\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 292, in __init__
    obj._consolidate(inplace=True)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5156, in _consolidate
    self._consolidate_inplace()
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5138, in _consolidate_inplace
    self._protect_consolidate(f)
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5127, in _protect_consolidate
    result = f()
  File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5136, in f
    self._data = self._data.consolidate()
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 922, in consolidate
    bm = self.__class__(self.blocks, self.axes)
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 114, in __init__
    self._verify_integrity()
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 311, in _verify_integrity
    construction_error(tot_items, block.shape[1:], self.axes)
  File "C:\Anaconda3\lib\site-packages\pandas\core\internals\managers.py", line 1691, in construction_error
    passed, implied))
ValueError: Shape of passed values is (432, 27), indices imply (1080, 27)

If I do not call set_index() on the dataframe, it works fine. Seems there is something wrong with the DatetimeIndex?

I don't know if this error can be re-produced on your machine, I can re-produce the same error on all my machines.

Expected Output

4104

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
C:\Anaconda3\python.exe D:/test/groupby_bug.py

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: 4.4.0
pip: 19.0.3
setuptools: 41.0.0
Cython: 0.29.7
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.6
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions