Skip to content

BUG: Segmentation fault when doing pandas.core.window.rolling.RollingGroupBy.apply #36727

Closed
@geogunow

Description

@geogunow
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame(
    [
        ["A", "group_1", pd.Timestamp(2019, 1, 1, 9)],
        ["B", "group_1", pd.Timestamp(2019, 1, 2, 9)],
        ["C", "group_2", pd.Timestamp(2019, 1, 3, 9)],
        ["D", "group_1", pd.Timestamp(2019, 1, 6, 9)],
        ["E", "group_1", pd.Timestamp(2019, 1, 7, 9)],
        ["F", "group_1", pd.Timestamp(2019, 1, 10, 9)],
        ["G", "group_2", pd.Timestamp(2019, 1, 20, 9)],
        ["H", "group_1", pd.Timestamp(2019, 4, 8, 9)],
    ],
    columns=["index", "group", "eventTime"],
).set_index("index")

groups = df.groupby("group")
df["count_to_date"] = groups.cumcount()
rolling_groups = groups.rolling("10d", on="eventTime")
group_size = rolling_groups.apply(lambda df: df.shape[0])
print(group_size)

Problem description

The above code causes a segmentation fault inside pandas for versions after 1.0.5. Since I need the above code for a project, I am restricted to using pandas 1.0.5 until this is resolved. I am not sure what is causing the segmentation fault, but all the above circumstances are necessary to reproducing the bug (ie DataFrame with special index, a column set in the DataFrame after grouping, a rolling window on a group, etc).

I have reproduced this bug on a variety of machines and operating systems.

Expected Output

                        eventTime  count_to_date
group   index                                   
group_1 A     2019-01-01 09:00:00            1.0
        B     2019-01-02 09:00:00            2.0
        D     2019-01-06 09:00:00            3.0
        E     2019-01-07 09:00:00            4.0
        F     2019-01-10 09:00:00            5.0
        H     2019-04-08 09:00:00            1.0
group_2 C     2019-01-03 09:00:00            1.0
        G     2019-01-20 09:00:00            1.0

Note: This is indeed the output of versions 1.0.5 and prior.

Output of pd.show_versions()

This is just one configuration but the bug has been reproduced on three different machines (both linux and mac), all exhibiting the same behavior.

INSTALLED VERSIONS

commit : 2a7d332
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
Version : Darwin Kernel Version 17.7.0: Thu Jun 18 21:21:34 PDT 2020; root:xnu-4570.71.82.5~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.2
numpy : 1.18.2
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    RegressionFunctionality that used to work in a prior pandas versionSegfaultNon-Recoverable ErrorWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions