Skip to content

BUG : DataFrameGroupBy.quantile segfault #28194

Closed
@benjaminriviere

Description

@benjaminriviere

Code Sample

import pandas as pd
import numpy as np


df = pd.DataFrame(data = {
    "x": [1, 1, 1],
    "y": [np.nan, np.nan, np.nan],
    "z": [1, 2, 3]
})

# Works on SeriesGroupBy
df.groupby(["x", "y"])["z"].quantile(0.5)

# Segfault on DataFrameGroupBy
df.groupby(["x", "y"])[["z"]].quantile(0.5)

Problem description

Hello all,
I just noticed that there is still an issue with the quantile function when used on a DataFrameGroupBy object with Pandas 0.25.1. When the groupby operation is done with an empty column, a segfault occurs. Above is a code sample to reproduce the bug. This issue didn't occur with Pandas 0.24.2.

Expected Output

Pandas 0.24.2 gives this output :

Empty DataFrame
Columns: []
Index: []

Output of pd.show_versions()

My computer is running Ubuntu 18.04. Below are the installed packages (in a clean virtual env). The bug doesn't seem to be related to numpy as it also occured with version 1.16.4.

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.8.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-58-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.1
numpy            : 1.17.1
pytz             : 2019.2
dateutil         : 2.8.0
pip              : 19.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions