Skip to content

Groupby on empty frame different results based on dtypes. #20888

Closed
@amanhanda

Description

@amanhanda

Code Sample, a copy-pastable example if possible

In [117]: import pandas as pd

In [118]: df = pd.DataFrame({"a":[1], "b":[2], "c":[3], "d":[4]})

In [119]: group_keys = ["a", "b", "c"]

In [120]: g = df[df.a==2].groupby(group_keys)

In [121]: g.first().index
Out[121]:
MultiIndex(levels=[[], [], []],
           labels=[[], [], []],
           names=[u'a', u'b', u'c'])
# Change the dtype for "d"
In [122]: df = pd.DataFrame({"a":[1], "b":[2], "c":[3], "d":["d"]})

In [123]: g = df[df.a==2].groupby(group_keys)

In [124]: g.first().index
Out[124]: Index([], dtype='object')

# Version 0.18
In [36]: g.first().index
Out[36]:
MultiIndex(levels=[[], [], []],
           labels=[[], [], []],
           names=[u'a', u'b', u'c'])

Problem description

The groupby should return multi-index in both cases. In one case when we have all int64 data, the resultant groupby on an empty frame returns the multi-index. When we have a str (object) dtype, the return is an empty index. Not consistent. Previous version, 0.18.0 returned multi-index in both cases.

Expected Output

In [36]: g.first().index
Out[36]:
MultiIndex(levels=[[], [], []],
           labels=[[], [], []],
           names=[u'a', u'b', u'c'])

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-327.36.3.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.22.0
pytest: 3.5.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.0.1
pyarrow: 0.9.0
xarray: 0.10.2
IPython: 5.6.0
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.2
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.2.1
bs4: 4.3.2
html5lib: 0.999
sqlalchemy: 1.2.6
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions