Skip to content

Regression: concatenating MultiIndex with empty RangeIndex raises #41234

Closed
@mlondschien

Description

@mlondschien
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


As also reported here: dask/dask#7610, PR #38671 introduced the following behaviour:

A reproducer with just pandas:

df1 = pd.DataFrame([[1, 2]], columns=pd.MultiIndex.from_tuples([('B', 1), ('C', 1)]))
df2 = pd.DataFrame(index=[0], columns=pd.RangeIndex(0))

pd.concat([df1, df2])

What triggers the error here is the columns=pd.RangeIndex(0) for the empty df2 (by default pandas creates a zero-length object dtype Index, which works fine, but if it's an empty RangeIndex, we now get this error).

This is a regression in itself in pandas. But I am also wondering a bit where the empty RangeIndex is coming from (it seems that the groupby operation in dask results in some empty partitions)

Originally posted by @jorisvandenbossche in dask/dask#7610 (comment)

IIUC the fix for this could possibly be as simple as replacing

if isinstance(self, ABCMultiIndex) and not is_object_dtype(
unpack_nested_dtype(other)
):

with

            if isinstance(self, ABCMultiIndex) and not is_object_dtype(
                unpack_nested_dtype(other)
            ) and len(other) > 0:

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexRelated to the Index class or subclassesRegressionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions