Skip to content

BUG: concat fails if indexes are all the same and keys are not unique #43596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,7 @@ Reshaping
- Bug in :meth:`DataFrame.append` failing to retain dtypes when appended columns do not match (:issue:`43392`)
- Bug in :func:`concat` of ``bool`` and ``boolean`` dtypes resulting in ``object`` dtype instead of ``boolean`` dtype (:issue:`42800`)
- Bug in :func:`crosstab` when inputs are are categorical Series, there are categories that are not present in one or both of the Series, and ``margins=True``. Previously the margin value for missing categories was ``NaN``. It is now correctly reported as 0 (:issue:`43505`)
- Bug in :func:`concat` would fail when the ``objs`` argument all had the same index and the ``keys`` argument contained duplicates (:issue:`43595`)

Sparse
^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/reshape/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -695,7 +695,7 @@ def _make_concat_multiindex(indexes, keys, levels=None, names=None) -> MultiInde
else:
levels = [ensure_index(x) for x in levels]

if not all_indexes_same(indexes):
if not all_indexes_same(indexes) or not all(level.is_unique for level in levels):
codes_list = []

# things are potentially different sizes, so compute the exact codes
Expand Down
23 changes: 22 additions & 1 deletion pandas/tests/reshape/concat/test_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,16 @@
deque,
)
from decimal import Decimal
from warnings import catch_warnings
from warnings import (
catch_warnings,
simplefilter,
)

import numpy as np
import pytest

from pandas.errors import PerformanceWarning

import pandas as pd
from pandas import (
DataFrame,
Expand Down Expand Up @@ -560,6 +565,22 @@ def test_duplicate_keys(keys):
tm.assert_frame_equal(result, expected)


def test_duplicate_keys_same_frame():
# GH 43595
keys = ["e", "e"]
df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
result = concat([df, df], axis=1, keys=keys)
expected_values = [[1, 4, 1, 4], [2, 5, 2, 5], [3, 6, 3, 6]]
expected_columns = MultiIndex.from_tuples(
[(keys[0], "a"), (keys[0], "b"), (keys[1], "a"), (keys[1], "b")]
)
expected = DataFrame(expected_values, columns=expected_columns)
with catch_warnings():
# result.columns not sorted, resulting in performance warning
simplefilter("ignore", PerformanceWarning)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"obj",
[
Expand Down