BUG: pd.concat with identical key leads to multi-indexing error #46546

GYHHAHA · 2022-03-28T19:47:50Z

closes BUG: Result of pd.concat([], keys=) with identical key has trouble with MultiIndex .loc[] #46519
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

jreback

jreback · 2022-03-29T00:15:51Z

pandas/core/reshape/concat.py

@@ -705,7 +705,7 @@ def _make_concat_multiindex(indexes, keys, levels=None, names=None) -> MultiInde
            names = [None]

        if levels is None:
-            levels = [ensure_index(keys)]
+            levels = [ensure_index(keys).unique()]


hmm shouldn't this be the case for a specified levels as well?

Can we check whether the level is unique before? If not, raise ValueError. The doc says it should be unique.

I find we actually do not have the check for depulicated levels in concat function. Something like the following will not raise. Since this problem is an isolated one, thus I will make another PR to escape from confusion.

df1 = pd.DataFrame({"A": [1]}, index=["x"]) df2 = pd.DataFrame({"A": [1]}, index=["y"]) pd.concat([df1, df2], levels=[["x", "y", "y"]]) # should raise

@GYHHAHA - Agreed what you are pointing out is a separate issue, but here is an example that @jreback was referring to.

df1 = pd.DataFrame({"x": [1, 2], "y": [3, 4], "z": [5, 6]}).set_index(["x", "y"]) df2 = pd.DataFrame({"x": [7, 8], "y": [9, 10], "z": [11, 12]}).set_index(["x", "y"]) result = pd.concat([df1, df2, df1], keys=["x", "y", "x"], levels=[["x", "y", "x"]]) print(result.loc['x', 1, 3])

This also raises the same error that is being addressed here.

why you loc with string '1' and '3' instead of numeric value? @rhshadrach

Yeah, now I get the error. I will look into this.

Sounds good! I believe you just need to apply your change to the else clause highlighted here (but I could be wrong).

But I believe this is caused by duplicated levels input, if levels is [["x", "y"]], then it works fine. Maybe more suitable to add this to another PR related to unique levels keyword. @rhshadrach

Ah - I see your point; the user should not ever specify a level with duplicate values and so we can raise here instead. That makes sense to separate this off into a different PR; can you see if there is an issue for this already and open one if there isn't?

It seems that such issue doesn't exist now. I'll open one then link a pr to that after refining the performance warning check for the current PR. And also since we will raise for a duplicated level, then unique() for else clause is unnecessary.

doc/source/whatsnew/v1.5.0.rst

pandas/tests/reshape/concat/test_index.py

pep8speaks · 2022-03-29T00:59:19Z

Hello @GYHHAHA! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-04-04 20:33:50 UTC

jreback · 2022-03-29T22:42:36Z

pandas/tests/reshape/concat/test_index.py

+        df2 = DataFrame({"name": [2]})
+        df3 = DataFrame({"name": [3]})
+        df_a = concat([df1, df2, df3], keys=["x", "y", "x"])
+        with tm.assert_produces_warning(PerformanceWarning):


what is showing the performance warning?

PerformanceWarning: indexing past lexsort depth may impact performance. since multi-index is unsorted.

Can you specify match="indexing past lexsort depth". This not only makes the check more specific, but also anyone reading the test will get your answer to @jreback's question.

sure, no problem

rhshadrach

Small request in the test, and I agree the path @jreback pointed out should be fixed and tested here too.

rhshadrach · 2022-03-31T18:32:27Z

pandas/tests/reshape/concat/test_index.py

+        df2 = DataFrame({"name": [2]})
+        df3 = DataFrame({"name": [3]})
+        df_a = concat([df1, df2, df3], keys=["x", "y", "x"])
+        with tm.assert_produces_warning(PerformanceWarning):


Can you specify match="indexing past lexsort depth". This not only makes the check more specific, but also anyone reading the test will get your answer to @jreback's question.

jreback

ok i think this is good cc @rhshadrach

rhshadrach

lgtm

rhshadrach · 2022-04-05T20:20:32Z

Thanks @GYHHAHA

…as-dev#46546)

GYHHAHA added 6 commits March 28, 2022 14:11

Update concat.py

c624df5

Update v1.5.0.rst

58b2bb4

Merge branch 'pandas-dev:main' into patch-1

fc2cef9

Update test_index.py

2e4317c

Update test_index.py

ef8337a

Update test_index.py

5c8e7db

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode MultiIndex labels Mar 29, 2022

jreback added this to the 1.5 milestone Mar 29, 2022

jreback requested changes Mar 29, 2022

View reviewed changes

GYHHAHA added 2 commits March 28, 2022 19:57

Update test_index.py

7330ddb

Update concat.py

1e1f525

GYHHAHA added 2 commits March 28, 2022 20:00

Update v1.5.0.rst

6d059b4

Update concat.py

3c43cea

GYHHAHA requested a review from jreback March 29, 2022 01:11

GYHHAHA added 4 commits March 28, 2022 20:38

Update test_index.py

4c91709

Update test_index.py

7429217

Update test_index.py

3cfad33

Update test_index.py

6ff6439

jreback reviewed Mar 29, 2022

View reviewed changes

rhshadrach requested changes Mar 31, 2022

View reviewed changes

GYHHAHA added 2 commits April 4, 2022 15:21

add warning message check

f2fe384

fix format

330187b

GYHHAHA requested review from rhshadrach and jreback April 4, 2022 22:06

jreback approved these changes Apr 4, 2022

View reviewed changes

rhshadrach approved these changes Apr 5, 2022

View reviewed changes

rhshadrach merged commit 54f352a into pandas-dev:main Apr 5, 2022

GYHHAHA deleted the patch-1 branch April 5, 2022 20:25

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022

BUG: pd.concat with identical key leads to multi-indexing error (pand…

ee672f3

…as-dev#46546)

Uh oh!

BUG: pd.concat with identical key leads to multi-indexing error #46546

BUG: pd.concat with identical key leads to multi-indexing error #46546

Uh oh!

Conversation

GYHHAHA commented Mar 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GYHHAHA Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach Mar 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GYHHAHA Mar 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GYHHAHA Mar 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pep8speaks commented Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2022-04-04 20:33:50 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Apr 5, 2022

Uh oh!

Uh oh!

GYHHAHA commented Mar 28, 2022 •

edited

Loading

GYHHAHA Mar 29, 2022 •

edited

Loading

rhshadrach Mar 31, 2022 •

edited

Loading

GYHHAHA Mar 31, 2022 •

edited

Loading

GYHHAHA Mar 31, 2022 •

edited

Loading

pep8speaks commented Mar 29, 2022 •

edited

Loading