Skip to content

BUG: Fix duplicates in intersection of multiindexes #36927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Nov 29, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
cdefaae
Fix duplicates in intersectin of multiindexes
phofl Oct 6, 2020
fbd63f2
Fix duplicates in index intersection
phofl Oct 6, 2020
53a37d1
Modify test and avoid None issues
phofl Oct 6, 2020
5675a4e
Fix failing test
phofl Oct 6, 2020
134936c
Merge branch 'master' of https://github.com/pandas-dev/pandas into 36915
phofl Oct 9, 2020
582c0b9
Change comment
phofl Oct 9, 2020
7805de5
Add unique after intersection
phofl Oct 11, 2020
67691df
Merge branch 'master' of https://github.com/pandas-dev/pandas into 36915
phofl Oct 11, 2020
8fb0055
Merge branch '31326' into 36915
phofl Oct 11, 2020
66b519f
Fix merge bug
phofl Oct 11, 2020
cb1477b
Add tests and whatsnew
phofl Oct 11, 2020
0fb2561
Add rename
phofl Oct 26, 2020
3c19d57
Merge branch 'master' of https://github.com/pandas-dev/pandas into 36915
phofl Oct 26, 2020
10524fd
Fix check in merge operation
phofl Oct 26, 2020
3dde0ee
Exit set ops when nonunique
phofl Nov 12, 2020
a0a1a33
Merge branch 'master' of https://github.com/pandas-dev/pandas into 36915
phofl Nov 12, 2020
45dfb84
Merge branch 'master' of https://github.com/pandas-dev/pandas into 36915
phofl Nov 22, 2020
d71a499
Roll back to initial version
phofl Nov 22, 2020
d873d5a
Change whatsnew
phofl Nov 22, 2020
e90239a
Move whatsnew
phofl Nov 27, 2020
c2b448a
Merge branch 'master' of https://github.com/pandas-dev/pandas into 36915
phofl Nov 27, 2020
742716e
Change gh reference
phofl Nov 27, 2020
321797a
Remove pd
phofl Nov 27, 2020
a980ec0
Remove whatsnew from 1.2
phofl Nov 28, 2020
972fd48
Fix test
phofl Nov 28, 2020
fe1ded4
Make condition more clear and add assert
phofl Nov 28, 2020
8e4d47b
Use shape for equality check
phofl Nov 28, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v1.1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ including other versions of pandas.

Fixed regressions
~~~~~~~~~~~~~~~~~
-

- Regression in :meth:`MultiIndex.intersection` returned duplicates when at least one of the indexes had duplicates (:issue:`36915`)

.. ---------------------------------------------------------------------------

Expand Down
6 changes: 5 additions & 1 deletion pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -3468,6 +3468,8 @@ def intersection(self, other, sort=False):
other, result_names = self._convert_can_do_setop(other)

if self.equals(other):
if self.has_duplicates:
return self.unique()
return self

if not is_object_dtype(other.dtype):
Expand All @@ -3486,10 +3488,12 @@ def intersection(self, other, sort=False):
uniq_tuples = None # flag whether _inner_indexer was successful
if self.is_monotonic and other.is_monotonic:
try:
uniq_tuples = self._inner_indexer(lvals, rvals)[0]
inner_tuples = self._inner_indexer(lvals, rvals)[0]
sort = False # uniq_tuples is already sorted
except TypeError:
pass
else:
uniq_tuples = algos.unique(inner_tuples)

if uniq_tuples is None:
other_uniq = set(rvals)
Expand Down
23 changes: 23 additions & 0 deletions pandas/tests/indexes/multi/test_setops.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,3 +376,26 @@ def test_setops_disallow_true(method):

with pytest.raises(ValueError, match="The 'sort' keyword only takes"):
getattr(idx1, method)(idx2, sort=True)


@pytest.mark.parametrize(
("tuples", "exp_tuples"),
[
([("val1", "test1")], [("val1", "test1")]),
([("val1", "test1"), ("val1", "test1")], [("val1", "test1")]),
(
[("val2", "test2"), ("val1", "test1")],
[("val2", "test2"), ("val1", "test1")],
),
],
)
def test_intersect_with_duplicates(tuples, exp_tuples):
# GH: 36915
left = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
right = pd.MultiIndex.from_tuples(
[("val1", "test1"), ("val1", "test1"), ("val2", "test2")],
names=["first", "second"],
)
result = left.intersection(right)
expected = pd.MultiIndex.from_tuples(exp_tuples, names=["first", "second"])
tm.assert_index_equal(result, expected)