Skip to content

Bug: Incorrect IntervalIndex.is_overlapping #49933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 5, 2022

Conversation

Hikipeko
Copy link
Contributor

@Hikipeko Hikipeko commented Nov 27, 2022

@Hikipeko Hikipeko changed the title Fix interval index Bug: Incorrect IntervalIndex.is_overlapping Nov 27, 2022
@@ -81,7 +81,9 @@ cdef class IntervalTree(IntervalMixin):
"""How to sort the left labels; this is used for binary search
"""
if self._left_sorter is None:
self._left_sorter = np.argsort(self.left)
left_right = np.asarray([(self.left[i], self.right[i]) for i in range(0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks likely to be non-performant. can we maybe check for uniqueness and only do this in the non-unique case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a more decent solution. Actually, I'm a first-time contributor here, so I'm wondering what to do next. Do I have to close the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can push a new commit to update your pull request with the changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think elsewhere in the file we use np.lexsort([self.left, self.right]) which should be similar to this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound like a much better way for sorting, and I have made these changes in my latest commit.

@mroeschke mroeschke added the Interval Interval data type label Nov 28, 2022
@mroeschke
Copy link
Member

Also, is pandas/_libs/intervaltree.pxi.pdf supposed to be committed?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove pandas/_libs/intervaltree.pxi.pdf?

@Hikipeko
Copy link
Contributor Author

Hikipeko commented Dec 1, 2022

@mroeschke Fixed.

Edit: Strangely it fails the "Python Dev / actions-311-dev (macOS-latest) (pull_request)" integration test.
Edit: Now passed.

@Hikipeko
Copy link
Contributor Author

Hikipeko commented Dec 4, 2022

Could you remove pandas/_libs/intervaltree.pxi.pdf?

I've removed this file, is there any thing I've to do now?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merge when ready @jbrockmendel

@mroeschke mroeschke added this to the 2.0 milestone Dec 5, 2022
@jbrockmendel jbrockmendel merged commit dba2080 into pandas-dev:main Dec 5, 2022
@jbrockmendel
Copy link
Member

thanks @Hikipeko

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Incorrect IntervalIndex.is_overlapping
3 participants