-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Implemented MultiIndex.searchsorted method ( GH14833) #61435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 11 commits
cffb863
1ba7ff8
ac70f3e
275b0e2
0e0b9b5
4747609
9ac62ab
e2c2c5e
e88da57
94f7c44
1f4a1c9
ffd99d8
5e2caa4
6b0d0ab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -147,14 +147,14 @@ def test_searchsorted(request, index_or_series_obj): | |
# See gh-12238 | ||
obj = index_or_series_obj | ||
|
||
if isinstance(obj, pd.MultiIndex): | ||
# See gh-14833 | ||
request.applymarker( | ||
pytest.mark.xfail( | ||
reason="np.searchsorted doesn't work on pd.MultiIndex: GH 14833" | ||
) | ||
) | ||
elif obj.dtype.kind == "c" and isinstance(obj, Index): | ||
# if isinstance(obj, pd.MultiIndex): | ||
# # See gh-14833 | ||
# request.applymarker( | ||
# pytest.mark.xfail( | ||
# reason="np.searchsorted doesn't work on pd.MultiIndex: GH 14833" | ||
# ) | ||
# ) | ||
if obj.dtype.kind == "c" and isinstance(obj, Index): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you remove this instead of commenting please? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Absolutely, I will do it right away. |
||
# TODO: Should Series cases also raise? Looks like they use numpy | ||
# comparison semantics https://github.com/numpy/numpy/issues/15981 | ||
mark = pytest.mark.xfail(reason="complex objects are not comparable") | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1029,3 +1029,28 @@ def test_get_loc_namedtuple_behaves_like_tuple(): | |
assert idx.get_loc(("i1", "i2")) == 0 | ||
assert idx.get_loc(("i3", "i4")) == 1 | ||
assert idx.get_loc(("i5", "i6")) == 2 | ||
|
||
|
||
def test_searchsorted(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be better to divide this test. Ideally, when a test fails, we want to know what's wrong just by checking the test name, and also we want to still test everything else. With a single test, the first thing that fails will make the whole test fail. You can use pytest fixtures and parametrize to not repeat too much code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the suggestion, it is quite insightful for me. I have created three different functions as follows: def test_searchsorted_single():... # for single inputs
def test_searchsorted_lists():... # list of single inputs
def test_searchsorted_invalid():... # for invalid inputs Would this be the correct way to do it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that may seem silly, but then when a test fails it's very easy to know what it fails. Ideally an assert per test, and if more than one input is asserted, then pytest.mark.parametrize to reuse a test with multiple inputs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No no, it is not at all silly. I understand the importance of descriptive test functions. Thank you. |
||
# GH14833 | ||
mi = MultiIndex.from_tuples([("a", 0), ("a", 1), ("b", 0), ("b", 1), ("c", 0)]) | ||
|
||
assert np.all(mi.searchsorted(("b", 0)) == 2) | ||
assert np.all(mi.searchsorted(("b", 0), side="right") == 3) | ||
assert np.all(mi.searchsorted(("a", 0)) == 0) | ||
assert np.all(mi.searchsorted(("a", -1)) == 0) | ||
assert np.all(mi.searchsorted(("c", 1)) == 5) | ||
|
||
result = mi.searchsorted([("a", 1), ("b", 0), ("c", 0)]) | ||
expected = np.array([1, 2, 4], dtype=np.intp) | ||
tm.assert_numpy_array_equal(result, expected) | ||
|
||
result = mi.searchsorted([("a", 1), ("b", 0), ("c", 0)], side="right") | ||
expected = np.array([2, 3, 5], dtype=np.intp) | ||
tm.assert_numpy_array_equal(result, expected) | ||
|
||
with pytest.raises(ValueError, match="side must be either 'left' or 'right'"): | ||
mi.searchsorted(("a", 1), side="middle") | ||
|
||
with pytest.raises(TypeError, match="value must be a tuple or list"): | ||
mi.searchsorted("a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to remove this. We do have some notebooks in this repo iirc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will remove it.