Skip to content

Commit 0e5367b

Browse files
tptopper-123
tp
authored andcommitted
improve performance of Series.searchsorted
1 parent 0370740 commit 0e5367b

File tree

3 files changed

+26
-3
lines changed

3 files changed

+26
-3
lines changed

asv_bench/benchmarks/series_methods.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,25 @@ def time_dropna(self, dtype):
130130
self.s.dropna()
131131

132132

133+
class SearchSorted(object):
134+
135+
goal_time = 0.2
136+
params = ['int8', 'int16', 'int32', 'int64',
137+
'uint8', 'uint16', 'uint32', 'uint64',
138+
'float16', 'float32', 'float64',
139+
'str']
140+
param_names = ['dtype']
141+
142+
def setup(self, dtype):
143+
N = 10**5
144+
data = np.array([1] * N + [2] * N + [3] * N).astype(dtype)
145+
self.s = Series(data)
146+
147+
def time_searchsorted(self, dtype):
148+
key = '2' if dtype == 'str' else 2
149+
self.s.searchsorted(key)
150+
151+
133152
class Map(object):
134153

135154
goal_time = 0.2

doc/source/whatsnew/v0.24.0.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -499,7 +499,8 @@ Performance Improvements
499499
- Very large improvement in performance of slicing when the index is a :class:`CategoricalIndex`,
500500
both when indexing by label (using .loc) and position(.iloc).
501501
Likewise, slicing a ``CategoricalIndex`` itself (i.e. ``ci[100:200]``) shows similar speed improvements (:issue:`21659`)
502-
- Improved performance of :func:`Series.describe` in case of numeric dtpyes (:issue:`21274`)
502+
- Improved performance of :func:`Series.searchsorted` (:issue:`22034`)
503+
- Improved performance of :func:`Series.describe` in case of numeric dtypes (:issue:`21274`)
503504
- Improved performance of :func:`pandas.core.groupby.GroupBy.rank` when dealing with tied rankings (:issue:`21237`)
504505
- Improved performance of :func:`DataFrame.set_index` with columns consisting of :class:`Period` objects (:issue:`21582`,:issue:`21606`)
505506
- Improved performance of membership checks in :class:`Categorical` and :class:`CategoricalIndex`

pandas/core/series.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2082,8 +2082,11 @@ def __rmatmul__(self, other):
20822082
def searchsorted(self, value, side='left', sorter=None):
20832083
if sorter is not None:
20842084
sorter = ensure_platform_int(sorter)
2085-
return self._values.searchsorted(Series(value)._values,
2086-
side=side, sorter=sorter)
2085+
if not is_extension_type(self._values):
2086+
value = np.asarray(value, dtype=self._values.dtype)
2087+
value = value[..., np.newaxis] if value.ndim == 0 else value
2088+
2089+
return self._values.searchsorted(value, side=side, sorter=sorter)
20872090

20882091
# -------------------------------------------------------------------
20892092
# Combination

0 commit comments

Comments
 (0)