Skip to content

Commit 897fe27

Browse files
tptopper-123
tp
authored andcommitted
improve performance of Series.searchsorted
1 parent ffae158 commit 897fe27

File tree

3 files changed

+26
-3
lines changed

3 files changed

+26
-3
lines changed

asv_bench/benchmarks/series_methods.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,25 @@ def time_dropna(self, dtype):
130130
self.s.dropna()
131131

132132

133+
class SearchSorted(object):
134+
135+
goal_time = 0.2
136+
params = ['int8', 'int16', 'int32', 'int64',
137+
'uint8', 'uint16', 'uint32', 'uint64',
138+
'float16', 'float32', 'float64',
139+
'str']
140+
param_names = ['dtype']
141+
142+
def setup(self, dtype):
143+
N = 10**5
144+
data = np.array([1] * N + [2] * N + [3] * N).astype(dtype)
145+
self.s = Series(data)
146+
147+
def time_searchsorted(self, dtype):
148+
key = '2' if dtype == 'str' else 2
149+
self.s.searchsorted(key)
150+
151+
133152
class Map(object):
134153

135154
goal_time = 0.2

doc/source/whatsnew/v0.24.0.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -500,7 +500,8 @@ Performance Improvements
500500
- Very large improvement in performance of slicing when the index is a :class:`CategoricalIndex`,
501501
both when indexing by label (using .loc) and position(.iloc).
502502
Likewise, slicing a ``CategoricalIndex`` itself (i.e. ``ci[100:200]``) shows similar speed improvements (:issue:`21659`)
503-
- Improved performance of :func:`Series.describe` in case of numeric dtpyes (:issue:`21274`)
503+
- Improved performance of :func:`Series.searchsorted` (:issue:`22034`)
504+
- Improved performance of :func:`Series.describe` in case of numeric dtypes (:issue:`21274`)
504505
- Improved performance of :func:`pandas.core.groupby.GroupBy.rank` when dealing with tied rankings (:issue:`21237`)
505506
- Improved performance of :func:`DataFrame.set_index` with columns consisting of :class:`Period` objects (:issue:`21582`,:issue:`21606`)
506507
- Improved performance of membership checks in :class:`Categorical` and :class:`CategoricalIndex`

pandas/core/series.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2089,8 +2089,11 @@ def __rmatmul__(self, other):
20892089
def searchsorted(self, value, side='left', sorter=None):
20902090
if sorter is not None:
20912091
sorter = ensure_platform_int(sorter)
2092-
return self._values.searchsorted(Series(value)._values,
2093-
side=side, sorter=sorter)
2092+
if not is_extension_type(self._values):
2093+
value = np.asarray(value, dtype=self._values.dtype)
2094+
value = value[..., np.newaxis] if value.ndim == 0 else value
2095+
2096+
return self._values.searchsorted(value, side=side, sorter=sorter)
20942097

20952098
# -------------------------------------------------------------------
20962099
# Combination

0 commit comments

Comments
 (0)