Skip to content

Commit 08605c4

Browse files
Merge pull request #1866 from IntelPython/tune-merge-sort
Lower the threshold to use sequential sort
2 parents 2f327af + a9edf94 commit 08605c4

File tree

2 files changed

+4
-1
lines changed

2 files changed

+4
-1
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
* Improved performance of copy-and-cast operations from `numpy.ndarray` to `tensor.usm_ndarray` for contiguous inputs [gh-1829](https://github.com/IntelPython/dpctl/pull/1829)
1414
* Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices [gh-1850](https://github.com/IntelPython/dpctl/pull/1850)
1515
* Improved performance of `tensor.argsort` function for all types [gh-1859](https://github.com/IntelPython/dpctl/pull/1859)
16+
* Improved performance of `tensor.sort` and `tensor.argsort` for short arrays in the range [16, 64] elements [gh-1866](https://github.com/IntelPython/dpctl/pull/1866)
1617

1718
### Fixed
1819

dpctl/tensor/libtensor/include/kernels/sorting/sort.hpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -734,7 +734,9 @@ sycl::event stable_sort_axis1_contig_impl(
734734

735735
auto comp = Comp{};
736736

737-
constexpr size_t sequential_sorting_threshold = 64;
737+
// constant chosen experimentally to ensure monotonicity of
738+
// sorting performance, as measured on GPU Max, and Iris Xe
739+
constexpr size_t sequential_sorting_threshold = 16;
738740

739741
if (sort_nelems < sequential_sorting_threshold) {
740742
// equal work-item sorts entire row

0 commit comments

Comments
 (0)