Skip to content

Change descending from template parameter to an argument #1883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

oleksandr-pavlyk
Copy link
Contributor

This PR removes template parameter is_ascending from radix sort functions and introduces it as function call argument instead.

This halves the binary size due to reduction in device code portion. Furthermore, radix sorting functions are singled out into a dedicated Python module _tensor_sorting_radix_impl further reducing the build time and memory footprint and provide increased parallelization opportunities to the linker.

The overall build time drops from 45 minutes to 33 minutes on my Core i7 laptop.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

…eter

The intent is to reduce the build time, build memory footprint, and
binary size of the sorting_impl module.

With this change it stands at 46MB, before it was 72MB.
…x_impl

With this change, _tensor_sorting_impl goes back to 17MB, and
_tensor_sorting_radix_impl is 30MB. The memory footprint of linking
should be greatly reduced, speeding up the building process,
reducing the required memory footprint, and providing better
parallelisation opportunities for the build job.

The build time on my Core i7 reduced from 45 minutes to 33 minutes.
Copy link

github-actions bot commented Oct 29, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@oleksandr-pavlyk
Copy link
Contributor Author

On GPU Max 1100, sorting an array of int16 data-type of size 10241024 elements evaluates at about the same 2.10 ms with this change as it does before it.

Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Windows build failure is unrelated and examples are failing in other PRs.

@oleksandr-pavlyk oleksandr-pavlyk merged commit d63dd70 into feature/radix-sort Oct 29, 2024
42 of 50 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the change-descending-from-template-parameter-to-an-argument branch October 29, 2024 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants