Skip to content

Add tests to cover scalar handling in launch() + Fix fp16 bug #669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 2, 2025

Conversation

leofang
Copy link
Member

@leofang leofang commented Jun 1, 2025

Description

closes #260.

The test coverage revealed a bug in the fp16 scalar handling. The existing handling is obviously wrong and it's me to blame, but I honestly can't recall why I believed the code was correct 😞

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link
Contributor

copy-pr-bot bot commented Jun 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang leofang self-assigned this Jun 1, 2025
@leofang leofang added bug Something isn't working P0 High priority - Must do! test Improvements or additions to tests cuda.core Everything related to the cuda.core module labels Jun 1, 2025
@leofang leofang added this to the cuda.core beta 4 milestone Jun 1, 2025
Comment on lines +116 to 119
elif supported_type is __half_raw:
(<supported_type*>ptr).x = <int16_t>(arg.view(numpy_int16))
else:
(<supported_type*>ptr)[0] = <supported_type>(arg)
Copy link
Member Author

@leofang leofang Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the bug: When arg is a np.float16 scalar, the old code would treat it as int16_t (due to lack of standard C++ identifier for fp16 before C++23), and the scalar would be static_cast to int16_t, which triggered non-trivial conversion operators. The new code ensures that there is no conversion and the bytes are reinterpret_cast'd, and in order to hit this new path we need a unique type identifier, which is __half_raw.

@leofang
Copy link
Member Author

leofang commented Jun 1, 2025

/ok to test f4852bd

@leofang leofang requested a review from rwgk June 1, 2025 04:53

This comment has been minimized.

@leofang
Copy link
Member Author

leofang commented Jun 1, 2025

/ok to test 63a68de

@kkraus14 kkraus14 merged commit 064b9ea into NVIDIA:main Jun 2, 2025
102 of 103 checks passed
Copy link

github-actions bot commented Jun 2, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuda.core Everything related to the cuda.core module P0 High priority - Must do! test Improvements or additions to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tests to cover cuda.core.experimental.launch()
2 participants