Skip to content

Enable building to target AMD GPUs #1731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 24, 2024
Merged

Conversation

ndgrigorian
Copy link
Collaborator

@ndgrigorian ndgrigorian commented Jul 13, 2024

This PR modifies cmake scripts throughout dpctl to enable building for AMD. This is done by either setting the DPCTL_TARGET_HIP environment variable to the intended build architecture, or using -DDPCTL_TARGET_HIP.

_dpctl_sycl_target_compile_options and _dpctl_sycl_target_link_options cmake lists are used to prevent branching logic in later scripts.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

@ndgrigorian ndgrigorian force-pushed the feature/enable-amd-builds branch from 70b1e1c to 45f3f28 Compare July 13, 2024 22:59
Copy link

github-actions bot commented Jul 13, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@coveralls
Copy link
Collaborator

coveralls commented Jul 13, 2024

Coverage Status

coverage: 87.668% (-0.06%) from 87.725%
when pulling db487e8 on feature/enable-amd-builds
into 691c225 on master.

Copy link

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_104 ran successfully.
Passed: 894
Failed: 15
Skipped: 105

Copy link

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_180 ran successfully.
Passed: 893
Failed: 2
Skipped: 119

@oleksandr-pavlyk oleksandr-pavlyk changed the title Enable building for AMD Enable building to target AMD GPUs Aug 7, 2024
@ndgrigorian ndgrigorian force-pushed the feature/enable-amd-builds branch from 7b04098 to c53fb26 Compare August 15, 2024 18:16
Copy link

Array API standard conformance tests for dpctl=0.18.0dev0=py310ha798474_311 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@oleksandr-pavlyk
Copy link
Contributor

@ndgrigorian Regrettably use of sycl::log1p change necessary to enable compiling for AMD breaks compiling for CUDA.

Perhaps a preprocessor variable can be used to enable building for SPV/NVPTX or SPV/AMDGCN targets, but not for all three except after the bug gets fixed. It may be possible to write implementation of log1p to enable building for all three too.

@ndgrigorian
Copy link
Collaborator Author

@ndgrigorian Regrettably use of sycl::log1p change necessary to enable compiling for AMD breaks compiling for CUDA.

Perhaps a preprocessor variable can be used to enable building for SPV/NVPTX or SPV/AMDGCN targets, but not for all three except after the bug gets fixed. It may be possible to write implementation of log1p to enable building for all three too.

Yes, I only added the commit to make it convenient for the build failure to be reproduced.

Writing our own implementation is possible, too. I think that would be preferable, but on the other hand, it's a corner case to build for both CUDA and AMD.

@ndgrigorian ndgrigorian force-pushed the feature/enable-amd-builds branch from c53fb26 to 8449aa8 Compare October 24, 2024 20:05
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_164 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/enable-amd-builds branch from 8449aa8 to 729f0cf Compare November 11, 2024 21:40
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_202 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/enable-amd-builds branch from 729f0cf to 2e66a2c Compare November 20, 2024 22:29
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_254 ran successfully.
Passed: 895
Failed: 0
Skipped: 119

@ndgrigorian ndgrigorian marked this pull request as ready for review November 21, 2024 01:23
@ndgrigorian ndgrigorian force-pushed the feature/enable-amd-builds branch from 2e66a2c to 449670e Compare November 21, 2024 02:51
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_261 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@ndgrigorian
Copy link
Collaborator Author

ndgrigorian commented Nov 21, 2024

@oleksandr-pavlyk @antonwolfy

I have successfully built dpctl for both CUDA and HIP simultaneously using this branch on a machine with CUDA and ROCm installed (no AMD devices, however—only Intel and NVidia).

Errors came up while building, but I believe that these were caused by going OOM—errors showed up twice in prod.cpp, once in sum.cpp, and once in copy_and_cast_usm_to_usm.cpp. prod, sum, etc. seemed to work on both level-zero and CUDA without a problem, and tests passed. Even in verbose mode, the message wasn't especially helpful.

I have marked this PR as ready for review, the CUDA segfault is resolved.

@ndgrigorian
Copy link
Collaborator Author

@oleksandr-pavlyk @antonwolfy

I have successfully built dpctl for both CUDA and HIP simultaneously using this branch on a machine with CUDA and ROCm installed (no AMD devices, however—only Intel and NVidia).

Errors came up while building, but I believe that these were caused by going OOM—errors showed up twice in prod.cpp, once in sum.cpp, and once in copy_and_cast_usm_to_usm.cpp. prod, sum, etc. seemed to work on both level-zero and CUDA without a problem, and tests passed. Even in verbose mode, the message wasn't especially helpful.

I have marked this PR as ready for review, the CUDA segfault is resolved.

Worth noting that I tried building with both DPCTL_TARGET_HIP=gfx1100 and DPCTL_TARGET_HIP=gfx1030.

@oleksandr-pavlyk
Copy link
Contributor

Is this the command to use ?

python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"

@ndgrigorian
Copy link
Collaborator Author

Is this the command to use ?

python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"

Yes, and DPCTL_TARGET_HIP=gfx1030 python scripts/build_locally.py --verbose should work too

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_264 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/enable-amd-builds branch from 99769d2 to db487e8 Compare November 22, 2024 00:04
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_264 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

Add DPCTL_TARGET_AMD cmake variable which is a string referring to the architecture to build for
@ndgrigorian ndgrigorian force-pushed the feature/enable-amd-builds branch from db487e8 to c316380 Compare November 22, 2024 19:12
Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310hdf72452_278 ran successfully.
Passed: 894
Failed: 1
Skipped: 119

Copy link
Contributor

@oleksandr-pavlyk oleksandr-pavlyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you @ndgrigorian .

It makes sense to similarly allow DPCTL_TARGET_CUDA to take non-boolean values to specify architecture in a future.

@ndgrigorian ndgrigorian merged commit eefc82b into master Nov 24, 2024
52 of 54 checks passed
@ndgrigorian ndgrigorian deleted the feature/enable-amd-builds branch November 24, 2024 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants