enable AMD GPU #406

vickytsang · 2023-03-13T18:52:54Z

No description provided.

Signed-off-by: Vicky Tsang <vtsang@amd.com>

monai/deploy/packager/util.py

monai/deploy/runner/runner.py

MMelQin · 2023-03-17T02:40:09Z

Thank you @vickytsang for the pull request.

Do all AMD GPU device names contain the word "AMD"? I have no access to a AMD GPU, and it will be great if you can provide some reference. Also, for the same reason, I cannot test a built package targeting AMD GPU.

I have also left comments in the code, mostly on error handling.

Signed-off-by: Vicky Tsang <vtsang@amd.com>

vickytsang · 2023-03-24T19:48:55Z

Thank you @vickytsang for the pull request.

Do all AMD GPU device names contain the word "AMD"? I have no access to a AMD GPU, and it will be great if you can provide some reference. Also, for the same reason, I cannot test a built package targeting AMD GPU.

I have also left comments in the code, mostly on error handling.

I've modified the implementation to use the rocminfo tool to identify the AMD target device. Below is an example of the output given an AMD GPU/rocm enabled system. The rocminfo will return an error if AMD GPU device driver is not loaded or "command not found" if this tool is missing.
Please give feedback on a preferred way to modify the associated unit tests.

=====================
/opt/rocm/bin/rocminfo
ROCk module is loaded
Able to open /dev/kfd read-write

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents

Agent 1

Name: AMD Ryzen 9 5950X 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 5950X 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3400
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 131896996(0x7dc96a4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 131896996(0x7dc96a4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
N/A

Agent 2

Name: gfx1030
Uuid: GPU-XX
Marketing Name: Device 73bf
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 4096(0x1000)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 29631(0x73bf)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2660
BDFID: 12544
Internal Node ID: 1
Compute Unit: 80
SIMDs per CU: 4
Shader Engines: 8
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 64(0x40)
Max Work-item Per CU: 2048(0x800)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1030
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

monai/deploy/utils/deviceutil.py

Signed-off-by: Vicky Tsang <vtsang@amd.com>

sonarqubecloud · 2023-03-29T02:23:14Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

MMelQin

Comments have been addressed.

MMelQin · 2023-04-06T17:57:36Z

We'll merge this PR as experimental support since

this only affects packager on what additional base image can be supported
the App SDK itself does not have explicit dependency on CUDA or GPU devices, rather, it is the (custom) operators and applications that may do, and the user shall ensure app works with ROCm and AMD GPU for packaging
App SDK Packager separately has been going through some major changes and will come from a underlying dependency package in the next release of the App SDK.

[WIP] enable AMD GPU

f8abbf7

Signed-off-by: Vicky Tsang <vtsang@amd.com>

vickytsang changed the title ~~[WIP] enable AMD GPU~~ enable AMD GPU Mar 15, 2023

MMelQin self-requested a review March 17, 2023 01:07

MMelQin reviewed Mar 17, 2023

View reviewed changes

monai/deploy/packager/util.py Outdated Show resolved Hide resolved

MMelQin reviewed Mar 17, 2023

View reviewed changes

monai/deploy/packager/util.py Outdated Show resolved Hide resolved

MMelQin reviewed Mar 17, 2023

View reviewed changes

monai/deploy/runner/runner.py Outdated Show resolved Hide resolved

MMelQin reviewed Mar 17, 2023

View reviewed changes

monai/deploy/runner/runner.py Outdated Show resolved Hide resolved

vickytsang added 3 commits March 24, 2023 12:25

Merge branch 'Project-MONAI:main' into main

07013c0

check for AMD GPU device and rocm installation with rocminfo

aaf14cc

Signed-off-by: Vicky Tsang <vtsang@amd.com>

update docs/tutorials with AMD GPU/rocm references

0e475f4

Signed-off-by: Vicky Tsang <vtsang@amd.com>

MMelQin reviewed Mar 24, 2023

View reviewed changes

monai/deploy/utils/deviceutil.py Show resolved Hide resolved

remove rocm dependency in packager

4989e6e

Signed-off-by: Vicky Tsang <vtsang@amd.com>

vickytsang force-pushed the main branch from d97af8d to 4989e6e Compare March 29, 2023 02:22

MMelQin reviewed Apr 6, 2023

View reviewed changes

MMelQin merged commit f62199b into Project-MONAI:main Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable AMD GPU #406

enable AMD GPU #406

Uh oh!

vickytsang commented Mar 13, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MMelQin commented Mar 17, 2023

Uh oh!

vickytsang commented Mar 24, 2023

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 29, 2023

Uh oh!

MMelQin left a comment

Uh oh!

MMelQin commented Apr 6, 2023

Uh oh!

Uh oh!

enable AMD GPU #406

enable AMD GPU #406

Uh oh!

Conversation

vickytsang commented Mar 13, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MMelQin commented Mar 17, 2023

Uh oh!

vickytsang commented Mar 24, 2023

===================== /opt/rocm/bin/rocminfo ROCk module is loaded Able to open /dev/kfd read-write

HSA System Attributes

========== HSA Agents

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 29, 2023

Uh oh!

MMelQin left a comment

Choose a reason for hiding this comment

Uh oh!

MMelQin commented Apr 6, 2023

Uh oh!

Uh oh!

=====================
/opt/rocm/bin/rocminfo
ROCk module is loaded
Able to open /dev/kfd read-write

==========
HSA Agents