|
36 | 36 | # 4. Use TensorBoard to view results and analyze model performance
|
37 | 37 | # 5. Improve performance with the help of profiler
|
38 | 38 | # 6. Analyze performance with other advanced features
|
| 39 | +# 7. Additional Practices: Profiling PyTorch on AMD GPUs |
39 | 40 | #
|
40 | 41 | # 1. Prepare the data and model
|
41 | 42 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
@@ -392,6 +393,102 @@ def train(data):
|
392 | 393 | #
|
393 | 394 | # The "Communication Operations Stats" summarizes the detailed statistics of all communication ops in each worker.
|
394 | 395 |
|
| 396 | +###################################################################### |
| 397 | +# 7. Additional Practices: Profiling PyTorch on AMD GPUs |
| 398 | +# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 399 | +# |
| 400 | +# |
| 401 | +# The AMD ROCm Platform is an open-source software stack designed for GPU computation, consisting of drivers, development tools, and APIs. |
| 402 | +# We can run the above mentioned steps on AMD GPUs. In this section, we will use Docker to install the ROCm base development image |
| 403 | +# before installing PyTorch. |
| 404 | + |
| 405 | + |
| 406 | +###################################################################### |
| 407 | +# For the purpose of example, let's create a directory called ``profiler_tutorial``, and save the code in **Step 1** as ``test_cifar10.py`` in this directory. |
| 408 | +# |
| 409 | +# .. code-block:: |
| 410 | +# |
| 411 | +# mkdir ~/profiler_tutorial |
| 412 | +# cd profiler_tutorial |
| 413 | +# vi test_cifar10.py |
| 414 | + |
| 415 | + |
| 416 | +###################################################################### |
| 417 | +# At the time of this writing, the Stable(``2.1.1``) Linux version of PyTorch on ROCm Platform is `ROCm 5.6 <https://pytorch.org/get-started/locally/>`_. |
| 418 | +# |
| 419 | +# |
| 420 | +# - Obtain a base Docker image with the correct user-space ROCm version installed from `Docker Hub <https://hub.docker.com/repository/docker/rocm/dev-ubuntu-20.04>`_. |
| 421 | +# |
| 422 | +# It is ``rocm/dev-ubuntu-20.04:5.6``. |
| 423 | +# |
| 424 | +# - Start the ROCm base Docker container: |
| 425 | +# |
| 426 | +# |
| 427 | +# .. code-block:: |
| 428 | +# |
| 429 | +# docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 8G -v ~/profiler_tutorial:/profiler_tutorial rocm/dev-ubuntu-20.04:5.6 |
| 430 | +# |
| 431 | +# |
| 432 | +# - Inside the container, install any dependencies needed for installing the wheels package. |
| 433 | +# |
| 434 | +# .. code-block:: |
| 435 | +# |
| 436 | +# sudo apt update |
| 437 | +# sudo apt install libjpeg-dev python3-dev -y |
| 438 | +# pip3 install wheel setuptools |
| 439 | +# sudo apt install python-is-python3 |
| 440 | +# |
| 441 | +# |
| 442 | +# - Install the wheels: |
| 443 | +# |
| 444 | +# .. code-block:: |
| 445 | +# |
| 446 | +# pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6 |
| 447 | +# |
| 448 | +# |
| 449 | +# - Install the ``torch_tb_profiler``, and then, run the Python file ``test_cifar10.py``: |
| 450 | +# |
| 451 | +# .. code-block:: |
| 452 | +# |
| 453 | +# pip install torch_tb_profiler |
| 454 | +# cd /profiler_tutorial |
| 455 | +# python test_cifar10.py |
| 456 | +# |
| 457 | +# |
| 458 | +# Now, we have all the data needed to view in TensorBoard: |
| 459 | +# |
| 460 | +# .. code-block:: |
| 461 | +# |
| 462 | +# tensorboard --logdir=./log |
| 463 | +# |
| 464 | +# Choose different views as described in **Step 4**. For example, below is the **Operator** View: |
| 465 | +# |
| 466 | +# .. image:: ../../_static/img/profiler_rocm_tensorboard_operartor_view.png |
| 467 | +# :scale: 25 % |
| 468 | + |
| 469 | + |
| 470 | +###################################################################### |
| 471 | +# At the time this section is written, **Trace** view does not work and it displays nothing. You can work around by typing ``chrome://tracing`` in your Chrome Browser. |
| 472 | +# |
| 473 | +# |
| 474 | +# - Copy the ``trace.json`` file under ``~/profiler_tutorial/log/resnet18`` directory to the Windows. |
| 475 | +# You may need to copy the file by using ``scp`` if the file is located in a remote location. |
| 476 | +# |
| 477 | +# - Click **Load** button to load the trace JSON file from the ``chrome://tracing`` page in the browser. |
| 478 | +# |
| 479 | +# .. image:: ../../_static/img/profiler_rocm_chrome_trace_view.png |
| 480 | +# :scale: 25 % |
| 481 | + |
| 482 | + |
| 483 | +###################################################################### |
| 484 | +# As mentioned previously, you can move the graph and zoom in and out. |
| 485 | +# You can also use keyboard to zoom and move around inside the timeline. |
| 486 | +# The ``w`` and ``s`` keys zoom in centered around the mouse, |
| 487 | +# and the ``a`` and ``d`` keys move the timeline left and right. |
| 488 | +# You can hit these keys multiple times until you see a readable representation. |
| 489 | + |
| 490 | + |
| 491 | + |
395 | 492 | ######################################################################
|
396 | 493 | # Learn More
|
397 | 494 | # ----------
|
|
0 commit comments