add tutorial for PyTorch inference on AWS Graviton CPUs #2719

snadampal · 2023-12-20T20:29:25Z

Fixes #ISSUE_NUMBER

Description

This is a new tutorial for AWS Graviton CPU inference

Checklist

[ NA] The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
[ x] Only one issue is addressed in this pull request
[ NA] Labels from the issue that this PR is fixing are added to this pull request
[ x] No unnecessary issues are included into this pull request.

pytorch-bot · 2023-12-20T20:29:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2719

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 16d0f8f with merge base bcaa9f6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

svekars

Thank you so much for this submission! I have a few editorial suggestions - please let me know if you have any questions.

svekars · 2023-12-20T20:32:41Z

beginner_source/inference_on_aws_graviton_tutorial.rst

@@ -0,0 +1,343 @@
+PyTorch inference performance tuning on AWS Graviton Processors


Suggested change

PyTorch inference performance tuning on AWS Graviton Processors

PyTorch Inference Performance Tuning on AWS Graviton Processors

svekars · 2023-12-20T20:35:26Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+
+**Author**: `Sunita Nadampalli <https://github.com/snadampal>`_
+
+`AWS Graviton <https://aws.amazon.com/ec2/graviton/>`_ is a series of ARM-based processors designed by AWS.  AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, SVE and twice the Single Instruction Multiple Data (SIMD) bandwidth compared to Graviton2.


Suggested change

`AWS Graviton <https://aws.amazon.com/ec2/graviton/>`_ is a series of ARM-based processors designed by AWS. AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, SVE and twice the Single Instruction Multiple Data (SIMD) bandwidth compared to Graviton2.

`AWS Graviton <https://aws.amazon.com/ec2/graviton/>`_ is a series of ARM-based processors designed by AWS. AWS Graviton3 processors are optimized for Machine Learning (ML) workloads, including support for ``bfloat16``, Scalable Vector Extension (SVE), and twice the Single Instruction Multiple Data (SIMD) bandwidth compared to Graviton2.

svekars · 2023-12-20T20:36:41Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+
+`AWS Graviton <https://aws.amazon.com/ec2/graviton/>`_ is a series of ARM-based processors designed by AWS.  AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, SVE and twice the Single Instruction Multiple Data (SIMD) bandwidth compared to Graviton2.
+
+In this tutorial we will cover how to achieve the best inference performance for linear layer neural network with bfloa16 kernels and with the right backend on AWS Graviton3 processors (`AWS c7g instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_).


Suggested change

In this tutorial we will cover how to achieve the best inference performance for linear layer neural network with bfloa16 kernels and with the right backend on AWS Graviton3 processors (`AWS c7g instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_).

In this tutorial we will cover how to achieve the best inference performance for linear layer neural network with ``bfloat16`` kernels and with the right backend on AWS Graviton3 processors (`AWS c7g instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_).

svekars · 2023-12-20T20:39:09Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+4. Optimize memory allocation overhead with Linux Transparent huge pages
+5. Conclusion
+
+NOTE: An instance from Graviton3 family (``c7g/r7g/m7g``) is required for this tutorial in order to reproduce the speedup numbers shown below and documented elsewhere. We have used `c7g.xl (4vcpu) instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_ for this tutorial.


Suggested change

NOTE: An instance from Graviton3 family (``c7g/r7g/m7g``) is required for this tutorial in order to reproduce the speedup numbers shown below and documented elsewhere. We have used `c7g.xl (4vcpu) instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_ for this tutorial.

.. note::

To successfully run this tutorial and reproduce the speedup numbers shown below, you need an instance from the Graviton3 family (``c7g/r7g/m7g``) of hardware. For this tutorial, we used the `c7g.xl (4vcpu) instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_ .

svekars · 2023-12-20T20:39:38Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+1. Basic Usage
+---------------
+
+PyTorch natively supports AWS Graviton3 optimizations starting PyTorch 2.0 version. Please refer to this `blog <https://pytorch.org/blog/optimized-pytorch-w-graviton/>`_ for more details on the optimizations.


Suggested change

PyTorch natively supports AWS Graviton3 optimizations starting PyTorch 2.0 version. Please refer to this `blog <https://pytorch.org/blog/optimized-pytorch-w-graviton/>`_ for more details on the optimizations.

PyTorch natively supports AWS Graviton3 optimizations starting with PyTorch 2.0 version.

Please refer to this `blog <https://pytorch.org/blog/optimized-pytorch-w-graviton/>`_ for more details on the optimizations.

svekars · 2023-12-20T21:25:54Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+4. Optimize memory allocation overhead with Linux Transparent huge pages
+-------------------------------------------------------------------------
+
+We also observed that for these larger networks, tensor memory allocations take significant portion of the inference latency. This can be optimized by enabling Linux transparent huge pages allocations from PyTorch C10 memory allocator. Set the following environment variable to enable it.


Suggested change

We also observed that for these larger networks, tensor memory allocations take significant portion of the inference latency. This can be optimized by enabling Linux transparent huge pages allocations from PyTorch C10 memory allocator. Set the following environment variable to enable it.

We also observed that for these larger networks, tensor memory allocations take significant portion of the inference latency. This can be optimized by enabling Linux transparent huge pages allocations from PyTorch C10 memory allocator. Set the following environment variable to enable it:

svekars · 2023-12-20T21:26:03Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+``$ export THP_MEM_ALLOC_ENABLE=1``
+
+
+For the batch dimension of 256 and with fast math mode


Suggested change

For the batch dimension of 256 and with fast math mode

For the batch dimension of 256 and with fast math mode:

svekars · 2023-12-20T21:26:14Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+    print(prof.key_averages().table(sort_by="self_cpu_time_total"))
+
+
+Following is the profiler output with THP memory allocations enabled


Suggested change

Following is the profiler output with THP memory allocations enabled

The following is the profiler output with THP memory allocations enabled:

svekars · 2023-12-20T21:26:27Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+            aten::relu         0.04%       2.547ms         4.85%     325.115ms       1.626ms           200  
+======================  ============  ============  ============  ============  ==============  ============
+
+``Self CPU time total: 6.697s``


Suggested change

``Self CPU time total: 6.697s``

**Self CPU time total:** 6.697s

svekars · 2023-12-20T21:26:41Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+
+``Self CPU time total: 6.697s``
+
+This is an additional ``1.08x or 8% (6.697s vs 7.262s)`` improvement on top of the already optimized fast math mode measured above.


Suggested change

This is an additional ``1.08x or 8% (6.697s vs 7.262s)`` improvement on top of the already optimized fast math mode measured above.

This is an additional **1.08x or 8% (6.697s vs 7.262s)** improvement on top of the already optimized fast math mode measured above.

svekars

Also, can you please add to index.rst

snadampal · 2023-12-21T03:53:15Z

Hi @svekars , thanks for the review, I have incorporated all your feedback on the doc.
Next, I'm checking how to add it to index.rst. If I understand correctly, I need to add it to the "what's new in PyTorch tutorials?" section, right? What link should I provide there?

snadampal · 2023-12-21T05:48:27Z

looks like I need to create these two, right? looking into them.

For Tutorials (except if it is a prototype feature), include it in the toctree directive and create a customcarditem in index.rst.
For Tutorials (except if it is a prototype feature), create a thumbnail in the index.rst file using a command like .. customcarditem:: beginner/your_tutorial.html. For Recipes, create a thumbnail in the recipes_index.rst

svekars · 2023-12-21T17:23:47Z

@snadampal I'm thinking this probably fits better under recipe_source, can you move under recipe_source and update recipe_source/recipes_index.rst. I also understand that this is a beta feature, please prepend the title with (Beta).

Also, please fix the spellchek. Words like OpenBLAS, Graviton, MKLDNN can be added to the en-wordlist.txt. Words like MyNeuralNetwork should be enclosed into double ticks for the spellcheck to skip them.

svekars · 2023-12-21T17:32:57Z

beginner_source/inference_on_aws_graviton_tutorial.rst

+        logits = self.linear_relu_stack(x)
+        return logits
+
+4. Let's create an instance of MyNeuralNetwork, and move it to the device:


Suggested change

4. Let's create an instance of MyNeuralNetwork, and move it to the device:

4. Let's create an instance of ``MyNeuralNetwork``, and move it to the device:

snadampal · 2023-12-21T19:13:26Z

Hi @svekars , sure, I will fix the acronyms part. can you please elaborate on what needs to be added to recipes_index.rst?

update recipe_source/recipes_index.rst

I'm planning to add this customcard item under performance section in recipes_index.rst, but wondering whether I need to provide the .html file or it gets built in the repo. Please clarify.

.. customcarditem::
   :header: PyTorch Inference Performance Tuning on AWS Graviton Processors
   :card_description: how to achieve the best inference performance for linear layer neural network on AWS Graviton3 CPUs
   :image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
   :link: ../recipes/inference_tuning_on_aws_graviton.html
   :tags: Model-Optimization

snadampal · 2023-12-21T21:59:08Z

Hi @svekars , I have updated the PR for the recipes_index.rst and the en-wordlist update.

agunapal · 2023-12-21T23:38:31Z

recipes_source/inference_tuning_on_aws_graviton.rst

+Speed up Inference with ``bfloat16`` Fast Math Kernels
+----------------------------------------------------------
+
+AWS Graviton3 processors support `bfloat16 MMLA instructions <https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/BFMMLA--BFloat16-floating-point-matrix-multiply-accumulate->`_. Arm Compute Library (`ACL <https://github.com/ARM-software/ComputeLibrary>`_) provides optimized ``bfloat16`` General Matrix Multiplication (GEMM) kernels for AWS Graviton processors, and are integrated into PyTorch via MKLDNN backend starting with PyTorch 2.0.  The inference performance can be optimized with the fast math GEMM kernels. To enable the fast math GEMM kernels, set the following environment variable:


Please add a line here for why this is not enabled by default

agunapal · 2023-12-21T23:39:45Z

recipes_source/inference_tuning_on_aws_graviton.rst

+
+.. code-block:: bash
+
+   $ export TORCH_MKLDNN_MATMUL_MIN_DIM=64


Please add a line or two for why this is not enabled by default

agunapal · 2023-12-21T23:40:25Z

recipes_source/inference_tuning_on_aws_graviton.rst

+
+.. code-block:: bash
+
+    $ export THP_MEM_ALLOC_ENABLE=1


Please add a line for why this is not enabled by default

agunapal · 2023-12-21T23:41:24Z

recipes_source/inference_tuning_on_aws_graviton.rst

+This is an additional **1.08x or 8% (6.697s vs 7.262s)** improvement on top of the already optimized MKLDNN fast math mode measured above.
+
+
+Conclusion


Please mention when each of the 3 optimizations should be used

snadampal · 2023-12-22T21:07:31Z

addressed the feedback from @agunapal . Please let me know if this looks good now.

agunapal

Thanks for addressing the changes. LGTM

svekars · 2023-12-27T03:54:48Z

recipes_source/inference_tuning_on_aws_graviton.rst

+======================  ============   ===========  =============  ===========  ============  ============  
+                  Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls  
+======================  ============   ===========  =============  ===========  ============  ============
+           aten::addmm        97.61%       15.813s        98.61%       15.977s      53.255ms           300  
+       aten::clamp_min         1.09%     177.032ms         1.09%     177.032ms     885.160us           200  
+           aten::copy_         1.00%     162.054ms         1.00%     162.054ms     540.180us           300  
+     mymodel_inference         0.22%      35.738ms       100.00%       16.201s       16.201s             1  
+          aten::linear         0.02%       2.955ms        98.66%       15.985s      53.282ms           300  
+               aten::t         0.01%       2.421ms         0.03%       5.043ms      16.810us           300  
+            aten::relu         0.01%       2.356ms         1.11%     179.388ms     896.940us           200  
+======================  ============   ===========  =============  ===========  ============  ============


Can you indent lines 117- 127 under the table directive for correct recognition. More info here

Same comment for all table directives in the PR.

svekars · 2023-12-27T04:01:20Z

recipes_source/inference_tuning_on_aws_graviton.rst

+======================  ============   ===========  =============  ===========  ============  ============
+           aten::addmm        97.61%       15.813s        98.61%       15.977s      53.255ms           300  
+       aten::clamp_min         1.09%     177.032ms         1.09%     177.032ms     885.160us           200  
+           aten::copy_         1.00%     162.054ms         1.00%     162.054ms     540.180us           300  


aten::copy_ for some reason creates a link in the HTML:

I wonder if the correct indentation will help with that.

svekars · 2023-12-27T05:42:56Z

recipes_source/inference_tuning_on_aws_graviton.rst

+
+The following is the profiler output with THP memory allocations enabled:
+
+.. table:: output with the fast math and thp memory allocations


Suggested change

.. table:: output with the fast math and thp memory allocations

.. table:: output with the fast math and THP memory allocations

svekars · 2023-12-27T05:43:09Z

en-wordlist.txt

+fastmath
+latencies
+openBLAS
+thp


Suggested change

thp

svekars · 2023-12-27T05:46:38Z

en-wordlist.txt

+addmm
+aten


I think these two should get resolved after indentation is fixed. Also, can you please sort alphabetically and then save. In vim, you should be able to just use :sort .

Suggested change

addmm

aten

snadampal · 2024-01-02T19:55:02Z

@svekars , even the indentation didn't fix the copy_ hyperlink issue, so, I have removed the _.

svekars · 2024-01-08T16:37:52Z

@snadampal I think this is now in pretty good shape - can you resolve the merge conflict?

snadampal · 2024-01-09T22:05:13Z

@svekars , the PR is rebased, could you please check and merge if looks good.

snadampal · 2024-01-12T00:55:27Z

@pytorchbot merge

snadampal · 2024-01-12T14:34:04Z

I have rebased to the main

svekars · 2024-01-12T20:24:35Z

thanks @snadampal - this looks good to me - we will merge a few days before the release.

chauhang · 2024-01-20T23:11:45Z

@snadampal Thanks for contributing the Graviton tutorial. Will be good to mention it works with PT 2.0 and higher. Have you also run any tests with torch.compile? Will be good to include an update as follow-up PR showcasing speedups with torch compile and Graviton

snadampal · 2024-01-20T23:22:02Z

Hi @chauhang , In the tutorial I mentioned the following. Do you suggest adding PyTorch2.0+ in the title itself?

PyTorch natively supports AWS Graviton3 optimizations starting with PyTorch 2.0 version.

regarding torch.compile(), yes, that's the next thing i'm working on :) currently PyTorch changes are merged, but oneDNN changes are still under review. once they are merged, I will raise a PR for torch.compile() tutorial.

* Add recipe for PyTorch inference on AWS Graviton CPUs --------- Co-authored-by: Svetlana Karslioglu <svekars@meta.com>

facebook-github-bot added the cla signed label Dec 20, 2023

svekars requested a review from agunapal December 20, 2023 20:32

agunapal requested review from chauhang and svekars December 20, 2023 20:34

svekars reviewed Dec 20, 2023

View reviewed changes

svekars added the 2.2 label Dec 20, 2023

svekars reviewed Dec 20, 2023

View reviewed changes

snadampal force-pushed the aws_graviton branch 3 times, most recently from d8c1437 to 363ed4a Compare December 21, 2023 03:44

svekars reviewed Dec 21, 2023

View reviewed changes

snadampal force-pushed the aws_graviton branch from e408696 to a4a8a38 Compare December 21, 2023 21:57

agunapal requested changes Dec 21, 2023

View reviewed changes

snadampal force-pushed the aws_graviton branch from a4a8a38 to 715cc6e Compare December 22, 2023 21:06

snadampal force-pushed the aws_graviton branch from 715cc6e to 899b9c4 Compare December 22, 2023 21:10

agunapal approved these changes Dec 22, 2023

View reviewed changes

svekars reviewed Dec 27, 2023

View reviewed changes

snadampal force-pushed the aws_graviton branch 3 times, most recently from 6f23e62 to aa79f4b Compare January 2, 2024 19:52

snadampal force-pushed the aws_graviton branch from aa79f4b to 54d64b3 Compare January 8, 2024 21:11

snadampal added 2 commits January 12, 2024 14:31

sort en-wordlist.txt

7f7219d

Add recipe for PyTorch inference on AWS Graviton CPUs

7081f4b

snadampal force-pushed the aws_graviton branch from 54d64b3 to 7081f4b Compare January 12, 2024 14:32

svekars approved these changes Jan 12, 2024

View reviewed changes

svekars added 2 commits January 24, 2024 08:46

Merge branch 'main' into aws_graviton

9514424

Merge branch 'main' into aws_graviton

16d0f8f

svekars merged commit d9a0d6b into pytorch:main Jan 24, 2024

HDCharles pushed a commit that referenced this pull request Jan 26, 2024

Add a tutorial for PyTorch inference on AWS Graviton CPUs (#2719)

4807e3d

* Add recipe for PyTorch inference on AWS Graviton CPUs --------- Co-authored-by: Svetlana Karslioglu <svekars@meta.com>

svekars added a commit that referenced this pull request Feb 2, 2024

Add a tutorial for PyTorch inference on AWS Graviton CPUs (#2719)

3789921

* Add recipe for PyTorch inference on AWS Graviton CPUs --------- Co-authored-by: Svetlana Karslioglu <svekars@meta.com>

		@@ -0,0 +1,343 @@
		PyTorch inference performance tuning on AWS Graviton Processors


		Author: `Sunita Nadampalli <https://github.com/snadampal>`_

		`AWS Graviton <https://aws.amazon.com/ec2/graviton/>`_ is a series of ARM-based processors designed by AWS. AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, SVE and twice the Single Instruction Multiple Data (SIMD) bandwidth compared to Graviton2.


		`AWS Graviton <https://aws.amazon.com/ec2/graviton/>`_ is a series of ARM-based processors designed by AWS. AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, SVE and twice the Single Instruction Multiple Data (SIMD) bandwidth compared to Graviton2.

		In this tutorial we will cover how to achieve the best inference performance for linear layer neural network with bfloa16 kernels and with the right backend on AWS Graviton3 processors (`AWS c7g instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_).

	NOTE: An instance from Graviton3 family (``c7g/r7g/m7g``) is required for this tutorial in order to reproduce the speedup numbers shown below and documented elsewhere. We have used `c7g.xl (4vcpu) instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_ for this tutorial.
	.. note::
	To successfully run this tutorial and reproduce the speedup numbers shown below, you need an instance from the Graviton3 family (``c7g/r7g/m7g``) of hardware. For this tutorial, we used the `c7g.xl (4vcpu) instance <https://aws.amazon.com/ec2/instance-types/c7g/>`_ .

	PyTorch natively supports AWS Graviton3 optimizations starting PyTorch 2.0 version. Please refer to this `blog <https://pytorch.org/blog/optimized-pytorch-w-graviton/>`_ for more details on the optimizations.
	PyTorch natively supports AWS Graviton3 optimizations starting with PyTorch 2.0 version.
	Please refer to this `blog <https://pytorch.org/blog/optimized-pytorch-w-graviton/>`_ for more details on the optimizations.

	We also observed that for these larger networks, tensor memory allocations take significant portion of the inference latency. This can be optimized by enabling Linux transparent huge pages allocations from PyTorch C10 memory allocator. Set the following environment variable to enable it.
	We also observed that for these larger networks, tensor memory allocations take significant portion of the inference latency. This can be optimized by enabling Linux transparent huge pages allocations from PyTorch C10 memory allocator. Set the following environment variable to enable it:

		``$ export THP_MEM_ALLOC_ENABLE=1``


		For the batch dimension of 256 and with fast math mode

		print(prof.key_averages().table(sort_by="self_cpu_time_total"))


		Following is the profiler output with THP memory allocations enabled

	Following is the profiler output with THP memory allocations enabled
	The following is the profiler output with THP memory allocations enabled:

	``Self CPU time total: 6.697s``
	Self CPU time total: 6.697s


		``Self CPU time total: 6.697s``

		This is an additional ``1.08x or 8% (6.697s vs 7.262s)`` improvement on top of the already optimized fast math mode measured above.

	This is an additional ``1.08x or 8% (6.697s vs 7.262s)`` improvement on top of the already optimized fast math mode measured above.
	This is an additional 1.08x or 8% (6.697s vs 7.262s) improvement on top of the already optimized fast math mode measured above.

	4. Let's create an instance of MyNeuralNetwork, and move it to the device:
	4. Let's create an instance of ``MyNeuralNetwork``, and move it to the device:

		This is an additional 1.08x or 8% (6.697s vs 7.262s) improvement on top of the already optimized MKLDNN fast math mode measured above.


		Conclusion


		The following is the profiler output with THP memory allocations enabled:

		.. table:: output with the fast math and thp memory allocations

add tutorial for PyTorch inference on AWS Graviton CPUs #2719

add tutorial for PyTorch inference on AWS Graviton CPUs #2719

Uh oh!

Conversation

snadampal commented Dec 20, 2023

Description

Checklist

Uh oh!

pytorch-bot bot commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2719

✅ No Failures

Uh oh!

svekars left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

svekars left a comment

Choose a reason for hiding this comment

Uh oh!

snadampal commented Dec 21, 2023

Uh oh!

snadampal commented Dec 21, 2023

Uh oh!

svekars commented Dec 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snadampal commented Dec 21, 2023

Uh oh!

snadampal commented Dec 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snadampal commented Dec 22, 2023

Uh oh!

agunapal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snadampal commented Jan 2, 2024

Uh oh!

svekars commented Jan 8, 2024

Uh oh!

pytorch-bot bot commented Dec 20, 2023 •

edited

Loading

svekars commented Dec 21, 2023 •

edited

Loading