-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add semi-structured sparse tutorial #2574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jcaip
commented
Sep 26, 2023
- Add new semi-structured sparsity tutorial.
- update index.rst
- add image
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2574
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ae6d49e
to
4773257
Compare
https://docs-preview.pytorch.org/pytorch/tutorials/2574/prototype/semi_structured_sparse.html?highlight=accelelerating%20bert%20semi%20structured is the link to the rendered tutorial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor editorial nits. Thank you!
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass without running tutorial code, just feedback on prose. Second pass later.
It is also known as **fine-grained structured sparsity** or **2:4 structured sparsity**. | ||
|
||
Semi-structured sparsity derives its name from its unique sparsity pattern, where n out of every 2n elements are pruned. | ||
In practice, we most often see n = 2, hence 2:4 sparsity. This sparsity pattern is particularly interesting for two reasons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would collapse this down to something like
Today, we most often see n = 2, hence 2:4 sparsity. 2:4 semi-structured sparsity was designed to be efficiently accelerated on GPUs. In 2021 (?confirm?), NVIDIA introduced hardware support for semi-structured sparsity in their Ampere architecture, and have also released fast sparse kernels via CUTLASS/cuSPARSELt <https://docs.nvidia.com/cuda/cusparselt/index.html>
_.
The second point isn't saying much unless you go into much more detail on alternative sparsity methods, and why they can compress more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to mention that 2:4 sparsity degrades model accuracy less than other sparse methods.
I'm thinking something like:
Today, we most often see n = 2, hence 2:4 sparsity. Semi-structured (2:4) sparsity is particularly interesting because it offers GPU hardware acceleration with minimal loss to model accuracy. In 2020, NVIDIA introduced hardware support for semi-structured sparsity in their Ampere architecture, and have also released fast sparse kernels via CUTLASS/cuSPARSELt https://docs.nvidia.com/cuda/cusparselt/index.html_.
assert torch.allclose(sparse_output, dense_output, atol=1e-3) | ||
print(f"Dense: {dense_t:.3f}ms Sparse: {sparse_t:.3f}ms | Speedup: {(dense_t / sparse_t):.3f}x") | ||
|
||
On my machine (A100 80GB), I see: `Dense: 0.870ms Sparse: 0.630ms | Speedup: 1.382x` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strike "my". On an A100, 80GB, we can see:
# mask Linear weight to be 2:4 sparse | ||
mask = torch.Tensor([0, 0, 1, 1]).tile((3072, 2560)).cuda().bool() | ||
linear = torch.nn.Linear(10240, 3072).half().cuda().eval() | ||
linear.weight = torch.nn.Parameter(mask * linear.weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to mask out weights on the dense model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so the numerics match down the line when comparing against the sparse output. Also, we expect the input to to_sparse_semi_structured to be 2:4 sparse already, otherwise we error out.
.. note:: | ||
|
||
This tutorial is designed for beginners to semi-structured sparsity / sparsity in general. | ||
For users with existing 2:4 sparse models, accelerating ``nn.Linear`` layers for inference is as easy as running the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would highlight the API here, "... with to_sparse_semi_structured
"
The general motivation behind sparsity is simple: if there are zeros in your network, you can avoid storing / doing compute with those parameters. | ||
However, the specifics of sparsity are tricky. Zeroing out parameters doesn't affect the latency / memory overhead of our model out of the box. | ||
|
||
This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
introduce the word kernel here.
"This is because the dense tensor itself still contains the pruned elements, and the respective matrix multiplication kernel will still operate on those elements."
However, the specifics of sparsity are tricky. Zeroing out parameters doesn't affect the latency / memory overhead of our model out of the box. | ||
|
||
This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication. | ||
In order to realize performance gains, we need to swap out our dense kernels for sparse kernels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"our" -> "a"
This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication. | ||
In order to realize performance gains, we need to swap out our dense kernels for sparse kernels. | ||
|
||
These sparse kerenels work by allowing us to skip calculations involving pruned elements. To do this, these kerenls work on sparse matrices, which are do not store the pruned elements and store the specified elements in a compressed format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strike "by allowing us" ... "work by skipping calculations...". Also kernels is spelled wrong.
These sparse kerenels work by allowing us to skip calculations involving pruned elements. To do this, these kerenls work on sparse matrices, which are do not store the pruned elements and store the specified elements in a compressed format. | ||
For semi-structured sparsity, we store exactly half of the original parameters along with some compressed metadata about how the elements were arranged. | ||
|
||
.. image:: https://developer-blogs.nvidia.com/wp-content/uploads/2023/06/2-4-structured-sparsity-pattern.png |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're going to embed this image, give credit in a caption from where it was taken.
:width: 80% | ||
|
||
|
||
However, this is not the only way to store sparse tensors. There are other formats like `COO <https://pytorch.org/docs/2.1/sparse.html#sparse-coo-tensors>`_ representation, which are used with **unstructured sparsity**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to include this blurb on COO, I'd move it up top into the intro where you talk about alternative sparsity methods. I don't think it makes sense here. You're just getting into the flow of showcasing the API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to frame semi-structured sparsity as a kind of "goldilocks" between unstructured and structured sparsity, but I agree that this is kind of abrupt.
index.rst
Outdated
@@ -544,6 +544,13 @@ What's new in PyTorch tutorials? | |||
:link: advanced/static_quantization_tutorial.html | |||
:tags: Quantization | |||
|
|||
.. customcarditem:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add to the https://github.com/pytorch/tutorials/blob/main/prototype_source/prototype_index.rst rather than in index.rst
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Happy to merge after approval from the tech reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing comments, looks great