Add semi-structured sparse tutorial #2574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

svekars merged 17 commits into main from jcaip/semi-structured-sparse

Oct 3, 2023

Contributor

jcaip commented Sep 26, 2023

Add new semi-structured sparsity tutorial.
update index.rst
add image

pytorch-bot bot commented Sep 26, 2023 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2574

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the cla signed label


          Add semi-structured sparse tutorial

jcaip force-pushed the jcaip/semi-structured-sparse branch from ae6d49e to 4773257 Compare

September 26, 2023 11:37

svekars added the 2.1 label

jcaip requested a review from cpuhrsch

September 26, 2023 16:05

Contributor

cpuhrsch commented Sep 26, 2023

https://docs-preview.pytorch.org/pytorch/tutorials/2574/prototype/semi_structured_sparse.html?highlight=accelelerating%20bert%20semi%20structured is the link to the rendered tutorial

svekars reviewed

View reviewed changes

Contributor

svekars left a comment

A few minor editorial nits. Thank you!

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

prototype_source/semi_structured_sparse.rst Outdated Show resolved Hide resolved

jcaip and others added 10 commits

September 26, 2023 17:42


          Update prototype_source/semi_structured_sparse.rst

2f81f44

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

14b69a3

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

dddd0e0

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

732af27

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

9f33b67

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

4d600b9

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

cc36419

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

11af4a0

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

cd3c07a

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Update prototype_source/semi_structured_sparse.rst

347cc2b

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>

jisaacso reviewed

View reviewed changes

jisaacso left a comment

First pass without running tutorial code, just feedback on prose. Second pass later.

prototype_source/semi_structured_sparse.rst Outdated

+              It is also known as **fine-grained structured sparsity** or **2:4 structured sparsity**.
+              Semi-structured sparsity derives its name from its unique sparsity pattern, where n out of every 2n elements are pruned.
+              In practice, we most often see n = 2, hence 2:4 sparsity. This sparsity pattern is particularly interesting for two reasons:

jisaacso Oct 2, 2023

I would collapse this down to something like

Today, we most often see n = 2, hence 2:4 sparsity. 2:4 semi-structured sparsity was designed to be efficiently accelerated on GPUs. In 2021 (?confirm?), NVIDIA introduced hardware support for semi-structured sparsity in their Ampere architecture, and have also released fast sparse kernels via CUTLASS/cuSPARSELt <https://docs.nvidia.com/cuda/cusparselt/index.html>_.

The second point isn't saying much unless you go into much more detail on alternative sparsity methods, and why they can compress more.

Contributor Author

jcaip Oct 2, 2023 •

edited

Loading

I think it's important to mention that 2:4 sparsity degrades model accuracy less than other sparse methods.

I'm thinking something like:

Today, we most often see n = 2, hence 2:4 sparsity. Semi-structured (2:4) sparsity is particularly interesting because it offers GPU hardware acceleration with minimal loss to model accuracy. In 2020, NVIDIA introduced hardware support for semi-structured sparsity in their Ampere architecture, and have also released fast sparse kernels via CUTLASS/cuSPARSELt https://docs.nvidia.com/cuda/cusparselt/index.html_.

prototype_source/semi_structured_sparse.rst Outdated

+                          assert torch.allclose(sparse_output, dense_output, atol=1e-3)
+                          print(f"Dense: {dense_t:.3f}ms Sparse: {sparse_t:.3f}ms | Speedup: {(dense_t / sparse_t):.3f}x")
+                  On my machine (A100 80GB), I see: `Dense: 0.870ms Sparse: 0.630ms | Speedup: 1.382x`

jisaacso Oct 2, 2023

strike "my". On an A100, 80GB, we can see:

prototype_source/semi_structured_sparse.rst

+                      # mask Linear weight to be 2:4 sparse
+                      mask = torch.Tensor([0, 0, 1, 1]).tile((3072, 2560)).cuda().bool()
+                      linear = torch.nn.Linear(10240, 3072).half().cuda().eval()
+                      linear.weight = torch.nn.Parameter(mask * linear.weight)

jisaacso Oct 2, 2023

why do you need to mask out weights on the dense model?

Contributor Author

jcaip Oct 2, 2023

This is so the numerics match down the line when comparing against the sparse output. Also, we expect the input to to_sparse_semi_structured to be 2:4 sparse already, otherwise we error out.

prototype_source/semi_structured_sparse.rst Outdated

+              .. note::
+                  This tutorial is designed for beginners to semi-structured sparsity / sparsity in general.
+                  For users with existing 2:4 sparse models, accelerating ``nn.Linear`` layers for inference is as easy as running the following:

jisaacso Oct 2, 2023

I would highlight the API here, "... with to_sparse_semi_structured "

prototype_source/semi_structured_sparse.rst Outdated

+              The general motivation behind sparsity is simple: if there are zeros in your network, you can avoid storing / doing compute with those parameters.
+              However, the specifics of sparsity are tricky. Zeroing out parameters doesn't affect the latency / memory overhead of our model out of the box.
+              This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication.

jisaacso Oct 2, 2023

introduce the word kernel here.

"This is because the dense tensor itself still contains the pruned elements, and the respective matrix multiplication kernel will still operate on those elements."

prototype_source/semi_structured_sparse.rst Outdated

+              However, the specifics of sparsity are tricky. Zeroing out parameters doesn't affect the latency / memory overhead of our model out of the box.
+              This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication.
+              In order to realize performance gains, we need to swap out our dense kernels for sparse kernels.

jisaacso Oct 2, 2023

"our" -> "a"

prototype_source/semi_structured_sparse.rst Outdated

+              This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication.
+              In order to realize performance gains, we need to swap out our dense kernels for sparse kernels.
+              These sparse kerenels work by allowing us to skip calculations involving pruned elements. To do this, these kerenls work on sparse matrices, which are do not store the pruned elements and store the specified elements in a compressed format.

jisaacso Oct 2, 2023

strike "by allowing us" ... "work by skipping calculations...". Also kernels is spelled wrong.

prototype_source/semi_structured_sparse.rst

+              These sparse kerenels work by allowing us to skip calculations involving pruned elements. To do this, these kerenls work on sparse matrices, which are do not store the pruned elements and store the specified elements in a compressed format.
+              For semi-structured sparsity, we store exactly half of the original parameters along with some compressed metadata about how the elements were arranged.
+              .. image:: https://developer-blogs.nvidia.com/wp-content/uploads/2023/06/2-4-structured-sparsity-pattern.png

jisaacso Oct 2, 2023

if you're going to embed this image, give credit in a caption from where it was taken.

prototype_source/semi_structured_sparse.rst Outdated

		:width: 80%


		However, this is not the only way to store sparse tensors. There are other formats like `COO <https://pytorch.org/docs/2.1/sparse.html#sparse-coo-tensors>`_ representation, which are used with unstructured sparsity.

jisaacso Oct 2, 2023

If you're going to include this blurb on COO, I'd move it up top into the intro where you talk about alternative sparsity methods. I don't think it makes sense here. You're just getting into the flow of showcasing the API

Contributor Author

jcaip Oct 2, 2023

I want to frame semi-structured sparsity as a kind of "goldilocks" between unstructured and structured sparsity, but I agree that this is kind of abrupt.

svekars reviewed

View reviewed changes

index.rst Outdated

@@ @@ -544,6 +544,13 @@ What's new in PyTorch tutorials? @@
                  :link: advanced/static_quantization_tutorial.html
                  :tags: Quantization
+              .. customcarditem::

Contributor

svekars Oct 2, 2023

please add to the https://github.com/pytorch/tutorials/blob/main/prototype_source/prototype_index.rst rather than in index.rst

jcaip and others added 5 commits

October 2, 2023 14:44


          Update prototype_source/semi_structured_sparse.rst

e64a41f

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          Apply suggestions from code review

672df41

Co-authored-by: Svetlana Karslioglu <svekars@fb.com>


          addess cr feedback

f7fe360


          fix spellcheck

bddb476


          more spellcheck fixes

4e336eb

svekars approved these changes

View reviewed changes

Contributor

svekars left a comment

LGTM! Happy to merge after approval from the tech reviewers.


          Merge branch 'main' into jcaip/semi-structured-sparse

e1fbc45

jisaacso approved these changes

View reviewed changes

jisaacso left a comment

Thanks for addressing comments, looks great

svekars merged commit f381abf into main

svekars deleted the jcaip/semi-structured-sparse branch

October 3, 2023 16:03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels