Skip to content

Add semi-structured sparse tutorial #2574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 3, 2023
Merged

Conversation

jcaip
Copy link
Contributor

@jcaip jcaip commented Sep 26, 2023

  • Add new semi-structured sparsity tutorial.
  • update index.rst
  • add image

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 26, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2574

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jcaip jcaip force-pushed the jcaip/semi-structured-sparse branch from ae6d49e to 4773257 Compare September 26, 2023 11:37
@svekars svekars added the 2.1 label Sep 26, 2023
@jcaip jcaip requested a review from cpuhrsch September 26, 2023 16:05
@cpuhrsch
Copy link
Contributor

Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor editorial nits. Thank you!

jcaip and others added 10 commits September 26, 2023 17:42
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Copy link

@jisaacso jisaacso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass without running tutorial code, just feedback on prose. Second pass later.

It is also known as **fine-grained structured sparsity** or **2:4 structured sparsity**.

Semi-structured sparsity derives its name from its unique sparsity pattern, where n out of every 2n elements are pruned.
In practice, we most often see n = 2, hence 2:4 sparsity. This sparsity pattern is particularly interesting for two reasons:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would collapse this down to something like

Today, we most often see n = 2, hence 2:4 sparsity. 2:4 semi-structured sparsity was designed to be efficiently accelerated on GPUs. In 2021 (?confirm?), NVIDIA introduced hardware support for semi-structured sparsity in their Ampere architecture, and have also released fast sparse kernels via CUTLASS/cuSPARSELt <https://docs.nvidia.com/cuda/cusparselt/index.html>_.

The second point isn't saying much unless you go into much more detail on alternative sparsity methods, and why they can compress more.

Copy link
Contributor Author

@jcaip jcaip Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to mention that 2:4 sparsity degrades model accuracy less than other sparse methods.

I'm thinking something like:

Today, we most often see n = 2, hence 2:4 sparsity. Semi-structured (2:4) sparsity is particularly interesting because it offers GPU hardware acceleration with minimal loss to model accuracy. In 2020, NVIDIA introduced hardware support for semi-structured sparsity in their Ampere architecture, and have also released fast sparse kernels via CUTLASS/cuSPARSELt https://docs.nvidia.com/cuda/cusparselt/index.html_.

assert torch.allclose(sparse_output, dense_output, atol=1e-3)
print(f"Dense: {dense_t:.3f}ms Sparse: {sparse_t:.3f}ms | Speedup: {(dense_t / sparse_t):.3f}x")

On my machine (A100 80GB), I see: `Dense: 0.870ms Sparse: 0.630ms | Speedup: 1.382x`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strike "my". On an A100, 80GB, we can see:

# mask Linear weight to be 2:4 sparse
mask = torch.Tensor([0, 0, 1, 1]).tile((3072, 2560)).cuda().bool()
linear = torch.nn.Linear(10240, 3072).half().cuda().eval()
linear.weight = torch.nn.Parameter(mask * linear.weight)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to mask out weights on the dense model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so the numerics match down the line when comparing against the sparse output. Also, we expect the input to to_sparse_semi_structured to be 2:4 sparse already, otherwise we error out.

.. note::

This tutorial is designed for beginners to semi-structured sparsity / sparsity in general.
For users with existing 2:4 sparse models, accelerating ``nn.Linear`` layers for inference is as easy as running the following:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highlight the API here, "... with to_sparse_semi_structured "

The general motivation behind sparsity is simple: if there are zeros in your network, you can avoid storing / doing compute with those parameters.
However, the specifics of sparsity are tricky. Zeroing out parameters doesn't affect the latency / memory overhead of our model out of the box.

This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduce the word kernel here.

"This is because the dense tensor itself still contains the pruned elements, and the respective matrix multiplication kernel will still operate on those elements."

However, the specifics of sparsity are tricky. Zeroing out parameters doesn't affect the latency / memory overhead of our model out of the box.

This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication.
In order to realize performance gains, we need to swap out our dense kernels for sparse kernels.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"our" -> "a"

This is because the dense tensor itself still contains the pruned elements and will still operate on those elements during matrix multiplication.
In order to realize performance gains, we need to swap out our dense kernels for sparse kernels.

These sparse kerenels work by allowing us to skip calculations involving pruned elements. To do this, these kerenls work on sparse matrices, which are do not store the pruned elements and store the specified elements in a compressed format.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strike "by allowing us" ... "work by skipping calculations...". Also kernels is spelled wrong.

These sparse kerenels work by allowing us to skip calculations involving pruned elements. To do this, these kerenls work on sparse matrices, which are do not store the pruned elements and store the specified elements in a compressed format.
For semi-structured sparsity, we store exactly half of the original parameters along with some compressed metadata about how the elements were arranged.

.. image:: https://developer-blogs.nvidia.com/wp-content/uploads/2023/06/2-4-structured-sparsity-pattern.png
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're going to embed this image, give credit in a caption from where it was taken.

:width: 80%


However, this is not the only way to store sparse tensors. There are other formats like `COO <https://pytorch.org/docs/2.1/sparse.html#sparse-coo-tensors>`_ representation, which are used with **unstructured sparsity**.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to include this blurb on COO, I'd move it up top into the intro where you talk about alternative sparsity methods. I don't think it makes sense here. You're just getting into the flow of showcasing the API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to frame semi-structured sparsity as a kind of "goldilocks" between unstructured and structured sparsity, but I agree that this is kind of abrupt.

index.rst Outdated
@@ -544,6 +544,13 @@ What's new in PyTorch tutorials?
:link: advanced/static_quantization_tutorial.html
:tags: Quantization

.. customcarditem::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jcaip and others added 5 commits October 2, 2023 14:44
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Co-authored-by: Svetlana Karslioglu <svekars@fb.com>
Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Happy to merge after approval from the tech reviewers.

Copy link

@jisaacso jisaacso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments, looks great

@svekars svekars merged commit f381abf into main Oct 3, 2023
@svekars svekars deleted the jcaip/semi-structured-sparse branch October 3, 2023 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants