Skip to content

Commit d45477c

Browse files
authored
Merge branch 'main' into docs/autoload
2 parents 9a1b2f7 + d862a95 commit d45477c

16 files changed

+1860
-269
lines changed

.ci/docker/requirements.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,5 +68,7 @@ iopath
6868
pygame==2.6.0
6969
pycocotools
7070
semilearn==0.3.2
71-
torchao==0.0.3
71+
torchao==0.5.0
7272
segment_anything==1.0
73+
torchrec==0.8.0
74+
fbgemm-gpu==0.8.0

.jenkins/metadata.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@
2828
"intermediate_source/model_parallel_tutorial.py": {
2929
"needs": "linux.16xlarge.nvidia.gpu"
3030
},
31+
"intermediate_source/torchrec_intro_tutorial.py": {
32+
"needs": "linux.g5.4xlarge.nvidia.gpu"
33+
},
3134
"recipes_source/torch_export_aoti_python.py": {
3235
"needs": "linux.g5.4xlarge.nvidia.gpu"
3336
},

beginner_source/dist_overview.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Sharding primitives
3535

3636
``DTensor`` and ``DeviceMesh`` are primitives used to build parallelism in terms of sharded or replicated tensors on N-dimensional process groups.
3737

38-
- `DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md>`__ represents a tensor that is sharded and/or replicated, and communicates automatically to reshard tensors as needed by operations.
38+
- `DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/README.md>`__ represents a tensor that is sharded and/or replicated, and communicates automatically to reshard tensors as needed by operations.
3939
- `DeviceMesh <https://pytorch.org/docs/stable/distributed.html#devicemesh>`__ abstracts the accelerator device communicators into a multi-dimensional array, which manages the underlying ``ProcessGroup`` instances for collective communications in multi-dimensional parallelisms. Try out our `Device Mesh Recipe <https://pytorch.org/tutorials/recipes/distributed_device_mesh.html>`__ to learn more.
4040

4141
Communications APIs

en-wordlist.txt

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -619,3 +619,32 @@ warmup
619619
webp
620620
wsi
621621
wsis
622+
Meta's
623+
RecSys
624+
TorchRec
625+
sharding
626+
TBE
627+
dtype
628+
EBC
629+
sharder
630+
hyperoptimized
631+
DMP
632+
unsharded
633+
lookups
634+
KJTs
635+
amongst
636+
async
637+
everytime
638+
prototyped
639+
GBs
640+
HBM
641+
gloo
642+
nccl
643+
Localhost
644+
gpu
645+
torchmetrics
646+
url
647+
colab
648+
sharders
649+
Criteo
650+
torchrec

index.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -439,6 +439,13 @@ Welcome to PyTorch Tutorials
439439
:link: advanced/python_custom_ops.html
440440
:tags: Extending-PyTorch,Frontend-APIs,C++,CUDA
441441

442+
.. customcarditem::
443+
:header: Compiled Autograd: Capturing a larger backward graph for ``torch.compile``
444+
:card_description: Learn how to use compiled autograd to capture a larger backward graph.
445+
:image: _static/img/thumbnails/cropped/generic-pytorch-logo.png
446+
:link: intermediate/compiled_autograd_tutorial
447+
:tags: Model-Optimization,CUDA
448+
442449
.. customcarditem::
443450
:header: Custom C++ and CUDA Operators
444451
:card_description: How to extend PyTorch with custom C++ and CUDA operators.
@@ -846,7 +853,7 @@ Welcome to PyTorch Tutorials
846853
:header: Introduction to TorchRec
847854
:card_description: TorchRec is a PyTorch domain library built to provide common sparsity & parallelism primitives needed for large-scale recommender systems.
848855
:image: _static/img/thumbnails/torchrec.png
849-
:link: intermediate/torchrec_tutorial.html
856+
:link: intermediate/torchrec_intro_tutorial.html
850857
:tags: TorchRec,Recommender
851858

852859
.. customcarditem::
@@ -1132,6 +1139,7 @@ Additional Resources
11321139
intermediate/nvfuser_intro_tutorial
11331140
intermediate/ax_multiobjective_nas_tutorial
11341141
intermediate/torch_compile_tutorial
1142+
intermediate/compiled_autograd_tutorial
11351143
intermediate/inductor_debug_cpu
11361144
intermediate/scaled_dot_product_attention_tutorial
11371145
beginner/knowledge_distillation_tutorial
@@ -1180,7 +1188,7 @@ Additional Resources
11801188
:hidden:
11811189
:caption: Recommendation Systems
11821190

1183-
intermediate/torchrec_tutorial
1191+
intermediate/torchrec_intro_tutorial
11841192
advanced/sharding
11851193

11861194
.. toctree::

intermediate_source/compiled_autograd_tutorial.rst

Lines changed: 302 additions & 0 deletions
Large diffs are not rendered by default.

intermediate_source/scaled_dot_product_attention_tutorial.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@ def generate_rand_batch(
244244

245245
######################################################################
246246
# Using SDPA with ``torch.compile``
247-
# =================================
247+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
248248
#
249249
# With the release of PyTorch 2.0, a new feature called
250250
# ``torch.compile()`` has been introduced, which can provide
@@ -324,9 +324,9 @@ def generate_rand_batch(
324324
#
325325

326326
######################################################################
327-
# Using SDPA with attn_bias subclasses`
328-
# ==========================================
329-
#
327+
# Using SDPA with attn_bias subclasses
328+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
329+
330330
# As of PyTorch 2.3, we have added a new submodule that contains tensor subclasses.
331331
# Designed to be used with ``torch.nn.functional.scaled_dot_product_attention``.
332332
# The module is named ``torch.nn.attention.bias`` and contains the following two
@@ -394,7 +394,7 @@ def generate_rand_batch(
394394

395395
######################################################################
396396
# Conclusion
397-
# ==========
397+
# ~~~~~~~~~~~
398398
#
399399
# In this tutorial, we have demonstrated the basic usage of
400400
# ``torch.nn.functional.scaled_dot_product_attention``. We have shown how

0 commit comments

Comments
 (0)