Add FSDP reference (#2738)

spzala · web-flow · commit f3c4ed7dbe72 · 2024-01-22T13:54:02.000-08:00
PyTorch Distributed Overview page (https://pytorch.org/tutorials/beginner/dist_overview.html) is widely used to learn basics of distributed package and its offerings. Seems like this page was created before the FSDP support was added in PyTorch. This PR adds missing FSDP reference.
diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst
@@ -74,7 +74,10 @@ common development trajectory would be:
 4. Use multi-machine `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
    and the `launching script <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__,
    if the application needs to scale across machine boundaries.
-5. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
+5. Use multi-GPU `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
+   training on a single-machine or multi-machine when the data and model cannot
+   fit on one GPU.
+6. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
    to launch distributed training if errors (e.g., out-of-memory) are expected or if
    resources can join and leave dynamically during training.
 
@@ -134,6 +137,18 @@ DDP materials are listed below:
 5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__
    tutorial walks through using the generic join context for distributed training with uneven inputs.
 
+
+``torch.distributed.FullyShardedDataParallel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
+(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s
+parameters, gradients and optimizer states, it shards all of these states across
+data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial
+`Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
+provides in depth explanation and example of how FSDP works.
+
+
 torch.distributed.elastic
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/en-wordlist.txt b/en-wordlist.txt
@@ -95,6 +95,7 @@ ExportDB
 FC
 FGSM
 FLAVA
+FSDP
 FX
 FX's
 FloydHub

-Original file line number
+Diff line change
 FC
 FGSM
 FLAVA
 +FSDP
 FX
 FX's
 FloydHub