From 9cc665b7d99bbd6613e93c1c323d2c883927ceb5 Mon Sep 17 00:00:00 2001
From: Sahdev Zala <spzala@us.ibm.com>
Date: Mon, 22 Jan 2024 12:53:56 -0500
Subject: [PATCH] Add FSDP reference

PyTorch Distributed Overview page
(https://pytorch.org/tutorials/beginner/dist_overview.html)
is widely used to learn basics of distributed package and its
offerings. Seems like this page was created before the  FSDP
support was added in PyTorch. This PR adds missing FSDP reference.
---
 beginner_source/dist_overview.rst | 17 ++++++++++++++++-
 en-wordlist.txt                   |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst
index 7768dc4876c..7309693e0e1 100644
--- a/beginner_source/dist_overview.rst
+++ b/beginner_source/dist_overview.rst
@@ -74,7 +74,10 @@ common development trajectory would be:
 4. Use multi-machine `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
    and the `launching script <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__,
    if the application needs to scale across machine boundaries.
-5. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
+5. Use multi-GPU `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
+   training on a single-machine or multi-machine when the data and model cannot
+   fit on one GPU.
+6. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
    to launch distributed training if errors (e.g., out-of-memory) are expected or if
    resources can join and leave dynamically during training.
 
@@ -134,6 +137,18 @@ DDP materials are listed below:
 5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__
    tutorial walks through using the generic join context for distributed training with uneven inputs.
 
+
+``torch.distributed.FullyShardedDataParallel``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
+(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s
+parameters, gradients and optimizer states, it shards all of these states across
+data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial
+`Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
+provides in depth explanation and example of how FSDP works.
+
+
 torch.distributed.elastic
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/en-wordlist.txt b/en-wordlist.txt
index 1ec9abb68de..da13503a56c 100644
--- a/en-wordlist.txt
+++ b/en-wordlist.txt
@@ -95,6 +95,7 @@ ExportDB
 FC
 FGSM
 FLAVA
+FSDP
 FX
 FX's
 FloydHub