Skip to content

Commit f3c4ed7

Browse files
authored
Add FSDP reference (#2738)
PyTorch Distributed Overview page (https://pytorch.org/tutorials/beginner/dist_overview.html) is widely used to learn basics of distributed package and its offerings. Seems like this page was created before the FSDP support was added in PyTorch. This PR adds missing FSDP reference.
1 parent c703e69 commit f3c4ed7

File tree

2 files changed

+17
-1
lines changed

2 files changed

+17
-1
lines changed

beginner_source/dist_overview.rst

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,10 @@ common development trajectory would be:
7474
4. Use multi-machine `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
7575
and the `launching script <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__,
7676
if the application needs to scale across machine boundaries.
77-
5. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
77+
5. Use multi-GPU `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
78+
training on a single-machine or multi-machine when the data and model cannot
79+
fit on one GPU.
80+
6. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
7881
to launch distributed training if errors (e.g., out-of-memory) are expected or if
7982
resources can join and leave dynamically during training.
8083

@@ -134,6 +137,18 @@ DDP materials are listed below:
134137
5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__
135138
tutorial walks through using the generic join context for distributed training with uneven inputs.
136139

140+
141+
``torch.distributed.FullyShardedDataParallel``
142+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143+
144+
The `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
145+
(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s
146+
parameters, gradients and optimizer states, it shards all of these states across
147+
data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial
148+
`Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
149+
provides in depth explanation and example of how FSDP works.
150+
151+
137152
torch.distributed.elastic
138153
~~~~~~~~~~~~~~~~~~~~~~~~~
139154

en-wordlist.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ ExportDB
9595
FC
9696
FGSM
9797
FLAVA
98+
FSDP
9899
FX
99100
FX's
100101
FloydHub

0 commit comments

Comments
 (0)