Skip to content

Commit 074597d

Browse files
committed
Add FSDP reference
PyTorch Distributed Overview page (https://pytorch.org/tutorials/beginner/dist_overview.html) is widely used to learn basics of distributed package and its offerings. Seems like this page was created before the FSDP support was added in PyTorch. This PR adds missing FSDP reference.
1 parent c703e69 commit 074597d

File tree

1 file changed

+16
-1
lines changed

1 file changed

+16
-1
lines changed

beginner_source/dist_overview.rst

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,10 @@ common development trajectory would be:
7474
4. Use multi-machine `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
7575
and the `launching script <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__,
7676
if the application needs to scale across machine boundaries.
77-
5. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
77+
5. Use multi-GPU `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
78+
training on a single-machine or multi-machine when the data and model cannot
79+
fit on one GPU.
80+
6. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
7881
to launch distributed training if errors (e.g., out-of-memory) are expected or if
7982
resources can join and leave dynamically during training.
8083

@@ -134,6 +137,18 @@ DDP materials are listed below:
134137
5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__
135138
tutorial walks through using the generic join context for distributed training with uneven inputs.
136139

140+
141+
``torch.distributed.FullyShardedDataParallel``
142+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
143+
144+
The `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
145+
(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s
146+
parameters, gradients and optimizer states, it shards all of these states across
147+
data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial
148+
`Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
149+
provides in depth explanation and example of how FSDP works.
150+
151+
137152
torch.distributed.elastic
138153
~~~~~~~~~~~~~~~~~~~~~~~~~
139154

0 commit comments

Comments
 (0)