Skip to content

Add FSDP reference in PyTorch Distributed doc #2738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion beginner_source/dist_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,10 @@ common development trajectory would be:
4. Use multi-machine `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
and the `launching script <https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md>`__,
if the application needs to scale across machine boundaries.
5. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
5. Use multi-GPU `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
training on a single-machine or multi-machine when the data and model cannot
fit on one GPU.
6. Use `torch.distributed.elastic <https://pytorch.org/docs/stable/distributed.elastic.html>`__
to launch distributed training if errors (e.g., out-of-memory) are expected or if
resources can join and leave dynamically during training.

Expand Down Expand Up @@ -134,6 +137,18 @@ DDP materials are listed below:
5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__
tutorial walks through using the generic join context for distributed training with uneven inputs.


``torch.distributed.FullyShardedDataParallel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The `FullyShardedDataParallel <https://pytorch.org/docs/stable/fsdp.html>`__
(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s
parameters, gradients and optimizer states, it shards all of these states across
data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial
`Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
provides in depth explanation and example of how FSDP works.


torch.distributed.elastic
~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
1 change: 1 addition & 0 deletions en-wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ ExportDB
FC
FGSM
FLAVA
FSDP
FX
FX's
FloydHub
Expand Down