diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst index 7768dc4876c..7309693e0e1 100644 --- a/beginner_source/dist_overview.rst +++ b/beginner_source/dist_overview.rst @@ -74,7 +74,10 @@ common development trajectory would be: 4. Use multi-machine `DistributedDataParallel `__ and the `launching script `__, if the application needs to scale across machine boundaries. -5. Use `torch.distributed.elastic `__ +5. Use multi-GPU `FullyShardedDataParallel `__ + training on a single-machine or multi-machine when the data and model cannot + fit on one GPU. +6. Use `torch.distributed.elastic `__ to launch distributed training if errors (e.g., out-of-memory) are expected or if resources can join and leave dynamically during training. @@ -134,6 +137,18 @@ DDP materials are listed below: 5. The `Distributed Training with Uneven Inputs Using the Join Context Manager <../advanced/generic_join.html>`__ tutorial walks through using the generic join context for distributed training with uneven inputs. + +``torch.distributed.FullyShardedDataParallel`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The `FullyShardedDataParallel `__ +(FSDP) is a type of data parallelism paradigm which maintains a per-GPU copy of a model’s +parameters, gradients and optimizer states, it shards all of these states across +data-parallel workers. The support for FSDP was added starting PyTorch v1.11. The tutorial +`Getting Started with FSDP `__ +provides in depth explanation and example of how FSDP works. + + torch.distributed.elastic ~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/en-wordlist.txt b/en-wordlist.txt index 1ec9abb68de..da13503a56c 100644 --- a/en-wordlist.txt +++ b/en-wordlist.txt @@ -95,6 +95,7 @@ ExportDB FC FGSM FLAVA +FSDP FX FX's FloydHub