Skip to content

Commit 83ca5f9

Browse files
committed
Let existing PT Distributed tutorials link to the overview page
1 parent 785de98 commit 83ca5f9

File tree

7 files changed

+31
-8
lines changed

7 files changed

+31
-8
lines changed

index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@ Welcome to PyTorch Tutorials
299299
300300
.. customcarditem::
301301
:header: PyTorch Distributed Overview
302-
:card_description: Have a high-level overview of all concepts and features in the distributed package. Use this to find the distributed training technology can best serve your application.
302+
:card_description: Briefly go over all concepts and features in the distributed package. Use this document to find the distributed training technology that can best serve your application.
303303
:image: _static/img/thumbnails/cropped/PyTorch-Distributed-Overview.png
304304
:link: beginner/dist_overview.html
305305
:tags: Parallel-and-Distributed-Training

intermediate_source/ddp_tutorial.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@ Getting Started with Distributed Data Parallel
22
=================================================
33
**Author**: `Shen Li <https://mrshenli.github.io/>`_
44

5+
Prerequisites:
6+
7+
- `PyTorch Distributed Overview <../beginner/dist_overview.html>`__
8+
- `DistributedDataParallel API documents <https://pytorch.org/docs/master/generated/torch.nn.parallel.DistributedDataParallel.html>`__
9+
_ `DistributedDataParallel notes <https://pytorch.org/docs/master/notes/ddp.html>`__
10+
11+
512
`DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__
613
(DDP) implements data parallelism at the module level which can run across
714
multiple machines. Applications using DDP should spawn multiple processes and

intermediate_source/dist_pipeline_parallel_tutorial.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ Distributed Pipeline Parallelism Using RPC
44

55
Prerequisites:
66

7+
- `PyTorch Distributed Overview <../beginner/dist_overview.html>`__
78
- `Single-Machine Model Parallel Best Practices <https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html>`__
89
- `Getting started with Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_tutorial.html>`__
910
- RRef helper functions:

intermediate_source/dist_tuto.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@ Writing Distributed Applications with PyTorch
22
=============================================
33
**Author**: `Séb Arnold <https://seba1511.com>`_
44

5+
Prerequisites:
6+
7+
- `PyTorch Distributed Overview <../beginner/dist_overview.html>`__
8+
59
In this short tutorial, we will be going over the distributed package
610
of PyTorch. We'll see how to set up the distributed setting, use the
711
different communication strategies, and go over some the internals of

intermediate_source/rpc_async_execution.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,9 @@ Implementing Batch RPC Processing Using Asynchronous Executions
55

66
Prerequisites:
77

8-
- `Getting started with Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_tutorial.html>`__
9-
- `Implementing a Parameter Server using Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_param_server_tutorial.html>`__
8+
- `PyTorch Distributed Overview <../beginner/dist_overview.html>`__
9+
- `Getting started with Distributed RPC Framework <rpc_tutorial.html>`__
10+
- `Implementing a Parameter Server using Distributed RPC Framework <rpc_param_server_tutorial.html>`__
1011
- `RPC Asynchronous Execution Decorator <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.functions.async_execution>`__
1112

1213
This tutorial demonstrates how to build batch-processing RPC applications with

intermediate_source/rpc_param_server_tutorial.rst

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,11 @@ Implementing a Parameter Server Using Distributed RPC Framework
44

55
**Author**\ : `Rohan Varma <https://github.com/rohan-varma>`_
66

7+
Prerequisites:
8+
9+
- `PyTorch Distributed Overview <../beginner/dist_overview.html>`__
10+
- `RPC API documents <https://pytorch.org/docs/master/rpc.html>`__
11+
712
This tutorial walks through a simple example of implementing a parameter server using PyTorch's `Distributed RPC framework <https://pytorch.org/docs/stable/rpc.html>`_. The parameter server framework is a paradigm in which a set of servers store parameters, such as large embedding tables, and several trainers query the parameter servers in order to retrieve the most up to date parameters. These trainers can run a training loop locally and occasionally synchronize with the parameter server to get the latest parameters. For more reading on the parameter server approach, check out `this paper <https://www.cs.cmu.edu/~muli/file/parameter_server_osdi14.pdf>`_.
813

914
Using the Distributed RPC Framework, we'll build an example where multiple trainers use RPC to communicate with the same parameter server and use `RRef <https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.RRef>`_ to access states on the remote parameter server instance. Each trainer will launch its dedicated backward pass in a distributed fashion through stitching of the autograd graph across multiple nodes using distributed autograd.
@@ -78,7 +83,7 @@ Next, let's define some helper functions that will be useful for the rest of our
7883
7984
# On the local node, call a method with first arg as the value held by the
8085
# RRef. Other args are passed in as arguments to the function called.
81-
# Useful for calling instance methods. method could be any matching function, including
86+
# Useful for calling instance methods. method could be any matching function, including
8287
# class methods.
8388
def call_method(method, rref, *args, **kwargs):
8489
return method(rref.local_value(), *args, **kwargs)
@@ -119,7 +124,7 @@ Next, we'll define our forward pass. Note that regardless of the device of the m
119124
# Tensors must be moved in and out of GPU memory due to this.
120125
out = out.to("cpu")
121126
return out
122-
Next, we'll define a few miscellaneous functions useful for training and verification purposes. The first, ``get_dist_gradients``\ , will take in a Distributed Autograd context ID and call into the ``dist_autograd.get_gradients`` API in order to retrieve gradients computed by distributed autograd. More information can be found in the `distributed autograd documentation <https://pytorch.org/docs/stable/rpc.html#distributed-autograd-framework>`_. Note that we also iterate through the resulting dictionary and convert each tensor to a CPU tensor, as the framework currently only supports sending tensors over RPC. Next, ``get_param_rrefs`` will iterate through our model parameters and wrap them as a (local) `RRef <https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.RRef>`_. This method will be invoked over RPC by trainer nodes and will return a list of the parameters to be optimized. This is required as input to the `Distributed Optimizer <https://pytorch.org/docs/stable/rpc.html#module-torch.distributed.optim>`_\ , which requires all parameters it must optimize as a list of ``RRef``\ s.
127+
Next, we'll define a few miscellaneous functions useful for training and verification purposes. The first, ``get_dist_gradients``\ , will take in a Distributed Autograd context ID and call into the ``dist_autograd.get_gradients`` API in order to retrieve gradients computed by distributed autograd. More information can be found in the `distributed autograd documentation <https://pytorch.org/docs/stable/rpc.html#distributed-autograd-framework>`_. Note that we also iterate through the resulting dictionary and convert each tensor to a CPU tensor, as the framework currently only supports sending tensors over RPC. Next, ``get_param_rrefs`` will iterate through our model parameters and wrap them as a (local) `RRef <https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.RRef>`_. This method will be invoked over RPC by trainer nodes and will return a list of the parameters to be optimized. This is required as input to the `Distributed Optimizer <https://pytorch.org/docs/stable/rpc.html#module-torch.distributed.optim>`_\ , which requires all parameters it must optimize as a list of ``RRef``\ s.
123128
124129
.. code-block:: python
125130
@@ -224,7 +229,7 @@ Below, we initialize our ``TrainerNet`` and build a ``DistributedOptimizer``. No
224229
# Build DistributedOptimizer.
225230
param_rrefs = net.get_global_param_rrefs()
226231
opt = DistributedOptimizer(optim.SGD, param_rrefs, lr=0.03)
227-
Next, we define our main training loop. We loop through iterables given by PyTorch's `DataLoader <https://pytorch.org/docs/stable/data.html>`_. Before writing our typical forward/backward/optimizer loop, we first wrap the logic within a `Distributed Autograd context <https://pytorch.org/docs/stable/rpc.html#torch.distributed.autograd.context>`_. Note that this is needed to record RPCs invoked in the model's forward pass, so that an appropriate graph can be constructed which includes all participating distributed workers in the backward pass. The distributed autograd context returns a ``context_id`` which serves as an identifier for accumulating and optimizing gradients corresponding to a particular iteration.
232+
Next, we define our main training loop. We loop through iterables given by PyTorch's `DataLoader <https://pytorch.org/docs/stable/data.html>`_. Before writing our typical forward/backward/optimizer loop, we first wrap the logic within a `Distributed Autograd context <https://pytorch.org/docs/stable/rpc.html#torch.distributed.autograd.context>`_. Note that this is needed to record RPCs invoked in the model's forward pass, so that an appropriate graph can be constructed which includes all participating distributed workers in the backward pass. The distributed autograd context returns a ``context_id`` which serves as an identifier for accumulating and optimizing gradients corresponding to a particular iteration.
228233

229234
As opposed to calling the typical ``loss.backward()`` which would kick off the backward pass on this local worker, we call ``dist_autograd.backward()`` and pass in our context_id as well as ``loss``\ , which is the root at which we want the backward pass to begin. In addition, we pass this ``context_id`` into our optimizer call, which is required to be able to look up the corresponding gradients computed by this particular backwards pass across all nodes.
230235

@@ -259,7 +264,7 @@ The following simply computes the accuracy of our model after we're done trainin
259264
model.eval()
260265
correct_sum = 0
261266
# Use GPU to evaluate if possible
262-
device = torch.device("cuda:0" if model.num_gpus > 0
267+
device = torch.device("cuda:0" if model.num_gpus > 0
263268
and torch.cuda.is_available() else "cpu")
264269
with torch.no_grad():
265270
for i, (data, target) in enumerate(test_loader):
@@ -330,7 +335,7 @@ We've now completed our trainer and parameter server specific code, and all that
330335
assert args.num_gpus <= 3, f"Only 0-2 GPUs currently supported (got {args.num_gpus})."
331336
os.environ['MASTER_ADDR'] = args.master_addr
332337
os.environ["MASTER_PORT"] = args.master_port
333-
Now, we'll create a process corresponding to either a parameter server or trainer depending on our command line arguments. We'll create a ``ParameterServer`` if our passed in rank is 0, and a ``TrainerNet`` otherwise. Note that we're using ``torch.multiprocessing`` to launch a subprocess corresponding to the function that we want to execute, and waiting on this process's completion from the main thread with ``p.join()``. In the case of initializing our trainers, we also use PyTorch's `dataloaders <https://pytorch.org/docs/stable/data.html>`_ in order to specify train and test data loaders on the MNIST dataset.
338+
Now, we'll create a process corresponding to either a parameter server or trainer depending on our command line arguments. We'll create a ``ParameterServer`` if our passed in rank is 0, and a ``TrainerNet`` otherwise. Note that we're using ``torch.multiprocessing`` to launch a subprocess corresponding to the function that we want to execute, and waiting on this process's completion from the main thread with ``p.join()``. In the case of initializing our trainers, we also use PyTorch's `dataloaders <https://pytorch.org/docs/stable/data.html>`_ in order to specify train and test data loaders on the MNIST dataset.
334339

335340
.. code-block:: python
336341

intermediate_source/rpc_tutorial.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,11 @@ Getting Started with Distributed RPC Framework
33
**Author**: `Shen Li <https://mrshenli.github.io/>`_
44

55

6+
Prerequisites:
7+
8+
- `PyTorch Distributed Overview <../beginner/dist_overview.html>`__
9+
- `RPC API documents <https://pytorch.org/docs/master/rpc.html>`__
10+
611
This tutorial uses two simple examples to demonstrate how to build distributed
712
training with the `torch.distributed.rpc <https://pytorch.org/docs/master/rpc.html>`__
813
package which is first introduced as an experimental feature in PyTorch v1.4.

0 commit comments

Comments
 (0)