Skip to content

Release/1.6 #1087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Jul 28, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
353174f
Add TorchScript fork/join tutorial
Jun 11, 2020
28f044e
Merge pull request #1021 from jamesr66a/fork_join
Jun 18, 2020
4e97bce
Add note about zipfile format in serialization tutorial
Jun 19, 2020
90f4771
Merge pull request #1036 from jamesr66a/save_note
Jun 19, 2020
823ddca
Profiler recipe (#1019)
ilia-cher Jun 23, 2020
910dfa9
[mobile] Mobile Perf Recipe
IvanKobzarev Jun 16, 2020
7b182fe
Merge branch 'master' into recipe_mobile_perf
Jun 24, 2020
6285b8f
Merge pull request #1031 from IvanKobzarev/recipe_mobile_perf
Jun 24, 2020
02aef1d
Minor syntax edits to mobile perf recipe
Jun 30, 2020
d64b9aa
Remove built files
Jun 30, 2020
8e4e379
[android] android native app recipe
IvanKobzarev Jun 26, 2020
9cc3ce8
[mobile_perf][recipe] Add ChannelsLast recommendation
IvanKobzarev Jun 26, 2020
a79c227
Merge branch 'master' into recipe_aarlink_customop
Jul 2, 2020
7b615c1
Merge pull request #1041 from IvanKobzarev/recipe_aarlink_customop
Jul 2, 2020
e6ac673
Merge branch 'release/1.6' into mobile_perf_channels_last
Jul 2, 2020
86b5c4a
Adding distributed pipeline parallel tutorial
Apr 11, 2020
fff3fba
Merge pull request #1042 from IvanKobzarev/mobile_perf_channels_last
Jul 2, 2020
dec681d
Merge pull request #948 from mrshenli/pipeline
Jul 2, 2020
f0be561
Add async execution tutorials
mrshenli Jun 30, 2020
aef2568
Merge pull request #1045 from mrshenli/batch
Jul 3, 2020
51abc6a
Fix code block in pipeline tutorial
mrshenli Jul 4, 2020
2f3ab79
Merge pull request #1050 from mrshenli/fix
Jul 4, 2020
f8465c3
Adding an Overview Page for PyTorch Distributed (#1056)
mrshenli Jul 9, 2020
d766eeb
[Mobile Perf Recipe] Add the benchmarking part for iOS (#1055)
xta0 Jul 13, 2020
569e96f
RPC profiling recipe (#1068)
rohan-varma Jul 15, 2020
5aa9d48
Push latest changes from master into release/1.6 (#1074)
Jul 16, 2020
bfb3bb5
Tutorial for DDP + RPC (#1071)
pritamdamania87 Jul 16, 2020
f9f3088
Make RPC profiling recipe into prototype tutorial (#1078)
Jul 21, 2020
c8e79e1
Add RPC tutorial
Jul 21, 2020
f2c549d
Update to include recipes
Jul 21, 2020
d9d152b
Add Graph Mode Dynamic Quant tutorial (#1065)
supriyar Jul 21, 2020
b53f272
Add mobile recipes images
jlin27 Jul 21, 2020
a855e46
Update mobile recipe index
Jul 21, 2020
f8b200d
Remove RPC Profiling recipe from index
Jul 22, 2020
6e261b2
1.6 model freezing tutorial (#1077)
Jul 22, 2020
def85bd
Update title
Jul 22, 2020
7945534
Update recipes_index.rst
brianjo Jul 28, 2020
f4561a8
Update dcgan_faces_tutorial.py
gchanan Jul 28, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ intermediate
advanced
pytorch_basics
recipes
prototype

#data things
_data/
Expand Down Expand Up @@ -117,3 +118,6 @@ ENV/
.DS_Store
cleanup.sh
*.swp

# PyTorch things
*.pt
16 changes: 13 additions & 3 deletions .jenkins/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,16 @@ if [[ "${JOB_BASE_NAME}" == *worker_* ]]; then
FILES_TO_RUN+=($(basename $filename .py))
fi
count=$((count+1))
done
for filename in $(find prototype_source/ -name '*.py' -not -path '*/data/*'); do
if [ $(($count % $NUM_WORKERS)) != $WORKER_ID ]; then
echo "Removing runnable code from "$filename
python $DIR/remove_runnable_code.py $filename $filename
else
echo "Keeping "$filename
FILES_TO_RUN+=($(basename $filename .py))
fi
count=$((count+1))
done
echo "FILES_TO_RUN: " ${FILES_TO_RUN[@]}

Expand All @@ -94,13 +104,13 @@ if [[ "${JOB_BASE_NAME}" == *worker_* ]]; then

# Step 4: If any of the generated files are not related the tutorial files we want to run,
# then we remove them
for filename in $(find docs/beginner docs/intermediate docs/advanced docs/recipes -name '*.html'); do
for filename in $(find docs/beginner docs/intermediate docs/advanced docs/recipes docs/prototype -name '*.html'); do
file_basename=$(basename $filename .html)
if [[ ! " ${FILES_TO_RUN[@]} " =~ " ${file_basename} " ]]; then
rm $filename
fi
done
for filename in $(find docs/beginner docs/intermediate docs/advanced docs/recipes -name '*.rst'); do
for filename in $(find docs/beginner docs/intermediate docs/advanced docs/recipes docs/prototype -name '*.rst'); do
file_basename=$(basename $filename .rst)
if [[ ! " ${FILES_TO_RUN[@]} " =~ " ${file_basename} " ]]; then
rm $filename
Expand All @@ -124,7 +134,7 @@ if [[ "${JOB_BASE_NAME}" == *worker_* ]]; then
rm $filename
fi
done
for filename in $(find docs/.doctrees/beginner docs/.doctrees/intermediate docs/.doctrees/advanced docs/.doctrees/recipes -name '*.doctree'); do
for filename in $(find docs/.doctrees/beginner docs/.doctrees/intermediate docs/.doctrees/advanced docs/.doctrees/recipes docs/.doctrees/prototype -name '*.doctree'); do
file_basename=$(basename $filename .doctree)
if [[ ! " ${FILES_TO_RUN[@]} " =~ " ${file_basename} " ]]; then
rm $filename
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ We use sphinx-gallery's [notebook styled examples](https://sphinx-gallery.github
Here's how to create a new tutorial or recipe:
1. Create a notebook styled python file. If you want it executed while inserted into documentation, save the file with suffix `tutorial` so that file name is `your_tutorial.py`.
2. Put it in one of the beginner_source, intermediate_source, advanced_source based on the level. If it is a recipe, add to recipes_source.
2. For Tutorials, include it in the TOC tree at index.rst
3. For Tutorials, create a thumbnail in the [index.rst file](https://github.com/pytorch/tutorials/blob/master/index.rst) using a command like `.. customcarditem:: beginner/your_tutorial.html`. For Recipes, create a thumbnail in the [recipes_index.rst](https://github.com/pytorch/tutorials/blob/master/recipes_source/recipes_index.rst)
2. For Tutorials (except if it is a prototype feature), include it in the TOC tree at index.rst
3. For Tutorials (except if it is a prototype feature), create a thumbnail in the [index.rst file](https://github.com/pytorch/tutorials/blob/master/index.rst) using a command like `.. customcarditem:: beginner/your_tutorial.html`. For Recipes, create a thumbnail in the [recipes_index.rst](https://github.com/pytorch/tutorials/blob/master/recipes_source/recipes_index.rst)

In case you prefer to write your tutorial in jupyter, you can use [this script](https://gist.github.com/chsasank/7218ca16f8d022e02a9c0deb94a310fe) to convert the notebook to python file. After conversion and addition to the project, please make sure the sections headings etc are in logical order.

Expand Down
Binary file added _static/img/rpc-images/batch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _static/img/rpc_trace_img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _static/img/thumbnails/cropped/android.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _static/img/thumbnails/cropped/ios.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _static/img/thumbnails/cropped/mobile.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _static/img/thumbnails/cropped/profiler.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _static/img/trace_img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions advanced_source/dynamic_quantization_tutorial.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
(experimental) Dynamic Quantization on an LSTM Word Language Model
(beta) Dynamic Quantization on an LSTM Word Language Model
==================================================================

**Author**: `James Reed <https://github.com/jamesr66a>`_
Expand All @@ -13,7 +13,7 @@
to int, which can result in smaller model size and faster inference with only a small
hit to accuracy.

In this tutorial, we'll apply the easiest form of quantization -
In this tutorial, we'll apply the easiest form of quantization -
`dynamic quantization <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_ -
to an LSTM-based next word-prediction model, closely following the
`word language model <https://github.com/pytorch/examples/tree/master/word_language_model>`_
Expand Down
2 changes: 1 addition & 1 deletion advanced_source/neural_style_tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
# An important detail to note is that neural networks from the
# torch library are trained with tensor values ranging from 0 to 1. If you
# try to feed the networks with 0 to 255 tensor images, then the activated
# feature maps will be unable sense the intended content and style.
# feature maps will be unable to sense the intended content and style.
# However, pre-trained networks from the Caffe library are trained with 0
# to 255 tensor images.
#
Expand Down
159 changes: 159 additions & 0 deletions advanced_source/rpc_ddp_tutorial.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
Combining Distributed DataParallel with Distributed RPC Framework
=================================================================
**Author**: `Pritam Damania <https://github.com/pritamdamania87>`_


This tutorial uses a simple example to demonstrate how you can combine
`DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__ (DDP)
with the `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
to combine distributed data parallelism with distributed model parallelism to
train a simple model. Source code of the example can be found `here <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__.

Previous tutorials,
`Getting Started With Distributed Data Parallel <https://pytorch.org/tutorials/intermediate/ddp_tutorial.html>`__
and `Getting Started with Distributed RPC Framework <https://pytorch.org/tutorials/intermediate/rpc_tutorial.html>`__,
described how to perform distributed data parallel and distributed model
parallel training respectively. Although, there are several training paradigms
where you might want to combine these two techniques. For example:

1) If we have a model with a sparse part (large embedding table) and a dense
part (FC layers), we might want to put the embedding table on a parameter
server and replicate the FC layer across multiple trainers using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
The `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
can be used to perform embedding lookups on the parameter server.
2) Enable hybrid parallelism as described in the `PipeDream <https://arxiv.org/abs/1806.03377>`__ paper.
We can use the `Distributed RPC framework <https://pytorch.org/docs/master/rpc.html>`__
to pipeline stages of the model across multiple workers and replicate each
stage (if needed) using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.

|
In this tutorial we will cover case 1 mentioned above. We have a total of 4
workers in our setup as follows:


1) 1 Master, which is responsible for creating an embedding table
(nn.EmbeddingBag) on the parameter server. The master also drives the
training loop on the two trainers.
2) 1 Parameter Server, which basically holds the embedding table in memory and
responds to RPCs from the Master and Trainers.
3) 2 Trainers, which store an FC layer (nn.Linear) which is replicated amongst
themselves using `DistributedDataParallel <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
The trainers are also responsible for executing the forward pass, backward
pass and optimizer step.

|
The entire training process is executed as follows:

1) The master creates an embedding table on the Parameter Server and holds an
`RRef <https://pytorch.org/docs/master/rpc.html#rref>`__ to it.
2) The master, then kicks off the training loop on the trainers and passes the
embedding table RRef to the trainers.
3) The trainers create a ``HybridModel`` which first performs an embedding lookup
using the embedding table RRef provided by the master and then executes the
FC layer which is wrapped inside DDP.
4) The trainer executes the forward pass of the model and uses the loss to
execute the backward pass using `Distributed Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__.
5) As part of the backward pass, the gradients for the FC layer are computed
first and synced to all trainers via allreduce in DDP.
6) Next, Distributed Autograd propagates the gradients to the parameter server,
where the gradients for the embedding table are updated.
7) Finally, the `Distributed Optimizer <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__ is used to update all the parameters.


.. attention::

You should always use `Distributed Autograd <https://pytorch.org/docs/master/rpc.html#distributed-autograd-framework>`__
for the backward pass if you're combining DDP and RPC.


Now, let's go through each part in detail. Firstly, we need to setup all of our
workers before we can perform any training. We create 4 processes such that
ranks 0 and 1 are our trainers, rank 2 is the master and rank 3 is the
parameter server.

We initialize the RPC framework on all 4 workers using the TCP init_method.
Once RPC initialization is done, the master creates an `EmbeddingBag <https://pytorch.org/docs/master/generated/torch.nn.EmbeddingBag.html>`__
on the Parameter Server using `rpc.remote <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.remote>`__.
The master then loops through each trainer and kicks of the training loop by
calling ``_run_trainer`` on each trainer using `rpc_async <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.rpc_async>`__.
Finally, the master waits for all training to finish before exiting.

The trainers first initialize a ``ProcessGroup`` for DDP with world_size=2
(for two trainers) using `init_process_group <https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group>`__.
Next, they initialize the RPC framework using the TCP init_method. Note that
the ports are different in RPC initialization and ProcessGroup initialization.
This is to avoid port conflicts between initialization of both frameworks.
Once the initialization is done, the trainers just wait for the ``_run_trainer``
RPC from the master.

The parameter server just initializes the RPC framework and waits for RPCs from
the trainers and master.


.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
:language: py
:start-after: BEGIN run_worker
:end-before: END run_worker

Before we discuss details of the Trainer, let's introduce the ``HybridModel`` that
the trainer uses. As described below, the ``HybridModel`` is initialized using an
RRef to the embedding table (emb_rref) on the parameter server and the ``device``
to use for DDP. The initialization of the model wraps an
`nn.Linear <https://pytorch.org/docs/master/generated/torch.nn.Linear.html>`__
layer inside DDP to replicate and synchronize this layer across all trainers.

The forward method of the model is pretty straightforward. It performs an
embedding lookup on the parameter server using an
`RRef helper <https://pytorch.org/docs/master/rpc.html#torch.distributed.rpc.RRef.rpc_sync>`__
and passes its output onto the FC layer.


.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
:language: py
:start-after: BEGIN hybrid_model
:end-before: END hybrid_model

Next, let's look at the setup on the Trainer. The trainer first creates the
``HybridModel`` described above using an RRef to the embedding table on the
parameter server and its own rank.

Now, we need to retrieve a list of RRefs to all the parameters that we would
like to optimize with `DistributedOptimizer <https://pytorch.org/docs/master/rpc.html#module-torch.distributed.optim>`__.
To retrieve the parameters for the embedding table from the parameter server,
we define a simple helper function ``_retrieve_embedding_parameters``, which
basically walks through all the parameters for the embedding table and returns
a list of RRefs. The trainer calls this method on the parameter server via RPC
to receive a list of RRefs to the desired parameters. Since the
DistributedOptimizer always takes a list of RRefs to parameters that need to
be optimized, we need to create RRefs even for the local parameters for our
FC layers. This is done by walking ``model.parameters()``, creating an RRef for
each parameter and appending it to a list. Note that ``model.parameters()`` only
returns local parameters and doesn't include ``emb_rref``.

Finally, we create our DistributedOptimizer using all the RRefs and define a
CrossEntropyLoss function.

.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
:language: py
:start-after: BEGIN setup_trainer
:end-before: END setup_trainer

Now we're ready to introduce the main training loop that is run on each trainer.
``get_next_batch`` is just a helper function to generate random inputs and
targets for training. We run the training loop for multiple epochs and for each
batch:

1) Setup a `Distributed Autograd Context <https://pytorch.org/docs/master/rpc.html#torch.distributed.autograd.context>`__
for Distributed Autograd.
2) Run the forward pass of the model and retrieve its output.
3) Compute the loss based on our outputs and targets using the loss function.
4) Use Distributed Autograd to execute a distributed backward pass using the loss.
5) Finally, run a Distributed Optimizer step to optimize all the parameters.

.. literalinclude:: ../advanced_source/rpc_ddp_tutorial/main.py
:language: py
:start-after: BEGIN run_trainer
:end-before: END run_trainer
.. code:: python

Source code for the entire example can be found `here <https://github.com/pytorch/examples/tree/master/distributed/rpc/ddp_rpc>`__.
Loading