Skip to content

Commit ac0e1d0

Browse files
authored
Merge pull request #489 from mrshenli/link
Adding links to recent tutorial pages
2 parents 06fe7c4 + 42b3de5 commit ac0e1d0

File tree

1 file changed

+17
-7
lines changed

1 file changed

+17
-7
lines changed

intermediate_source/model_parallel_tutorial.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@
44
*************************************************************
55
**Author**: `Shen Li <https://mrshenli.github.io/>`_
66
7-
Data parallel and model parallel are widely-used distributed training
7+
Data parallel and model parallel are widely-used in distributed training
88
techniques. Previous posts have explained how to use
99
`DataParallel <https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html>`_
1010
to train a neural network on multiple GPUs. ``DataParallel`` replicates the
1111
same model to all GPUs, where each GPU consumes a different partition of the
1212
input data. Although it can significantly accelerate the training process, it
13-
does not work for some use cases where the model is large to fit into a single
14-
GPU. This post shows how to solve that problem by using model parallel and also
15-
shares some insights on how to speed up model parallel training.
13+
does not work for some use cases where the model is too large to fit into a
14+
single GPU. This post shows how to solve that problem by using model parallel
15+
and also shares some insights on how to speed up model parallel training.
1616
1717
The high-level idea of model parallel is to place different sub-networks of a
1818
model onto different devices, and implement the ``forward`` method accordingly
@@ -23,11 +23,21 @@
2323
of model parallel. It is up to the readers to apply the ideas to real-world
2424
applications.
2525
26-
Let us start with a toy model that contains two linear layers. To run this
27-
model on two GPUs, simply put each linear layer on a different GPU, and move
28-
inputs and intermediate outputs to match the layer devices accordingly.
26+
**Recommended Reading:**
27+
28+
- https://pytorch.org/ For installation instructions
29+
- :doc:`/beginner/blitz/data_parallel_tutorial` Single-Machine Data Parallel
30+
- :doc:`/intermediate/ddp_tutorial` Combine Distributed Data Parallel and Model Parallel
2931
"""
3032

33+
######################################################################
34+
# Basic Usage
35+
# =======================
36+
#
37+
# Let us start with a toy model that contains two linear layers. To run this
38+
# model on two GPUs, simply put each linear layer on a different GPU, and move
39+
# inputs and intermediate outputs to match the layer devices accordingly.
40+
3141
import torch
3242
import torch.nn as nn
3343
import torch.optim as optim

0 commit comments

Comments
 (0)