File tree Expand file tree Collapse file tree 4 files changed +22
-14
lines changed Expand file tree Collapse file tree 4 files changed +22
-14
lines changed Original file line number Diff line number Diff line change @@ -22,7 +22,8 @@ matrix:
22
22
# - intermediate_source/nvfuser_intro_tutorial.py
23
23
# - intermediate_source/parametrizations.py
24
24
# - intermediate_source/per_sample_grads.py
25
- - intermediate_source/pipeline_tutorial.py
25
+ # - intermediate_source/pipeline_tutorial.py
26
+ - intermediate_source/pruning_tutorial.py
26
27
dictionary :
27
28
wordlists :
28
29
- en-wordlist.txt
Original file line number Diff line number Diff line change
1
+ subnetworks
2
+ sparsify
3
+ LeCun
4
+ prepruned
5
+ dimensionality
6
+ unpruned
1
7
RPC
2
8
multihead
3
9
GPU's
Original file line number Diff line number Diff line change 35
35
# As a result, our focus is on ``nn.TransformerEncoder`` and we split the model
36
36
# such that half of the ``nn.TransformerEncoderLayer`` are on one GPU and the
37
37
# other half are on another. To do this, we pull out the ``Encoder`` and
38
- # ``Decoder`` sections into seperate modules and then build an ``nn.Sequential``
38
+ # ``Decoder`` sections into separate modules and then build an ``nn.Sequential``
39
39
# representing the original Transformer module.
40
40
41
41
import sys
@@ -134,16 +134,17 @@ def forward(self, x):
134
134
# length 6:
135
135
#
136
136
# .. math::
137
- # \begin{bmatrix}
138
- # \text{A} & \text{B} & \text{C} & \ldots & \text{X} & \text{Y} & \text{Z}
139
- # \end{bmatrix}
140
- # \Rightarrow
141
- # \begin{bmatrix}
142
- # \begin{bmatrix}\text{A} \\ \text{B} \\ \text{C} \\ \text{D} \\ \text{E} \\ \text{F}\end{bmatrix} &
143
- # \begin{bmatrix}\text{G} \\ \text{H} \\ \text{I} \\ \text{J} \\ \text{K} \\ \text{L}\end{bmatrix} &
144
- # \begin{bmatrix}\text{M} \\ \text{N} \\ \text{O} \\ \text{P} \\ \text{Q} \\ \text{R}\end{bmatrix} &
145
- # \begin{bmatrix}\text{S} \\ \text{T} \\ \text{U} \\ \text{V} \\ \text{W} \\ \text{X}\end{bmatrix}
146
- # \end{bmatrix}
137
+ #
138
+ # \begin{bmatrix}
139
+ # \text{A} & \text{B} & \text{C} & \ldots & \text{X} & \text{Y} & \text{Z}
140
+ # \end{bmatrix}
141
+ # \Rightarrow
142
+ # \begin{bmatrix}
143
+ # \begin{bmatrix}\text{A} \\ \text{B} \\ \text{C} \\ \text{D} \\ \text{E} \\ \text{F}\end{bmatrix} &
144
+ # \begin{bmatrix}\text{G} \\ \text{H} \\ \text{I} \\ \text{J} \\ \text{K} \\ \text{L}\end{bmatrix} &
145
+ # \begin{bmatrix}\text{M} \\ \text{N} \\ \text{O} \\ \text{P} \\ \text{Q} \\ \text{R}\end{bmatrix} &
146
+ # \begin{bmatrix}\text{S} \\ \text{T} \\ \text{U} \\ \text{V} \\ \text{W} \\ \text{X}\end{bmatrix}
147
+ # \end{bmatrix}
147
148
#
148
149
# These columns are treated as independent by the model, which means that
149
150
# the dependence of ``G`` and ``F`` can not be learned, but allows more
Original file line number Diff line number Diff line change @@ -339,8 +339,8 @@ def forward(self, x):
339
339
# pruning this technique implements (supported options are ``global``,
340
340
# ``structured``, and ``unstructured``). This is needed to determine
341
341
# how to combine masks in the case in which pruning is applied
342
- # iteratively. In other words, when pruning a pre-pruned parameter,
343
- # the current prunining techique is expected to act on the unpruned
342
+ # iteratively. In other words, when pruning a prepruned parameter,
343
+ # the current pruning technique is expected to act on the unpruned
344
344
# portion of the parameter. Specifying the ``PRUNING_TYPE`` will
345
345
# enable the ``PruningContainer`` (which handles the iterative
346
346
# application of pruning masks) to correctly identify the slice of the
You can’t perform that action at this time.
0 commit comments