You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: intermediate_source/pipelining_tutorial.rst
+14-10Lines changed: 14 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,7 @@ chunks. First, let us define the model:
67
67
h = layer(h, h)
68
68
69
69
h =self.norm(h) ifself.norm else h
70
-
output =self.output(h).float() ifself.output else h
70
+
output =self.output(h).clone() ifself.output else h
71
71
return output
72
72
73
73
Then, we need to import the necessary libraries in our script and initialize the distributed training process. In this case, we are defining some global variables to use
@@ -109,32 +109,29 @@ Step 1: Partition the Transformer Model
109
109
There are two different ways of partitioning the model:
110
110
111
111
First is the manual mode in which we can manually create two instances of the model by deleting portions of
112
-
attributes of the model. In this example for a 2 stage (2 ranks) the model is cut in half.
112
+
attributes of the model. In this example for 2 stages (2 ranks) the model is cut in half.
0 commit comments