Skip to content

Commit 9ecc44a

Browse files
authored
Fix TF32 convergence issue with TF32 (#1244)
* Fix TF32 convergence issue with TF32 * save
1 parent e6167a9 commit 9ecc44a

File tree

2 files changed

+18
-2
lines changed

2 files changed

+18
-2
lines changed

beginner_source/examples_autograd/two_layer_net_autograd.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,15 @@
1818

1919
dtype = torch.float
2020
device = torch.device("cpu")
21-
# device = torch.device("cuda:0") # Uncomment this to run on GPU
21+
# device = torch.device("cuda:0") # Uncomment this to run on GPU
22+
# torch.backends.cuda.matmul.allow_tf32 = False # Uncomment this to run on GPU
23+
24+
# The above line disables TensorFloat32. This a feature that allows
25+
# networks to run at a much faster speed while sacrificing precision.
26+
# Although TensorFloat32 works well on most real models, for our toy model
27+
# in this tutorial, the sacrificed precision causes convergence issue.
28+
# For more information, see:
29+
# https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices
2230

2331
# N is batch size; D_in is input dimension;
2432
# H is hidden dimension; D_out is output dimension.

beginner_source/examples_autograd/two_layer_net_custom_function.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,15 @@ def backward(ctx, grad_output):
4848

4949
dtype = torch.float
5050
device = torch.device("cpu")
51-
# device = torch.device("cuda:0") # Uncomment this to run on GPU
51+
# device = torch.device("cuda:0") # Uncomment this to run on GPU
52+
# torch.backends.cuda.matmul.allow_tf32 = False # Uncomment this to run on GPU
53+
54+
# The above line disables TensorFloat32. This a feature that allows
55+
# networks to run at a much faster speed while sacrificing precision.
56+
# Although TensorFloat32 works well on most real models, for our toy model
57+
# in this tutorial, the sacrificed precision causes convergence issue.
58+
# For more information, see:
59+
# https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices
5260

5361
# N is batch size; D_in is input dimension;
5462
# H is hidden dimension; D_out is output dimension.

0 commit comments

Comments
 (0)