Skip to content

Commit f482f69

Browse files
Update 2020-10-26-1.7-released.md
1 parent 86b28b4 commit f482f69

File tree

1 file changed

+30
-30
lines changed

1 file changed

+30
-30
lines changed

_posts/2020-10-26-1.7-released.md

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ Today, we’re announcing the availability of PyTorch 1.7, along with updated do
88

99
A few of the highlights include:
1010

11-
* 1- CUDA 11 is now officially supported with binaries available at [PyTorch.org](http://pytorch.org/)
12-
* 2- Updates and additions to profiling and performance for RPC, TorchScript, Stack traces and Benchmark utilities
13-
* 3- (Beta) Support for NumPy compatible Fast Fourier transforms (FFT) via torch.fft
14-
* 4- (Prototype) Support for Nvidia A100 generation GPUs and native TF32 format
15-
* 5- (Prototype) Distributed training on Windows now supported
11+
1. CUDA 11 is now officially supported with binaries available at [PyTorch.org](http://pytorch.org/)
12+
2. Updates and additions to profiling and performance for RPC, TorchScript, Stack traces and Benchmark utilities
13+
3. (Beta) Support for NumPy compatible Fast Fourier transforms (FFT) via torch.fft
14+
4. (Prototype) Support for Nvidia A100 generation GPUs and native TF32 format
15+
5. (Prototype) Distributed training on Windows now supported
1616

1717
To reiterate, starting [PyTorch 1.6](https://pytorch.org/blog/pytorch-feature-classification-changes/), features are now classified as stable, beta and prototype. You can see the detailed announcement [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). Note that the prototype features listed in this blog are available as part of this release.
1818

@@ -64,8 +64,8 @@ Note that this is necessary, **but not sufficient**, for determinism **within a
6464

6565
See the documentation for ```torch.set_deterministic(bool)``` for the list of affected operations.
6666

67-
* RFC | [Link](https://github.com/pytorch/pytorch/issues/15359)
68-
* Documentation | [Link](https://pytorch.org/docs/stable/generated/torch.set_deterministic.html)
67+
* RFC ([Link](https://github.com/pytorch/pytorch/issues/15359))
68+
* Documentation ([Link](https://pytorch.org/docs/stable/generated/torch.set_deterministic.html))
6969

7070
# Performance & Profiling
7171

@@ -121,14 +121,14 @@ n = 33554432
121121
0.89 ns / element
122122
```
123123

124-
Documentation | Link **Missing link**
124+
Documentation (Link) **Missing link**
125125

126126
## [Beta] Stack traces added to profiler
127127

128128
Users can now see not only operator name/inputs in the profiler output table but also where the operator is in the code. The workflow requires very little change to take advantage of this capability. The user uses the [autograd profiler](https://pytorch.org/docs/stable/autograd.html#profiler) as before but with optional new parameters: ```with_stack``` and ```group_by_stack_n.```
129129

130-
* [Details Link](https://github.com/pytorch/pytorch/pull/43898/)
131-
* [Documentation Link](https://pytorch.org/docs/stable/autograd.html)
130+
* Detail ([Link](https://github.com/pytorch/pytorch/pull/43898/))
131+
* Documentation ([Link](https://pytorch.org/docs/stable/autograd.html))
132132

133133
# Distributed Training & RPC
134134

@@ -138,21 +138,21 @@ Torchelastic offers a strict superset of the current ```torch.distributed.launch
138138

139139
By bundling ```torchelastic``` in the same docker image as PyTorch, users can start experimenting with torchelastic right-away without having to separately install ```torchelastic```. In addition to convenience, this work is a nice-to-have when adding support for elastic parameters in the existing Kubeflow’s distributed PyTorch operators.
140140

141-
* Usage examples and how to get started | [Link](https://pytorch.org/elastic/0.2.0/examples.html)
141+
* Usage examples and how to get started ([Link](https://pytorch.org/elastic/0.2.0/examples.html))
142142

143143
## [Beta] Support for uneven dataset inputs in DDP
144144

145145
PyTorch 1.7 introduces a new context manager to be used in conjunction with models trained using ```torch.nn.parallel.DistributedDataParallel``` to enable training with uneven dataset size across different processes. This feature enables greater flexibility when using DDP and prevents the user from having to manually ensure dataset sizes are the same across different process. With this context manager, DDP will handle uneven dataset sizes automatically, which can prevent errors or hangs at the end of training.
146146

147-
* RFC | [Link](https://github.com/pytorch/pytorch/issues/38174)
148-
* Documentation | [Link](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.join)
147+
* RFC ([Link](https://github.com/pytorch/pytorch/issues/38174))
148+
* Documentation ([Link](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.join))
149149

150150
## [Beta] NCCL Reliability - Async Error/Timeout Handling
151151

152152
In the past, NCCL training runs would hang indefinitely due to stuck collectives, leading to a very unpleasant experience for users. This feature will abort stuck collectives and throw an exception/crash the process if a potential hang is detected. When used with something like torchelastic (which can recover the training process from the last checkpoint), users can have much greater reliability for distributed training. This feature is completely opt in and sits behind an environment variable that needs to be explicitly set in order to enable this feature (otherwise users will see the same behavior as before).
153153

154-
* Documentation | Link **Missing Link**
155-
* Usage examples | Link **Missing Link**
154+
* Documentation (Link) **Missing Link**
155+
* Usage examples (Link) **Missing Link**
156156

157157
## [Beta] TorchScript rpc_remote and rpc_sync
158158

@@ -187,9 +187,9 @@ def script_rpc_remote_call(
187187
rref_res = rpc.remote(dst_worker_name, two_args_two_kwargs, args, kwargs)
188188
return rref_res.to_here()
189189
```
190-
* Design doc | Link **Missing Link**
191-
* Documentation | Link **Missing Link**
192-
* Usage examples | Link **Missing Link**
190+
* Design doc (Link) **Missing Link**
191+
* Documentation (Link) **Missing Link**
192+
* Usage examples (Link) **Missing Link**
193193

194194
## [Beta] Distributed optimizer with TorchScript support
195195

@@ -199,9 +199,9 @@ In PyTorch 1.7, we are enabling the TorchScript support in distributed optimizer
199199

200200
Currently, the only optimizer that supports automatic conversion with TorchScript is ```Adagrad```, all other optimizers will still work as before without TorchScript support. We are working on expanding the coverage to all PyTorch optimizers.
201201

202-
* Design doc | Link **Missing Link**
203-
* Documentation | Link **Missing Link**
204-
* Usage examples | Link **Missing Link**
202+
* Design doc (Link) **Missing Link**
203+
* Documentation (Link) **Missing Link**
204+
* Usage examples (Link) **Missing Link**
205205

206206
## [Beta] Enhancements to RPC-based Profiling
207207

@@ -213,15 +213,15 @@ Support for using the PyTorch profiler in conjunction with the RPC framework was
213213

214214
Users are now able to use familiar profiling tools such as with ```torch.autograd.profiler.profile()``` and ```with torch.autograd.profiler.record_function```, and this works transparently with the RPC framework with full feature support, profiles asynchronous functions, and TorchScript functions.
215215

216-
* Design doc | [Link](https://github.com/pytorch/pytorch/issues/39675)
217-
* Usage examples | [Link](https://pytorch.org/tutorials/recipes/distributed_rpc_profiling.html)
216+
* Design doc ([Link](https://github.com/pytorch/pytorch/issues/39675))
217+
* Usage examples ([Link](https://pytorch.org/tutorials/recipes/distributed_rpc_profiling.html))
218218

219219
## [Beta] DDP memory reduction
220220

221221
As of PyTorch 1.6, DDP would put an extra copy of gradient tensors in communication buckets. This incurs additional memory overhead which is equivalent to the size of the gradients. In PyTorch 1.7, we added a ```gradient_as_bucket_view``` flag to the DDP constructor API. When this flag is set to ```True```, DDP will override ```param.grad``` as views that point of communication buckets. This not only eliminates an extra in-memory copy of gradients, but also avoids the additional read/write operations to synchronize communication buckets and ```param.grad``` values.
222222

223-
* Design doc | [Link](https://github.com/pytorch/pytorch/issues/37030)
224-
* Documentation | [Link](https://pytorch.org/docs/master/generated/torch.nn.parallel.DistributedDataParallel.html?highlight=distributeddataparallel)
223+
* Design doc ([Link](https://github.com/pytorch/pytorch/issues/37030))
224+
* Documentation ([Link](https://pytorch.org/docs/master/generated/torch.nn.parallel.DistributedDataParallel.html?highlight=distributeddataparallel))
225225

226226
## [Prototype] Windows support for Distributed Training
227227

@@ -241,9 +241,9 @@ dist.init_process_group(
241241

242242
model = DistributedDataParallel(local_model, device_ids=[rank])
243243
```
244-
* Design doc | [Link](https://github.com/pytorch/pytorch/issues/42095)
245-
* Documentation | [Link](https://pytorch.org/docs/master/distributed.html#backends-that-come-with-pytorch)
246-
* Acknowledgement | [gunandrose4u](https://github.com/gunandrose4u)
244+
* Design doc ([Link](https://github.com/pytorch/pytorch/issues/42095))
245+
* Documentation ([Link](https://pytorch.org/docs/master/distributed.html#backends-that-come-with-pytorch))
246+
* Acknowledgement ([gunandrose4u](https://github.com/gunandrose4u))
247247

248248
# Mobile
249249

@@ -275,8 +275,8 @@ c10::CPUCachingAllocator caching_allocator;
275275

276276
**NOTE**: Caching allocator is only available on mobile builds, thus the use of caching allocator outside of mobile builds won’t be effective.
277277

278-
* Documentation | [Link](https://github.com/pytorch/pytorch/blob/master/c10/mobile/CPUCachingAllocator.h#L13-L43)
279-
* Usage examples [Link](https://github.com/pytorch/pytorch/blob/master/binaries/speed_benchmark_torch.cc#L207)
278+
* Documentation ([Link](https://github.com/pytorch/pytorch/blob/master/c10/mobile/CPUCachingAllocator.h#L13-L43))
279+
* Usage examples ([Link](https://github.com/pytorch/pytorch/blob/master/binaries/speed_benchmark_torch.cc#L207))
280280

281281
# torchvision
282282

0 commit comments

Comments
 (0)