Skip to content

Commit 8ae0a5e

Browse files
authored
Update 2020-10-26-pytorch-1.7-released.md
1 parent 735a934 commit 8ae0a5e

File tree

1 file changed

+8
-69
lines changed

1 file changed

+8
-69
lines changed

_posts/2020-10-26-pytorch-1.7-released.md

Lines changed: 8 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,7 @@ author: Team PyTorch
66

77
Today, we’re announcing the availability of PyTorch 1.7, along with updated domain libraries. The PyTorch 1.7 release includes a number of new APIs including support for NumPy-Compatible FFT operations, profiling tools and major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. In addition, several features moved to [stable](https://pytorch.org/docs/stable/index.html#pytorch-documentation) including custom C++ Classes, the memory profiler, extensions via custom tensor-like objects, user async functions in RPC and a number of other features in torch.distributed such as Per-RPC timeout, DDP dynamic bucketing and RRef helper.
88

9-
A few of the highlights include:
10-
9+
A few of the highlights include:
1110
* CUDA 11 is now officially supported with binaries available at [PyTorch.org](http://pytorch.org/)
1211
* Updates and additions to profiling and performance for RPC, TorchScript and Stack traces in the autograd profiler
1312
* (Beta) Support for NumPy compatible Fast Fourier transforms (FFT) via torch.fft
@@ -18,22 +17,19 @@ A few of the highlights include:
1817
* (Stable) Native image I/O for JPEG and PNG formats
1918
* (Beta) New Video Reader API
2019
* torchaudio
21-
* (Stable) Added support for speech rec (wav2letter), text to speech (WaveRNN) and source separation (ConvTasNet)
20+
* (Stable) Added support for speech rec (wav2letter), text to speech (WaveRNN) and source separation (ConvTasNet)
2221

2322
To reiterate, starting PyTorch 1.6, features are now classified as stable, beta and prototype. You can see the detailed announcement [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). Note that the prototype features listed in this blog are available as part of this release.
2423

2524
Find the full release notes [here](https://github.com/pytorch/pytorch/releases).
2625

2726
# Front End APIs
28-
2927
## [Beta] NumPy Compatible torch.fft module
30-
3128
FFT-related functionality is commonly used in a variety of scientific fields like signal processing. While PyTorch has historically supported a few FFT-related functions, the 1.7 release adds a new torch.fft module that implements FFT-related functions with the same API as NumPy.
3229

3330
This new module must be imported to be used in the 1.7 release, since its name conflicts with the historic (and now deprecated) torch.fft function.
3431

3532
**Example usage:**
36-
3733
```python
3834
>>> import torch.fft
3935
>>> t = torch.arange(4)
@@ -51,70 +47,52 @@ tensor([12.+16.j, -8.+0.j, -4.-4.j, 0.-8.j])
5147
* [Documentation](https://pytorch.org/docs/stable/fft.html#torch-fft)
5248

5349
## [Beta] C++ Support for Transformer NN Modules
54-
5550
Since [PyTorch 1.5](https://pytorch.org/blog/pytorch-1-dot-5-released-with-new-and-updated-apis/), we’ve continued to maintain parity between the python and C++ frontend APIs. This update allows developers to use the nn.transformer module abstraction from the C++ Frontend. And moreover, developers no longer need to save a module from python/JIT and load into C++ as it can now be used it in C++ directly.
56-
57-
* [Documentation] (https://pytorch.org/cppdocs/api/classtorch_1_1nn_1_1_transformer_impl.html#_CPPv4N5torch2nn15TransformerImplE))
51+
* [Documentation](https://pytorch.org/cppdocs/api/classtorch_1_1nn_1_1_transformer_impl.html#_CPPv4N5torch2nn15TransformerImplE)
5852

5953
## [Beta] torch.set_deterministic
60-
6154
Reproducibility (bit-for-bit determinism) may help identify errors when debugging or testing a program. To facilitate reproducibility, PyTorch 1.7 adds the ```torch.set_deterministic(bool)``` function that can direct PyTorch operators to select deterministic algorithms when available, and to throw a runtime error if an operation may result in nondeterministic behavior. By default, the flag this function controls is false and there is no change in behavior, meaning PyTorch may implement its operations nondeterministically by default.
6255

6356
More precisely, when this flag is true:
64-
6557
* Operations known to not have a deterministic implementation throw a runtime error;
6658
* Operations with deterministic variants use those variants (usually with a performance penalty versus the non-deterministic version); and
6759
* ```torch.backends.cudnn.deterministic = True``` is set.
6860

6961
Note that this is necessary, **but not sufficient**, for determinism **within a single run of a PyTorch program**. Other sources of randomness like random number generators, unknown operations, or asynchronous or distributed computation may still cause nondeterministic behavior.
7062

7163
See the documentation for ```torch.set_deterministic(bool)``` for the list of affected operations.
72-
7364
* [RFC](https://github.com/pytorch/pytorch/issues/15359)
7465
* [Documentation](https://pytorch.org/docs/stable/generated/torch.set_deterministic.html)
7566

7667
# Performance & Profiling
77-
78-
## [Beta] Stack traces added to profiler
79-
80-
Users can now see not only operator name/inputs in the profiler output table but also where the operator is in the code. The workflow requires very little change to take advantage of this capability. The user uses the [autograd profiler](https://pytorch.org/docs/stable/autograd.html#profiler) as before but with optional new parameters: ```with_stack``` and ```group_by_stack_n```. Caution: regular profiling runs should not use this feature as it adds significant overhead.
81-
68+
## [Beta] Stack traces added to profiler
69+
Users can now see not only operator name/inputs in the profiler output table but also where the operator is in the code. The workflow requires very little change to take advantage of this capability. The user uses the [autograd profiler](https://pytorch.org/docs/stable/autograd.html#profiler) as before but with optional new parameters: ```with_stack``` and ```group_by_stack_n```. Caution: regular profiling runs should not use this feature as it adds significant overhead.
8270
* [Detail](https://github.com/pytorch/pytorch/pull/43898/)
8371
* [Documentation](https://pytorch.org/docs/stable/autograd.html)
8472

8573
# Distributed Training & RPC
86-
8774
## [Stable] TorchElastic now bundled into PyTorch docker image
88-
8975
Torchelastic offers a strict superset of the current ```torch.distributed.launch``` CLI with the added features for fault-tolerance and elasticity. If the user is not be interested in fault-tolerance, they can get the exact functionality/behavior parity by setting ```max_restarts=0``` with the added convenience of auto-assigned ```RANK``` and ```MASTER_ADDR|PORT``` (versus manually specified in ```torch.distributed.launch)```.
9076

9177
By bundling ```torchelastic``` in the same docker image as PyTorch, users can start experimenting with TorchElastic right-away without having to separately install ```torchelastic```. In addition to convenience, this work is a nice-to-have when adding support for elastic parameters in the existing Kubeflow’s distributed PyTorch operators.
92-
9378
* [Usage examples and how to get started](https://pytorch.org/elastic/0.2.0/examples.html)
9479

9580
## [Beta] Support for uneven dataset inputs in DDP
96-
9781
PyTorch 1.7 introduces a new context manager to be used in conjunction with models trained using ```torch.nn.parallel.DistributedDataParallel``` to enable training with uneven dataset size across different processes. This feature enables greater flexibility when using DDP and prevents the user from having to manually ensure dataset sizes are the same across different process. With this context manager, DDP will handle uneven dataset sizes automatically, which can prevent errors or hangs at the end of training.
98-
9982
* [RFC](https://github.com/pytorch/pytorch/issues/38174)
10083
* [Documentation](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.join)
10184

10285
## [Beta] NCCL Reliability - Async Error/Timeout Handling
103-
10486
In the past, NCCL training runs would hang indefinitely due to stuck collectives, leading to a very unpleasant experience for users. This feature will abort stuck collectives and throw an exception/crash the process if a potential hang is detected. When used with something like torchelastic (which can recover the training process from the last checkpoint), users can have much greater reliability for distributed training. This feature is completely opt-in and sits behind an environment variable that needs to be explicitly set in order to enable this functionality (otherwise users will see the same behavior as before).
105-
106-
* [Documentation](https://pytorch.org/docs/stable/distributed.html?highlight=init_process_group#torch.distributed.init_process_group)
10787
* [RFC](https://github.com/pytorch/pytorch/issues/46874)
88+
* [Documentation](https://pytorch.org/docs/stable/distributed.html?highlight=init_process_group#torch.distributed.init_process_group)
10889

10990
## [Beta] TorchScript ```rpc_remote``` and ```rpc_sync```
110-
11191
```torch.distributed.rpc.rpc_async``` has been available in TorchScript in prior releases. For PyTorch 1.7, this functionality will be extended the remaining two core RPC APIs, ```torch.distributed.rpc.rpc_sync``` and ```torch.distributed.rpc.remote```. This will complete the major RPC APIs targeted for support in TorchScript, it allows users to use the existing python RPC APIs within TorchScript (in a script function or script method, which releases the python Global Interpreter Lock) and could possibly improve application performance in multithreaded environment.
112-
11392
* [Documentation](https://pytorch.org/docs/stable/rpc.html#rpc)
11493
* [Usage examples](https://github.com/pytorch/pytorch/blob/58ed60c259834e324e86f3e3118e4fcbbfea8dd1/torch/testing/_internal/distributed/rpc/jit/rpc_test.py#L505-L525)
11594

11695
## [Beta] Distributed optimizer with TorchScript support
117-
11896
PyTorch provides a broad set of optimizers for training algorithms, and these have been used repeatedly as part of the python API. However, users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency in the context of large scale distributed training (e.g. Distributed Model Parallel) or any RPC-based training application). Users couldn’t do this with with distributed optimizer before because we need to get rid of the python Global Interpreter Lock (GIL) limitation to achieve this.
11997

12098
In PyTorch 1.7, we are enabling the TorchScript support in distributed optimizer to remove the GIL, and make it possible to run optimizer in multithreaded applications. The new distributed optimizer has the exact same interface as before but it automatically converts optimizers within each worker into TorchScript to make each GIL free. This is done by leveraging a functional optimizer concept and allowing the distributed optimizer to convert the computational portion of the optimizer into TorchScript. This will help use cases like distributed model parallel training and improve performance using multithreading.
@@ -145,25 +123,20 @@ with dist_autograd.context() as context_id:
145123
)
146124
dist_optim.step(context_id)
147125
```
148-
149-
* [Documentation](https://pytorch.org/docs/stable/rpc.html#module-torch.distributed.optim)
150126
* [RFC](https://github.com/pytorch/pytorch/issues/46883)
127+
* [Documentation](https://pytorch.org/docs/stable/rpc.html#module-torch.distributed.optim)
151128

152129
## [Beta] Enhancements to RPC-based Profiling
153-
154130
Support for using the PyTorch profiler in conjunction with the RPC framework was first introduced in PyTorch 1.6. In PyTorch 1.7, the following enhancements have been made:
155-
156131
* Implemented better support for profiling TorchScript functions over RPC
157132
* Achieved parity in terms of profiler features that work with RPC
158133
* Added support for asynchronous RPC functions on the server-side (functions decorated with ```rpc.functions.async_execution)```.
159134

160135
Users are now able to use familiar profiling tools such as with ```torch.autograd.profiler.profile()``` and ```with torch.autograd.profiler.record_function```, and this works transparently with the RPC framework with full feature support, profiles asynchronous functions, and TorchScript functions.
161-
162136
* [Design doc](https://github.com/pytorch/pytorch/issues/39675)
163137
* [Usage examples](https://pytorch.org/tutorials/recipes/distributed_rpc_profiling.html)
164138

165139
## [Prototype] Windows support for Distributed Training
166-
167140
PyTorch 1.7 brings prototype support for ```DistributedDataParallel``` and collective communications on the Windows platform. In this release, the support only covers Gloo-based ```ProcessGroup``` and ```FileStore```.
168141

169142
To use this feature across multiple machines, please provide a file from a shared file system in ```init_process_group```.
@@ -186,13 +159,10 @@ model = DistributedDataParallel(local_model, device_ids=[rank])
186159
* Acknowledgement ([gunandrose4u](https://github.com/gunandrose4u))
187160

188161
# Mobile
189-
190162
PyTorch Mobile supports both [iOS](https://pytorch.org/mobile/ios) and [Android](https://pytorch.org/mobile/android/) with binary packages available in [Cocoapods](https://cocoapods.org/) and [JCenter](https://mvnrepository.com/repos/jcenter) respectively. You can learn more about PyTorch Mobile [here](https://pytorch.org/mobile/home/).
191163

192164
## [Beta] PyTorch Mobile Caching allocator for performance improvements
193-
194165
On some mobile platforms, such as Pixel, we observed that memory is returned to the system more aggressively. This results in frequent page faults as PyTorch being a functional framework does not maintain state for the operators. Thus outputs are allocated dynamically on each execution of the op, for the most ops. To ameliorate performance penalties due to this, PyTorch 1.7 provides a simple caching allocator for CPU. The allocator caches allocations by tensor sizes and, is currently, available only via the PyTorch C++ API. The caching allocator itself is owned by client and thus the lifetime of the allocator is also maintained by client code. Such a client owned caching allocator can then be used with scoped guard, ```c10::WithCPUCachingAllocatorGuard```, to enable the use of cached allocation within that scope.
195-
196166
**Example usage:**
197167

198168
```python
@@ -212,16 +182,12 @@ c10::CPUCachingAllocator caching_allocator;
212182
}
213183
...
214184
```
215-
216185
**NOTE**: Caching allocator is only available on mobile builds, thus the use of caching allocator outside of mobile builds won’t be effective.
217-
218186
* [Documentation](https://github.com/pytorch/pytorch/blob/master/c10/mobile/CPUCachingAllocator.h#L13-L43)
219187
* [Usage examples](https://github.com/pytorch/pytorch/blob/master/binaries/speed_benchmark_torch.cc#L207)
220188

221189
# torchvision
222-
223190
## [Stable] Transforms now support Tensor inputs, batch computation, GPU, and TorchScript
224-
225191
torchvision transforms are now inherited from ```nn.Module``` and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimensions and work seamlessly on CPU/GPU devices:
226192
```python
227193
import torch
@@ -253,20 +219,15 @@ out_image_batched = transforms(batched_image)
253219
# and has torchscript support
254220
out_image2 = scripted_transforms(tensor_image)
255221
```
256-
257222
These improvements enable the following new features:
258-
259223
* support for GPU acceleration
260224
* batched transformations e.g. as needed for videos
261225
* transform multi-band torch tensor images (with more than 3-4 channels)
262226
* torchscript transforms together with your model for deployment
263-
264227
**Note:** Exceptions for TorchScript support includes ```Compose```, ```RandomChoice```, ```RandomOrder```, ```Lambda``` and those applied on PIL images, such as ```ToPILImage```.
265228

266229
## [Stable] Native image IO for JPEG and PNG formats
267-
268230
torchvision 0.8.0 introduces native image reading and writing operations for JPEG and PNG formats. Those operators support TorchScript and return ```CxHxW``` tensors in ```uint8``` format, and can thus be now part of your model for deployment in C++ environments.
269-
270231
```python
271232
from torchvision.io import read_image
272233

@@ -284,13 +245,10 @@ tensor_image = decode_image(raw_data)
284245
scripted_read_image = torch.jit.script(read_image)
285246
```
286247
## [Stable] RetinaNet detection model
287-
288248
This release adds pretrained models for RetinaNet with a ResNet50 backbone from [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002), delivering improved accuracy on COCO val2017.
289249

290250
## [Beta] New Video Reader API
291-
292251
This release introduces a new video reading abstraction, which gives more fine-grained control of iteration over videos. It supports image and audio, and implements an iterator interface so that it is interoperable with other the python libraries such as itertools.
293-
294252
```python
295253
from torchvision.io import VideoReader
296254

@@ -316,40 +274,21 @@ for frame in takewhile(lambda x: x["pts"] < 5, reader):
316274
pass
317275
```
318276
**Notes:**
319-
320277
* In order to use the Video Reader API beta, you must compile torchvision from source and have ffmpeg installed in your system.
321278
* The VideoReader API is currently released as beta and its API may change following user feedback.
322279

323280
# torchaudio
324-
325281
With this release, torchaudio is expanding its support for models and [end-to-end applications](https://github.com/pytorch/audio/tree/master/examples), adding a wav2letter training pipeline and end-to-end text-to-speech and source separation pipelines. Please file an issue on [github](https://github.com/pytorch/audio/issues/new?template=questions-help-support.md) to provide feedback on them.
326282

327283
## [Stable] Speech Recognition
328-
329284
Building on the addition of the wav2letter model for speech recognition in the last release, we’ve now added an [example wav2letter training pipeline](https://github.com/pytorch/audio/tree/master/examples/pipeline_wav2letter) with the LibriSpeech dataset.
330285

331286
## [Stable] Text-to-speech
332-
333287
With the goal of supporting text-to-speech applications, we added a vocoder based on the WaveRNN model, based on the implementation from [this repository](https://github.com/fatchord/WaveRNN). The original implementation was introduced in "Efficient Neural Audio Synthesis". We also provide an [example WaveRNN training pipeline](https://github.com/pytorch/audio/tree/master/examples/pipeline_wavernn) that uses the LibriTTS dataset added to torchaudio in this release.
334288

335289
## [Stable] Source Separation
336-
337290
With the addition of the ConvTasNet model, based on the paper "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation," torchaudio now also supports source separation. An [example ConvTasNet training pipeline](https://github.com/pytorch/audio/tree/master/examples/source_separation) is provided with the wsj-mix dataset.
338291

339-
# Additional updates
340-
341-
## PyTorch Developer Day, November 12
342-
343-
Kicking off this November, we plan to host two separate virtual PyTorch events: one for developers and users to discuss PyTorch’s future development called “Developer Day” and another for the entire PyTorch ecosystem to showcase their work, network and collaborate called “Ecosystem Day” (scheduled for early 2021).
344-
345-
The PyTorch Developer Day takes place on November 12, 2020 PST with a full day of technical talks, project deep dives, and a networking event. The talks will be available to the public and the following networking event requires registration (Space is limited).
346-
347-
* YouTube Premiere Link
348-
* Facebook Watch Link
349-
* Networking event registration
350-
351292
Cheers!
352-
Team PyTorch
353-
354-
355293

294+
Team PyTorch

0 commit comments

Comments
 (0)