- {% for post in posts %}
diff --git a/_posts/2021-10-19-pytorch-1.10-main-release.md b/_posts/2021-10-19-pytorch-1.10-main-release.md new file mode 100644 index 000000000000..321f60f8e02e --- /dev/null +++ b/_posts/2021-10-19-pytorch-1.10-main-release.md @@ -0,0 +1,104 @@ +--- +layout: blog_detail +title: 'PyTorch 1.10 Release, including CUDA Graphs APIs, TorchScript improvements' +author: Team PyTorch +--- + +We are excited to announce the release of PyTorch 1.10. This release is composed of around 3,400 commits since 1.9, made by 426 contributors. We want to sincerely thank our community for continuously improving PyTorch. + +PyTorch 1.10 updates are focused on improving training and performance of PyTorch, and developer usability. The full release notes are available [here](https://github.com/pytorch/pytorch/releases/tag/v1.10.0). Highlights include: +1. CUDA Graphs APIs are integrated to reduce CPU overheads for CUDA workloads +2. New features to optimize usability and performance of TorchScript - profile-directed typing in TorchScript & LLVM-based JIT Compiler for CPUs +3. Android NNAPI support now in beta + +We are also releasing major updates to TorchAudio and TorchVision along with 1.10 as well as introducing TorchX - a new SDK for quickly building and deploying ML applications from research to production. See [this blog post](https://pytorch.org/blog/pytorch-1.10-new-library-releases/) for details. Features in PyTorch releases are classified as Stable, Beta, and Prototype. You can learn more about the definitions in [this blog post](https://pytorch.org/blog/pytorch-feature-classification-changes/). + +# Frontend APIs + +### (Stable) Python code transformations with FX + +FX provides a Pythonic platform for transforming and lowering PyTorch programs. It is a toolkit for pass writers to facilitate Python-to-Python transformation of functions and nn.Module instances. This toolkit aims to support a subset of Python language semantics—rather than the whole Python language—to facilitate ease of implementation of transforms. With 1.10, FX is moving to stable. + +You can learn more about FX in the [official documentation](https://pytorch.org/docs/master/fx.html) and [GitHub examples](https://github.com/pytorch/examples/tree/master/fx) of program transformations implemented using ```torch.fx````. + +### (Stable) *torch.special* + +A ```torch.special module```, analogous to [SciPy’s special module](https://docs.scipy.org/doc/scipy/reference/special.html), is now available in stable. The module has 30 operations, including gamma, Bessel, and error functions. Refer to this [documentation](https://pytorch.org/docs/master/special.html) for more details. + +### (Stable) nn.Module Parametrization + +```nn.Module``` parametrizaton, a feature that allows users to parametrize any parameter or buffer of an ```nn.Module``` without modifying the ```nn.Module``` itself, is available in stable. This release adds weight normalization (```weight_norm```), orthogonal parametrization (matrix constraints and part of pruning) and more flexibility when creating your own parametrization. + +Refer to this [tutorial](https://pytorch.org/tutorials/intermediate/parametrizations.html) and the general [documentation](https://pytorch.org/docs/master/generated/torch.nn.utils.parametrizations.spectral_norm.html?highlight=parametrize) for more details. + +### (Beta) *CUDA Graphs APIs Integration + +PyTorch now integrates CUDA Graphs APIs to reduce CPU overheads for CUDA workloads. + +CUDA Graphs greatly reduce the CPU overhead for CPU-bound cuda workloads and thus improve performance by increasing GPU utilization. For distributed workloads, CUDA Graphs also reduce jitter, and since parallel workloads have to wait for the slowest worker, reducing jitter improves overall parallel efficiency. + +Integration allows seamless interop between the parts of the network captured by cuda graphs, and parts of the network that cannot be captured due to graph limitations. + +Read the [note](https://pytorch.org/docs/master/notes/cuda.html#cuda-graphs) for more details and examples, and refer to the general [documentation](https://pytorch.org/docs/master/generated/torch.cuda.CUDAGraph.html#torch.cuda.CUDAGraph) for additional information. + +# Distributed Training + +### Distributed Training Releases Now in Stable + +In 1.10, there are a number of features that are moving from beta to stable in the distributed package: + +* **(Stable) Remote Module**: CThis feature allows users to operate a module on a remote worker like using a local module, where the RPCs are transparent to the user. Refer to this [documentation](https://pytorch.org/docs/master/rpc.html#remotemodule) for more details. + +* **(Stable) DDP Communication Hook**: This feature allows users to override how DDP synchronizes gradients across processes. Refer to this [documentation](https://pytorch.org/docs/master/rpc.html#remotemodule) for more details. + +* **(Stable) ZeroRedundancyOptimizer**: This feature can be used in conjunction with DistributedDataParallel to reduce the size of per-process optimizer states. With this stable release, it now can handle uneven inputs to different data-parallel workers. Check out this [tutorial](https://pytorch.org/tutorials/advanced/generic_join.html). We also improved the parameter partition algorithm to better balance memory and computation overhead across processes. Refer to this [documentation](https://pytorch.org/docs/master/distributed.optim.html) and this [tutorial](https://pytorch.org/tutorials/recipes/zero_redundancy_optimizer.html) to learn more. + +# Performance Optimization and Tooling + +### (Beta) Profile-directed typing in TorchScript + +TorchScript has a hard requirement for source code to have type annotations in order for compilation to be successful. For a long time, it was only possible to add missing or incorrect type annotations through trial and error (i.e., by fixing the type-checking errors generated by torch.jit.script one by one), which was inefficient and time consuming. + +Now, we have enabled profile directed typing for torch.jit.script by leveraging existing tools like MonkeyType, which makes the process much easier, faster, and more efficient. For more details, refer to the [documentation](https://pytorch.org/docs/1.9.0/jit.html). + +### (Beta) CPU Fusion + +In PyTorch 1.10, we've added an LLVM-based JIT compiler for CPUs that can fuse together sequences of `torch` library calls to improve performance. While we've had this capability for some time on GPUs, this release is the first time we've brought compilation to the CPU. Check out a few sample results in this notebook, + +You can check out a few performance results for yourself in this [Colab notebook](https://colab.research.google.com/drive/1xaH-L0XjsxUcS15GG220mtyrvIgDoZl6?usp=sharing). + +### (Beta) PyTorch Profiler + +The objective of PyTorch Profiler is to target the execution steps that are the most costly in time and/or memory, and visualize the workload distribution between GPUs and CPUs. PyTorch 1.10 includes the following key features: + +* **Enhanced Memory View**: This helps you understand your memory usage better. This tool will help you avoid Out of Memory errors by showing active memory allocations at various points of your program run. + +* **Enhanced Automated Recommendations**: This helps provide automated performance recommendations to help optimize your model. The tools recommend changes to batch size, TensorCore, memory reduction technologies, etc. + +* **Distributed Training**: Gloo is now supported for distributed training jobs. + +* **Correlate Operators in the Forward & Backward Pass**: This helps map the operators found in the forward pass to the backward pass, and vice versa, in a trace view. + +* **TensorCore**: This tool shows the Tensor Core (TC) usage and provides recommendations for data scientists and framework developers. + +Refer to this [documentation](https://pytorch.org/docs/stable/profiler.html) for details. Check out this [tutorial](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html) to learn how to get started with this feature. + +# PyTorch Mobile + +### (Beta) Android NNAPI Support in Beta + +Last year we [released prototype support](https://medium.com/pytorch/pytorch-mobile-now-supports-android-nnapi-e2a2aeb74534) for Android’s Neural Networks API (NNAPI). NNAPI allows Android apps to run computationally intensive neural networks on the most powerful and efficient parts of the chips that power mobile phones, including GPUs (Graphics Processing Units) and NPUs (specialized Neural Processing Units). + +Try out this feature using the [tutorial](https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.html). Please provide your feedback or ask questions on [the forum](https://discuss.pytorch.org/c/mobile/18). You can also check out [this presentation](https://www.youtube.com/watch?v=B-2spa3UCTU) to learn more. + +### (Beta) PyTorch Bundle Inputs + +PyTorch now provides a utility that allows TorchScript models to have inputs bundled directly to them. It allows users to streamline the process of passing runnable inputs with a model. These inputs can be used to actually run the model in benchmarking applications or trace the used operators in something like mobile’s upcoming tracing based selective build. Also, they could be used to just specify input shapes for certain pipelines. + +You can find a tutorial for this feature here [], and provide your feedback on the [PyTorch Discussion Forum - Mobile](https://discuss.pytorch.org/c/mobile/18). + +Thanks for reading. If you’re interested in these updates and want to join the PyTorch community, we encourage you to join the [discussion forums](https://discuss.pytorch.org/) and [open GitHub issues](https://github.com/pytorch/pytorch/issues). To get the latest news from PyTorch, follow us on [Facebook](https://www.facebook.com/pytorch/), [Twitter](https://twitter.com/PyTorch), [Medium](https://medium.com/pytorch), [YouTube](https://www.youtube.com/pytorch), or [LinkedIn](https://www.linkedin.com/company/pytorch). + +Cheers! + +Team PyTorch diff --git a/_posts/2021-10-21-pytorch-1.10-main-release.md b/_posts/2021-10-21-pytorch-1.10-main-release.md deleted file mode 100644 index 03ef2ef2021b..000000000000 --- a/_posts/2021-10-21-pytorch-1.10-main-release.md +++ /dev/null @@ -1,171 +0,0 @@ ---- -layout: blog_detail -title: 'PyTorch 1.10 Release, including CUDA Graphs APIs, TorchScript improvements' -author: Team PyTorch ---- - -We are excited to announce the release of PyTorch 1.10. This release is composed of around 3,400 commits since 1.9, made by 426 contributors. We want to sincerely thank our community for continuously improving PyTorch. - -Along with 1.9, we are also releasing major updates to the PyTorch libraries, which you can read about in [this blog post](https://pytorch.org/blog/pytorch-1.9-new-library-releases/). - -We’d like to thank the community for their support and work on this latest release. We’d especially like to thank Quansight and Microsoft for their contributions. - -Features in PyTorch releases are classified as Stable, Beta, and Prototype. You can learn more about the definitions in [this blog post](https://pytorch.org/blog/pytorch-feature-classification-changes/). - -# Frontend APIs - -### (Stable) *torch.linalg* - -In 1.9, the *torch.linalg* module is moving to a stable release. Linear algebra is essential to deep learning and scientific computing, and the *torch.linalg* module extends PyTorch’s support for it with implementations of every function from [NumPy’s linear algebra module](https://numpy.org/doc/stable/reference/routines.linalg.html) (now with support for accelerators and autograd) and more, like [*torch.linalg.matrix_norm*](https://pytorch.org/docs/1.9.0/generated/torch.linalg.matrix_norm.html?highlight=matrix_norm#torch.linalg.matrix_norm) and [*torch.linalg.householder_product*](https://pytorch.org/docs/1.9.0/generated/torch.linalg.householder_product.html?highlight=householder_product#torch.linalg.householder_product). This makes the module immediately familiar to users who have worked with NumPy. Refer to [the documentation](https://pytorch.org/docs/1.9.0/linalg.html?highlight=linalg#module-torch.linalg) here. - -We plan to publish another blog post with more details on the *torch.linalg* module next week! - -### (Stable) Complex Autograd - -The Complex Autograd feature, released as a beta in PyTorch 1.8, is now stable. Since the beta release, we have extended support for Complex Autograd for over 98% operators in PyTorch 1.9, improved testing for complex operators by adding more OpInfos, and added greater validation through TorchAudio migration to native complex tensors (refer to [this issue](https://github.com/pytorch/audio/issues/1337)). - -This feature provides users the functionality to calculate complex gradients and optimize real valued loss functions with complex variables. This is a required feature for multiple current and downstream prospective users of complex numbers in PyTorch like TorchAudio, ESPNet, Asteroid, and FastMRI. Refer to [the documentation](https://pytorch.org/docs/1.9.0/notes/autograd.html#autograd-for-complex-numbers) for more details. - -### (Stable) torch.use_deterministic_algorithms() - -To help with debugging and writing reproducible programs, PyTorch 1.9 includes a *torch.use_determinstic_algorithms* option. When this setting is enabled, operations will behave deterministically, if possible, or throw a runtime error if they might behave nondeterministically. Here are a couple examples: - -```python ->>> a = torch.randn(100, 100, 100, device='cuda').to_sparse() ->>> b = torch.randn(100, 100, 100, device='cuda') - -# Sparse-dense CUDA bmm is usually nondeterministic ->>> torch.bmm(a, b).eq(torch.bmm(a, b)).all().item() -False - ->>> torch.use_deterministic_algorithms(True) - -# Now torch.bmm gives the same result each time, but with reduced performance ->>> torch.bmm(a, b).eq(torch.bmm(a, b)).all().item() -True - -# CUDA kthvalue has no deterministic algorithm, so it throws a runtime error ->>> torch.zeros(10000, device='cuda').kthvalue(1) -RuntimeError: kthvalue CUDA does not have a deterministic implementation... -``` - -PyTorch 1.9 adds deterministic implementations for a number of indexing operations, too, including *index_add*, *index_copy*, and *index_put with accum=False*. For more details, refer to the [documentation](https://pytorch.org/docs/1.9.0/generated/torch.use_deterministic_algorithms.html?highlight=use_deterministic#torch.use_deterministic_algorithms) and [reproducibility note](https://pytorch.org/docs/1.9.0/notes/randomness.html?highlight=reproducibility). - -### (Beta) *torch.special* - -A *torch.special* module, analogous to [SciPy’s special module](https://docs.scipy.org/doc/scipy/reference/special.html), is now available in beta. This module contains many functions useful for scientific computing and working with distributions such as *iv*, *ive*, *erfcx*, *logerfc*, and *logerfcx*. Refer to [the documentation](https://pytorch.org/docs/master/special.html) for more details. - -### (Beta) nn.Module parameterization - -```nn.Module``` parameterization allows users to parametrize any parameter or buffer of an ```nn.Module``` without modifying the ```nn.Module``` itself. It allows you to constrain the space in which your parameters live without the need for special optimization methods. - -This also contains a new implementation of the ```spectral_norm``` parametrization for PyTorch 1.9. More parametrization will be added to this feature (weight_norm, matrix constraints and part of pruning) for the feature to become stable in 1.10. For more details, refer to the [documentation](https://pytorch.org/docs/1.9.0/generated/torch.nn.utils.parametrizations.spectral_norm.html?highlight=parametrize) and [tutorial](https://pytorch.org/tutorials/intermediate/parametrizations.html). - -# PyTorch Mobile - -### (Beta) Mobile Interpreter - -We are releasing Mobile Interpreter, a streamlined version of the PyTorch runtime, in beta. The Interpreter will execute PyTorch programs in edge devices, with reduced binary size footprint. - -Mobile Interpreter is one of the top requested features for PyTorch Mobile. This new release will significantly reduce binary size compared with the current on-device runtime. In order for you to get the binary size improvements with our interpreter (which can reduce the binary size up to ~75% for a typical application) follow these instructions. As an example, using Mobile Interpreter, we can reach 2.6 MB compressed with MobileNetV2 in arm64-v7a Android. With this latest release we are making it much simpler to integrate the interpreter by providing pre-built libraries for iOS and Android. - -### TorchVision Library - -Starting from 1.9, users can use the TorchVision library on their iOS/Android apps. The Torchvision library contains the C++ TorchVision ops and needs to be linked together with the main PyTorch library for iOS, for Android it can be added as a gradle dependency. This allows using TorchVision prebuilt MaskRCNN operators for object detections and segmentation. To learn more about the library, please refer to our tutorials and [demo apps](https://github.com/pytorch/android-demo-app/tree/master/D2Go). - -### Demo apps - -We are releasing a new video app based on [PyTorch Video](https://pytorchvideo.org/) library and an updated speech recognition app based on the latest torchaudio, wave2vec model. Both are available on [iOS](https://github.com/pytorch/ios-demo-app) and [Android](https://github.com/pytorch/android-demo-app). In addition, we have updated the seven Computer Vision and three Natural Language Processing demo apps, including the HuggingFace DistilBERT, and the DeiT vision transformer models, with PyTorch Mobile v1.9. With the addition of these two apps, we now offer a full suite of demo apps covering image, text, audio, and video. To get started check out our [iOS demo apps](https://github.com/pytorch/ios-demo-app) and [Android demo apps](https://github.com/pytorch/android-demo-app). - -
- -
- -# Distributed Training - -### (Beta) TorchElastic is now part of core - -[TorchElastic](https://github.com/pytorch/pytorch/issues/50621), which was open sourced over a year ago in the [pytorch/elastic](https://github.com/pytorch/elastic) github repository, is a runner and coordinator for PyTorch worker processes. Since then, it has been adopted by various distributed torch use-cases: 1) [deepspeech.pytorch](https://medium.com/pytorch/training-deepspeech-using-torchelastic-ad013539682) 2) [pytorch-lightning](https://pytorch-lightning.readthedocs.io/en/stable/advanced/multi_gpu.html#torchelastic) 3) [Kubernetes CRD](https://github.com/pytorch/elastic/blob/master/kubernetes/README.md). Now, it is part of PyTorch core. - -As its name suggests, the core function of TorcheElastic is to gracefully handle scaling events. A notable corollary of elasticity is that peer discovery and rank assignment are built into TorchElastic enabling users to run distributed training on preemptible instances without requiring a gang scheduler. As a side note, [etcd](https://etcd.io/) used to be a hard dependency of TorchElastic. With the upstream, this is no longer the case since we have added a “standalone” rendezvous based on c10d::Store. For more details, refer to the [documentation](https://pytorch.org/docs/1.9.0/distributed.elastic.html). - -### (Beta) Distributed Training Updates - -In addition to TorchElastic, there are a number of beta features available in the distributed package: - -* **(Beta) CUDA support is available in RPC**: Compared to CPU RPC and general-purpose RPC frameworks, CUDA RPC is a much more efficient way for P2P Tensor communication. It is built on top of TensorPipe which can automatically choose a communication channel for each Tensor based on Tensor device type and channel availability on both the caller and the callee. Existing TensorPipe channels cover NVLink, InfiniBand, SHM, CMA, TCP, etc. See [this recipe](https://pytorch.org/tutorials/recipes/cuda_rpc.html) for how CUDA RPC helps to attain 34x speedup compared to CPU RPC. - -* **(Beta) ZeroRedundancyOptimizer**: ZeroRedundancyOptimizer can be used in conjunction with DistributedDataParallel to reduce the size of per-process optimizer states. The idea of ZeroRedundancyOptimizer comes from [DeepSpeed/ZeRO project](https://github.com/microsoft/DeepSpeed) and [Marian](https://github.com/marian-nmt/marian-dev), where the optimizer in each process owns a shard of model parameters and their corresponding optimizer states. When running `step()`, each optimizer only updates its own parameters, and then uses collective communication to synchronize updated parameters across all processes. Refer to [this documentation](https://pytorch.org/docs/master/distributed.optim.html) and this [tutorial](https://pytorch.org/tutorials/recipes/zero_redundancy_optimizer.html) to learn more. - -* **(Beta) Support for profiling distributed collectives**: PyTorch’s profiler tools, *torch.profiler* and *torch.autograd.profiler*, are able to profile distributed collectives and point to point communication primitives including allreduce, alltoall, allgather, send/recv, etc. This is enabled for all backends supported natively by PyTorch: gloo, mpi, and nccl. This can be used to debug performance issues, analyze traces that contain distributed communication, and gain insight into performance of applications that use distributed training. To learn more, refer to [this documentation](https://pytorch.org/docs/1.9.0/distributed.html#profiling-collective-communication). - -# Performance Optimization and Tooling - -### (Stable) Freezing API - -Module Freezing is the process of inlining module parameters and attributes values as constants into the TorchScript internal representation. This allows further optimization and specialization of your program, both for TorchScript optimizations and lowering to other backends. It is used by [optimize_for_mobile API](https://github.com/pytorch/pytorch/blob/master/torch/utils/mobile_optimizer.py), ONNX, and others. - -Freezing is recommended for model deployment. It helps TorchScript JIT optimizations optimize away overhead and bookkeeping that is necessary for training, tuning, or debugging PyTorch models. It enables graph fusions that are not semantically valid on non-frozen graphs - such as fusing Conv-BN. For more details, refer to the [documentation](https://pytorch.org/docs/1.9.0/generated/torch.jit.freeze.html). - -### (Beta) PyTorch Profiler - -
- -
- -The new PyTorch Profiler graduates to beta and leverages [Kineto](https://github.com/pytorch/kineto/) for GPU profiling, TensorBoard for visualization and is now the standard across our tutorials and documentation. - -PyTorch 1.9 extends support for the new *torch.profiler* API to more builds, including Windows and Mac and is recommended in most cases instead of the previous *torch.autograd.profiler* API. The new API supports existing profiler features, integrates with CUPTI library (Linux-only) to trace on-device CUDA kernels and provides support for long-running jobs, e.g.: - -```python -def trace_handler(p): - output = p.key_averages().table(sort_by="self_cuda_time_total", row_limit=10) - print(output) - p.export_chrome_trace("/tmp/trace_" + str(p.step_num) + ".json") - -with profile( - activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], - # schedule argument specifies the iterations on which the profiler is active - schedule=torch.profiler.schedule( - wait=1, - warmup=1, - active=2), - # on_trace_ready argument specifies the handler for the traces - on_trace_ready=trace_handler -) as p: - for idx in range(8): - model(inputs) - # profiler will trace iterations 2 and 3, and then 6 and 7 (counting from zero) - p.step() -``` - -More usage examples can be found on the [profiler recipe page](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html). - -The PyTorch Profiler Tensorboard plugin has new features for: -* Distributed Training summary view with communications overview for NCCL -* GPU Utilization and SM Efficiency in Trace view and GPU operators view -* Memory Profiling view -* Jump to source when launched from Microsoft VSCode -* Ability for load traces from cloud object storage systems - -### (Beta) Inference Mode API - -Inference Mode API allows significant speed-up for inference workloads while remaining safe and ensuring no incorrect gradients can ever be computed. It offers the best possible performance when no autograd is required. For more details, refer to [the documentation for inference mode itself](https://pytorch.org/docs/1.9.0/generated/torch.inference_mode.html?highlight=inference%20mode#torch.inference_mode) and [the documentation explaining when to use it and the difference with no_grad mode](https://pytorch.org/docs/1.9.0/notes/autograd.html#locally-disabling-gradient-computation). - -### (Beta) *torch.package* - -*torch.package* is a new way to package PyTorch models in a self-contained, stable format. A package will include both the model’s data (e.g. parameters, buffers) and its code (model architecture). Packaging a model with its full set of Python dependencies, combined with a description of a conda environment with pinned versions, can be used to easily reproduce training. Representing a model in a self-contained artifact will also allow it to be published and transferred throughout a production ML pipeline while retaining the flexibility of a pure-Python representation. For more details, refer to [the documentation](https://pytorch.org/docs/1.9.0/package.html). - -### (Prototype) prepare_for_inference - -prepare_for_inference is a new prototype feature that takes in a module and performs graph-level optimizations to improve inference performance, depending on the device. It is meant to be a PyTorch-native option that requires minimal changes to user’s workflows. For more details, see [the documentation](https://github.com/pytorch/pytorch/blob/master/torch/jit/_freeze.py#L168) for the Torchscript version [here](https://github.com/pytorch/pytorch/blob/master/torch/jit/_freeze.py#L168) or the FX version [here](https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/optimization.py#L234). - -### (Prototype) Profile-directed typing in TorchScript - -TorchScript has a hard requirement for source code to have type annotations in order for compilation to be successful. For a long time, it was only possible to add missing or incorrect type annotations through trial and error (i.e., by fixing the type-checking errors generated by *torch.jit.script* one by one), which was inefficient and time consuming. Now, we have enabled profile directed typing for *torch.jit.script* by leveraging existing tools like MonkeyType, which makes the process much easier, faster, and more efficient. For more details, refer to [the documentation](https://pytorch.org/docs/1.9.0/jit.html). - -Thanks for reading. If you’re interested in these updates and want to join the PyTorch community, we encourage you to join the [discussion forums](https://discuss.pytorch.org/) and [open GitHub issues](https://github.com/pytorch/pytorch/issues). To get the latest news from PyTorch, follow us on [Facebook](https://www.facebook.com/pytorch/), [Twitter](https://twitter.com/PyTorch), [Medium](https://medium.com/pytorch), [YouTube](https://www.youtube.com/pytorch), or [LinkedIn](https://www.linkedin.com/company/pytorch). - -Cheers! - -Team PyTorch From e47c6ab288912c1e5b88b444d30381e511578526 Mon Sep 17 00:00:00 2001 From: arielmoguillansky Date: Wed, 20 Oct 2021 15:01:49 -0300 Subject: [PATCH 4/6] [PYT-637]-pyt-1.10-new-library-releases --- .../2021-10-19-pytorch-1.10-main-release.md | 2 +- ...10-19-pytorch-1.10-new-library-releases.md | 222 ++++++++++++++++++ 2 files changed, 223 insertions(+), 1 deletion(-) create mode 100644 _posts/2021-10-19-pytorch-1.10-new-library-releases.md diff --git a/_posts/2021-10-19-pytorch-1.10-main-release.md b/_posts/2021-10-19-pytorch-1.10-main-release.md index 321f60f8e02e..d38efc0dc184 100644 --- a/_posts/2021-10-19-pytorch-1.10-main-release.md +++ b/_posts/2021-10-19-pytorch-1.10-main-release.md @@ -31,7 +31,7 @@ A ```torch.special module```, analogous to [SciPy’s special module](https://do Refer to this [tutorial](https://pytorch.org/tutorials/intermediate/parametrizations.html) and the general [documentation](https://pytorch.org/docs/master/generated/torch.nn.utils.parametrizations.spectral_norm.html?highlight=parametrize) for more details. -### (Beta) *CUDA Graphs APIs Integration +### (Beta) CUDA Graphs APIs Integration PyTorch now integrates CUDA Graphs APIs to reduce CPU overheads for CUDA workloads. diff --git a/_posts/2021-10-19-pytorch-1.10-new-library-releases.md b/_posts/2021-10-19-pytorch-1.10-new-library-releases.md new file mode 100644 index 000000000000..755ff7049d96 --- /dev/null +++ b/_posts/2021-10-19-pytorch-1.10-new-library-releases.md @@ -0,0 +1,222 @@ +--- +layout: blog_detail +title: 'New Library Releases in PyTorch 1.10, including TorchX, TorchAudio, TorchVision' +author: Team PyTorch +--- + +Today, we are announcing a number of new features and improvements to PyTorch libraries, alongside the [PyTorch 1.10 release](https://pytorch.org/blog/pytorch-1.10-released/). Some highlights include: + +Some highlights include: + +* **TorchX** - a new SDK for quickly building and deploying ML applications from research & development to production. +* **TorchAudio** - Added text-to-speech pipeline, self-supervised model support, multi-channel support and MVDR beamforming module, RNN transducer (RNNT) loss function, and batch and filterbank support to `lfilter` function. See the TorchAudio release notes [here](https://github.com/pytorch/audio/releases). +* **TorchVision** - Added new RegNet and EfficientNet models, FX based feature extraction added to utilities, two new Automatic Augmentation techniques: Rand Augment and Trivial Augment, and updated training recipes. See the TorchVision release notes [here](https://github.com/pytorch/vision/releases). + + +# Introducing TorchX + +TorchX is a new SDK for quickly building and deploying ML applications from research & development to production. It offers various builtin components that encode MLOps best practices and make advanced features like distributed training and hyperparameter optimization accessible to all. + +Users can get started with TorchX 0.1 with no added setup cost since it supports popular ML schedulers and pipeline orchestrators that are already widely adopted and deployed in production. No two production environments are the same. To comply with various use cases, TorchX’s core APIs allow tons of customization at well-defined extension points so that even the most unique applications can be serviced without customizing the whole vertical stack. + +Read the [documentation](https://pytorch.org/torchx) for more details and try out this feature using this quickstart [tutorial](https://pytorch.org/torchx/latest/examples/hello_world.html). + + +# TorchAudio 0.10 + +### (Stable) Text-to-speech pipeline +TorchAudio now adds the Tacotron2 model and pretrained weights. It is now possible to build a text-to-speech pipeline with existing vocoder implementations like WaveRNN and Griffin-Lim. Building a TTS pipeline requires matching data processing and pretrained weights, which are often non-trivial to users. So TorchAudio introduces a bundle API so that constructing pipelines for specific pretrained weights is easy. The following example illustrates this. + +```python +import torchaudio + +bundle = torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH + +# Build text processor, Tacotron2 and vocoder (WaveRNN) model +processor = bundle.get_text_preprocessor() +tacotron2 = bundle.get_tacotron2() +Downloading: +100%|███████████████████████████████| 107M/107M [00:01<00:00, 87.9MB/s] +vocoder = bundle.get_vocoder() +Downloading: +100%|███████████████████████████████| 16.7M/16.7M [00:00<00:00, 78.1MB/s] + +text = "Hello World!" + +# Encode text +input, lengths = processor(text) + +# Generate (mel-scale) spectrogram +specgram, lengths, _ = tacotron2.infer(input, lengths) + +# Convert spectrogram to waveform +waveforms, lengths = vocoder(specgram, lengths) + +# Save audio +torchaudio.save('hello-world.wav', waveforms, vocoder.sample_rate) +``` + +For the details of this API please refer to [the documentation](https://pytorch.org/audio/0.10.0/pipelines#tacotron2-text-to-speech). You can also try this from [the tutorial](https://pytorch.org/tutorials/intermediate/text_to_speech_with_torchaudio_tutorial.html). + +### (Beta) Self-Supervised Model Support +TorchAudio added HuBERT model architecture and pre-trained weight support for wav2vec 2.0 and HuBERT. HuBERT and wav2vec 2.0 are novel ways for audio representation learning and they yield high accuracy when fine-tuned on downstream tasks. These models can serve as baseline in future research, therefore, TorchAudio is providing a simple way to run the model. Similar to the TTS pipeline, the pretrained weights and associated information, such as expected sample rates and output class labels (for fine-tuned weights) are put together as a bundle, so that they can be used to build pipelines. The following example illustrates this. + +```python +import torchaudio + +bundle = torchaudio.pipelines.HUBERT_ASR_LARGE + +# Build the model and load pretrained weight. +model = bundle.get_model() +Downloading: +100%|███████████████████████████████| 1.18G/1.18G [00:17<00:00, 73.8MB/s] +# Check the corresponding labels of the output. +labels = bundle.get_labels() +print(labels) +('', '', '', '', '|', 'E', 'T', 'A', 'O', 'N', 'I', 'H', 'S', 'R', 'D', 'L', 'U', 'M', 'W', 'C', 'F', 'G', 'Y', 'P', 'B', 'V', 'K', "'", 'X', 'J', 'Q', 'Z') + + # Infer the label probability distribution +waveform, sample_rate = torchaudio.load(hello-world.wav') + +emissions, _ = model(waveform) + +# Pass emission to (hypothetical) decoder +transcripts = ctc_decode(emissions, labels) +print(transcripts[0]) +HELLO WORLD +``` + +Please refer to the [documentation](https://pytorch.org/audio/0.10.0/pipelines#wav2vec-2-0-hubert-representation-learning) for more details and try out this feature using [tutorial, Google Colab, or examples]. + +### (Beta) Multi-channel support and MVDR beamforming +Far-field speech recognition is a more challenging task compared to near-field recognition. Multi-channel methods such as beamforming help reduce the noises and enhance the target speech. + +TorchAudio now adds support for differentiable Minimum Variance Distortionless Response (MVDR) beamforming on multi-channel audio using Time-Frequency masks. Researchers can easily assemble it with any multi-channel ASR pipeline. There are three solutions (ref_channel, stv_evd, stv_power) and it supports single-channel and multi-channel (perform average in the method) masks. It provides an online option that recursively updates the parameters for streaming audio. We also provide a tutorial on how to apply MVDR beamforming to the multi-channel audio in the example directory. + +```python +from torchaudio.transforms import MVDR, Spectrogram, InverseSpectrogram + +# Load the multi-channel noisy audio +waveform_mix, sr = torchaudio.load('mix.wav') +# Initialize the stft and istft modules +stft = Spectrogram(n_fft=1024, hop_length=256, return_complex=True, power=None) +istft = InverseSpectrogram(n_fft=1024, hop_length=256) +# Get the noisy spectrogram +specgram_mix = stft(waveform_mix) +# Get the Time-Frequency mask via machine learning models +mask = model(waveform) +# Initialize the MVDR module +mvdr = MVDR(ref_channel=0, solution=”ref_channel”, multi_mask=False) +# Apply MVDR beamforming +specgram_enhanced = mvdr(specgram_mix, mask) +# Get the enhanced waveform via iSTFT +waveform_enhanced = istft(specgram_enhanced, length=waveform.shape[-1]) +``` + +Please refer to the [documentation](https://pytorch.org/audio/0.10.0/transforms.html#mvdr) f for more details and try out this feature using the [MVDR tutorial](https://github.com/pytorch/audio/blob/main/examples/beamforming/MVDR_tutorial.ipynb). + +### (Beta) RNN Transducer Loss +The RNN transducer (RNNT) loss is part of the RNN transducer pipeline, which is a popular architecture for speech recognition tasks. Recently it has gotten attention for being used in a streaming setting, and has also achieved state-of-the-art WER for the LibriSpeech benchmark. + +TorchAudio’s loss function supports float16 and float32 logits, has autograd and torchscript support, and can be run on both CPU and GPU, which has a custom CUDA kernel implementation for improved performance. The implementation is consistent with the original loss function in [Sequence Transduction with Recurrent Neural Networks](https://arxiv.org/pdf/1211.3711.pdf), but relies on code from [Alignment Restricted Streaming Recurrent Neural Network Transducer](https://arxiv.org/pdf/2011.03072.pdf). Special thanks to Jay Mahadeokar and Ching-Feng Yeh for their code contributions and guidance. + +Please refer to the [documentation](https://pytorch.org/audio/0.10.0/transforms.html#rnntloss) for more details. + +### (Beta) Batch support and filter bank support +`torchaudio.functional.lfilter` now supports batch processing and multiple filters. + +### (Prototype) Emformer Module +Automatic speech recognition (ASR) research and productization have increasingly focused on on-device applications. Towards supporting such efforts, TorchAudio now includes [Emformer](https://arxiv.org/abs/2010.10759), a memory-efficient transformer architecture that has achieved state-of-the-art results on LibriSpeech in low-latency streaming scenarios, as a prototype feature. + +Please refer to the [documentation](https://pytorch.org/audio/main/prototype.html#emformer) for more details. + +### GPU Build +GPU builds that support custom CUDA kernels in TorchAudio, like the one being used for RNN transducer loss, have been added. Following this change, TorchAudio’s binary distribution now includes CPU-only versions and CUDA-enabled versions. To use CUDA-enabled binaries, PyTorch also needs to be compatible with CUDA. + +### (Prototype) RNN Transducer Loss +The RNN transducer loss is used in training RNN transducer models, which is a popular architecture for speech recognition tasks. The prototype loss in torchaudio currently supports autograd, torchscript, float16 and float32, and can also be run on both CPU and CUDA. For more details, please refer to [the documentation](https://pytorch.org/audio/master/rnnt_loss.html). + + +# TorchVision 0.11 + +### (Stable) New Models +[RegNet](https://arxiv.org/abs/2003.13678) and [EfficientNet](https://arxiv.org/abs/1905.11946) are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and the accuracies of the pre-trained models obtained on ImageNet val can be found below (see [#4403](https://github.com/pytorch/vision/pull/4403#issuecomment-930381524), [#4530](https://github.com/pytorch/vision/pull/4530#issuecomment-933213238) and [#4293](https://github.com/pytorch/vision/pull/4293) for more details). + +The models can be used as follows: + +```python +import torch +from torchvision import models + +x = torch.rand(1, 3, 224, 224) + +regnet = models.regnet_y_400mf(pretrained=True) +regnet.eval() +predictions = regnet(x) + +efficientnet = models.efficientnet_b0(pretrained=True) +efficientnet.eval() +predictions = efficientnet(x) +``` + +See the full list of new models on the [torchvision.models](https://pytorch.org/vision/master/models.html) documentation page + +We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants. + +### (Beta) FX-based Feature Extraction +A new Feature Extraction method has been added to our utilities. It uses [torch.fx](https://pytorch.org/docs/stable/fx.html) and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. + +Here is an example of how to use the new utility: + +```python +import torch +from torchvision.models import resnet50 +from torchvision.models.feature_extraction import create_feature_extractor + + +x = torch.rand(1, 3, 224, 224) + +model = resnet50() + +return_nodes = { +"layer4.2.relu_2": "layer4" +} +model2 = create_feature_extractor(model, return_nodes=return_nodes) +intermediate_outputs = model2(x) + +print(intermediate_outputs['layer4'].shape) +``` + +We would like to thank Alexander Soare for developing this utility. + +### (Stable) New Data Augmentations +Two new Automatic Augmentation techniques were added: [Rand Augment](https://arxiv.org/abs/1909.13719) and [Trivial Augment](https://arxiv.org/abs/2103.10158). They apply a series of transformations on the original data to enhance them and to boost the performance of the models. The new techniques build on top of the previously added [AutoAugment](https://github.com/pytorch/vision/pull/3123) and focus on simplifying the approach, reducing the search space for the optimal policy and improving the performance gain in terms of accuracy. These techniques enable users to reproduce recipes to achieve state-of-the-art performance on the offered models. Additionally, it enables users to apply these techniques in order to do transfer learning and achieve optimal accuracy on new datasets. + +Both methods can be used as drop-in replacement of the AutoAugment technique as seen below: + +```python +from torchvision import transforms + +t = transforms.RandAugment() +# t = transforms.TrivialAugmentWide() +transformed = t(image) + +transform = transforms.Compose([ +transforms.Resize(256), +transforms.RandAugment(), # transforms.TrivialAugmentWide() +transforms.ToTensor()]) +``` + +Read the [automatic augmentation transforms](https://pytorch.org/vision/master/transforms.html#automatic-augmentation-transforms) for more details. + +We would like to thank Samuel G. Müller for contributing to Trivial Augment and for his help on refactoring the AA package. + +### Updated Training Recipes +We have updated our training reference scripts to add support for Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, [Mixup](https://arxiv.org/abs/1710.09412), [Cutmix](https://arxiv.org/abs/1905.04899) and other [SOTA primitives](https://github.com/pytorch/vision/issues/3911). The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected in the next release. + + +Thanks for reading. If you’re interested in these updates and want to join the PyTorch community, we encourage you to join [the discussion](https://discuss.pytorch.org/) forums and [open GitHub issues](https://github.com/pytorch/pytorch/issues). To get the latest news from PyTorch, follow us on [Facebook](https://www.facebook.com/pytorch/), [Twitter](https://twitter.com/PyTorch), [Medium](https://medium.com/pytorch), [YouTube](https://www.youtube.com/pytorch) or [LinkedIn](https://www.linkedin.com/company/pytorch). + +Cheers! + +-Team PyTorch From 116e4619813c5e4c9fbb341d039e77be2647be6e Mon Sep 17 00:00:00 2001 From: arielmoguillansky Date: Wed, 20 Oct 2021 18:15:00 -0300 Subject: [PATCH 5/6] Updated blog dates to 2021-10-21 --- ...10-main-release.md => 2021-10-21-pytorch-1.10-main-release.md} | 0 ...eleases.md => 2021-10-21-pytorch-1.10-new-library-releases.md} | 0 2 files changed, 0 insertions(+), 0 deletions(-) rename _posts/{2021-10-19-pytorch-1.10-main-release.md => 2021-10-21-pytorch-1.10-main-release.md} (100%) rename _posts/{2021-10-19-pytorch-1.10-new-library-releases.md => 2021-10-21-pytorch-1.10-new-library-releases.md} (100%) diff --git a/_posts/2021-10-19-pytorch-1.10-main-release.md b/_posts/2021-10-21-pytorch-1.10-main-release.md similarity index 100% rename from _posts/2021-10-19-pytorch-1.10-main-release.md rename to _posts/2021-10-21-pytorch-1.10-main-release.md diff --git a/_posts/2021-10-19-pytorch-1.10-new-library-releases.md b/_posts/2021-10-21-pytorch-1.10-new-library-releases.md similarity index 100% rename from _posts/2021-10-19-pytorch-1.10-new-library-releases.md rename to _posts/2021-10-21-pytorch-1.10-new-library-releases.md From c99fdfb8deaf37614501e8b6cde7d5b4c22ae1ae Mon Sep 17 00:00:00 2001 From: arielmoguillansky Date: Thu, 21 Oct 2021 16:18:45 -0300 Subject: [PATCH 6/6] testing bundle version for netlify --- .gitignore | 2 +- Gemfile.lock | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 477742363c05..90a766742d95 100644 --- a/.gitignore +++ b/.gitignore @@ -1,7 +1,7 @@ .DS_Store node_modules yarn-error.log -vendor +/vendor # These are NOT autogenerated. Check in files as necessary. !docs/stable/_static/js/vendor/ !docs/master/_static/js/vendor/ diff --git a/Gemfile.lock b/Gemfile.lock index f1f29bd40749..7cbd60164233 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -272,3 +272,5 @@ DEPENDENCIES RUBY VERSION ruby 2.5.1p57 +BUNDLED WITH + 2.2.22 \ No newline at end of file