Update 2020-10-26-1.7-released.md

andresruizfacebook · web-flow · commit f482f692380b · 2020-10-26T12:00:38.000-07:00
diff --git a/_posts/2020-10-26-1.7-released.md b/_posts/2020-10-26-1.7-released.md
@@ -8,11 +8,11 @@ Today, we’re announcing the availability of PyTorch 1.7, along with updated do
 
 A few of the highlights include: 
 
-* 1- CUDA 11 is now officially supported with binaries available at [PyTorch.org](http://pytorch.org/)
-* 2- Updates and additions to profiling and performance for RPC, TorchScript, Stack traces and Benchmark utilities
-* 3- (Beta) Support for NumPy compatible Fast Fourier transforms (FFT) via torch.fft
-* 4- (Prototype) Support for Nvidia A100 generation GPUs and native TF32 format 
-* 5- (Prototype) Distributed training on Windows now supported
+1. CUDA 11 is now officially supported with binaries available at [PyTorch.org](http://pytorch.org/)
+2. Updates and additions to profiling and performance for RPC, TorchScript, Stack traces and Benchmark utilities
+3. (Beta) Support for NumPy compatible Fast Fourier transforms (FFT) via torch.fft
+4. (Prototype) Support for Nvidia A100 generation GPUs and native TF32 format 
+5. (Prototype) Distributed training on Windows now supported
 
 To reiterate, starting [PyTorch 1.6](https://pytorch.org/blog/pytorch-feature-classification-changes/), features are now classified as stable, beta and prototype. You can see the detailed announcement [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). Note that the prototype features listed in this blog are available as part of this release. 
 
@@ -64,8 +64,8 @@ Note that this is necessary, **but not sufficient**, for determinism **within a
 
 See the documentation for ```torch.set_deterministic(bool)``` for the list of affected operations.
 
-* RFC | [Link](https://github.com/pytorch/pytorch/issues/15359)
-* Documentation | [Link](https://pytorch.org/docs/stable/generated/torch.set_deterministic.html)
+* RFC ([Link](https://github.com/pytorch/pytorch/issues/15359))
+* Documentation ([Link](https://pytorch.org/docs/stable/generated/torch.set_deterministic.html))
 
 # Performance & Profiling
 
@@ -121,14 +121,14 @@ n = 33554432
  0.89 ns / element
  ```
  
- Documentation | Link  **Missing link**
+ Documentation (Link)  **Missing link**
  
  ## [Beta] Stack traces added to profiler
 
 Users can now see not only operator name/inputs in the profiler output table but also where the operator is in the code. The workflow requires very little change to take advantage of this capability. The user uses the [autograd profiler](https://pytorch.org/docs/stable/autograd.html#profiler) as before but with optional new parameters: ```with_stack``` and ```group_by_stack_n.``` 
 
-* [Details Link](https://github.com/pytorch/pytorch/pull/43898/)
-* [Documentation Link](https://pytorch.org/docs/stable/autograd.html)
+* Detail ([Link](https://github.com/pytorch/pytorch/pull/43898/))
+* Documentation ([Link](https://pytorch.org/docs/stable/autograd.html))
 
 # Distributed Training & RPC 
 
@@ -138,21 +138,21 @@ Torchelastic offers a strict superset of the current ```torch.distributed.launch
 
 By bundling ```torchelastic``` in the same docker image as PyTorch, users can start experimenting with torchelastic right-away without having to separately install ```torchelastic```. In addition to convenience, this work is a nice-to-have when adding support for elastic parameters in the existing Kubeflow’s distributed PyTorch operators.
 
-* Usage examples and how to get started | [Link](https://pytorch.org/elastic/0.2.0/examples.html)
+* Usage examples and how to get started ([Link](https://pytorch.org/elastic/0.2.0/examples.html))
 
 ## [Beta] Support for uneven dataset inputs in DDP
 
 PyTorch 1.7 introduces a new context manager to be used in conjunction with models trained using ```torch.nn.parallel.DistributedDataParallel``` to enable training with uneven dataset size across different processes. This feature enables greater flexibility when using DDP and prevents the user from having to manually ensure dataset sizes are the same across different process. With this context manager, DDP will handle uneven dataset sizes automatically, which can prevent errors or hangs at the end of training.
 
-* RFC | [Link](https://github.com/pytorch/pytorch/issues/38174)
-* Documentation | [Link](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.join)
+* RFC ([Link](https://github.com/pytorch/pytorch/issues/38174))
+* Documentation ([Link](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.join))
 
 ## [Beta] NCCL Reliability - Async Error/Timeout Handling
 
 In the past, NCCL training runs would hang indefinitely due to stuck collectives, leading to a very unpleasant experience for users. This feature will abort stuck collectives and throw an exception/crash the process if a potential hang is detected. When used with something like torchelastic (which can recover the training process from the last checkpoint), users can have much greater reliability for distributed training. This feature is completely opt in and sits behind an environment variable that needs to be explicitly set in order to enable this feature (otherwise users will see the same behavior as before).
 
-* Documentation | Link **Missing Link**
-* Usage examples | Link **Missing Link**
+* Documentation (Link) **Missing Link**
+* Usage examples (Link) **Missing Link**
 
 ## [Beta] TorchScript rpc_remote and rpc_sync
 
@@ -187,9 +187,9 @@ def script_rpc_remote_call(
     rref_res = rpc.remote(dst_worker_name, two_args_two_kwargs, args, kwargs)
     return rref_res.to_here()
  ```
-* Design doc | Link **Missing Link**
-* Documentation | Link **Missing Link**
-* Usage examples | Link **Missing Link**
+* Design doc (Link) **Missing Link**
+* Documentation  (Link) **Missing Link**
+* Usage examples (Link) **Missing Link**
 
 ## [Beta] Distributed optimizer with TorchScript support
 
@@ -199,9 +199,9 @@ In PyTorch 1.7, we are enabling the TorchScript support in distributed optimizer
 
 Currently, the only optimizer that supports automatic conversion with TorchScript is ```Adagrad```, all other optimizers will still work as before without TorchScript support. We are working on expanding the coverage to all PyTorch optimizers. 
 
-* Design doc | Link **Missing Link**
-* Documentation | Link **Missing Link**
-* Usage examples | Link **Missing Link**
+* Design doc (Link) **Missing Link**
+* Documentation (Link) **Missing Link**
+* Usage examples (Link) **Missing Link**
 
 ## [Beta] Enhancements to RPC-based Profiling
 
@@ -213,15 +213,15 @@ Support for using the PyTorch profiler in conjunction with the RPC framework was
 
 Users are now able to use familiar profiling tools such as with ```torch.autograd.profiler.profile()``` and ```with torch.autograd.profiler.record_function```, and this works transparently with the RPC framework with full feature support, profiles asynchronous functions, and TorchScript functions.
 
-* Design doc | [Link](https://github.com/pytorch/pytorch/issues/39675)
-* Usage examples | [Link](https://pytorch.org/tutorials/recipes/distributed_rpc_profiling.html)
+* Design doc ([Link](https://github.com/pytorch/pytorch/issues/39675))
+* Usage examples ([Link](https://pytorch.org/tutorials/recipes/distributed_rpc_profiling.html))
 
 ## [Beta] DDP memory reduction
 
 As of PyTorch 1.6, DDP would put an extra copy of gradient tensors in communication buckets. This incurs additional memory overhead which is equivalent to the size of the gradients. In PyTorch 1.7, we added a ```gradient_as_bucket_view``` flag to the DDP constructor API. When this flag is set to ```True```, DDP will override ```param.grad```  as views that point of communication buckets. This not only eliminates an extra in-memory copy of gradients, but also avoids the additional read/write operations to synchronize communication buckets and ```param.grad``` values. 
 
-* Design doc | [Link](https://github.com/pytorch/pytorch/issues/37030)
-* Documentation | [Link](https://pytorch.org/docs/master/generated/torch.nn.parallel.DistributedDataParallel.html?highlight=distributeddataparallel)
+* Design doc ([Link](https://github.com/pytorch/pytorch/issues/37030))
+* Documentation ([Link](https://pytorch.org/docs/master/generated/torch.nn.parallel.DistributedDataParallel.html?highlight=distributeddataparallel))
 
 ## [Prototype] Windows support for Distributed Training
 
@@ -241,9 +241,9 @@ dist.init_process_group(
 
 model = DistributedDataParallel(local_model, device_ids=[rank])
 ```
-* Design doc | [Link](https://github.com/pytorch/pytorch/issues/42095)
-* Documentation | [Link](https://pytorch.org/docs/master/distributed.html#backends-that-come-with-pytorch)
-* Acknowledgement | [gunandrose4u](https://github.com/gunandrose4u)
+* Design doc ([Link](https://github.com/pytorch/pytorch/issues/42095))
+* Documentation ([Link](https://pytorch.org/docs/master/distributed.html#backends-that-come-with-pytorch))
+* Acknowledgement ([gunandrose4u](https://github.com/gunandrose4u))
 
 # Mobile
 
@@ -275,8 +275,8 @@ c10::CPUCachingAllocator caching_allocator;
 
 **NOTE**: Caching allocator is only available on mobile builds, thus the use of caching allocator outside of mobile builds won’t be effective.
 
-* Documentation | [Link](https://github.com/pytorch/pytorch/blob/master/c10/mobile/CPUCachingAllocator.h#L13-L43)
-* Usage examples  [Link](https://github.com/pytorch/pytorch/blob/master/binaries/speed_benchmark_torch.cc#L207)
+* Documentation ([Link](https://github.com/pytorch/pytorch/blob/master/c10/mobile/CPUCachingAllocator.h#L13-L43))
+* Usage examples ([Link](https://github.com/pytorch/pytorch/blob/master/binaries/speed_benchmark_torch.cc#L207))
 
 # torchvision