Merge branch 'master' into quant-fix

andrewor14 · web-flow · commit 0bfba0ee48d9 · 2022-01-11T15:03:32.000-08:00
diff --git a/intermediate_source/README.txt b/intermediate_source/README.txt
@@ -3,7 +3,7 @@ Intermediate tutorials
 
 1. tensorboard_tutorial.py
 	Classifying Names with a Character-Level RNN
-	https://pytorch.org/tutorials/beginner/tensorboard_tutorial.html
+	https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html
 
 2. char_rnn_classification_tutorial.py
 	Classifying Names with a Character-Level RNN
diff --git a/recipes_source/intel_neural_compressor_for_pytorch.rst b/recipes_source/intel_neural_compressor_for_pytorch.rst
@@ -0,0 +1,373 @@
+Ease-of-use quantization for PyTorch with Intel® Neural Compressor
+==================================================================
+
+Overview
+--------
+
+Most deep learning applications are using 32-bits of floating-point precision
+for inference. But low precision data types, especially int8, are getting more
+focus due to significant performance boost. One of the essential concerns on
+adopting low precision is how to easily mitigate the possible accuracy loss
+and reach predefined accuracy requirement.
+
+Intel® Neural Compressor aims to address the aforementioned concern by extending
+PyTorch with accuracy-driven automatic tuning strategies to help user quickly find
+out the best quantized model on Intel hardware, including Intel Deep Learning
+Boost (`Intel DL Boost <https://www.intel.com/content/www/us/en/artificial-intelligence/deep-learning-boost.html>`_)
+and Intel Advanced Matrix Extensions (`Intel AMX <https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/intrinsics/intrinsics-for-amx-instructions/intrinsics-for-amx-tile-instructions.html>`_).
+
+Intel® Neural Compressor has been released as an open-source project
+at `Github <https://github.com/intel/neural-compressor>`_.
+
+Features
+--------
+
+- **Ease-of-use Python API:** Intel® Neural Compressor provides simple frontend
+  Python APIs and utilities for users to do neural network compression with few
+  line code changes.
+  Typically, only 5 to 6 clauses are required to be added to the original code.
+
+- **Quantization:** Intel® Neural Compressor supports accuracy-driven automatic
+  tuning process on post-training static quantization, post-training dynamic
+  quantization, and quantization-aware training on PyTorch fx graph mode and
+  eager model.
+
+*This tutorial mainly focuses on the quantization part. As for how to use Intel®
+Neural Compressor to do pruning and distillation, please refer to corresponding
+documents in the Intel® Neural Compressor github repo.*
+
+Getting Started
+---------------
+
+Installation
+~~~~~~~~~~~~
+
+.. code:: bash
+
+    # install stable version from pip
+    pip install neural-compressor
+
+    # install nightly version from pip
+    pip install -i https://test.pypi.org/simple/ neural-compressor
+
+    # install stable version from from conda
+    conda install neural-compressor -c conda-forge -c intel
+
+*Supported python versions are 3.6 or 3.7 or 3.8 or 3.9*
+
+Usages
+~~~~~~
+
+Minor code changes are required for users to get started with Intel® Neural Compressor
+quantization API. Both PyTorch fx graph mode and eager mode are supported.
+
+Intel® Neural Compressor takes a FP32 model and a yaml configuration file as inputs.
+To construct the quantization process, users can either specify the below settings via
+the yaml configuration file or python APIs:
+
+1. Calibration Dataloader (Needed for static quantization)
+2. Evaluation Dataloader
+3. Evaluation Metric
+
+Intel® Neural Compressor supports some popular dataloaders and evaluation metrics. For
+how to configure them in yaml configuration file, user could refer to `Built-in Datasets
+<https://github.com/intel/neural-compressor/blob/master/docs/dataset.md>`_.
+
+If users want to use a self-developed dataloader or evaluation metric, Intel® Neural
+Compressor supports this by the registration of customized dataloader/metric using python code.
+
+For the yaml configuration file format please refer to `yaml template
+<https://github.com/intel/neural-compressor/blob/master/neural_compressor/template/ptq.yaml>`_.
+
+The code changes that are required for *Intel® Neural Compressor* are highlighted with
+comments in the line above.
+
+Model
+^^^^^
+
+In this tutorial, the LeNet model is used to demonstrate how to deal with *Intel® Neural Compressor*.
+
+.. code-block:: python3
+
+    # main.py
+    import torch
+    import torch.nn as nn
+    import torch.nn.functional as F
+
+    # LeNet Model definition
+    class Net(nn.Module):
+        def __init__(self):
+            super(Net, self).__init__()
+            self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
+            self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
+            self.conv2_drop = nn.Dropout2d()
+            self.fc1 = nn.Linear(320, 50)
+            self.fc1_drop = nn.Dropout()
+            self.fc2 = nn.Linear(50, 10)
+
+        def forward(self, x):
+            x = F.relu(F.max_pool2d(self.conv1(x), 2))
+            x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
+            x = x.reshape(-1, 320)
+            x = F.relu(self.fc1(x))
+            x = self.fc1_drop(x)
+            x = self.fc2(x)
+            return F.log_softmax(x, dim=1)
+
+    model = Net()
+    model.load_state_dict(torch.load('./lenet_mnist_model.pth'))
+
+The pretrained model weight `lenet_mnist_model.pth` comes from
+`here <https://drive.google.com/drive/folders/1fn83DF14tWmit0RTKWRhPq5uVXt73e0h?usp=sharing>`_.
+
+Accuracy driven quantization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Intel® Neural Compressor supports accuracy-driven automatic tuning to generate the optimal
+int8 model which meets a predefined accuracy goal.
+
+Below is an example of how to quantize a simple network on PyTorch
+`FX graph mode <https://pytorch.org/docs/stable/fx.html>`_ by auto-tuning.
+
+.. code-block:: yaml
+
+    # conf.yaml
+    model:
+        name: LeNet
+        framework: pytorch_fx
+
+    evaluation:
+        accuracy:
+            metric:
+                topk: 1
+
+    tuning:
+      accuracy_criterion:
+        relative: 0.01
+
+.. code-block:: python3
+
+    # main.py
+    model.eval()
+
+    from torchvision import datasets, transforms
+    test_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('./data', train=False, download=True,
+                       transform=transforms.Compose([
+                           transforms.ToTensor(),
+                       ])),
+        batch_size=1)
+
+    # launch code for Intel® Neural Compressor
+    from neural_compressor.experimental import Quantization
+    quantizer = Quantization("./conf.yaml")
+    quantizer.model = model
+    quantizer.calib_dataloader = test_loader
+    quantizer.eval_dataloader = test_loader
+    q_model = quantizer()
+    q_model.save('./output')
+
+In the `conf.yaml` file, the built-in metric `top1` of Intel® Neural Compressor is specified as
+the evaluation method, and `1%` relative accuracy loss is set as the accuracy target for auto-tuning.
+Intel® Neural Compressor will traverse all possible quantization config combinations on per-op level
+to find out the optimal int8 model that reaches the predefined accuracy target.
+
+Besides those built-in metrics, Intel® Neural Compressor also supports customized metric through
+python code:
+
+.. code-block:: yaml
+
+    # conf.yaml
+    model:
+        name: LeNet
+        framework: pytorch_fx
+
+    tuning:
+        accuracy_criterion:
+            relative: 0.01
+
+.. code-block:: python3
+
+    # main.py
+    model.eval()
+
+    from torchvision import datasets, transforms
+    test_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('./data', train=False, download=True,
+                       transform=transforms.Compose([
+                           transforms.ToTensor(),
+                       ])),
+        batch_size=1)
+
+    # define a customized metric
+    class Top1Metric(object):
+        def __init__(self):
+            self.correct = 0
+        def update(self, output, label):
+            pred = output.argmax(dim=1, keepdim=True)
+            self.correct += pred.eq(label.view_as(pred)).sum().item()
+        def reset(self):
+            self.correct = 0
+        def result(self):
+            return 100. * self.correct / len(test_loader.dataset)
+
+    # launch code for Intel® Neural Compressor
+    from neural_compressor.experimental import Quantization
+    quantizer = Quantization("./conf.yaml")
+    quantizer.model = model
+    quantizer.calib_dataloader = test_loader
+    quantizer.eval_dataloader = test_loader
+    quantizer.metric = Top1Metric()
+    q_model = quantizer()
+    q_model.save('./output')
+
+In the above example, a `class` which contains `update()` and `result()` function is implemented
+to record per mini-batch result and calculate final accuracy at the end.
+
+Quantization aware training
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Besides post-training static quantization and post-training dynamic quantization, Intel® Neural
+Compressor supports quantization-aware training with an accuracy-driven automatic tuning mechanism.
+
+Below is an example of how to do quantization aware training on a simple network on PyTorch
+`FX graph mode <https://pytorch.org/docs/stable/fx.html>`_.
+
+.. code-block:: yaml
+
+    # conf.yaml
+    model:
+        name: LeNet
+        framework: pytorch_fx
+
+    quantization:
+        approach: quant_aware_training
+
+    evaluation:
+        accuracy:
+            metric:
+                topk: 1
+
+    tuning:
+        accuracy_criterion:
+            relative: 0.01
+
+.. code-block:: python3
+
+    # main.py
+    model.eval()
+
+    from torchvision import datasets, transforms
+    train_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('./data', train=True, download=True,
+                       transform=transforms.Compose([
+                           transforms.ToTensor(),
+                           transforms.Normalize((0.1307,), (0.3081,))
+                       ])),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('./data', train=False, download=True,
+                       transform=transforms.Compose([
+                           transforms.ToTensor(),
+                           transforms.Normalize((0.1307,), (0.3081,))
+                       ])),
+        batch_size=1)
+
+    import torch.optim as optim
+    optimizer = optim.SGD(model.parameters(), lr=0.0001, momentum=0.1)
+
+    def training_func(model):
+        model.train()
+        for epoch in range(1, 3):
+            for batch_idx, (data, target) in enumerate(train_loader):
+                optimizer.zero_grad()
+                output = model(data)
+                loss = F.nll_loss(output, target)
+                loss.backward()
+                optimizer.step()
+                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
+                      epoch, batch_idx * len(data), len(train_loader.dataset),
+                      100. * batch_idx / len(train_loader), loss.item()))
+
+    # launch code for Intel® Neural Compressor
+    from neural_compressor.experimental import Quantization
+    quantizer = Quantization("./conf.yaml")
+    quantizer.model = model
+    quantizer.q_func = training_func
+    quantizer.eval_dataloader = test_loader
+    q_model = quantizer()
+    q_model.save('./output')
+
+Performance only quantization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Intel® Neural Compressor supports directly yielding int8 model with dummy dataset for the
+performance benchmarking purpose.
+
+Below is an example of how to quantize a simple network on PyTorch
+`FX graph mode <https://pytorch.org/docs/stable/fx.html>`_ with a dummy dataset.
+
+.. code-block:: yaml
+
+    # conf.yaml
+    model:
+        name: lenet
+        framework: pytorch_fx
+
+.. code-block:: python3
+
+    # main.py
+    model.eval()
+
+    # launch code for Intel® Neural Compressor
+    from neural_compressor.experimental import Quantization, common
+    from neural_compressor.experimental.data.datasets.dummy_dataset import DummyDataset
+    quantizer = Quantization("./conf.yaml")
+    quantizer.model = model
+    quantizer.calib_dataloader = common.DataLoader(DummyDataset([(1, 1, 28, 28)]))
+    q_model = quantizer()
+    q_model.save('./output')
+
+Quantization outputs
+~~~~~~~~~~~~~~~~~~~~
+
+Users could know how many ops get quantized from log printed by Intel® Neural Compressor
+like below:
+
+::
+
+    2021-12-08 14:58:35 [INFO] |********Mixed Precision Statistics*******|
+    2021-12-08 14:58:35 [INFO] +------------------------+--------+-------+
+    2021-12-08 14:58:35 [INFO] |        Op Type         | Total  |  INT8 |
+    2021-12-08 14:58:35 [INFO] +------------------------+--------+-------+
+    2021-12-08 14:58:35 [INFO] |  quantize_per_tensor   |   2    |   2   |
+    2021-12-08 14:58:35 [INFO] |         Conv2d         |   2    |   2   |
+    2021-12-08 14:58:35 [INFO] |       max_pool2d       |   1    |   1   |
+    2021-12-08 14:58:35 [INFO] |          relu          |   1    |   1   |
+    2021-12-08 14:58:35 [INFO] |       dequantize       |   2    |   2   |
+    2021-12-08 14:58:35 [INFO] |       LinearReLU       |   1    |   1   |
+    2021-12-08 14:58:35 [INFO] |         Linear         |   1    |   1   |
+    2021-12-08 14:58:35 [INFO] +------------------------+--------+-------+
+
+The quantized model will be generated under `./output` directory, in which there are two files:
+1. best_configure.yaml
+2. best_model_weights.pt
+
+The first file contains the quantization configurations of each op, the second file contains
+int8 weights and zero point and scale info of activations.
+
+Deployment
+~~~~~~~~~~
+
+Users could use the below code to load quantized model and then do inference or performance benchmark.
+
+.. code-block:: python3
+
+    from neural_compressor.utils.pytorch import load
+    int8_model = load('./output', model)
+
+Tutorials
+---------
+
+Please visit `Intel® Neural Compressor Github repo <https://github.com/intel/neural-compressor>`_
+for more tutorials.
diff --git a/recipes_source/recipes_index.rst b/recipes_source/recipes_index.rst
@@ -248,6 +248,15 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu
    :link: ../recipes/intel_extension_for_pytorch.html
    :tags: Model-Optimization
 
+.. Intel(R) Neural Compressor for PyTorch*
+
+.. customcarditem::
+   :header: Intel® Neural Compressor for PyTorch
+   :card_description: Ease-of-use quantization for PyTorch with Intel® Neural Compressor.
+   :image: ../_static/img/thumbnails/cropped/profiler.png
+   :link: ../recipes/intel_neural_compressor_for_pytorch.html
+   :tags: Quantization,Model-Optimization
+
 .. Distributed Training
 
 .. customcarditem::