Skip to content

Commit c4210c5

Browse files
committed
Merge branch 'main' of github.com:pytorch/tutorials into sphinx-tv-tutorial
2 parents 25efd19 + 32d8341 commit c4210c5

File tree

4 files changed

+101
-30
lines changed

4 files changed

+101
-30
lines changed

intermediate_source/torchvision_tutorial.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ adding new custom datasets. The dataset should inherit from the standard
2626
The only specificity that we require is that the dataset ``__getitem__``
2727
should return a tuple:
2828

29-
- image: :class:`torchvision.datapoints.Image` of shape ``[3, H, W]`` or a PIL Image of size ``(H, W)``
29+
- image: :class:`torchvision.datapoints.Image` of shape ``[3, H, W]``, a pure tensor, or a PIL Image of size ``(H, W)``
3030
- target: a dict containing the following fields
3131

3232
- ``boxes``, :class:`torchvision.datapoints.BoundingBoxes` of shape ``[N, 4]``:
@@ -105,7 +105,7 @@ built-in transformations (`new Transforms API <https://pytorch.org/vision/stable
105105
for the given object detection and segmentation task.
106106
Namely, image tensors will be wrapped by :class:`torchvision.datapoints.Image`, bounding boxes into
107107
:class:`torchvision.datapoints.BoundingBoxes` and masks into :class:`torchvision.datapoints.Mask`.
108-
As datapoints are :class:`torch.Tensor` subclasses, wrapped objects are also tensors and inherit plain
108+
As datapoints are :class:`torch.Tensor` subclasses, wrapped objects are also tensors and inherit the plain
109109
:class:`torch.Tensor` API. For more information about torchvision datapoints see
110110
`this documentation <https://pytorch.org/vision/main/auto_examples/v2_transforms/plot_transforms_v2.html#sphx-glr-auto-examples-v2-transforms-plot-transforms-v2-py>`_.
111111

@@ -151,7 +151,7 @@ As datapoints are :class:`torch.Tensor` subclasses, wrapped objects are also ten
151151
# there is only one class
152152
labels = torch.ones((num_objs,), dtype=torch.int64)
153153
154-
image_id = torch.tensor([idx])
154+
image_id = idx
155155
area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
156156
# suppose all instances are not crowd
157157
iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

prototype_source/pt2e_quant_ptq_static.rst

Lines changed: 56 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -22,27 +22,27 @@ this:
2222
\ /
2323
\ /
2424
—-------------------------------------------------------
25-
| Dynamo Export |
25+
| Export |
2626
—-------------------------------------------------------
2727
|
2828
FX Graph in ATen XNNPACKQuantizer,
2929
| or X86InductorQuantizer,
3030
| or <Other Backend Quantizer>
3131
| /
3232
—--------------------------------------------------------
33-
| prepare_pt2e |
33+
| prepare_pt2e |
3434
—--------------------------------------------------------
3535
|
3636
Calibrate/Train
3737
|
3838
—--------------------------------------------------------
39-
| convert_pt2e |
39+
| convert_pt2e |
4040
—--------------------------------------------------------
4141
|
4242
Reference Quantized Model
4343
|
4444
—--------------------------------------------------------
45-
| Lowering |
45+
| Lowering |
4646
—--------------------------------------------------------
4747
|
4848
Executorch, or Inductor, or <Other Backends>
@@ -53,6 +53,7 @@ The PyTorch 2.0 export quantization API looks like this:
5353
.. code:: python
5454
5555
import torch
56+
from torch._export import capture_pre_autograd_graph
5657
class M(torch.nn.Module):
5758
def __init__(self):
5859
super().__init__()
@@ -66,7 +67,9 @@ The PyTorch 2.0 export quantization API looks like this:
6667
m = M().eval()
6768
6869
# Step 1. program capture
69-
m = torch._dynamo.export(m, *example_inputs, aten_graph=True)
70+
# NOTE: this API will be updated to torch.export API in the future, but the captured
71+
# result shoud mostly stay the same
72+
m = capture_pre_autograd_graph(m, *example_inputs)
7073
# we get a model with aten ops
7174
7275
@@ -186,8 +189,6 @@ and rename it to ``data/resnet18_pretrained_float.pth``.
186189
import numpy as np
187190
188191
import torch
189-
from torch.ao.quantization import get_default_qconfig, QConfigMapping
190-
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx, fuse_fx
191192
import torch.nn as nn
192193
from torch.utils.data import DataLoader
193194
@@ -352,10 +353,16 @@ Here is how you can use ``torch.export`` to export the model:
352353

353354
.. code-block:: python
354355
355-
import torch._dynamo as torchdynamo
356+
from torch._export import capture_pre_autograd_graph
356357
357358
example_inputs = (torch.rand(2, 3, 224, 224),)
358-
exported_model, _ = torchdynamo.export(model_to_quantize, *example_inputs, aten_graph=True, tracing_mode="symbolic")
359+
exported_model = capture_pre_autograd_graph(model_to_quantize, example_inputs)
360+
# or capture with dynamic dimensions
361+
# from torch._export import dynamic_dim
362+
# exported_model = capture_pre_autograd_graph(model_to_quantize, example_inputs, constraints=[dynamic_dim(example_inputs[0], 0)])
363+
364+
365+
``capture_pre_autograd_graph`` is a short term API, it will be updated to use the offical ``torch.export`` API when that is ready.
359366

360367

361368
Import the Backend Specific Quantizer and Configure how to Quantize the Model
@@ -429,24 +436,47 @@ Convert the Calibrated Model to a Quantized Model
429436
quantized_model = convert_pt2e(prepared_model)
430437
print(quantized_model)
431438
432-
.. note::
433-
the model produced here also had some improvement upon the previous
434-
`representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ in the FX graph mode quantizaiton, previously all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``, in the new flow, we choose to represent some of the operators with integer computation so that it's closer to the computation happens in hardwares.
435-
For example, here is how we plan to represent a quantized linear operator:
439+
At this step, we currently have two representations that you can choose from, but exact representation
440+
we offer in the long term might change based on feedback from PyTorch users.
441+
442+
* Q/DQ Representation (default)
443+
444+
Previous documentation for `representations <https://github.com/pytorch/rfcs/blob/master/RFC-0019-
445+
Extending-PyTorch-Quantization-to-Custom-Backends.md>`_ all quantized operators are represented as ``dequantize -> fp32_op -> qauntize``.
436446

437-
.. code-block:: python
447+
.. code-block:: python
448+
449+
def quantized_linear(x_int8, x_scale, x_zero_point, weight_int8, weight_scale, weight_zero_point, bias_fp32, output_scale, output_zero_point):
450+
x_fp32 = torch.ops.quantized_decomposed.dequantize_per_tensor(
451+
x_i8, x_scale, x_zero_point, x_quant_min, x_quant_max, torch.int8)
452+
weight_fp32 = torch.ops.quantized_decomposed.dequantize_per_tensor(
453+
weight_i8, weight_scale, weight_zero_point, weight_quant_min, weight_quant_max, torch.int8)
454+
weight_permuted = torch.ops.aten.permute_copy.default(weight_fp32, [1, 0]);
455+
out_fp32 = torch.ops.aten.addmm.default(bias_fp32, x_fp32, weight_permuted)
456+
out_i8 = torch.ops.quantized_decomposed.quantize_per_tensor(
457+
out_fp32, out_scale, out_zero_point, out_quant_min, out_quant_max, torch.int8)
458+
return out_i8
459+
460+
* Reference Quantized Model Representation (WIP, expected to be ready at end of August): we have special representation for selected ops (for example, quantized linear), other ops are represented as (``dq -> float32_op -> q``), and ``q/dq`` are decomposed into more primitive operators.
461+
462+
You can get this representation by using ``convert_pt2e(..., use_reference_representation=True)``.
463+
464+
.. code-block:: python
465+
466+
# Reference Quantized Pattern for quantized linear
467+
def quantized_linear(x_int8, x_scale, x_zero_point, weight_int8, weight_scale, weight_zero_point, bias_fp32, output_scale, output_zero_point):
468+
x_int16 = x_int8.to(torch.int16)
469+
weight_int16 = weight_int8.to(torch.int16)
470+
acc_int32 = torch.ops.out_dtype(torch.mm, torch.int32, (x_int16 - x_zero_point), (weight_int16 - weight_zero_point))
471+
bias_scale = x_scale * weight_scale
472+
bias_int32 = out_dtype(torch.ops.aten.div.Tensor, torch.int32, bias_fp32, bias_scale)
473+
acc_int32 = acc_int32 + bias_int32
474+
acc_int32 = torch.ops.out_dtype(torch.ops.aten.mul.Scalar, torch.int32, acc_int32, x_scale * weight_scale / output_scale) + output_zero_point
475+
out_int8 = torch.ops.aten.clamp(acc_int32, qmin, qmax).to(torch.int8)
476+
return out_int8
438477
439-
def quantized_linear(x_int8, x_scale, x_zero_point, weight_int8, weight_scale, weight_zero_point, bias_int32, bias_scale, bias_zero_point, output_scale, output_zero_point):
440-
x_int16 = x_int8.to(torch.int16)
441-
weight_int16 = weight_int8.to(torch.int16)
442-
acc_int32 = torch.ops.out_dtype(torch.mm, torch.int32, (x_int16 - x_zero_point), (weight_int16 - weight_zero_point))
443-
acc_rescaled_int32 = torch.ops.out_dtype(torch.ops.aten.mul.Scalar, torch.int32, acc_int32, x_scale * weight_scale / output_scale)
444-
bias_int32 = torch.ops.out_dtype(torch.ops.aten.mul.Scalar, bias_int32 - bias_zero_point, bias_scale / output_scale))
445-
out_int8 = torch.ops.aten.clamp(acc_rescaled_int32 + bias_int32 + output_zero_point, qmin, qmax).to(torch.int8)
446-
return out_int8
447478
448-
For more details, please see:
449-
`Quantized Model Representation <https://docs.google.com/document/d/17h-OEtD4o_hoVuPqUFsdm5uo7psiNMY8ThN03F9ZZwg/edit>`_.
479+
See `here <https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/pt2e/representation/rewrite.py>`_ for the most up-to-date reference representations.
450480

451481

452482
Checking Model Size and Accuracy Evaluation
@@ -503,9 +533,9 @@ We'll show how to save and load the quantized model.
503533
# Rerun all steps to get a quantized model
504534
model_to_quantize = load_model(saved_model_dir + float_model_file).to("cpu")
505535
model_to_quantize.eval()
506-
import torch._dynamo as torchdynamo
536+
from torch._export import capture_pre_autograd_graph
507537
508-
exported_model, _ = torchdynamo.export(model_to_quantize, *copy.deepcopy(example_inputs), aten_graph=True, tracing_mode="symbolic")
538+
exported_model = capture_pre_autograd_graph(model_to_quantize, example_inputs)
509539
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
510540
XNNPACKQuantizer,
511541
get_symmetric_quantization_config,

prototype_source/pt2e_quantizer.rst

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,15 @@ Prerequisites:
99
^^^^^^^^^^^^^^^^
1010

1111
Required:
12+
1213
- `Torchdynamo concepts in PyTorch <https://pytorch.org/docs/stable/dynamo/index.html>`__
1314

1415
- `Quantization concepts in PyTorch <https://pytorch.org/docs/master/quantization.html#quantization-api-summary>`__
1516

1617
- `(prototype) PyTorch 2.0 Export Post Training Static Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html>`__
1718

1819
Optional:
20+
1921
- `FX Graph Mode post training static quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html>`__
2022

2123
- `BackendConfig in PyTorch Quantization FX Graph Mode <https://pytorch.org/tutorials/prototype/backend_config_tutorial.html?highlight=backend>`__
@@ -141,7 +143,42 @@ parameters can be shared among some tensors explicitly. Two typical use cases ar
141143

142144
``SharedQuantizationSpec`` is designed for this use case to annotate tensors whose quantization
143145
parameters are shared with other tensors. Input of ``SharedQuantizationSpec`` is an ``EdgeOrNode`` object which
144-
can be an input edge or an output value.
146+
can be an input edge or an output value.
147+
148+
.. note::
149+
150+
* Sharing is transitive
151+
152+
Some tensors might be effectively using shared quantization spec due to:
153+
154+
* Two nodes/edges are configured to use ``SharedQuantizationSpec``.
155+
* There is existing sharing of some nodes.
156+
157+
For example, let's say we have two ``conv`` nodes ``conv1`` and ``conv2``, and both of them are fed into a ``cat``
158+
node: ``cat([conv1_out, conv2_out], ...)``. Let's say the output of ``conv1``, ``conv2``, and the first input of ``cat`` are configured
159+
with the same configurations of ``QuantizationSpec``. The second input of ``cat`` is configured to use ``SharedQuantizationSpec``
160+
with the first input.
161+
162+
.. code-block::
163+
164+
conv1_out: qspec1(dtype=torch.int8, ...)
165+
conv2_out: qspec1(dtype=torch.int8, ...)
166+
cat_input0: qspec1(dtype=torch.int8, ...)
167+
cat_input1: SharedQuantizationSpec((conv1, cat)) # conv1 node is the first input of cat
168+
169+
First of all, the output of ``conv1`` is implicitly sharing quantization parameters (and observer object)
170+
with the first input of ``cat``, and the same is true for the output of ``conv2`` and the second input of ``cat``.
171+
Therefore, since the user configures the two inputs of ``cat`` to share quantization parameters, by transitivity,
172+
``conv2_out`` and ``conv1_out`` will also be sharing quantization parameters. In the observed graph, you
173+
will see the following:
174+
175+
.. code-block::
176+
177+
conv1 -> obs -> cat
178+
conv2 -> obs /
179+
180+
and both ``obs`` will be the same observer instance.
181+
145182

146183
- Input edge is the connection between input node and the node consuming the input,
147184
so it's a ``Tuple[Node, Node]``.

recipes_source/recipes/tuning_guide.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,10 @@ def fused_gelu(x):
295295
torch._C._jit_set_autocast_mode(False)
296296

297297
with torch.no_grad(), torch.cpu.amp.autocast(cache_enabled=False, dtype=torch.bfloat16):
298+
# Conv-BatchNorm folding for CNN-based Vision Models should be done with ``torch.fx.experimental.optimization.fuse`` when AMP is used
299+
import torch.fx.experimental.optimization as optimization
300+
# Please note that optimization.fuse need not be called when AMP is not used
301+
model = optimization.fuse(model)
298302
model = torch.jit.trace(model, (example_input))
299303
model = torch.jit.freeze(model)
300304
# a couple of warm-up runs

0 commit comments

Comments
 (0)