Skip to content

Commit 3e664f3

Browse files
author
Svetlana Karslioglu
authored
Merge branch 'main' into patch-ray-tune
2 parents 99763ca + 9614e02 commit 3e664f3

File tree

1 file changed

+85
-213
lines changed

1 file changed

+85
-213
lines changed

prototype_source/backend_config_tutorial.rst

Lines changed: 85 additions & 213 deletions
Original file line numberDiff line numberDiff line change
@@ -11,145 +11,6 @@ For more information on the motivation and implementation details behind
1111
BackendConfig, please refer to this
1212
`README <https://github.com/pytorch/pytorch/tree/master/torch/ao/quantization/backend_config>`__.
1313

14-
BackendConfig API Specification
15-
-------------------------------
16-
17-
On a high level, BackendConfig specifies the quantization behavior for
18-
each supported operator pattern (e.g. linear, conv-bn-relu). The API is
19-
broken down into the following class hierarchy:
20-
21-
- `BackendConfig <https://pytorch.org/docs/stable/generated/torch.ao.quantization.backend_config.BackendConfig.html>`__:
22-
The main class to be passed to prepare and convert functions.
23-
- `BackendPatternConfig <https://pytorch.org/docs/stable/generated/torch.ao.quantization.backend_config.BackendPatternConfig.html>`__:
24-
Config object that specifies quantization behavior for a given
25-
operator pattern. Each BackendConfig consists of many of these.
26-
- `DTypeConfig <https://pytorch.org/docs/stable/generated/torch.ao.quantization.backend_config.DTypeConfig.html>`__:
27-
Config object that specifies the supported data types passed as
28-
arguments to quantize ops in the reference model spec, for input
29-
and output activations, weights, and biases. This object also
30-
optionally specifies constraints associated with the data types.
31-
Each BackendPatternConfig consists of one or more of these.
32-
- `DTypeWithConstraints <https://pytorch.org/docs/stable/generated/torch.ao.quantization.backend_config.DTypeWithConstraints.html>`__:
33-
Constraints imposed by the backend on the quantization parameters
34-
(scale and zero point) and ranges when quantizing to a given data
35-
type. Each DTypeConfig consists of many of these.
36-
37-
The pattern specified in BackendPatternConfig follows the format
38-
described `here <https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/backend_config/README.md#pattern-specification>`__.
39-
40-
BackendPatternConfig Specification
41-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42-
43-
set_observation_type
44-
^^^^^^^^^^^^^^^^^^^^
45-
46-
Observation type here refers to how observers (or quant-dequant ops)
47-
will be placed in the graph. This is used to produce the desired
48-
reference patterns understood by the backend. Weighted ops such as
49-
linear and conv require different observers (or quantization parameters
50-
passed to quantize ops in the reference model) for the input and the
51-
output (see `ObservationType <https://pytorch.org/docs/stable/generated/torch.ao.quantization.backend_config.ObservationType.html>`__).
52-
53-
Note: This will be renamed in the near future, since we will soon insert
54-
QuantDeQuantStubs with observers (and fake quantizes) attached instead
55-
of observers themselves.
56-
57-
set_dtype_configs / add_type_config
58-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
59-
60-
Each operator pattern may support one or more sets of
61-
input/output/weight/bias data types, and each set may have their own
62-
constraints. These requirements are captured in DTypeConfigs, which will
63-
be described in more detail in the next section.
64-
65-
set_root_module / set_reference_quantized_module
66-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
67-
68-
When we construct the reference quantized model during the convert
69-
phase, the root modules (e.g. ``torch.nn.Linear`` for
70-
``torch.ao.nn.intrinsic.LinearReLU``) will be swapped to the
71-
corresponding reference quantized modules (e.g.
72-
``torch.ao.nn.reference.quantized.Linear``). This allows custom backends
73-
to specify custom reference quantized module implementations to match
74-
the numerics of their lowered operators. Since this is a one-to-one
75-
mapping, both the root module and the reference quantized module must be
76-
specified in the same BackendPatternConfig in order for the conversion
77-
to take place.
78-
79-
set_fuser_method
80-
^^^^^^^^^^^^^^^^
81-
82-
As an optimization, operator patterns such as (``torch.nn.Linear``,
83-
``torch.nn.ReLU``) may be fused into ``nni.LinearReLU``.
84-
``set_fuser_method`` specifies the function through which this is
85-
performed. The first argument of this function is ``is_qat``, and the
86-
rest of the arguments are the items in the tuple pattern, e.g. the fuser
87-
method for the above pattern will have three arguments, ``is_qat``,
88-
``linear``, and ``relu``. See `this
89-
example <https://gist.github.com/jerryzh168/8bea7180a8ba3c279f2c9b050f2a69a6>`__
90-
for a slightly more complicated usage.
91-
92-
set_fused_module
93-
^^^^^^^^^^^^^^^^
94-
95-
This is used to identify fused weighted modules (e.g.
96-
``torch.ao.nn.intrinsic.LinearReLU``) that need to be converted to
97-
reference quantized modules.
98-
99-
Data Type Restrictions
100-
~~~~~~~~~~~~~~~~~~~~~~
101-
102-
Each DTypeConfig attached to a BackendPatternConfig represents a set of
103-
supported data types passed as arguments to quantize ops in the reference
104-
model spec. For example, consider the following reference model::
105-
106-
quant1 - [dequant1 - fp32_linear - quant2] - dequant2
107-
108-
The pattern in the square brackets refers to the reference pattern of
109-
statically quantized linear. Setting the input dtype as `torch.quint8`
110-
in the DTypeConfig means we pass in `torch.quint8` as the dtype argument
111-
to the first quantize op (quant1). Similarly, setting the output dtype as
112-
`torch.quint8` means we pass in `torch.quint8` as the dtype argument to
113-
the second quantize op (quant2).
114-
115-
Note that the dtype here does not refer to the interface dtypes of the
116-
op. For example, the "input dtype" here is not the dtype of the input
117-
tensor passed to the quantized linear op. Though it can still be the
118-
same as the interface dtype, this is not always the case, e.g. the
119-
interface dtype is fp32 in dynamic quantization but the "input dtype"
120-
specified in the DTypeConfig would still be quint8. The semantics of
121-
dtypes here are the same as the semantics of the dtypes specified in
122-
the observers.
123-
124-
These dtypes are matched against the ones specified in the user’s
125-
QConfig. If there is a match, and the QConfig satisfies the constraints
126-
specified in the DTypeConfig (if any), then we will quantize the given
127-
pattern using this DTypeConfig. Otherwise, the QConfig is ignored and
128-
the pattern will not be quantized.
129-
130-
There are two ways of specifying ``input_dtype``, ``output_dtype``, and
131-
``weight_dtype``, as simple ``torch.dtype`` or as
132-
``DTypeWithConstraints``. The constraints currently supported are:
133-
134-
- **quant_min_lower_bound** and **quant_max_upper_bound**: Lower and upper
135-
bounds for the minimum and maximum quantized values respectively. If the
136-
QConfig’s ``quant_min`` and ``quant_max`` fall outside this range, then
137-
the QConfig will be ignored.
138-
- **scale_min_lower_bound** and **scale_max_upper_bound**: Lower and
139-
upper bounds for the minimum and aximum scale values respectively. If
140-
the QConfig’s minimum scale value (currently exposed as ``eps``) falls
141-
below the lower bound, then the QConfig will be ignored. Note that the
142-
upper bound is currently not enforced.
143-
- **scale_exact_match** and **zero_point_exact_match**: Exact match
144-
requirements for scale and zero point, to be used for operators with
145-
fixed quantization parameters such as sigmoid and tanh. If the observer
146-
specified in the QConfig is neither ``FixedQParamsObserver`` nor
147-
``FixedQParamsFakeQuantize``, or if the quantization parameters don't
148-
match, then the QConfig will be ignored.
149-
150-
End-to-End Example
151-
------------------
152-
15314
Suppose we are a backend developer and we wish to integrate our backend
15415
with PyTorch's quantization APIs. Our backend consists of two ops only:
15516
quantized linear and quantized conv-relu. In this section, we will walk
@@ -175,6 +36,9 @@ BackendConfig through `prepare_fx` and `convert_fx`.
17536
)
17637
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx
17738
39+
1. Derive reference pattern for each quantized operator
40+
--------------------------------------------------------
41+
17842
For quantized linear, suppose our backend expects the reference pattern
17943
`[dequant - fp32_linear - quant]` and lowers it into a single quantized
18044
linear op. The way to achieve this is to first insert quant-dequant ops
@@ -183,17 +47,21 @@ reference model::
18347

18448
quant1 - [dequant1 - fp32_linear - quant2] - dequant2
18549

186-
Here we specify using different observers (will be renamed) for the input
187-
and output for the linear op, so the quantization params passed to the two
188-
quantize ops (quant1 and quant2) will be different. This is commonly the
189-
case for weighted ops like linear and conv.
50+
Similarly, for quantized conv-relu, we wish to produce the following
51+
reference model, where the reference pattern in the square brackets will
52+
be lowered into a single quantized conv-relu op::
53+
54+
quant1 - [dequant1 - fp32_conv_relu - quant2] - dequant2
19055

191-
The input dtype specified in the DTypeConfig will be passed as the dtype
192-
argument to quant1, while the output dtype will be passed as the dtype
193-
argument to quant2. If the output dtype is fp32, as in the case of dynamic
194-
quantization, then the output quant-dequant pair will not be inserted.
195-
This example also shows how to specify restrictions on quantization and
196-
scale ranges on a particular dtype.
56+
2. Set DTypeConfigs with backend constraints
57+
---------------------------------------------
58+
59+
In the reference patterns above, the input dtype specified in the
60+
DTypeConfig will be passed as the dtype argument to quant1, while the
61+
output dtype will be passed as the dtype argument to quant2. If the output
62+
dtype is fp32, as in the case of dynamic quantization, then the output
63+
quant-dequant pair will not be inserted. This example also shows how to
64+
specify restrictions on quantization and scale ranges on a particular dtype.
19765

19866
.. code:: ipython3
19967
@@ -211,6 +79,38 @@ scale ranges on a particular dtype.
21179
weight_dtype=torch.qint8,
21280
bias_dtype=torch.float)
21381
82+
3. Set up fusion for conv-relu
83+
-------------------------------
84+
85+
Note that the original user model contains separate conv and relu ops,
86+
so we need to first fuse the conv and relu ops into a single conv-relu
87+
op (`fp32_conv_relu`), and then quantize this op similar to how the linear
88+
op is quantized. We can set up fusion by defining a function that accepts
89+
3 arguments, where the first is whether or not this is for QAT, and the
90+
remaining arguments refer to the individual items of the fused pattern.
91+
92+
.. code:: ipython3
93+
94+
def fuse_conv2d_relu(is_qat, conv, relu):
95+
"""Return a fused ConvReLU2d from individual conv and relu modules."""
96+
return torch.ao.nn.intrinsic.ConvReLU2d(conv, relu)
97+
98+
4. Define the BackendConfig
99+
----------------------------
100+
101+
Now we have all the necessary pieces, so we go ahead and define our
102+
BackendConfig. Here we use different observers (will be renamed) for
103+
the input and output for the linear op, so the quantization params
104+
passed to the two quantize ops (quant1 and quant2) will be different.
105+
This is commonly the case for weighted ops like linear and conv.
106+
107+
For the conv-relu op, the observation type is the same. However, we
108+
need two BackendPatternConfigs to support this op, one for fusion
109+
and one for quantization. For both conv-relu and linear, we use the
110+
DTypeConfig defined above.
111+
112+
.. code:: ipython3
113+
214114
linear_config = BackendPatternConfig() \
215115
.set_pattern(torch.nn.Linear) \
216116
.set_observation_type(ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT) \
@@ -219,24 +119,6 @@ scale ranges on a particular dtype.
219119
.set_qat_module(torch.nn.qat.Linear) \
220120
.set_reference_quantized_module(torch.ao.nn.quantized.reference.Linear)
221121
222-
For quantized conv-relu, the observation type and DTypeConfig settings
223-
are the same, since we wish to produce the following reference model,
224-
where the reference pattern in the square brackets will be lowered into
225-
a single quantized conv-relu op::
226-
227-
quant1 - [dequant1 - fp32_conv_relu - quant2] - dequant2
228-
229-
However, first we need to fuse the conv and relu ops into a single
230-
conv-relu op (`fp32_conv_relu`), and then quantize this op similar to
231-
how the linear op is quantized. Thus, we need two BackendPatternConfigs
232-
to support this op, one for fusion and one for quantization:
233-
234-
.. code:: ipython3
235-
236-
def fuse_conv2d_relu(is_qat, conv, relu):
237-
"""Return a fused ConvReLU2d from individual conv and relu modules."""
238-
return torch.ao.nn.intrinsic.ConvReLU2d(conv, relu)
239-
240122
# For fusing Conv2d + ReLU into ConvReLU2d
241123
# No need to set observation type and dtype config here, since we are not
242124
# inserting quant-dequant ops in this step yet
@@ -254,23 +136,43 @@ to support this op, one for fusion and one for quantization:
254136
.set_qat_module(torch.ao.nn.intrinsic.qat.ConvReLU2d) \
255137
.set_reference_quantized_module(torch.ao.nn.quantized.reference.Conv2d)
256138
257-
Now we have all the necessary pieces, so we go ahead and define our
258-
BackendConfig and test it out on an example model. Here we see that
259-
both linear and conv-relu are quantized.
260-
261-
.. code:: ipython3
262-
263139
backend_config = BackendConfig("my_backend") \
264140
.set_backend_pattern_config(linear_config) \
265141
.set_backend_pattern_config(conv_relu_config) \
266142
.set_backend_pattern_config(fused_conv_relu_config)
267143
144+
5. Set up QConfigMapping that satisfies the backend constraints
145+
----------------------------------------------------------------
146+
147+
In order to use the ops defined above, the user must define a QConfig
148+
that satisfies the constraints specified in the DTypeConfig. For more
149+
detail, see the documentation for `DTypeConfig <https://pytorch.org/docs/stable/generated/torch.ao.quantization.backend_config.DTypeConfig.html>`__.
150+
We will then use this QConfig for all the modules used in the patterns
151+
we wish to quantize.
152+
153+
.. code:: ipython3
154+
155+
# Note: Here we use a quant_max of 127, but this could be up to 255 (see `quint8_with_constraints`)
156+
activation_observer = MinMaxObserver.with_args(quant_min=0, quant_max=127, eps=2 ** -12)
157+
qconfig = QConfig(activation=activation_observer, weight=default_weight_observer)
158+
159+
# Note: All individual items of a fused pattern, e.g. Conv2d and ReLU in
160+
# (Conv2d, ReLU), must have the same QConfig
161+
qconfig_mapping = QConfigMapping() \
162+
.set_object_type(torch.nn.Linear, qconfig) \
163+
.set_object_type(torch.nn.Conv2d, qconfig) \
164+
.set_object_type(torch.nn.BatchNorm2d, qconfig) \
165+
.set_object_type(torch.nn.ReLU, qconfig)
166+
167+
6. Quantize the model through prepare and convert
168+
--------------------------------------------------
169+
170+
Finally, we quantize the model by passing the BackendConfig we defined
171+
into prepare and convert. This produces a quantized linear module and
172+
a fused quantized conv-relu module.
173+
268174
.. code:: ipython3
269175
270-
# ====================
271-
# Example user model
272-
# ====================
273-
274176
class MyModel(torch.nn.Module):
275177
def __init__(self, use_bn: bool):
276178
super().__init__()
@@ -280,7 +182,7 @@ both linear and conv-relu are quantized.
280182
self.relu = torch.nn.ReLU()
281183
self.sigmoid = torch.nn.Sigmoid()
282184
self.use_bn = use_bn
283-
185+
284186
def forward(self, x):
285187
x = self.linear(x)
286188
x = self.conv(x)
@@ -290,31 +192,6 @@ both linear and conv-relu are quantized.
290192
x = self.sigmoid(x)
291193
return x
292194
293-
.. code:: ipython3
294-
295-
# =======================
296-
# Custom QConfigMapping
297-
# =======================
298-
299-
# Define a QConfig that satisfies the constraints specified in DTypeConfig
300-
# Note: Here we use a quant_max of 127, but this could be up to 255 (see `quint8_with_constraints`)
301-
activation_observer = MinMaxObserver.with_args(quant_min=0, quant_max=127, eps=2 ** -12)
302-
qconfig = QConfig(activation=activation_observer, weight=default_weight_observer)
303-
304-
# Note: All individual items of a fused pattern, e.g. Conv2d and ReLU in
305-
# (Conv2d, ReLU), must have the same QConfig
306-
qconfig_mapping = QConfigMapping() \
307-
.set_object_type(torch.nn.Linear, qconfig) \
308-
.set_object_type(torch.nn.Conv2d, qconfig) \
309-
.set_object_type(torch.nn.BatchNorm2d, qconfig) \
310-
.set_object_type(torch.nn.ReLU, qconfig)
311-
312-
.. code:: ipython3
313-
314-
# =====================
315-
# Prepare and Convert
316-
# =====================
317-
318195
example_inputs = (torch.rand(1, 3, 10, 10, dtype=torch.float),)
319196
model = MyModel(use_bn=False)
320197
prepared = prepare_fx(model, qconfig_mapping, example_inputs, backend_config=backend_config)
@@ -341,17 +218,16 @@ both linear and conv-relu are quantized.
341218
sigmoid = self.sigmoid(dequantize_2); dequantize_2 = None
342219
return sigmoid
343220
221+
(7. Experiment with faulty BackendConfig setups)
222+
-------------------------------------------------
223+
344224
As an experiment, here we modify the model to use conv-bn-relu
345225
instead of conv-relu, but use the same BackendConfig, which doesn't
346226
know how to quantize conv-bn-relu. As a result, only linear is
347227
quantized, but conv-bn-relu is neither fused nor quantized.
348228

349229
.. code:: ipython3
350-
351-
# ================================================
352-
# Prepare and Convert (only linear is quantized)
353-
# ================================================
354-
230+
# Only linear is quantized, since there's no rule for fusing conv-bn-relu
355231
example_inputs = (torch.rand(1, 3, 10, 10, dtype=torch.float),)
356232
model = MyModel(use_bn=True)
357233
prepared = prepare_fx(model, qconfig_mapping, example_inputs, backend_config=backend_config)
@@ -387,11 +263,7 @@ doesn't satisfy the dtype constraints specified in the backend. As
387263
a result, nothing is quantized since the QConfigs are simply ignored.
388264

389265
.. code:: ipython3
390-
391-
# ============================================
392-
# Prepare and Convert (nothing is quantized)
393-
# ============================================
394-
266+
# Nothing is quantized or fused, since backend constraints are not satisfied
395267
example_inputs = (torch.rand(1, 3, 10, 10, dtype=torch.float),)
396268
model = MyModel(use_bn=True)
397269
prepared = prepare_fx(model, get_default_qconfig_mapping(), example_inputs, backend_config=backend_config)

0 commit comments

Comments
 (0)