-
Notifications
You must be signed in to change notification settings - Fork 4.2k
[quant] Add IR related recommendations for Quantizer tutorial #2608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2608
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 970227a with merge base 4e25f97 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: att Test Plan: CI Reviewers: Subscribers: Tasks: Tags:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some editorial nits.
Can you please fix the pyspelling job? |
Can someone please take a look? @ezyang @andrewor14 @SherlockNoMad @kimishpatel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few questions about the ordering / how this relates to our existing XNNPACKQuantizer
.
prototype_source/pt2e_quantizer.rst
Outdated
conv = torch.nn.functional.conv2d(x, weight, bias) | ||
relu = torch.nn.functional.relu(conv) | ||
# returns an additional dict that includes a map from name to node that we want to annotate | ||
return relu, {"conv": conv, "relu": relu, "x": x, "weight": weight, "bias": bias} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we resolve the weight issue? Can we return an nn.Parameter
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not yet, nn.Parameter is not supported. also I haven't tested if this would work for linear modules yet actually.
prototype_source/pt2e_quantizer.rst
Outdated
# annotate the nodes | ||
inputs, output = _find_input_and_output(match) | ||
inputs[0].users[0].meta["quantization_annotation"] = ... | ||
inputs[1].users[0].meta["quantization_annotation"] = ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually from reading the example (2) doesn't seem super user friendly. Should we put (3) SubgraphMatcherWithNameNodeMap
at the top as the recommended way? I feel it'll handle 90% of the patterns we want to match. For something like LSTM or MHA maybe we need something more advanced like (1) and (2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was thinking of having (3) as the recommended one, but there is actually an caveat that we are assuming some ops not being traced into multiple ops (e.g. F.linear, F.conv2d etc.)
@svekars I'm not sure how to fix these actually: https://github.com/pytorch/tutorials/actions/runs/6539478971/job/17757612269?pr=2608 |
prototype_source/pt2e_quantizer.rst
Outdated
|
||
2. Using ``SubgraphMatcher`` | ||
-------------------------------- | ||
Because of this, we recommend people to recognize the pattern through ``SubgraphMatcher``, through capturing a ``torch`` IR pattern (with the same program capture used for capturing the floating point model), instead of using the ``aten`` IR pattern directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who are the "people" here?
We should really just suggest alternative, rather than recommendation. When we say that quantization targets pre-dispatch aten IR, that is our contract. So the fact that, in some cases, pre-dispatch aten IR is not same as torch IR is not really quantization workflow problem but part of the export. That is the reason I feel we should really provide "examples of pattern matching". I feel recommendation feels too strong, given all the drawbacks listed below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"people" means backend developers.
The motivation for giving a recommendation is introduced in L338, basically trying to make sure the pattern matching is robust against pytorch code changes. If we don't give the recommendation I think modeling users will still expect the same python model + same quantization code + same quantizer will give the same quantized model when they update their pytorch version, which may not be true if quantizer writers are not cautious about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats fair. My main concern is with "recommendation". I would have presented alternatives with pros and cons.
prototype_source/pt2e_quantizer.rst
Outdated
|
||
2. Using ``SubgraphMatcher`` | ||
-------------------------------- | ||
Because of this, we recommend people to recognize the pattern through ``SubgraphMatcher``, through capturing a ``torch`` IR pattern (with the same program capture used for capturing the floating point model), instead of using the ``aten`` IR pattern directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also is torch IR defined anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not officially defined, L307 gives some examples
prototype_source/pt2e_quantizer.rst
Outdated
|
||
If people are uncertain if some functional operator is going to be traced into multiple aten operators then they can pick option 2. | ||
|
||
We would not recommend option 1. since that's the least stable in all three options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Against this is a very very storng statement which does not hold true in practice. If anything, in practice 1 is a better option because torch IR for the most part match pre-dispatch aten IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why option 1 would be better than torch IR? what is the criteria here? given the motivation in L338 I think option 1 is the least stable of all 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand the stability argument. My point is that in practice torch IR is very close to aten IR. And if you dont recommend it, I would just remove option 1 and just have this doc be "how-to-pattern-match for quantizers"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll present this in a different way, this is also a motivation of why we want to have other ways of matching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some comments
Modulo disagreements with @kimishpatel, the doc seems fine, but you should workshop it with one of the potential users and check if the APIs actually are sufficient for them in practice. User feedback matters. |
lol. Please do tell me where you disagree though |
I think Ed meant the disagreement between you and me on what options to recommend. I plan to restructure this and use option 1 as motivation and remove option 2, please let me know if there are other concerns |
|
||
With this, the ``Quantizer`` will still be valid even when the implementation for nn modules and functionals changes, the ``aten`` IR for floating point model will change, but since we capture the pattern again instead of hardcoding the ``aten`` IR for the pattern, we'll get the updated ``aten`` IR as well and will still be able to match the pattern. | ||
|
||
One caveat is that if inputs of the pattern has multiple users, we don't have a good way to identify which user node we want to annotate except for checking the aten op target. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand this issue clearly?
- One caveat is that if inputs of the pattern has multiple users,
Are we talking about the case that a input has been used by multiple different patterns? or the case that a input has been used by multiple nodes inside one pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that's correct
Summary:
att
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags:
Fixes #ISSUE_NUMBER
Description
Checklist