Open
Description
Add Link
https://pytorch.org/tutorials/intermediate/realtime_rpi.html
Describe the bug
I am getting 25-30fps on my rpi4 with provided snippet.
However, after finetuning mobilenet_v2 and applying:
# Quantize the model
quantized_model = torch.quantization.quantize_dynamic(
model, {torch.nn.Linear}, dtype=torch.qint8
)
# Convert the quantized model to TorchScript
script_model = torch.jit.script(quantized_model)
I am only getting 2.5fps.
The tutorial suggests:
You can create your own model or fine tune an existing one. If you fine tune on one of the models from [torchvision.models.quantized](https://pytorch.org/vision/stable/models.html#quantized-models) most of the work to fuse and quantize has already been done for you so you can directly deploy with good performance on a Raspberry Pi.
But provides no guidance on how to do it.
My attempts to do so failed:
torch.backends.quantized.engine = 'qnnpack'
model = models.quantization.mobilenet_v2(pretrained=True, quantize=True) # INT
num_classes = 3
model.classifier[1] = torch.nn.Linear(model.last_channel, num_classes)
would result in
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
[<ipython-input-48-ddcd2d77aac5>](https://localhost:8080/#) in <cell line: 24>()
39
40 # Forward pass
---> 41 outputs = model(inputs)
42 loss = criterion(outputs, labels)
43
6 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py](https://localhost:8080/#) in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:
RuntimeError: mat1 and mat2 must have the same dtype
Multiple attempts to create custom Linear layer that supports int8 dtype also failed.
Describe your environment
not relevant