Description
🐛 Describe the bug
I just follow the tutorial to export an mobile model:
python3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte
in the torchchat page(it's using executorch to export the mobile model, please refer to: https://github.com/pytorch/torchchat/blob/main/torchchat/export.py#L145), I changed the model's name to stories110m:
python3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte
torch._dynamo.exc.TorchRuntimeError: Failed running call_function quantized_decomposed.embedding_4bit.dtype(*(FakeTensor(..., size=(32000, 384), dtype=torch.uint8), FakeTensor(..., size=(32000, 24)), None, 0, 0, FakeTensor(..., size=(1, 1), dtype=torch.int64)), **{'dtype': torch.float32}):
embedding_4bit_dtype in ExecuTorch expects weight_quant_min == -8
the steps:
git clone https://github.com/pytorch/torchchat
compile and install torchchat
run the above command.
Please note: if change "bitwidth": 4 to "bitwidth": 8, the error isn't reproduced.
{
"embedding": {"bitwidth": 4, "groupsize" : 32},
"linear:a8w4dq": {"groupsize" : 256}
}
thanks for your work.
Versions
Collecting environment information...
PyTorch version: 2.6.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.31.4
Libc version: glibc-2.35
Python version: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
Nvidia driver version: 560.94
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.7.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] executorch==0.6.0a0+791472d
[pip3] numpy==2.0.0
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pytorch-triton==3.2.0+gitb2684bf3
[pip3] torch==2.6.0+cpu
[pip3] torchao==0.8.0+git11333ba2
[pip3] torchaudio==2.6.0+cpu
[pip3] torchsr==1.0.4
[pip3] torchtune==0.6.0.dev20250131+cu124
[pip3] torchvision==0.21.0+cpu
[conda] Could not collect