Skip to content

export mobile model with 4bit failed, but 8bit was ok. #1506

Closed
@TheBetterSolution

Description

@TheBetterSolution

🐛 Describe the bug

I just follow the tutorial to export an mobile model:
python3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte

in the torchchat page(it's using executorch to export the mobile model, please refer to: https://github.com/pytorch/torchchat/blob/main/torchchat/export.py#L145), I changed the model's name to stories110m:
python3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte

torch._dynamo.exc.TorchRuntimeError: Failed running call_function quantized_decomposed.embedding_4bit.dtype(*(FakeTensor(..., size=(32000, 384), dtype=torch.uint8), FakeTensor(..., size=(32000, 24)), None, 0, 0, FakeTensor(..., size=(1, 1), dtype=torch.int64)), **{'dtype': torch.float32}):
embedding_4bit_dtype in ExecuTorch expects weight_quant_min == -8

the steps:

git clone https://github.com/pytorch/torchchat
compile and install torchchat
run the above command.

Please note: if change "bitwidth": 4 to "bitwidth": 8, the error isn't reproduced.
{
"embedding": {"bitwidth": 4, "groupsize" : 32},
"linear:a8w4dq": {"groupsize" : 256}
}

thanks for your work.

Versions

Collecting environment information...
PyTorch version: 2.6.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.31.4
Libc version: glibc-2.35

Python version: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
Nvidia driver version: 560.94
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.7.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] executorch==0.6.0a0+791472d
[pip3] numpy==2.0.0
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pytorch-triton==3.2.0+gitb2684bf3
[pip3] torch==2.6.0+cpu
[pip3] torchao==0.8.0+git11333ba2
[pip3] torchaudio==2.6.0+cpu
[pip3] torchsr==1.0.4
[pip3] torchtune==0.6.0.dev20250131+cu124
[pip3] torchvision==0.21.0+cpu
[conda] Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    ExecuTorchIssues related to ExecuTorch installation, export, or build. Mobile uses separate tagsQuantizationIssues related to Quantization or torchaotriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions