Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES #820

eric1932 · 2023-10-14T19:58:19Z

When GPU acceleration is not properly configured, or some other cases, if the user tries to split tensors into multiple GPUs, model load will exit with an exception of
Could not load Llama model from path: ../models/codellama-7b.Q4_K_M.gguf. Received error invalid index (type=value_error)

This error message is caused by expanding a python list into a smaller C array llama.py:312

However, the error message is misleading, which doesn't contain detail enough information.

Thus this PR captures this case and raise a human-readable exception to point out the situation. Feel free to make any improvements.

eric1932 force-pushed the check-tensor-split branch from 89f59f2 to 1a651e9 Compare October 14, 2023 20:01

Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES

f7e46f8

eric1932 force-pushed the check-tensor-split branch from 1a651e9 to f7e46f8 Compare October 14, 2023 20:02

reword

4b51ca3

abetlen approved these changes Oct 15, 2023

View reviewed changes

abetlen merged commit b501665 into abetlen:main Oct 15, 2023

antoine-lizee pushed a commit to antoine-lizee/llama-cpp-python that referenced this pull request Oct 30, 2023

Add LoRA support (abetlen#820)

315a95a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES #820

Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES #820

Uh oh!

eric1932 commented Oct 14, 2023

Uh oh!

Uh oh!

Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES #820

Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES #820

Uh oh!

Conversation

eric1932 commented Oct 14, 2023

Uh oh!

Uh oh!