Skip to content

Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES #820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 15, 2023

Conversation

eric1932
Copy link
Contributor

When GPU acceleration is not properly configured, or some other cases, if the user tries to split tensors into multiple GPUs, model load will exit with an exception of
Could not load Llama model from path: ../models/codellama-7b.Q4_K_M.gguf. Received error invalid index (type=value_error)
CleanShot 2023-10-14 at 12 52 26@2x

This error message is caused by expanding a python list into a smaller C array llama.py:312
CleanShot 2023-10-14 at 12 53 36@2x
However, the error message is misleading, which doesn't contain detail enough information.

Thus this PR captures this case and raise a human-readable exception to point out the situation. Feel free to make any improvements.

@abetlen abetlen merged commit b501665 into abetlen:main Oct 15, 2023
antoine-lizee pushed a commit to antoine-lizee/llama-cpp-python that referenced this pull request Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants