Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
finetuning
Llama 2 70B should succeed.
Current Behavior
finetuning
Llama 2 70B fails with
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
- Physical (or virtual) hardware you are using, e.g. for Linux:
$ system_profiler SPHardwareDataType
Hardware:
Hardware Overview:
Model Name: Mac Studio
Model Identifier: Mac14,14
Model Number: G180LJ/A
Chip: Apple M2 Ultra
Total Number of Cores: 24 (16 performance and 8 efficiency)
Memory: 192 GB
System Firmware Version: 10151.41.12
OS Loader Version: 10151.41.12
Serial Number (system): xxxxxxxxxxxxxxxx
Hardware UUID: xxxxxxxxxxxxxxxxx
Provisioning UDID: xxxxxxxxxxxxxxxxxx
Activation Lock Status: Disabled
- Operating System, e.g. for Linux:
$ uname -a
Darwin xxxxxxxxx-MacStudio 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct 9 21:28:45 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6020 arm64
- SDK version, e.g. for Linux:
$ python3 --version
Python 3.11.5
$ make --version
GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
This program built for i386-apple-darwin11.3.0
$ g++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Failure Information (for bugs)
Please help provide information about the failure / bug.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- I tried to finetuning this model and ran the following command.
./finetune \
--model-base hf_downloads/japanese-stablelm-instruct-beta-70b.Q8_0.gguf \
--checkpoint-out finetuning-ITERATION.gguf \
--lora-out finetuning-LoRA-ITERATION.bin \
--train-data ./training/datasets/data.txt \
--save-every 10 --threads 32 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing
Doing so will result in an error.
GGML_ASSERT: ggml.c:16911: np < GGML_MAX_PARAMS
zsh: abort ./finetune --model-base --checkpoint-out emploee_list-ITERATION.gguf 10
It worked fine in 7b model, so it seems to be an error that occurs as the scale increases.
What does this limit mean?
What happens if you increase the number?
I thought this problem was related, but it seems like it's a different error.
Thank you!