R&D/AI

Llama 2 in Apple Silicon Macbook (2/3)

sunshout 2023. 10. 29. 15:17

To program Llama 2 easily, it is highly recommended to encode quantized model.

There is llama C++ port repository.

Download llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make

Convert model to GGLM format

cd llama.cpp
python3 -m venv llama2
source llama2/bin/activate
python3 -m pip install -r requirements.txt

Converting process consists of two step.

convert model to f16 format
convert f16 model to ggml

convert to f16 format

mkdir -p models/7B
python3 convert.py --output models/7B/ggml-model-f16.bin \
--outtype f16 \
../llama2/llama/llama-2-7b-chat \
--vocab-dir ../llama2/llama

Before run the convert, create output directory (ex. models/7B)

--outfile is for specifying the output file name
--outtype is for specifying the output type which is f16
--vocab-dir is for specifying the directory containing tokenizer.model file

convert f16 model to ggml

This step is called as quantize the model

./quantize ./models/7B/ggml-model-f16.bin \
./models/7B/ggml-model-q4_0.bin q4_0

After quantize model, the file size became very small.

mzc01-choonhoson@MZC01-CHOONHOSON 7B % ls -alh
total 33831448
drwxr-xr-x@ 4 mzc01-choonhoson  staff   128B  9 12 17:23 .
drwxr-xr-x@ 5 mzc01-choonhoson  staff   160B  9 12 16:50 ..
-rw-r--r--@ 1 mzc01-choonhoson  staff    13G  9 12 17:23 ggml-model-f16.bin
-rw-r--r--@ 1 mzc01-choonhoson  staff   3.6G  9 12 17:23 ggml-model-q4_0.bin

Example

All done. run example binary!!!

./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt

References

GGML - Large Language Models for Everyone
https://github.com/rustformers/llm/blob/main/crates/ggml/README.md

Series

Llama 2 in Apple Silicon Bacbook (1/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-13-54h

Llama 2 in Apple Silicon Bacbook (2/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-23-2j51

Llama 2 in Apple Silicon Bacbook (3/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-33-3hb7

저작자표시

현재글Llama 2 in Apple Silicon Macbook (2/3)

네트워크, Eclipse, 라우터, C, 미완성, 아파트, 분양, CloudStack, 회사, Hadoop, OVM, 팁, 가상화, 논문, Xen, Python, PyQt4, HBase, ns, latex,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Deep dive into Kernel

Llama 2 in Apple Silicon Macbook (2/3)

Download llama.cpp

Convert model to GGLM format

convert to f16 format

convert f16 model to ggml

Example

References

Series

'R&D/AI'의 다른글

티스토리툴바

Llama 2 in Apple Silicon Macbook (2/3)

Download llama.cpp

Convert model to GGLM format

convert to f16 format

convert f16 model to ggml

Example

References

Series

'R&D/AI'의 다른글

관련글

티스토리툴바