To program Llama 2 easily, it is highly recommended to encode quantized model. There is llama C++ port repository. Download llama.cpp git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp make Convert model to GGLM format cd llama.cpp python3 -m venv llama2 source llama2/bin/activate python3 -m pip install -r requirements.txt Converting process consists of two step. convert model to ..