참조: https://github.com/samlhuillier/code-llama-fine-tune-notebook/blob/main/fine-tune-code-llama.ipynb
기본 환경: Python 3.10, cuda 11.8
EC2 Instance Type: g3s.xlarge
- GPU가 장착되어 있는 VM을 생성합니다.
PyTorch 설치
root@ip-172-31-10-59:~# pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch
Downloading https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl (2325.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 GB ? eta 0:00:00
Collecting torchvision
Downloading https://download.pytorch.org/whl/cu118/torchvision-0.16.0%2Bcu118-cp310-cp310-linux_x86_64.whl (6.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 19.4 MB/s eta 0:00:00
Collecting torchaudio
Downloading https://download.pytorch.org/whl/cu118/torchaudio-2.1.0%2Bcu118-cp310-cp310-linux_x86_64.whl (3.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 11.2 MB/s eta 0:00:00
Collecting networkx
Downloading https://download.pytorch.org/whl/networkx-3.0-py3-none-any.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 16.7 MB/s eta 0:00:00
Requirement already satisfied: jinja2 in /usr/lib/python3/dist-packages (from torch) (3.0.3)
Collecting fsspec
Downloading https://download.pytorch.org/whl/fsspec-2023.4.0-py3-none-any.whl (153 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.0/154.0 KB 1.4 MB/s eta 0:00:00
Collecting typing-extensions
Downloading https://download.pytorch.org/whl/typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting filelock
Downloading https://download.pytorch.org/whl/filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting triton==2.1.0
Downloading https://download.pytorch.org/whl/triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.2/89.2 MB 14.4 MB/s eta 0:00:00
Collecting sympy
Downloading https://download.pytorch.org/whl/sympy-1.12-py3-none-any.whl (5.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 79.6 MB/s eta 0:00:00
Collecting numpy
Downloading https://download.pytorch.org/whl/numpy-1.24.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 38.1 MB/s eta 0:00:00
Collecting pillow!=8.3.*,>=5.3.0
Downloading https://download.pytorch.org/whl/Pillow-9.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 18.2 MB/s eta 0:00:00
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from torchvision) (2.25.1)
Collecting mpmath>=0.19
Downloading https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl (536 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 47.8 MB/s eta 0:00:00
Installing collected packages: mpmath, typing-extensions, sympy, pillow, numpy, networkx, fsspec, filelock, triton, torch, torchvision, torchaudio
Successfully installed filelock-3.9.0 fsspec-2023.4.0 mpmath-1.3.0 networkx-3.0 numpy-1.24.1 pillow-9.3.0 sympy-1.12 torch-2.1.0+cu118 torchaudio-2.1.0+cu118 torchvision-0.16.0+cu118 triton-2.1.0 typing-extensions-4.4.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@ip-172-31-10-59:~#
PyTorch 와 NVidia CUDA가 동작하는지 체크 하려면
import torch
torch.cuda.is_available()
실행 결과가 False 가 나오면 Nvidia CUDA가 아직 설치되어 있지 않은 상태이다.
Nvidia Driver 설치
sudo apt install software-properties-common -y
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update
sudo apt install ubuntu-drivers-common
sudo ubuntu-drivers autoinstall
예의상 서버를 리부팅하고
nvidia-smi 명령어를 실행해보며 어떤 GPU가 동작하는지 확인할 수 있다.
root@ip-172-31-10-59:~# nvidia-smi
Tue Nov 14 05:58:52 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02 Driver Version: 545.29.02 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla M60 Off | 00000000:00:1E.0 Off | 0 |
| N/A 25C P0 38W / 150W | 0MiB / 7680MiB | 99% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
root@ip-172-31-10-59:~#
ubuntu@ip-172-31-10-59:~$ python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True