マルチGPUでの実行(tensorflow-gpu/keras/horovod)
1. tensorflow-gpu/keras/horovod のインストール
[username@es1 ~]$ qrsh -g grpname -l rt_G.large=1
[username@g0001 ~]$ module load python/3.6/3.6.5 cuda/10.0/10.0.130.1 cudnn/7.6/7.6.5 nccl/2.5/2.5.6-1 openmpi/2.1.6 gcc/7.4.0
[username@g0001 ~]$ python3 -m venv ~/venv/tensorflow-keras+horovod # 次回以降は省略
[username@g0001 ~]$ source ~/venv/tensorflow-keras+horovod/bin/activate
(tensorflow-keras+horovod) [username@g0001 ~]$ pip3 install --upgrade pip setuptools
(tensorflow-keras+horovod) [username@g0001 ~]$ pip3 install tensorflow-gpu==1.15 keras
(tensorflow-keras+horovod) [username@g0001 ~]$ HOROVOD_WITH_TENSORFLOW=1 HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_NCCL_HOME=$NCCL_HOME pip3 install --no-cache-dir horovod
2. サンプルスクリプトの取得および実行確認例(1):GPUx4 (シングルノード)
(tensorflow-keras+horovod) [username@g0001 ~]$ git clone -b v0.18.2 https://github.com/horovod/horovod.git
(tensorflow-keras+horovod) [username@g0001 ~]$ mpirun -np 4 -map-by ppr:4:node python3 horovod/examples/keras_mnist.py
3. サンプルスクリプトの取得および実行確認例(2):GPUx8 (2ノード)
(tensorflow-keras+horovod) [username@g0001 ~]$ git clone -b v0.18.2 https://github.com/horovod/horovod.git
(tensorflow-keras+horovod) [username@g0001 ~]$ NUM_PROCS=8
(tensorflow-keras+horovod) [username@g0001 ~]$ NUM_GPUS_PER_NODE=4
(tensorflow-keras+horovod) [username@g0001 ~]$ mpirun -np $NUM_PROCS -map-by ppr:${NUM_GPUS_PER_NODE}:node python3 horovod/examples/tensorflow_mnist.py
コメント
まだありません。
ログインしてコメントを書く