GPU computation
To enable GPU acceleration of your code, 2 conditions need to be met:
- 1.You need to run your application on a GPU enabled node. By default applications on Nuvolos run on nodes that do not have a GPU card integrated, however with a single click you can scale your applications to run on GPU accelerated nodes.
- 2.You need to make sure the applications are configured to use a GPU. The documentation below mostly addresses the configurations needs to be done for applications to be able to use a GPU once it's available.
The NVIDIA device drivers will be loaded in all GPU supported images once a GPU node is started on Nuvolos. However depending on the image type additional components (e.g. CUDA toolkit) might need to be installed via conda.
If you launch a GPU accelerated node on Nuvolos, the
nvidia-smi
tool will be available from the command line / terminal. You can use this to check the driver version and monitor memory usage of the card.$ nvidia-smi
Thu Jun 1 08:39:06 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08 Driver Version: 510.73.08 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10-4Q On | 00000002:00:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 333MiB / 4096MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Due to the underlying virtualization technology in Nuvolos, the
nvidia-smi
tool is currently unable to list processes using the GPUNote that nvidia-smi reports the CUDA Driver API version in its output (11.6). However, most high level machine learning frameworks utilize the CUDA Runtime API as well, and the latter is provided by the CUDA Runtime library. Most frameworks are able to automatically install the required version of the runtime, so if you start from scratch, this should not be difficult to set up.
Please find below some examples on how to get started with GPU computations on Nuvolos or consult directly the relevant machine learning library documentation. If you require additional support, please reach out to our team directly.
Installing the right version of CUDA for python packages can be an overwhelming task. We recommend to always start with a new clean image, and install the high-level AI/ML Python libraries first, only afterwards install other libraries, if possible. This way, PyTorch or Tensorflow can install the exact CUDA libraries they need.
In our experience, installing PyTorch with pip is better than with conda, as it won't try to overwrite system libraries:
pip3 install torch torchvision torchaudio
You don't need to scale your app to have a GPU to install PyTorch, it's enough to scale up after you're done with the installation. To test if your installation was successful, execute the following code snippet while already scaled up to a GPU:
import torch
dtype = torch.float
device = torch.device("cuda")
a = torch.randn((), device=device, dtype=dtype)
If it completes without an error, your configuration is correct.
To install tensorflow, we recommend using conda as tensorflow requires the cudatoolkit package.
conda install -c conda-forge tensorflow-gpu cudatoolkit=11.6
We recommend to install both tensorflow and cudatoolkit from the same conda channel, to make sure they are compatible. At the time of writing, conda-forge was the best channel for this.
You don't need to scale your app to have a GPU to install Tensorflow, it's enough to scale up after you're done with the installation. To test if your installation was successful, execute the following code snippet while already scaled up to a GPU:
import tensorflow as tf
a = tf.constant([1, 2, 3])
print(a.device)
If you see an output similar to
/job:localhost/replica:0/task:0/device:GPU:0
that ends with
GPU:0
, your configuration is correct.With Machine Learning (CUDA enabled) Rstudio images you can run GPU computations on GPU accelerated nodes. These images have the CUDA runtime / toolkit installed as well.
We recommend to use the pre-built experimental binary to get started with XGBoost and R. In a terminal on a GPU node:
# define version used - update if needed
XGBOOST_VERSION=1.4.1
# download binary
wget https://github.com/dmlc/xgboost/releases/download/v${XGBOOST_VERSION}/xgboost_r_gpu_linux_${XGBOOST_VERSION}.tar.gz
# Install dependencies
R -q -e "install.packages(c('data.table', 'jsonlite'))"
# Install XGBoost
R CMD INSTALL ./xgboost_r_gpu_linux_${XGBOOST_VERSION}.tar.gz
You can test the code via the following example program: https://rdrr.io/cran/xgboost/src/demo/gpu_accelerated.R
You can use Tensorflow with GPU acceleration, by following our Tensorflow installation guide and selecting to install version = "gpu" when installing Tensorflow.
Last modified 3mo ago