GPU computation
Introduction
To enable GPU acceleration of your code, 2 conditions need to be met:
You need to run your application on a GPU-enabled size. By default applications on Nuvolos run on nodes that do not have a GPU card integrated, however you can scale your applications to sizes with GPU. Note that all sizes with GPUs are credit-based.
You need to make sure the application libraries are properly configured to use a GPU. The documentation below mostly addresses this topic for various frameworks, such that the application can actually use the available GPU.
Library versions
The NVIDIA device drivers will be loaded automatically in all GPU-enabled sizes. However, depending on the software you use, additional components (e.g. CUDA toolkit) might need to be installed via conda.
If you launch an app in a GPU-enabled size on Nuvolos, the nvidia-smi
tool will be available from the command line / terminal. You can use this to check the driver version and monitor memory usage of the card.
Due to the underlying virtualization technology in Nuvolos, the nvidia-smi
tool is currently unable to list processes using the GPU
Note that nvidia-smi reports the CUDA Driver API version in its output (11.6). However, most high level machine learning frameworks utilize the CUDA Runtime API as well, and the latter is provided by the CUDA Runtime library. Most frameworks are able to automatically install the required version of the runtime, so if you start from scratch, this should not be difficult to set up.
Please find below some examples on how to get started with GPU computations on Nuvolos or consult directly the relevant machine learning library documentation. If you require additional support, please reach out to our team directly.
GPU monitoring
We recommend to use the nvitop package to interactively monitor GPU usage. You can install it with
Due to the underlying virtualization technology in Nuvolos, thenvitop
tool cannot load the details of the processes using the GPU
Large Language Models
A few useful guidelines for running LLMs on Nuvolos:
Always assess first your VRAM requirements. A helpful estimator can be found here: https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator
Try loading your models with quantized parameters first, which require less VRAM footprint. The HuggingFace transformers model has good built-in support for automatic weights quantization: https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained
Python
Installing the right version of CUDA for python packages can be an overwhelming task. We recommend to always start with a new clean image, and install the high-level AI/ML Python libraries first, only afterwards install other libraries, if possible. This way, PyTorch or Tensorflow can install the exact CUDA libraries they need.
PyTorch
In our experience, installing PyTorch with pip is better than with conda, as it won't try to overwrite system libraries:
The above command will install PyTorch with the latest major CUDA Runtime version (12). On Nuvolos, currently all GPUs support version 12, except the the A10 card. If you wish to run your computation on A10, please install PyTorch with the older, 11 CUDA Runtime:
You don't need to have a GPU available in your application to install PyTorch with GPU support, it's enough to scale up to a GPU-enabled size after you're done with the installation. To test if your installation was successful, execute the following code snippet while already on a GPU-enabled size:
If it completes without an error, your configuration is correct.
Note that pip will install the runtime libraries needed by torch, but will not set up a complete developer environment that you could use from outside python (see official notes). To use tools like nvcc from the command line, please install the CUDA Toolkit via conda instead.
NVCC
If you wish to compile CUDA executables with nvcc, you'll need to install the following packages as a minimum:
The cuda-nvcc package provides the compiler binaries, the cuda-cudart-dev provides the header and library files. Both packages are available in CUDA 11 and 12 versions.
CUDA Toolkit
If you need the entire CUDA toolkit, you can also install it with conda. Recently it has changed how Nvidia ships the package with conda:
The package is available in both CUDA 11 and 12 versions.
Tensorflow
To install tensorflow, we recommend using conda as tensorflow requires the cudatoolkit package.
where CUDA_VERSION
is the version reported by nvidia-smi
. If you don't need to use the latest version of CUDA, it's recommended to start with an older version like 11.6 to achieve compatibility with older CPU cards.
We recommend to install both tensorflow and cudatoolkit from the same conda channel if possible, see notes above for cudatoolkit.
You don't need to have a GPU available in your running app to install Tensorflow wtih GPU support, it's enough to scale up to a GPU-enabled size after you're done with the installation. To test if your installation was successful, execute the following code snippet while already on a GPU-enabled size:
If you see an output similar to
that ends with GPU:0
, your configuration is correct.
Rstudio
With Machine Learning (CUDA enabled) Rstudio images you can run GPU computations on GPU accelerated nodes. These images have the CUDA runtime / toolkit installed as well.
XGBoost
We recommend to use the pre-built experimental binary to get started with XGBoost and R. In a terminal on a GPU node:
You can test the code via the following example program: https://rdrr.io/cran/xgboost/src/demo/gpu_accelerated.R
Tensorflow / Keras
You can use Tensorflow with GPU acceleration, by following our Tensorflow installation guide and selecting to install version = "gpu" when installing Tensorflow.
Last updated