NUVOLOS
Sign In
  • Getting Started
    • Introduction to Nuvolos
    • Documentation structure
    • Nuvolos basic concepts
      • Organisational hierarchy
      • Applications
      • Distribution
      • Data integration
      • Snapshots
      • Background tasks
    • Navigate in Nuvolos
    • Quickstart tutorials
      • Research
      • Education (instructor)
      • Education (student)
  • Features
    • Applications
      • Application resources
      • Sessions
        • Session Logs
      • Install a software package
      • Create a persistent .bashrc
      • Automatic code execution
      • Long-running applications
      • Troubleshooting applications
      • New applications or licenses
      • Configuring applications
      • Exporting applications
      • Add-ons
        • MariaDB add-on
        • PostgreSQL add-on
        • OpenSearch add-on
        • MongoDB add-on
        • Redis add-on
        • PostGIS add-on
        • Rclone mount add-on
        • Neo4j add-on
    • File system and storage
      • File navigator
      • Large File Storage
      • Preview files
      • Mount Dropbox
      • Access S3 buckets with RClone
      • Access remote files with SSHFS
      • Access files on SharePoint Online
    • Object distribution
      • Distribution strategies
      • The distributed instance
    • Snapshots
      • Create a snapshot
      • Restore a snapshot
      • Delete a snapshot
    • Database integration
      • Create datasets
      • View tables
      • Build queries
      • Upload data
      • Access data from applications
        • Set up ODBC drivers
        • Obtain tokens for data access
        • Find database and schema path
      • DBeaver integration
    • Environment variables and secrets
    • Searching
      • Page
      • Find an application
      • Find an organisation
      • Find a space
      • Find an instance
      • Find a state
    • Video library
    • Nuvolos CLI and Python API
      • Installing the CLI
      • Using the CLI
  • User Guides
    • Research guides
      • Inviting a reviewer
      • GPU computation
    • Education guides
      • Setting assignments
        • Programmatical assignment handling
      • Documenting your course
      • Setting up group projects
        • Collaborative application editing
      • Configuring student applications
      • Archiving your course
      • Student guides
        • Joining a course
        • Working on assignments
        • Leaving a course
    • Application-specific guides
      • JupyterLab
      • RStudio
      • VSCode
      • Stata
      • MATLAB
      • Terminal
      • Terminal [tmux]
      • Apache Airflow
      • Apache Superset
      • D-Wave Inspector
      • MLFlow
      • Databricks Connect
      • Dynare.jl
      • CloudBeaver
      • InveLab
      • Overleaf
      • Metabase
      • DNDCv.CAN
      • OpenMetaData
      • Uploading data to the Large File Storage
    • Data guides
      • Setting up a dataset on Nuvolos
      • Importing data on Nuvolos
      • A complete database research workflow (Matlab & RStudio)
      • Accessing data as data.frames in R
      • Working with CRSP and Compustat
      • Working with the S&P 500®
  • Pricing and Billing
    • Pricing structure
    • Resource pools and budgets
    • Nuvolos Compute Units (NCUs)
  • Administration
    • Roles
      • Requesting roles
    • Organisation management
    • Space management
      • Invite to a space
      • Revoke a space user
      • HPC spaces
      • Resting spaces
    • Instance management
      • Invite to an instance
    • Enabling extra services
    • Monitoring resource usage
  • Reference
    • Application reference
      • InveLab
        • Dataset selection
        • Modules
          • Time-series visualisation
          • Moment estimation
          • Mean-variance frontiers
          • Frontiers
          • Dynamic strategy
          • Portfolio analysis
          • Performance analysis
          • Benchmarking
          • Carry trade strategies
          • Risk measures
          • Conditional volatility
          • Replication
          • Factor factory
          • Factor tilting
          • Valuation
    • Glossary
  • FAQs
    • FAQs
    • Troubleshooting
      • Login troubleshooting
        • I forgot my email address
        • I forgot my identity provider
        • I can't log in to Nuvolos
        • I forgot my password
        • I haven't received the password reset email
        • I haven't received the invitation email
      • Application troubleshooting
        • I can't see an application
        • I can't start an application
        • I can't create an application
        • I can't delete an application
        • I can't stop a running application
        • JupyterLab 3 troubleshooting
        • Spyder 3.7 troubleshooting
      • Administration troubleshooting
        • I can't see a space
        • I can't create a space
        • I can't delete a space
        • I can't invite admins to my space
        • I can't see an instance
        • I can't create an instance
        • I can't delete an instance
        • I can't invite users to an instance
        • I can't see distributed content in my instance
        • I can't see a snapshot
        • I can't create a snapshot
        • I can't delete a snapshot
        • I can't revoke a user role
        • I can't upload a file
        • I can't delete a file
        • I can't invite students to my course
      • Content troubleshooting
        • I can't find my files in my Linux home
        • I can't find my files among the Workspace files
        • I restored a snapshot by mistake
Powered by GitBook
On this page
  • Introduction
  • Library versions
  • GPU monitoring
  • Large Language Models
  • Python
  • PyTorch
  • NVCC
  • CUDA Toolkit
  • Tensorflow
  • Rstudio
  • XGBoost
  • Tensorflow / Keras

Was this helpful?

  1. User Guides
  2. Research guides

GPU computation

PreviousInviting a reviewerNextEducation guides

Last updated 1 year ago

Was this helpful?

Introduction

To enable GPU acceleration of your code, 2 conditions need to be met:

  1. You need to run your application on a GPU-enabled size. By default applications on Nuvolos run on nodes that do not have a GPU card integrated, however you can scale your applications to sizes with GPU. Note that all sizes with GPUs are .

  2. You need to make sure the application libraries are properly configured to use a GPU. The documentation below mostly addresses this topic for various frameworks, such that the application can actually use the available GPU.

Library versions

The NVIDIA device drivers will be loaded automatically in all GPU-enabled sizes. However, depending on the software you use, additional components (e.g. CUDA toolkit) might need to be installed via conda.

If you launch an app in a GPU-enabled size on Nuvolos, the nvidia-smi will be available from the command line / terminal. You can use this to check the driver version and monitor memory usage of the card.

$ nvidia-smi
Thu Jun  1 08:39:06 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 510.73.08    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10-4Q       On   | 00000002:00:00.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |    333MiB /  4096MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Due to the underlying virtualization technology in Nuvolos, the nvidia-smi tool is currently unable to list processes using the GPU

Please find below some examples on how to get started with GPU computations on Nuvolos or consult directly the relevant machine learning library documentation. If you require additional support, please reach out to our team directly.

GPU monitoring

We recommend to use the nvitop package to interactively monitor GPU usage. You can install it with

conda install -c conda-forge nvitop

Due to the underlying virtualization technology in Nuvolos, thenvitoptool cannot load the details of the processes using the GPU

Large Language Models

A few useful guidelines for running LLMs on Nuvolos:

Python

Installing the right version of CUDA for python packages can be an overwhelming task. We recommend to always start with a new clean image, and install the high-level AI/ML Python libraries first, only afterwards install other libraries, if possible. This way, PyTorch or Tensorflow can install the exact CUDA libraries they need.

PyTorch

In our experience, installing PyTorch with pip is better than with conda, as it won't try to overwrite system libraries:

pip3 install torch torchvision torchaudio

The above command will install PyTorch with the latest major CUDA Runtime version (12). On Nuvolos, currently all GPUs support version 12, except the the A10 card. If you wish to run your computation on A10, please install PyTorch with the older, 11 CUDA Runtime:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

You don't need to have a GPU available in your application to install PyTorch with GPU support, it's enough to scale up to a GPU-enabled size after you're done with the installation. To test if your installation was successful, execute the following code snippet while already on a GPU-enabled size:

import torch

dtype = torch.float
device = torch.device("cuda")
a = torch.randn((), device=device, dtype=dtype)

If it completes without an error, your configuration is correct.

NVCC

If you wish to compile CUDA executables with nvcc, you'll need to install the following packages as a minimum:

conda install -c nvidia cuda-nvcc cuda-cudart-dev

CUDA Toolkit

If you need the entire CUDA toolkit, you can also install it with conda. Recently it has changed how Nvidia ships the package with conda:

conda install -c nvidia cuda-toolkit

The package is available in both CUDA 11 and 12 versions.

Tensorflow

To install tensorflow, we recommend using conda as tensorflow requires the cudatoolkit package.

conda install -c conda-forge tensorflow-gpu "cudatoolkit<=CUDA_VERSION"

where CUDA_VERSION is the version reported by nvidia-smi. If you don't need to use the latest version of CUDA, it's recommended to start with an older version like 11.6 to achieve compatibility with older CPU cards.

We recommend to install both tensorflow and cudatoolkit from the same conda channel if possible, see notes above for cudatoolkit.

You don't need to have a GPU available in your running app to install Tensorflow wtih GPU support, it's enough to scale up to a GPU-enabled size after you're done with the installation. To test if your installation was successful, execute the following code snippet while already on a GPU-enabled size:

import tensorflow as tf

a = tf.constant([1, 2, 3])
print(a.device)

If you see an output similar to

/job:localhost/replica:0/task:0/device:GPU:0

that ends with GPU:0, your configuration is correct.

Rstudio

With Machine Learning (CUDA enabled) Rstudio images you can run GPU computations on GPU accelerated nodes. These images have the CUDA runtime / toolkit installed as well.

XGBoost

We recommend to use the pre-built experimental binary to get started with XGBoost and R. In a terminal on a GPU node:

# define version used - update if needed
XGBOOST_VERSION=1.4.1
# download binary
wget https://github.com/dmlc/xgboost/releases/download/v${XGBOOST_VERSION}/xgboost_r_gpu_linux_${XGBOOST_VERSION}.tar.gz
# Install dependencies
R -q -e "install.packages(c('data.table', 'jsonlite'))"
# Install XGBoost
R CMD INSTALL ./xgboost_r_gpu_linux_${XGBOOST_VERSION}.tar.gz

Tensorflow / Keras

You can use Tensorflow with GPU acceleration, by following our Tensorflow installation guide and selecting to install version = "gpu" when installing Tensorflow.

Note that the CUDA Driver API version in its output (11.6). However, most high level machine learning frameworks utilize the CUDA Runtime API as well, and the latter is provided by the CUDA Runtime library. Most frameworks are able to automatically install the required version of the runtime, so if you start from scratch, this should not be difficult to set up.

Always assess first your VRAM requirements. A helpful estimator can be found here:

Try loading your models with quantized parameters first, which require less VRAM footprint. The HuggingFace transformers model has good built-in support for automatic weights quantization:

Note that pip will install the runtime libraries needed by torch, but will not set up a complete developer environment that you could use from outside python (see). To use tools like nvcc from the command line, please install the via conda instead.

The package provides the compiler binaries, the provides the header and library files. Both packages are available in CUDA 11 and 12 versions.

You can test the code via the following example program:

nvidia-smi reports
https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator
https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.from_pretrained
cuda-nvcc
cuda-cudart-dev
https://rdrr.io/cran/xgboost/src/demo/gpu_accelerated.R
official notes
CUDA Toolkit
tool
credit-based
Monitoring with nvitop