GPU Usage

Introduction

In this example, we will explore how to submit a program to Multivac using the GPU to train a neural network.

Currently, only the all and dops-gpu partitions provide GPU access on machine dops-gpu1. In the configuration script, keep in mind that the whole machine must be used and it is not multi-user, since this GPU cannot be shared under these conditions.

We will create a file named test_gpu.py for demonstration purposes.

Example Code

import tensorflow as tf
from tensorflow.keras import layers, models, datasets

# Requires: pip install tensorflow[and-cuda]==2.15.1

# Do we want to use GPU?
volem_usar_gpu = True

# If we do not want to use the GPU, force CPU execution
if not volem_usar_gpu:
    tf.config.set_visible_devices([], 'GPU')

# Check and configure available GPUs
gpus = tf.config.list_physical_devices('GPU')

print(gpus)
if gpus:
    try:
        # Configure incremental memory growth
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print("GPUs configured for incremental memory usage.")
    except RuntimeError as e:
        print(e)
else:
    print("No GPU detected, CPU will be used.")

# Load and prepare the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
train_images = train_images[..., tf.newaxis] / 255.0  # Add a channel for grayscale images
test_images = test_images[..., tf.newaxis] / 255.0

# Define model architecture
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),  # First convolution layer
    layers.MaxPooling2D((2, 2)),  # First pooling layer
    layers.Conv2D(64, (3, 3), activation='relu'),  # Second convolution layer
    layers.MaxPooling2D((2, 2)),  # Second pooling layer
    layers.Conv2D(64, (3, 3), activation='relu'),  # Third convolution layer
    layers.Flatten(),  # Flatten features
    layers.Dense(64, activation='relu'),  # Hidden layer with 64 neurons and ReLU activation
    layers.Dense(10)  # Output layer with 10 neurons (one per class), no activation
])

# Compile model
model.compile(optimizer='adam',
            loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=['accuracy'])

# Train model
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

# Evaluate model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f'Test set accuracy: {test_acc}')

Execution

We will create a Python virtual environment named, for example, venv. This step is only required the first time the venv is created; afterwards it can be reused. You need to install TensorFlow: pip install tensorflow[and-cuda]==2.15.1

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install tensorflow[and-cuda]==2.15.1
$ deactivate

We will create a configuration file, for example test_gpu.slurm, containing the execution settings for this GPU script.

VERSION=1.3
JOB_NAME=test_gpu
NAME_OUTPUT=out
PARTITION=dops-gpu
N_TASKS=1
CPUS_PER_TASK=16
MAIL_TYPE=END,FAIL
MAIL_USER=nom.usuari@upc.edu
MEMORY=15000M
BEGIN=now
TIME_LIMIT=23:59:59 # In this example we set a 24h limit
LOG_OUTPUT=log
FORCED_NODES=dops-gpu1 # Very important: force this node
EXCLUDED_NODES=
ROUTE=~/tests/gpu_example
COMMANDS=(

    "hostname" # To know which machine executed the job
    "source $ROUTE/venv/bin/activate"
    "python3 $ROUTE/test_gpu.py"
    "deactivate"
)

We will submit this script from iocex using the following command:

multivac test_gpu.slurm

Once execution is complete, the output of our program will be visible in the same directory where the job was launched.