GPU Usage
Introduction
In this example, we will explore how to submit a program to Multivac using the GPU to train a neural network.
Currently, only the all and dops-gpu partitions provide GPU access on machine dops-gpu1. In the configuration script, keep in mind that the whole machine must be used and it is not multi-user, since this GPU cannot be shared under these conditions.
We will create a file named test_gpu.py for demonstration purposes.
Example Code
import tensorflow as tf
from tensorflow.keras import layers, models, datasets
# Requires: pip install tensorflow[and-cuda]==2.15.1
# Do we want to use GPU?
volem_usar_gpu = True
# If we do not want to use the GPU, force CPU execution
if not volem_usar_gpu:
tf.config.set_visible_devices([], 'GPU')
# Check and configure available GPUs
gpus = tf.config.list_physical_devices('GPU')
print(gpus)
if gpus:
try:
# Configure incremental memory growth
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
print("GPUs configured for incremental memory usage.")
except RuntimeError as e:
print(e)
else:
print("No GPU detected, CPU will be used.")
# Load and prepare the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
train_images = train_images[..., tf.newaxis] / 255.0 # Add a channel for grayscale images
test_images = test_images[..., tf.newaxis] / 255.0
# Define model architecture
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), # First convolution layer
layers.MaxPooling2D((2, 2)), # First pooling layer
layers.Conv2D(64, (3, 3), activation='relu'), # Second convolution layer
layers.MaxPooling2D((2, 2)), # Second pooling layer
layers.Conv2D(64, (3, 3), activation='relu'), # Third convolution layer
layers.Flatten(), # Flatten features
layers.Dense(64, activation='relu'), # Hidden layer with 64 neurons and ReLU activation
layers.Dense(10) # Output layer with 10 neurons (one per class), no activation
])
# Compile model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Train model
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))
# Evaluate model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f'Test set accuracy: {test_acc}')
Execution
We will create a Python virtual environment named, for example, venv. This step is only required the first time the venv is created; afterwards it can be reused.
You need to install TensorFlow: pip install tensorflow[and-cuda]==2.15.1
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install tensorflow[and-cuda]==2.15.1
$ deactivate
We will create a configuration file, for example test_gpu.slurm, containing the execution settings for this GPU script.
VERSION=1.3
JOB_NAME=test_gpu
NAME_OUTPUT=out
PARTITION=dops-gpu
N_TASKS=1
CPUS_PER_TASK=16
MAIL_TYPE=END,FAIL
MAIL_USER=nom.usuari@upc.edu
MEMORY=15000M
BEGIN=now
TIME_LIMIT=23:59:59 # In this example we set a 24h limit
LOG_OUTPUT=log
FORCED_NODES=dops-gpu1 # Very important: force this node
EXCLUDED_NODES=
ROUTE=~/tests/gpu_example
COMMANDS=(
"hostname" # To know which machine executed the job
"source $ROUTE/venv/bin/activate"
"python3 $ROUTE/test_gpu.py"
"deactivate"
)
We will submit this script from iocex using the following command:
multivac test_gpu.slurm
Once execution is complete, the output of our program will be visible in the same directory where the job was launched.