C++
Introduction
In this example, we will explore how to submit a program to Multivac using the OpenMP module to parallelize and compile a program that opens multiple threads.
We will create a file named test_openmp.c for demonstration purposes.
Example Code
#include <omp.h>
#include <stdio.h>
#include <unistd.h> // For sleep()
int main() {
// Example 1: Threads run without a specific order
printf("Example 1: Threads without specific order\n");
// Shared variable
int shared_data = 0;
#pragma omp parallel
{
int id = omp_get_thread_num();
// Each thread modifies shared variable without synchronization
++shared_data;
printf("Thread %d modified shared_data to %d\n", id, shared_data);
// Simulate work
sleep(1);
// Each thread reads shared variable
printf("Thread %d reads shared_data: %d\n", id, shared_data);
}
// Example 2: Threads exchange information and synchronize
printf("\nExample 2: Threads with communication and synchronization\n");
// Reset shared variable
shared_data = 0;
#pragma omp parallel shared(shared_data)
{
int id = omp_get_thread_num();
int n = omp_get_num_threads();
for (int i = 0; i < n; ++i) {
#pragma omp barrier // Wait until all threads reach this point
if (id == i) {
// Current thread updates shared variable
++shared_data;
printf("Thread %d updated shared_data to %d\n", id, shared_data);
sleep(1);
}
#pragma omp barrier // Wait until modification is visible to all threads
// All threads wait in order to read the shared variable
for (int j = 0; j < n; ++j) {
#pragma omp barrier
if (id == j) {
printf("Thread %d reads shared_data: %d\n", id, shared_data);
}
}
}
}
return 0;
}
Execution
We will create a .slurm script named test_omp.slurm that tells Multivac how to run the code.
VERSION=1.3
JOB_NAME=test
NAME_OUTPUT=out
PARTITION=all
N_TASKS=4
CPUS_PER_TASK=2
MAIL_TYPE=END,FAIL
MAIL_USER=alexandre.gracia@upc.edu
MEMORY=15000M # [K|M|G|T] log-c1: 3715M, log-c2:3779M, log-c3:3715M, log-c4:3779M, dops-a1:15879M, dops-a2:15879M, dops-a3:15878M, dops-a4:15900M, dops-a5:15900M, cetus:64166M, psi:257379M
BEGIN=now # "hh:mm","now+1hour", "now+60" (seconds by default), "2010-01-20T12:34:00" YYYY-MM-DD[THH:MM[:SS]]
TIME_LIMIT=23:59:00 # "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes", "days-hours:minutes:seconds"
LOG_OUTPUT=log
FORCED_NODES= # Nodes where execution is forced
EXCLUDED_NODES= # Nodes excluded from execution
ROUTE=~/tests/test_omp # Path to our scripts
COMMANDS=(
# Info commands
"hostname"
"whoami"
# Compile program
"g++ -fopenmp test_openmp.c -o test_openmp"
# Execute program
"./test_openmp"
)
We will submit this script from iocex using the following command:
multivac test_omp.slurm
Output
The output of our script should be similar to this. In the first example, threads run without a specific order. In the second example, threads read a shared variable in a specific sequence, waiting for each previous step.
dops-a3
alexandre.gracia
Example 1: Threads without specific order
Thread 4 modified shared_data to 1
Thread 1 modified shared_data to 5
Thread 6 modified shared_data to 7
Thread 0 modified shared_data to 2
Thread 3 modified shared_data to 6
Thread 5 modified shared_data to 3
Thread 2 modified shared_data to 4
Thread 7 modified shared_data to 5
Thread 7 reads shared_data: 7
Thread 2 reads shared_data: 7
Thread 5 reads shared_data: 7
Thread 0 reads shared_data: 7
Thread 1 reads shared_data: 7
Thread 6 reads shared_data: 7
Thread 4 reads shared_data: 7
Thread 3 reads shared_data: 7
Example 2: Threads with communication and synchronization
Thread 0 updated shared_data to 1
Thread 0 reads shared_data: 1
Thread 1 reads shared_data: 1
Thread 2 reads shared_data: 1
Thread 3 reads shared_data: 1
Thread 4 reads shared_data: 1
Thread 5 reads shared_data: 1
Thread 6 reads shared_data: 1
Thread 7 reads shared_data: 1
Thread 1 updated shared_data to 2
Thread 0 reads shared_data: 2
Thread 1 reads shared_data: 2
Thread 2 reads shared_data: 2
Thread 3 reads shared_data: 2
Thread 4 reads shared_data: 2
Thread 5 reads shared_data: 2
Thread 6 reads shared_data: 2
Thread 7 reads shared_data: 2
Thread 2 updated shared_data to 3
Thread 0 reads shared_data: 3
Thread 1 reads shared_data: 3
Thread 2 reads shared_data: 3
Thread 3 reads shared_data: 3
Thread 4 reads shared_data: 3
Thread 5 reads shared_data: 3
Thread 6 reads shared_data: 3
Thread 7 reads shared_data: 3
Thread 3 updated shared_data to 4
Thread 0 reads shared_data: 4
Thread 1 reads shared_data: 4
Thread 2 reads shared_data: 4
Thread 3 reads shared_data: 4
Thread 4 reads shared_data: 4
Thread 5 reads shared_data: 4
Thread 6 reads shared_data: 4
Thread 7 reads shared_data: 4
Thread 4 updated shared_data to 5
Thread 0 reads shared_data: 5
Thread 1 reads shared_data: 5
Thread 2 reads shared_data: 5
Thread 3 reads shared_data: 5
Thread 4 reads shared_data: 5
Thread 5 reads shared_data: 5
Thread 6 reads shared_data: 5
Thread 7 reads shared_data: 5
Thread 5 updated shared_data to 6
Thread 0 reads shared_data: 6
Thread 1 reads shared_data: 6
Thread 2 reads shared_data: 6
Thread 3 reads shared_data: 6
Thread 4 reads shared_data: 6
Thread 5 reads shared_data: 6
Thread 6 reads shared_data: 6
Thread 7 reads shared_data: 6
Thread 6 updated shared_data to 7
Thread 0 reads shared_data: 7
Thread 1 reads shared_data: 7
Thread 2 reads shared_data: 7
Thread 3 reads shared_data: 7
Thread 4 reads shared_data: 7
Thread 5 reads shared_data: 7
Thread 6 reads shared_data: 7
Thread 7 reads shared_data: 7
Thread 7 updated shared_data to 8
Thread 0 reads shared_data: 8
Thread 1 reads shared_data: 8
Thread 2 reads shared_data: 8
Thread 3 reads shared_data: 8
Thread 4 reads shared_data: 8
Thread 5 reads shared_data: 8
Thread 6 reads shared_data: 8
Thread 7 reads shared_data: 8
Once execution is complete, the program output will be visible in the same directory where the job was launched.
As we can observe in the first case, because threads do not wait for each other, the order is random. Each thread reads the variable according to its own timing, without guaranteeing the expected shared value, leading to unexpected results.