Job Submission
This section provides detailed instructions on how to prepare and submit jobs to the cluster using SLURM. Understanding this process is crucial to achieve efficient execution and proper cluster resource management.
Connection to the execution environment
The execution environment is a virtual machine called iocex.
To connect to the iocex machine using SSH, follow these steps:
Open a Terminal: Open a terminal on your system. You can use your built-in terminal application or a terminal emulator such as Terminal on Linux or Command Prompt on Windows.
Use the SSH Command: Use the ssh command to start an SSH connection through port 1022. The basic syntax is:
$ ssh -p 1022 username@iocex.upc.edu
Example using a configuration file:
To simplify and standardize job launching in Multivac, we created a custom command that takes a .slurm configuration file as input. This file contains all configurable Multivac parameters and simplifies job submission.
$ multivac my_conf.slurm
This file contains the available parameters shown in conf.slurm template
Basic Job Submission
To submit a job to the cluster with SLURM, use the sbatch command. A basic example is shown below:
$ sbatch script.sh
In this example, script.sh is the Bash script for the job you want to run on the cluster. You can specify several parameters in the sbatch command, such as partition, number of CPUs, memory, execution time, and more.
Example with Parameters:
Below is a more detailed example with common parameters:
$ sbatch -p partition_name -c 4 --mem=8G -t 01:00:00 script.sh
In this example: - -p partition_name specifies the partition where the job will run. - -c 4 indicates that the job will use 4 CPUs. - –mem=8G sets the memory requirement to 8 gigabytes. - -t 01:00:00 sets the maximum job runtime to 1 hour.
Job Monitoring
All submitted jobs can be viewed with the following command:
$ squeue
You can add extra parameters to get more details about running jobs.
-u user: Shows jobs belonging to a specific user.
Example:
$ squeue -u username
-o format: Specifies the output format. You can customize the fields shown.
Example:
$ squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"
In this example, several fields are displayed, such as job ID (%.18i), partition (%.9P), job name (%.8j), user (%.8u), job state (%.2t), memory (%.10M), elapsed time (%.6D), and nodes (%R).
–start: Filters jobs based on their start time.
Example:
$ squeue --start
–sort: Sorts the output by a specific field.
Example:
$ squeue --sort=-start_time
In this example, jobs are sorted by start time from most recent to oldest.
These are just a few examples of how you can customize squeue output to get more details about running jobs. See the official SLURM documentation for more options and detailed information.
Job Cancellation
To cancel a job, use the scancel command followed by the job ID. For example:
$ scancel 1234