Job Submission

This section provides detailed instructions on how to prepare and submit jobs to the cluster using SLURM. Understanding this process is crucial to achieve efficient execution and proper cluster resource management.

Connection to the execution environment

The execution environment is a virtual machine called iocex.

To connect to the iocex machine using SSH, follow these steps:

  1. Open a Terminal: Open a terminal on your system. You can use your built-in terminal application or a terminal emulator such as Terminal on Linux or Command Prompt on Windows.

  2. Use the SSH Command: Use the ssh command to start an SSH connection through port 1022. The basic syntax is:

$ ssh -p 1022 username@iocex.upc.edu

How to configure a shared folder on Windows

This link explains how to configure a folder with direct access to the IOC NFS.

We recommend option 2.

Windows SFTP connection guide.

Example using a configuration file:

To simplify and standardize job launching in Multivac, we created a custom command that takes a .slurm configuration file as input. This file contains all configurable Multivac parameters and simplifies job submission.

$ multivac my_conf.slurm

This file contains the available parameters shown in conf.slurm template

Basic Job Submission

To submit a job to the cluster with SLURM, use the sbatch command. A basic example is shown below:

$ sbatch script.sh

In this example, script.sh is the Bash script for the job you want to run on the cluster. You can specify several parameters in the sbatch command, such as partition, number of CPUs, memory, execution time, and more.

Example with Parameters:

Below is a more detailed example with common parameters:

$ sbatch -p partition_name -c 4 --mem=8G -t 01:00:00 script.sh

In this example: - -p partition_name specifies the partition where the job will run. - -c 4 indicates that the job will use 4 CPUs. - –mem=8G sets the memory requirement to 8 gigabytes. - -t 01:00:00 sets the maximum job runtime to 1 hour.

Job Monitoring

All submitted jobs can be viewed with the following command:

$ squeue

You can add extra parameters to get more details about running jobs.

  • -u user: Shows jobs belonging to a specific user.

    Example:

$ squeue -u username
  • -o format: Specifies the output format. You can customize the fields shown.

    Example:

$ squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R"

In this example, several fields are displayed, such as job ID (%.18i), partition (%.9P), job name (%.8j), user (%.8u), job state (%.2t), memory (%.10M), elapsed time (%.6D), and nodes (%R).

  • –start: Filters jobs based on their start time.

    Example:

    $ squeue --start
    
  • –sort: Sorts the output by a specific field.

Example:

$ squeue --sort=-start_time

In this example, jobs are sorted by start time from most recent to oldest.

These are just a few examples of how you can customize squeue output to get more details about running jobs. See the official SLURM documentation for more options and detailed information.

Job Cancellation

To cancel a job, use the scancel command followed by the job ID. For example:

$ scancel 1234