Introduction to SLURM

Basic Concepts

SLURM uses several key concepts to manage cluster resources:

  • Node: A node is an individual computing unit in the cluster, which may have multiple CPUs and associated memory.

  • Job Queue: SLURM places jobs in queues, providing a way to order execution priority and access cluster resources.

  • Partition: A partition is a logical division of the cluster that lets users specify the type of resources they need. Partitions are usually organized by associated resource type.

Why SLURM Matters

SLURM is essential for efficient use of a compute cluster. Its main features include:

  • Efficient Scheduling: SLURM provides a robust and flexible scheduler, allowing jobs to be submitted with specific parameters such as runtime, CPU count, and memory.

  • Resource Management: With SLURM, it is possible to monitor and manage available cluster resources efficiently, avoiding overlap and optimizing node usage.

  • Fair and Equitable Access: SLURM helps maintain a fair execution environment, ensuring jobs run in an ordered and prioritized way through the queue system.

Learning to use SLURM is essential to achieve optimal cluster performance and ensure efficient resource distribution among research group members.

First Steps

When we connect to the cluster over SSH (How to connect via SSH), the most frequent commands are displayed:

***********************************************************************************
*       Welcome to Multivac. The calculus cluster of DOPS, a group of IOC.       *
***********************************************************************************
                         ___________________
                        |  _                |
                        | / \ _____  _____  |
                        | | // / \ \/ / \ \ |
                        | |// /   \/ /   \/ |
                        | |\\ \   / /\   __ |
                        | |_\\_\_/_/\_\_/_/ |
                        |___________________|

                            UPDATED: 21/02/2025
                        Multivac most frequent commands
        squeue                          Show the pending jobs
        sinfo                           Show the status/information of the nodes
        scua                            Show the pending jobs and status/information
        sacct -S 2025-01-01             Get all the jobs from the user since 2025-01-01
        seff [job_id]                   Shows the efficency of your submitted script
        multivac [conf.slurm]           Send a job to be scheduled in the queue
        scancel [jobid]                 Cancel a job by jobid
        sjob [jobid]                    Shows the info about a jobid
        mvacinteract [node]             Creates a ssh with the name of the node specified.
        mvac_crear_venv                 Creates a python3 virtual environment with docplex
        mvac_jupyter                    Creates a jupyterlab session
        More information: https://iocnet.upc.edu/doc/multivac

Most Frequent Linux Commands

cd                  # Change directory
cd ..               # Move one folder up
cd exemple/         # Move to folder "exemple"
ls                  # List directory contents
ls -la              # List contents including hidden files and attributes
mkdir exemple       # Create a folder
rmdir -r exemple    # Delete a folder
touch file1         # Create a file named file1
cp file1 file2      # Copy the contents of file1 to file2
mv file1 exemple/   # Move file1 into folder "exemple"
rm file1            # Delete file1
cat file1           # Display file contents in terminal
code &              # Open VS Code; trailing & keeps it alive if terminal closes