Introduction to SLURM
Basic Concepts
SLURM uses several key concepts to manage cluster resources:
Node: A node is an individual computing unit in the cluster, which may have multiple CPUs and associated memory.
Job Queue: SLURM places jobs in queues, providing a way to order execution priority and access cluster resources.
Partition: A partition is a logical division of the cluster that lets users specify the type of resources they need. Partitions are usually organized by associated resource type.
Why SLURM Matters
SLURM is essential for efficient use of a compute cluster. Its main features include:
Efficient Scheduling: SLURM provides a robust and flexible scheduler, allowing jobs to be submitted with specific parameters such as runtime, CPU count, and memory.
Resource Management: With SLURM, it is possible to monitor and manage available cluster resources efficiently, avoiding overlap and optimizing node usage.
Fair and Equitable Access: SLURM helps maintain a fair execution environment, ensuring jobs run in an ordered and prioritized way through the queue system.
Learning to use SLURM is essential to achieve optimal cluster performance and ensure efficient resource distribution among research group members.
First Steps
When we connect to the cluster over SSH (How to connect via SSH), the most frequent commands are displayed:
***********************************************************************************
* Welcome to Multivac. The calculus cluster of DOPS, a group of IOC. *
***********************************************************************************
___________________
| _ |
| / \ _____ _____ |
| | // / \ \/ / \ \ |
| |// / \/ / \/ |
| |\\ \ / /\ __ |
| |_\\_\_/_/\_\_/_/ |
|___________________|
UPDATED: 21/02/2025
Multivac most frequent commands
squeue Show the pending jobs
sinfo Show the status/information of the nodes
scua Show the pending jobs and status/information
sacct -S 2025-01-01 Get all the jobs from the user since 2025-01-01
seff [job_id] Shows the efficency of your submitted script
multivac [conf.slurm] Send a job to be scheduled in the queue
scancel [jobid] Cancel a job by jobid
sjob [jobid] Shows the info about a jobid
mvacinteract [node] Creates a ssh with the name of the node specified.
mvac_crear_venv Creates a python3 virtual environment with docplex
mvac_jupyter Creates a jupyterlab session
More information: https://iocnet.upc.edu/doc/multivac
Most Frequent Linux Commands
cd # Change directory
cd .. # Move one folder up
cd exemple/ # Move to folder "exemple"
ls # List directory contents
ls -la # List contents including hidden files and attributes
mkdir exemple # Create a folder
rmdir -r exemple # Delete a folder
touch file1 # Create a file named file1
cp file1 file2 # Copy the contents of file1 to file2
mv file1 exemple/ # Move file1 into folder "exemple"
rm file1 # Delete file1
cat file1 # Display file contents in terminal
code & # Open VS Code; trailing & keeps it alive if terminal closes
Useful Links
SFTP connection guide: Manual of IOC Connectivity
Downloads: Windows files