Optimizing

Edit this page


Here is a presentation from Research Computing on terminology and optimizing workflows in the lab: Optimization_Lotterhos_labmeeting_Sekar-V2.pdf

To view your recent jobs, use acct -S 2020-08-01 -u lotterhos will list all jobs since that date for user lotterhos

To learn about how much memory a job used, type seff jobID - will give memory for the job that was requested and how much was used.

Check job efficiency and performance:

a. R code in part c from Workflow parralelization approaches shows an example of how you can capture the run time of a particular job, or a parallel segment by printing out the time difference it takes to calculate a particular part. You can also modify this to calculate the total time it took to run your code. To find the best performance, I would recommend to benchmark the calculations on (for example) 1,24,8,16,32 cores and see if at some point it levels off in terms of run time, this may mean that there would be no benefit to request more cores beyond that. Research Computing can discuss performance testing, just schedule another consultation time.

b. You can also look at the efficiency and memory usage of your jobs by looking at the completed jobs from your slurm history, by typing: acct -S 2020-08-10 -u albecker Then, selecting a particular job (might be beneficial to have a small script that runs through all jobs, since you had over 100,000 jobs this month, I can help with that too if needed) and using the seff command to extract the efficiency. For example:

$ seff 13291706
Job ID: 13291706
Cluster: discovery
User/Group: albecker/users
State: COMPLETED (exit code 0)
Cores: 1
CPU Utilized: 00:01:58
CPU Efficiency: 99.16% of 00:01:59 core-walltime
Job Wall-clock time: 00:01:59
Memory Utilized: 284.50 MB
Memory Efficiency: 0.56% of 50.00 GB

Here, you can see that the CPU efficiency was very high at 99%, but you ended up using less than 1% of the memory you requested (284.50 MB out of 50.00 GB). This information is useful to better allocate and utilize resources in the future.

c. It is also recommended to perform short testing prior to submit long jobs to the cluster. You can use the “debug”, “express” or the “lotterhos” partition and you can run the job interactively. For example, this will give you resources on the debug partition (up to 20 minutes). srun -p debug --nodes=1 --ntasks=16 --pty /bin/bash

Make sure to optimize on the paritition that you will use

For example, the lotterhos partition runs a lot faster than the short partition

If you optimize on the lotterhos partition, then your jobs will run out of time on the short partition

Optimization and arrays

Here is an example header for an array script:

#SBATCH --job-name=SlimRun
#SBATCH --mail-user=k.lotterhos@northeastern.edu
#SBATCH --mail-type=FAIL
#SBATCH --partition=lotterhos
#SBATCH --mem=4G
#SBATCH --nodes=1
#SBATCH --CPUs-per-task=2
#SBATCH --array=50-151%70
#SBATCH --output=/work/lotterhos/MVP-NonClinalAF/slurm_log/SlimRun20210826_%j.out
#SBATCH --error=/work/lotterhos/MVP-NonClinalAF/slurm_log/SlimRun20210826_%j.err

source ~/miniconda3/bin/activate MVP_env

The Lotterhos partition has 72 Cores, divided across two nodes. First, you need to know more about the how the programs that your script will call work. Can thay utilize multiple cores (threads)? Or do they only use one core?

This is an example header from a submission script for an array that reads in rows 50-150 from a file that has the parameters that I want to run.

And explanation of the following terms:

#SBATCH --array=50-151%70

  • This will run 101 array tasks, for parameters 50-151 in the filename, and submit 70 jobs at a time.

  • If each task in array requires 1 CPU, you could also run up to 72 jobs at at time with 50-151%72, to maximize the resources in the lotterhos partition. But, it is good to leave a couple of cores open for srun tasks.

  • If each task requires 2 CPUS, set --CPUs-per-task=2 (look it up) and set --array=50-151%36 to maximize the resources in the lotterhos partition.

  • IF programs can use multiple threads, do benchmarking to determine CPU per task, 1-2 jobs at a time, and do seff on those tasks that completed (see below). Test 1, 2, 4, 8, 16 CPUs. Then set that in the line.

#SBATCH --nodes=1

  • Set this to 1 unless we know the program can communicate between the nodes.

  • If you set this to 1 and you set 50-151%72, it will use both nodes on lotterhos, but only one CPU per node. If you set #SBATCH --nodes=2 and --array=50-151%72, it will only submit up to 36 jobs at a time because it will set aside 1 CPU from each node, and only one CPU will be used while the other one will remain idle if the program does not communicate between the nodes.

#SBATCH --mem=4GB

  • This is the amount of memory needed for each task.

  • If I set #SBATCH --array=50-151%70, and each task needs 3GB memory (see below), it’s best to add a add buffer in case one task requires more memory.

  • If your submission script doesn’t specify anthing, it will use the default amount which I would have to look up. For some array tasks, the default might work well, but it might not work well every time. The default memory is limited, so it’s always recommended to set it higher so you don’t run out.

seff 20714872_151 for an array

seff 20714872
Job ID: 20714872
Array Job ID: 20714872_151
Cluster: discovery
User/Group: lotterhos/users
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 1
CPU Utilized: 00:13:00
CPU Efficiency: 48.51% of 00:26:48 core-walltime
Job Wall-clock time: 00:13:24
Memory Utilized: 768.23 MB
Memory Efficiency: 19.21% of 3.91 GB

This report is for the specified array task (in this case index 151 in the array)

This array task required 0.1921 x 3.91 ~ 0.75 GB memory.

This task used 2 cores (1 from each node), but the actual programs being used didn’t commuicate between the nodes. So it was not a well formatted array.

It only used half of the CPU, but the array will run each task on an individual CPU. So there is not much we can do about that.