SLURM/sbatch: pass array ID as input argument of an executable - arrays

I'm new to SLURM and I'm trying to do something very natural: I have a compiled C program, off.exe, which takes one variable as input, and I want to run it several times in parallel, each with a different value of the input parameter.
I thought I could use the %a array iterator as input:
#!/bin/bash
#SBATCH --partition=regular1,regular2
#SBATCH --time=12:00:00 # walltime
#SBATCH --ntasks=1 # number of processor cores (i.e. tasks)
#SBATCH --mem-per-cpu=512M # memory per CPU core
#SBATCH --job-name="ISM" # job name
#SBATCH --array=1-60 # job array. The item identifier is %a
#SBATCH --output=Polarization_%a_v0.4.txt # output file. %A is the job ID
srun ./off.exe %a
but it's not working (it's as if the input parameter were always zero!).
Can someone help me please?

$a, %j etc. are replacement symbols for filenames, for example in the names of the output and error files recorded by Slurm. For your job arrays, you need to use one of Slurm's output environment variables, probably $SLURM_ARRAY_TASK_ID. You can find the full list in the manpage for sbatch.

Related

slurm - use array and limit the number of jobs running at the same time until they finish

Let's suppose I have the following bash script (bash.sh) to be run on a HPC using slurm:
#!/bin/bash
#SBATCH --job-name test
#SBATCH --ntasks 4
#SBATCH --time 00-05:00
#SBATCH --output out
#SBATCH --error err
#SBATCH --array=0-24
readarray -t VARS < file.txt
VAR=${VARS[$SLURM_ARRAY_TASK_ID]}
export VAR
bash my_script.sh
This script will run 25 times the my_script.sh script changing variables taken in the file.txt file. In other words, 25 jobs will be launched all together, if I submit bash.sh with the command sbatch bash.sh.
Is there a way I can limit the number of jobs to be ran at the same time (e.g. 5) until all 25 will be completed?
And if there is a way in doing so, how can I do the same but with having 24 jobs in total (i.e. not a number divisible by 5)?
Thanks
Extract from Slurm's sbatch documentation:
-a, --array=<indexes>
... A maximum number of simultaneously running tasks from the job array may be specified using a "%" separator. For example "--array=0-15%4" will limit the number of simultaneously running tasks from this job array to 4. ...
This should limit the number of running jobs to 5 in your array:
#SBATCH --array=0-24%5

start a SBATCH array with big number?

Is it possible to start a SBATCH job array, i.e. #SBATCH ––array=1-5, with a big number, e.g. #SBATCH ––array=12-25 ?
You can start at a value larger than 1, but the values must remain below the MaxArraySize value configure in slurm.conf.
Otherwise, you will get an error:
$ scontrol show config | grep -i array
MaxArraySize = 1001
$ sbatch --array 1000-1005 --wrap hostname
sbatch: error: Batch job submission failed: Invalid job array specification
Should that be the case, you can use a Bash array to hold the values and then use SLURM_ARRAY_TASK_ID as index into that array:
...
#SBATCH --array=0-5
...
VALUES=({1000..1005})
THISJOBVALUE=${VALUES[$SLURM_TASK_ARRAY_ID]}
...
Yes. Have you tried already? See man sbatch under -a, --array.

Having a job depend on an array job in SLURM

I have two job scripts to submit to SLURM, jobA.sh and jobB.sh. jobA is a array job and I want jobB to only start once all of jobA has been completed. My script for jobA.sh is:
#!/bin/bash
#SBATCH -A TRIGWMS
#SBATCH --mail-type=FAIL
# cores per task
#SBATCH -c 11
#
#SBATCH --array=%#combo#%%100
#SBATCH -J %#profile#%_%#freq#%
#
# number of nodes
#SBATCH -N 1
#
#SBATCH -t 0-2:00:00
# Standard output is saved in this file
#SBATCH -o myjob_%A_%a.out
#
# Standard error messages are saved in this file
#SBATCH -e myjob_%A_%a.err
#
# set the $OMP_NUM_THREADS variable
export OMP_NUM_THREADS=12
./myjobA_$SLURM_ARRAY_TASK_ID
This job script runs fine, but I cannot seem to get jobB to run after it has. jobB has the following script:
#!/bin/bash
#SBATCH -A TRIGWMS
#SBATCH --mail-type=FAIL
# cores per task
#SBATCH -c 11
#
# number of nodes
#SBATCH -N 1
#SBATCH --ntasks=1
#SBATCH -J MESA
#SBATCH -t 0-2:00:00
# Standard output is saved in this file
#SBATCH -o myjob_%A_%a.out
#
# Standard error messages are saved in this file
#SBATCH -e myjob_%A_%a.err
#
# set the $OMP_NUM_THREADS variable
ompthreads=$SLURM_JOB_CPUS_PER_NODE
export OMP_NUM_THREADS=$ompthreads
./myjobB
This script also works fine, but only if jobA is ran first. To try and submit both of these jobs, with jobB dependent on jobA, I used the following script:
#!/bin/bash
FIRST=$(sbatch -p bigmem --mail-user=$USER#something.ac.uk jobA.sh)
echo $FIRST
SECOND=$(sbatch --dependency=afterany:$FIRST jobB.sh)
echo $SECOND
exit 0
but this only submits the first and comes with the error 'sbatch: error: Unable to open file batch' (I originally had -p bigmem --mail etc in there but took it out just to check). The issue is with the --dependency part and once I remove that all of them are submitted, but I need jobB to start after jobA has finished.
You should submit your first job with the --parsable option.
FIRST=$(sbatch -p bigmem --mail-user=$USER#something.ac.uk --parsable jobA.sh)
Otherwise, the FIRST variable contains a string similar to:
Submitted batch job 123456789
So your second line looks like this after variable expansion by Bash:
SECOND=$(sbatch --dependency=afterany:Submitted batch job 123456789 jobB.sh)
So sbatch is actually trying to find a script named batch and running it with arguments job 123456789 jobB.sh. With the --parsable option, sbatch will only respond with the job id and your line should work as is.
If your cluster runs a version of Slurm that is too old, the --parsable option might not be available, in which case you can follow this advice.

SLURM sbatch job array for the same script but with different input string arguments run in parallel

My question is similar with this one, and the difference is that my different arguments are not numbers but strings.
If I have a script (myscript.R) that takes two strings as arguments: "text-a", "text-A". My shell script for sbatch would be:
#!/bin/bash
#SBATCH -n 1
#SBATCH -c 12
#SBATCH -t 120:00:00
#SBATCH --partition=main
#SBATCH --export=ALL
srun ./myscript.R "text-a" "text-A"
Now I have a few different input strings that I'd like to run with:
first <- c("text-a","text-b","text-c","text-d")
second <- c("text-A","text-B","text-C","text-D")
and I want to run myscript.R with combinations of the texts, for example:
srun ./myscript.R "text-a" "text-A"
srun ./myscript.R "text-b" "text-B"
srun ./myscript.R "text-c" "text-C"
srun ./myscript.R "text-d" "text-D"
But if I put them in the same shell script, they'll run sequentially. I only know that I can use #SBATCH -a 0-10 when the arguments are index. If I want to submit the four scripts at the same time and each of them with the exact same settings (especially each one need to be assigned -c 12), how can I do that?
Thanks!
You can store de list of argument values in an array and use the SLURM_ARRAY_TASK_ID env variable to index that array.
#!/bin/bash
#SBATCH -n 1
#SBATCH -c 12
#SBATCH -t 120:00:00
#SBATCH --partition=main
#SBATCH --export=ALL
#SBATCH --array=0-3
A=(text-{a..d}) # This is equivalent to A=(text-a text-b ... text-d)
B=(text-{A..D})
srun ./myscript.R "${A[$SLURM_ARRAY_TASK_ID]}" "${B[$SLURM_ARRAY_TASK_ID]}"
and simply submit it with sbatch.

SLURM sbatch job array for the same script but with different input arguments run in parallel

I have a problem where I need to launch the same script but with different input arguments.
Say I have a script myscript.py -p <par_Val> -i <num_trial>, where I need to consider N different par_values (between x0 and x1) and M trials for each value of par_values.
Each trial of M is such that almost reaches the time limits of the cluster where I am working on (and I don't have priviledges to change this). So in practice I need to run NxM independent jobs.
Because each batch jobs has the same node/cpu configuration, and invokes the same python script, except for changing the input parameters, in principle, in pseudo-language I should have a sbatch script that should do something like:
#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j.out
#SBATCH --error=cv_analysis_eis-%j.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
for p1 in 0.05 0.075 0.1 0.25 0.5
do
for i in {0..150..5}
do
python myscript.py -p p1 -v i
done
done
where every call of the script is itself a batch job.
Looking at the sbatch doc, the -a --array option seems promising. But in my case I need to change the input parameters for every script of the NxM that I have. How can I do this? I would like not to write NxM batch scripts and then list them in a txt file as suggested by this post. Nor the solution proposed here seems ideal, as this is the case imho of a job array. Moreover I would like to make sure that all the NxM scripts are launched at the same time, and the invoking above script is terminated right after, so that it won't clash with the time limit and my whole job will be terminated by the system and remain incomplete (whereas, since each of the NxM jobs is within such limit, if they are run together in parallel but independent, this won't happen).
The best approach is to use job arrays.
One option is to pass the parameter p1 when submitting the job script, so you will only have one script, but will have to submit it multiple times, once for each p1 value.
The code will be like this (untested):
#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j-%a.out
#SBATCH --error=cv_analysis_eis-%j-%a.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH -a 0-150:5
python myscript.py -p $1 -v $SLURM_ARRAY_TASK_ID
and you will submit it with:
sbatch my_jobscript.sh 0.05
sbatch my_jobscript.sh 0.075
...
Another approach is to define all the p1 parameters in a bash array and submit NxM jobs (untested)
#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j-%a.out
#SBATCH --error=cv_analysis_eis-%j-%a.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#Make the array NxM
#SBATCH -a 0-150
PARRAY=(0.05 0.075 0.1 0.25 0.5)
#p1 is the element of the array found with ARRAY_ID mod P_ARRAY_LENGTH
p1=${PARRAY[`expr $SLURM_ARRAY_TASK_ID % ${#PARRAY[#]}`]}
#v is the integer division of the ARRAY_ID by the lenght of
v=`expr $SLURM_ARRAY_TASK_ID / ${#PARRAY[#]}`
python myscript.py -p $p1 -v $v
If you use SLURM job arrays, you could linearise the index of your two for loops, and then do a comparison of the loop index and the array task id:
#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j.out
#SBATCH --error=cv_analysis_eis-%j.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH -a 0-154
# NxM = 5 * 31 = 154
p1_arr=(0.05 0.075 0.1 0.25 0.5)
# SLURM_ARRAY_TASK_ID=154 # comment in for testing
for ip1 in {0..4} # 5 steps
do
for i in {0..150..5} # 31 steps
do
let task_id=$i/5+31*$ip1
# printf $task_id"\n" # comment in for testing
if [ "$task_id" -eq "$SLURM_ARRAY_TASK_ID" ]
then
p1=${p1_arr[ip1]}
# printf "python myscript.py -p $p1 -v $i\n" # comment in for testing
python myscript.py -p $p1 -v $i\n
fi
done
done
This answer is pretty similar to Carles. I would thus have preferred to write it as a comment but do not have enough reputation.
According to this page, job arrays incur significant overhead:
If the running time of your program is small, say ten minutes or less, creating a job array will incur a lot of overhead and you should consider packing your jobs.
That page provides a few examples to run your kind of job, using both arrays and "packed jobs."
If you don't want/need to specify the resources for your job, here is another approach: I'm not sure if it's a usecase that was intended by Slurm, but it appears to work, and the submission script looks a little bit nicer since we don't have to linearize the indices to fit it into the job-array paradigm. Plus it works well with nested loops of arbitrary depth.
Run this directly as a shell script:
#!/bin/bash
FLAGS="--ntasks=1 --cpus-per-task=1"
for i in 1 2 3 4 5; do
for j in 1 2 3 4 5; do
for k in 1 2 3 4 5; do
sbatch $FLAGS testscript.py $i $j $k
done
done
done
where you need to make sure testscript.py points to the correct interpreter in the first line using the #! e.g.
#!/usr/bin/env python
import time
import sys
time.sleep(5)
print "This is my script"
print sys.argv[1], sys.argv[2], sys.argv[3]
Alternatively (untested), you can use the --wrap flag like this
sbatch $FLAGS --wrap="python testscript.py $i $j $k"
and you won't need the #!/usr/bin/env python line in testscript.py

Resources