Simultaneously running multiple jobs on same node using slurm - ubuntu-18.04

Most of our jobs are either (relatively) low on CPU and high on memory (data processing) or low on memory and high on CPU (simulations). The server we have is generally big enough (256GB Mem; 16 cores) to accommodate multiple jobs running at the same time and we would like use slurm to schedule these jobs. However, testing on a small (4 CPU) amazon server, I am unable to get this working. I would have to use SelectType=select/cons_res and SelectTypeParameters=CR_CPU_Memory as far as I know. However, when starting multiple jobs using a single CPU these are started sequentially and not in parallel.
My slurm.conf
ControlMachine=ip-172-31-37-52
MpiDefault=none
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobAcctGatherType=jobacct_gather/none
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
# COMPUTE NODES
NodeName=ip-172-31-37-52 CPUs=4 RealMemory=7860 CoresPerSocket=2 ThreadsPerCore=2 State=UNKNOWN
PartitionName=test Nodes=ip-172-31-37-52 Default=YES MaxTime=INFINITE State=UP
job.sh
#!/bin/bash
sleep 30
env
Output when running jobs:
ubuntu#ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 2
ubuntu#ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 3
ubuntu#ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 4
ubuntu#ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 5
ubuntu#ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 6
ubuntu#ip-172-31-37-52:~$ sbatch -n1 -N1 job.sh
Submitted batch job 7
ubuntu#ip-172-31-37-52:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3 test job.sh ubuntu PD 0:00 1 (Resources)
4 test job.sh ubuntu PD 0:00 1 (Priority)
5 test job.sh ubuntu PD 0:00 1 (Priority)
6 test job.sh ubuntu PD 0:00 1 (Priority)
7 test job.sh ubuntu PD 0:00 1 (Priority)
2 test job.sh ubuntu R 0:03 1 ip-172-31-37-52
The jobs are run sequentially, while in principle it should be possible to run 4 jobs in parallel.

You do not specify memory in your submission files. Also, you do not specify a default value for memory (DefMemPerNode, or DefMemPerCPU ). In that case, Slurm allocates the full memory to the jobs, it is therefore not able to allocate multiple jobs on one node.
Try specifying the memory :
sbatch -n1 -N1 --mem-per-cpu=1G job.sh
You can check the resources consumed on a node with scontrol show node (Look for the AllocTRES value).

Related

Job distribution between nodes on HPC, instead of 1 CPU cores

I am using PBS, HPC to submit serially written C codes. I have to run suppose 5 codes in 5 different directories. when I select 1 node and 5 cores select=1:ncpus=5, and submits it with ./submit &. It forks and runs all the 5 jobs. The moment I choose 5 node and 1 cores select=5:ncpus=1, and submits it with ./submit &. Only 1 core of the first node runs all five jobs and rest 4 threads are free, speed decreased to 1/5.
My question is, Is it possible to fork the job between the nodes as well?
because when I select on HPC select=1:ncpus=24 it gets to Que instead select=4:ncpus=6 runs.
Thanks.
You should consider using job arrays (using option #PBS -t 1-5) with I node and 1 cpu each. Then 5 independent jobs will start and your job will wait less in the queue.
Within your script you can use environment variable PBS_ARRAYID to identify the task and use it to set appropriate directory and start the appropriate C code. Something like this:
#!/bin/bash -l
#PBS -N yourjobname
#PBS -q yourqueue
#PBS -l nodes=1:ppn=1
#PBS -t 1-5
./myprog-${PBS_ARRAYID}.c
This script will run 5 jobs and each of them will run programs with a name myprog-*.c where * is a number between 1 and 5.

SLURM: run jobs in parallel instead of as array?

I have a large file to analyze using "jellyfish query", which is not multithreaded. I have split the big file into 29 manageable fragments, to run as an array on SLURM. However, these are sitting in the workload queue for ages, whereas if I could request a whole node (32 cpus) they would get in a separate queue with quicker availability. Is there a way to tell SLURM to run the command on these fragments in parallel across all the cpus in a node, instead of as a serial array?
You could ask for 29 tasks, 1 cpu per task (you will get from 29 cpus on a node to 1 cpu in 29 different nodes), and in the slurm script you should start your calculus with srun, telling srun to allocate one task/cpu per chunk.
.
.
.
#SBATCH --ntasks=29
#SBATCH --cpus-per-task=1
.
.
.
for n in {1..29}
do
srun -n 1 <your_script> $n &
done
wait
I suggest running a python script to multithread this for you, then submit a SLURM job to run the python script.
from multiprocessing import Pool
import subprocess
num_threads = 29
def sample_function(input_file):
return subprocess.run(["cat", input_file], check=True).stdout
input_file_list = ['one','two','three']
pool = Pool(processes=num_threads)
[pool.apply_async(sample_function, args=(input_file,)) for input_file in input_file_list]
pool.close()
pool.join()
This assumes you have files "one", "two", and "three". Obviously you need to replace:
the input file list
job you want to run with subprocess
Thanks for the suggestions! I found a much less elegant but still functional way:
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
jellyfish query...fragment 1 &
jellyfish query...fragment 2 &
...
jellyfish query...fragment 29
wait

How to run multiple Gatling simulations in a sequence (sequence will be provides by us)

When I run Gatling from my command prompt I get a list of simulations like this:
Choose a simulation number: 1,2,3,4
When I type 3 third simulation will run but this sequence is auto-generated.Suppose I want to list them according to my wish like:
3,2,1,4
Is it possible to give user defined sequence for simulations list.If yes how it is possible?
As far as I know there is no possibility in Gatling to provide sequence of simulations. You can achieve this by writing for example bash script. For running Gatling tests in mvn it could look like this
#!/bin/bash
#params
SIMULATION_CLASSES=
#usage
function usage (){
echo "usage: $0 options"
echo "This script run Gatling load tests"
echo ""
echo "OPTIONS:"
echo "Run options:"
echo " -s [*] Simulation classes (comma separated)"
}
#INIT PARAMS
while getopts “s:” OPTION
do
case $OPTION in
s) SIMULATION_CLASSES=$OPTARG;;
?) usage
exit 1;;
esac
done
#checks
if [[ -z $SIMULATION_CLASSES ]]; then
usage
exit 1
fi
#run scenarios
SIMULATION_CLASSES_ARRAY=($(echo $SIMULATION_CLASSES | tr "," "\n"))
for SIMULATION_CLASS in "${SIMULATION_CLASSES_ARRAY[#]}"
do
echo "Run scenario for $SIMULATION_CLASS"
mvn gatling:execute -Dgatling.simulationClass=$SIMULATION_CLASS
done
And sample usage
./campaign.sh -s package.ScenarioClass1,package.ScenarioClass2
If you use the Gatling SBT Plugin (demo project here), you can do, in Bash:
sbt "gatling:testOnly sims.ReadProd02Simulation" "gatling:testOnly sims.ReadProd02Simulation
This first runs only the sceenario ReadProd02Simulation, and then runs ReadProd03Simulation. No Bash script needed.
The output will be first the output from ReadProd02Simulation and then ReadProd03Simulation, like so:
08:01:57 46 ~/dev/ed/gatling-sbt-plugin-demo[master*]$ sbt "gatling:testOnly sims.ReadProd02Simulation" "gatling:testOnly sims.ReadProd02Simulation"
[info] Loading project definition from /home/.../gatling-sbt-plugin-demo/project
[info] Set current project to gatling-sbt-plugin-demo...
Simulation sims.ReadProd02Simulation started...
...
Simulation sims.ReadProd02Simulation completed in 16 seconds
Parsing log file(s)...
Parsing log file(s) done
Generating reports...
======================================================================
- Global Information ----------------------------------------------
> request count 3 (OK=3 KO=0 )
...
...
Reports generated in 0s.
Please open the following file: /home/.../gatling-sbt-plugin-demo/target/gatling/readprod02simulation-1491631335723/index.html
[info] Simulation ReadProd02Simulation successful.
[info] Simulation(s) execution ended.
[success] Total time: 19 s, completed Apr 8, 2017 8:02:33 AM
08:02:36.911 [INFO ] i.g.h.a.HttpEngine - Start warm up
08:02:37.240 [INFO ] i.g.h.a.HttpEngine - Warm up done
Simulation sims.ReadProd03Simulation started...
...
Simulation sims.ReadProd03Simulation completed in 4 seconds
Parsing log file(s)...
Parsing log file(s) done
Generating reports...
======================================================================
---- Global Information ----------------------------------------------
> request count 3 (OK=3 KO=0 )
......
Reports generated in 0s.
Please open the following file: /home/.../gatling-sbt-plugin-demo/target/gatling/readprod03simulation-1491631356198/index.html
[info] Simulation ReadProd03Simulation successful.
[info] Simulation(s) execution ended.
[success] Total time: 9 s, completed Apr 8, 2017 8:02:42 AM
That is, first it runs one sim, then another, and concatenates all output.
But how do you make use of this? Well, you could use Bash and grep the output for exactly two lines matching failed 0 ( 0%) (if you run two simulations) + check the total request counts for both simulations, also via Bash + grep etc.

How to run an array job within a pipeline of several holded jobs when the number of subjobs in the array depends on the result of a previous job

I am trying to write a bash script that sends several jobs to the cluster (SGE scheduler), and that each of them waits for the previous to end, such as:
HOLD_ID=$(qsub JOB1.sh | cut -c 10-16)
HOLD_ID=$(qsub -hold_jid $HOLD_ID JOB2.sh | cut -c 10-16)
HOLD_ID=$(qsub -hold_jid $HOLD_ID JOB3.sh | cut -c 10-16)
This works perfectly, however, now I want to add to this pipeline a holded array job, such as:
qsub -hold_jid $HOLD_ID -t 1-$NB_OF_SUBJOBS JOB4.sh
But here the number of sub-jobs ($NB_OF_SUBJOBS) I will have depends on the result of JOB2.sh.
I want this to be a fast, master script that just send all the jobs. I would not like to have a while + sleep or something like that, which was my first attempt. The job on which depends the number I need (JOB2.sh) is relatively long in time. As the last line is evaluated when submited, any variable or file with the number of sub-jobs created by the previous JOB2.sh will not work. Any ideas?
Many thanks,
David
So, if I understand, the submission of job 4 is predicated on obtaining information from the completion of job 2. If this is the case, it is clear that you will need to submit job 4 after job 2 completes, which is separate from submitting job 4 and having execution hold on completion of job 2.
Why not use the -sync -y option on job 2 to have the submission of job 4 only occur after job 2 completes:
qsub -hold_jid $HOLD_ID JOB2.sh -sync y
Make sure to have job 2 output n_subjobs variable to somewhere like a file (n_subjobs.txt example below), or you can parse output into variable as you have done for job id. Then read this information when submitting job 4:
qsub -t 1-$(cat n_subjobs.txt) JOB4.sh

how to run a c program using multiple cores (i7 machine) in shell scripting

I have a C program fextract which takes a wavfile as input and gives output in some fcc format.the syntax goes like this 'fextract file.wav file.fcc'. Now I have 75000 wav files which needs to be converted into fcc format. to speed up the procedure I am planing to use all the cores of my i7 machine. First I have saved all the input and output paths in a file which i call it as scp file
eg: /mnt/disk1/file1.wav /mnt/disk2/file1.fcc
/mnt/disk1/file2.wav /mnt/disk2/file2.fcc
and so on
now using the following shell scripting i have devided the scp files into 8 files and stored in a temp directory
mkdir $tmpDir
cd $tmpDir
nCores=`cat /proc/cpuinfo | grep processor | wc -l`
nLines=`cat $scpFile|wc -l`
split -l $((nLines/nCores + 1)) $scpFile
now my temp file has eight subfiles. How can i assess them to run the program fextract using multiple cores
for i in `ls`
do
fextract &i
done
need something of these kind. Please help me solve this its urgent
Use GNU Parallel:
parallel -j $nCores fextract -- `ls`
Or you could use xargs with -P key (useful with find).
Those commands will launch your code in multiple threads, which will allow them to be executed on multiple cores.
Using GNU Parallel:
cat filenames | parallel fextract {} {.}.fcc
As some time is spent for disk I/O it may be faster to run a little more than 1 per cpu core:
cat filenames | parallel -j150% {} {.}.fcc
If you just want all files in current dir:
parallel -j150% {} {.}.fcc ::: *.wav
If you want to give both input and output filename on a single line separated by space you can use:
cat filenames_2_per_line | parallel --colsep ' ' -j150% {1} {2}
If the filenames are not on the same line but after each other, then you need to read 2 lines at a time:
cat filenames_interleaved | parallel -N2 -j150% {1} {2}
Watch the intro videos to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Resources