I am using slurm scripts to run arrays for Matlab computing on a cluster. Each script uses an array to loop over a matlab parameter.
1) Is it possible to create a shell script to loop over another variable?
2) Can I pass variables to a slurm script?
For example, my slurm files currently look like
#!/bin/bash
#SBATCH --array=1-128
...
matlab -nodesktop r "frame=[${SLURM_ARRAY_TASK_ID}]; filename=['Person24']; myfunction(frame, filename);";
I frequently need to run this array to process a number of different files. This means I will submit the job (sbatch exampleScript.slurm), edit the file, update 'Person24' to 'Person25', and then resubmit the job. This is pretty inefficient when I have a large number of files to process.
Could I make a shell script that would pass a variable to the slurm script? For example, something like this:
Shell Script (myshell.sh)
#!/bin/bash
for ((FNUM=24; FNUM<=30; FNUM+=1));
do
sbatch myscript.slurm >> SOMEHOW PASS ${FNUM} HERE (?)
done
Slurm script (myscript.slurm)
#!/bin/bash
#SBATCH --array=1-128
...
matlab -nodesktop -nodisplay r "frame=[${SLURM_ARRAY_TASK_ID}]; filename=[${FNUM}]; myfunction(frame, filename);";
where I could efficiently submit all of the jobs using something like
sbatch myshell.sh
Thank you!
In order to avoid possible name collisions with shell and anvironment variables, it is a good habit to always use lowercase or mixed case variables in your Bash scripts.
You were almost there. You just need to pass the variable as an argument to the second script and then pick it up there based on the positional parameters. In this case, it looks like you're only passing one argument, so $1 is OK to use. In other cases, with multiple parameters of a fixed number you could also use $2,$3, etc. With a variable number of arguments "$#" would be more appropriate.
Shell Script (myshell.sh)
#!/bin/bash
for ((fnum=24; fnum<=30; fnum+=1))
do
sbatch myscript.slurm "$fnum"
done
Slurm script (myscript.slurm)
#!/bin/bash
#SBATCH --array=1-128
fnum=$1
...
matlab -nodesktop -nodisplay r "frame=[${slurm_array_task_ID}]; filename=[${fnum}]; myfunction(frame, filename);";
For handling various timeout conditions this might work:
A=$(sbatch --parsable a.slurm)
case $? in
9|64|130|131|137|140)
echo "some sort of timeout occurred"
B=$(sbatch --parsable --dependency=afternotok:$A a.slurm)
;;
*)
echo "some other exit condition occurred"
;;
esac
You will just need to decide what conditions you want to handle and how you want to handle them. I have listed all the ones that seem to involve timeouts.
Related
I have a C program that I want to run without having to manually type commands into. I have 4 commands (5 if you count the one to exit the program) that I want given to the program and I don't know where to start. I have seen some stuff like
./a.out <<<'name'
to pass in a single string but that doesn't quite work for me.
Other issues I have that make this more difficult are that one of the commands will give an output and that output needs to be a part of a later command. If I had access to the source code I could just brute force in some loops and counters so I am trying to get a hold of it but for now I am stuck working without it. I was thinking there was a way to do this with bash scripts but I don't know what that would be.
In simple cases, bash script is a possibility: run the executable in coproc (requires version 4). A short example:
#!/bin/bash
coproc ./parrot
echo aaa >&${COPROC[1]}
read result <&${COPROC[0]}
echo $result
echo exit >&${COPROC[1]}
with parrot (a test executable):
#!/bin/bash
while [ true ]; do
read var
if [ "$var" = "exit" ]; then exit 0; fi
echo $var
done
For a more serious scenarios, use expect.
I have a script in unix that looks like this:
#!/bin/bash
gcc -osign sign.c
./sign < /usr/share/dict/words | sort | squash > out
Whenever I try to run this script it gives me an error saying that squash is not a valid command. squash is a shell script stored in the same directory as this script and looks like this:
#!/bin/bash
awk -f squash.awk
I have execute permissions set correctly but for some reason it doesn't run. Is there something else I have to do to make it able to run like shown? I am rather new to scripting so any help would be greatly appreciated!
As mentioned in #Biffen's comment, unless . is in your $PATH variable, you need to specify ./squash for the same reason you need to specify ./sign.
When parsing a bare word on the command line, bash checks all the directories listed in $PATH to see if said word is an executable file living inside any of them. Unless . is in $PATH, bash won't find squash.
To avoid this problem, you can tell bash not to go looking for squash by giving bash the complete path to it, namely ./squash.
I'm not sure if this has been answered, I've looked and haven't found anything that looks like what I'm trying to do. I also posted this to stackexchange (https://unix.stackexchange.com/questions/189293/create-array-in-bash-with-variables-as-array-name)
I have a number of shell scripts that are capable of running against a ksh or bash shell, and they make use of arrays. I created a function named "setArray" that interrogates the running shell and determines what builtin to use to create the array - for ksh, set -A, for bash, typeset -a. However, I'm having some issues with the bash portion.
The function takes two arguments, the name of the array and the value to add. This then becomes ${ARRAY_NAME} and ${VARIABLE_VALUE}. Doing the following:
set -A $(eval echo \${ARRAY_NAME}) $(eval echo \${${ARRAY_NAME}[*]}) "${VARIABLE_VALUE}"
works perfectly in ksh. However,
typeset -a $(eval echo \${ARRAY_NAME})=( $(eval echo \${${ARRAY_NAME}[*]}) "${VARIABLE_VALUE}" )
does not. This provides
bash: syntax error near unexpected token '('
I know I can just make it a list of strings (e.g. MYARRAY="one two three") and just loop through it using the IFS, but I don't want to lose the ability to use an array either.
Any thoughts ?
Given the assertion that the ksh portion of this function is working only the bash portion needs to be created. For which the following should work and, I believe, be safe and robust (though evidence to the contrary is welcome).
eval $ARRAY_NAME+=\(\"\$VARIABLE_VALUE\"\)
First expansion only expands $ARRAY_NAME to get
eval array+=("$VARIABLE_VALUE")
which eval then causes to be evaluated again normally.
I currently have a R script written to perform a population genetic simulation, then write a table with my results to a text file. I would like to somehow run multiple instances of this script in parallel using an array job (my University's cluster uses SGE), and when its all done I will have generated results files corresponding to each job (Results_1.txt, Results_2.txt, etc.).
Spent the better part of the afternoon reading and trying to figure out how to do this, but haven't really found anything along the lines of what I am trying to do. I was wondering if someone could provide and example or perhaps point me in the direction of something I could read to help with this.
To boil down mithrado's answer to the bare essentials:
Create job script, pop_gen.bash, that may or may not take SGE task id argument as input, storing results in specific file identified by same SGE task id:
#!/bin/bash
Rscript pop_gen.R ${SGE_TASK_ID} > Results_${SGE_TASK_ID}.txt
Submit this script as a job array, e.g. 1000 jobs:
qsub -t 1-1000 pop_gen.bash
Grid Engine will execute pop_gen.bash 1000 times, each time setting SGE_TASK_ID to value ranging from 1-1000.
Additionally, as mentioned above, via passing SGE_TASK_ID as command line variable to pop_gen.R you can use SGE_TASK_ID to write to output file:
args <- commandArgs(trailingOnly = TRUE)
out.file <- paste("Results_", args[1], ".txt", sep="")
# d <- "some data frame"
write.table(d, file=out.file)
HTH
I am not used to do this in R, but I've been using the same approach in python. Imagine that you have an script genetic_simulation.r and it has 3 parameter:
--gene_id --khmer_len and --output_file.
You will have one csv file, genetic_sim_parms.csv with n rows:
first_gene,10,/result/first_gene.txt
...
nth_gene,6,/result/nth_gene.txt
A import detail is the first lane of your genetic_simulation.r. It needs to tell which executable the cluster is going to will use. You might need to tweak its parameters as well, depending on your setup, it will look like to:
#!/path/to/Rscript --vanilla
And finally, you will need a array-job bash script:
#!/bin/bash
#$ -t 1:N < change to number of rows in genetic_sim_parms.csv
#$ -N genetic_simulation.r
echo "Starting on : $(date)"
echo "Running on node : $(hostname)"
echo "Current directory : $(pwd)"
echo "Current job ID : $JOB_ID"
echo "Current job name : $JOB_NAME"
echo "Task index number : $SGE_TASK_ID"
ID=$(awk -F, -v "line=$SGE_TASK_ID" 'NR==line {print $1}' genetic_sim_parms.csv)
LEN=$(awk -F, -v "line=$SGE_TASK_ID" 'NR==line {print $2}' genetic_sim_parms.csv)
OUTPUT=$(awk -F, -v "line=$SGE_TASK_ID" 'NR==line {print $3}' genetic_sim_parms.csv)
echo "id is: $ID"
rscript genetic_simulation.r --gene_id $ID --khmer_len $LEN --output_file $OUTPUT
echo "Finished on : $(date)"
Hope this helps!
We know that exporting arrays in a canonical way is impossible, but I am interested in finding a workaround. I have a following scenario: a list of variables is loaded from a file to an array during startup, and I need to have that array visible to several bash scripts that may or may not be executed in the parent environment (. example.sh or just example.sh). I tried many things, but something like this seems as most promising:
export j=1
export array$j=something
And then I tried to access the value using:
echo ${array[$j]} #doesn't work in child script
echo $(echo \$array$j) #displays the actual '$array1' instead of 'something'
Any suggestions?
You can look up values through indirection using parameter expansion:
j=1
array1=something
name="array$j"
echo "The value of $name is ${!name}"