Randomly generating invoice IDs - moving text database into script file? - database

I've come up with the following bash script to randomly generate invoice numbers, preventing duplications by logging all generated numbers to a text file "database".
To my surprise the script actually works, and it seems robust (although I'd be glad to have any flaws pointed out to me at this early stage rather than later on).
What I'm now wondering is whether it's at all possible to move the "database" of generated numbers into the script file itself. This would allow me to rely on and keep track of just the one file rather than two separate ones.
Is this at all possible, and if so, how? If it isn't a good idea, what valid reasons are there not to do so?
#!/usr/bin/env bash
generate_num() {
#num=$(head /dev/urandom | tr -dc '[:digit:]' | cut -c 1-5) [Original method, no longer used]
num=$(shuf -i 10000-99999 -n 1)
}
read -p "Are you sure you want to generate a new invoice ID? [Y/n] " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]
then
generate_num && echo Generating a random invoice ID and checking it against the database...
sleep 2
while grep -xq "$num" "ID_database"
do
echo Invoice ID \#$num already exists in the database...
sleep 2
generate_num && echo Generating new random invoice ID and checking against database...
sleep 2
done
while [[ ${#num} -gt 5 ]]
do
echo Invoice ID \#$num is more than 5 digits...
sleep 2
generate_num && echo Generating new random invoice ID and checking against database...
sleep 2
done
echo Generated random invoice ID \#$num
sleep 1
echo Invoice ID \#$num does not exist in database...
sleep 2
echo $num >> "ID_database" && echo Successfully added Invoice ID \#$num to the database.
else
echo "Exiting..."
fi

I do not recommend this because:
These things are fragile. One bad edit and your invoice database is corrupt.
It makes version control a pain. Each new version of the script should preferably be checked in. You could add logic to make sure that "$mydir" is an empty directory when you run the script (except for "$myname", .git and other git-related files) then run git -C "$mydir" init if "$mydir"/.git doesn't exist. Then for each database update, git -C "$mydir" add "$myname" and git -C "$mydir" commit -m "$num". It's just an idea to explore...
Locking - It's possible to do file locking to make sure that not two users run the script at the same time, but it adds to the complexity so I didn't bother. If you feel that's a risk, you need to add that.
... but you want a self-modifying script, so here goes.
This just adds a new invoice number to its internal database for each time you run it. I've explained what goes on as comments. The last line should read __INVOICES__ (+ a newline) if you copy the script.
As always when dealing with things like this, remember to make a backup before making changes :-)
As it's currently written, you can only add one invoice per run. It shouldn't be hard to move things around (you need a new tempfile) to get it to add more than one if you need that.
#!/bin/bash
set -e # exit on error - imporant for this type of script
#------------------------------------------------------------------------------
myname="$0"
mydir=$(dirname "$myname")
if [[ ! -w $myname ]]; then
echo "ERROR: You don't have permission to update $myname" >&2
exit 1
fi
# create a tempfile to be able to update the database in the file later
#
# set -e makes the script end if this fails:
temp=$(mktemp -p "$mydir")
trap "{ rm -f "$temp"; }" EXIT # remove the tempfile if we die for some reason
# read current database from the file
readarray -t ID_database <<< $(sed '0,/^__INVOICES__$/d' "$0")
#declare -p ID_database >&2 # debug
#------------------------------------------------------------------------------
# a function to check if a number is already in the db
is_it_taken() {
local num=$1
# return 1 (true, yes it's taken) if the regex found a match
[[ ! " ${ID_database[#]} " =~ " ${num} " ]]
}
generate_num() {
local num
(exit 1) # set $? to 1
# loop until $? becomes 0
while (( $? )); do
num=$(shuf -i 10000-99999 -n 1)
is_it_taken "$num"
done
# we found a free number
echo $num
}
add_to_db() {
local num=$1
# add to db in memory
ID_database+=($num)
# add to db in file:
# copy the script to the tempfile
cp -pf "$myname" "$temp"
# add the new number
echo $num >> "$temp"
# move the tempfile into place
mv "$temp" "$myname"
}
#------------------------------------------------------------------------------
num=$(generate_num)
add_to_db $num
# your business logic goes here:
echo "All current invoices:"
for invoice in ${ID_database[#]}
do
echo ">$invoice<"
done
#------------------------------------------------------------------------------
# leave the rest untouched:
exit
__INVOICES__

Edited
To answer the question you asked -
Make sure your file ends with an explicit exit statement.
Without some sort of branching it won't execute past that, so unless there is a gross parsing error anything below could be used as storage space. Just
echo $num >> $0
If you write your records directly onto the bottom of the script, the script grows, but ...relatively harmlessly. Just make sure your grep pattern doesn't grab any lines of code, though grep -E '^\d[%]$' seems pretty safe.
This is only ever going to give you a max of ~90k id's, and spends unneeded time and cycles on redundancy checking. Is there a limit on the length of the value?
If you can assure there won't be more than one invoice processed per second,
date +%s >> "ID_database" # the UNIX epoch, seconds since 00:00:00 01/01/1970
If you need more precision that that,
date +%Y%m%d%H%M%S%N
will output Year month day hour minute second nanoseconds, which is both immediate and "pretty safe".
date +%s%N # epoch with nanoseconds
is shorter, but doesn't have the convenient side effect of automatically giving you the date and time of invoice creation.
If you absolutely need to guarantee uniqueness and nanoseconds isn't good enough, use a lock of some sort, and maybe a more fine-grained language.
On the other hand, if minutes are unique enough, you could use
date +%y%m%d%H%M
You get the idea.

Related

Bash manipulate and sort file content with arrays via loop

Purpose
Create a bash script which loops through certain commands and save the outputs of each command (they print only numbers) into a file (I guess the best way is to save them in a file?) with their dates (unix time) next to each output so we can use these stored values next time we run the script and it looped through again, see if there isn't any change in the outputs of commands within the last hour.
Example output
# ./script
command1 123123
command2 123123
Important notes
There are around 200 commands which the script will loop through.
There'll be new commands in the future so the script will have to check if this command exists in the saved file. If it already present, only compare it within the last hour to see if the number has changed since you last saved the file. If it doesn't exists, save it into the file so we can use it to compare next time.
Order of the commands which the script will run might be different as the commands increase/decrease/change. So if it's only like this for now;
# ./script
command1 123123
command2 123123
and you add a 3rd command in the future, the order might change (it is also not certain what kind of pattern it's following), for example;
# ./script
command1 123123
command3 123123
command2 123123
so we can't, for example, read it line by line and in this case, I believe the best way is to compare them with the command* names.
Structure for stored values
My presumed structure for stored values is like this (don't have to stick with this one tho);
command1 123123 unixtime
command2 123123 unixtime
About the said commands
The things I called commands are basically applications which are running on /usr/local/bin/ an can be accessed by directly running their names on the shell, like command1 getnumber and it will print you the number.
Since the commands are located in the /usr/local/bin/ and following a similar pattern, I'm first looping through the /usr/local/bin/ for command*. See below.
commands=`find /usr/local/bin/ -name 'command*'`
for i in $commands; do
echo "$i" "`$i getnumber`"
done
so this will loop through all files that starts with command and run command* getnumber for each one, which will print out the numbers we need.
Now we need to store these values in a file to compare them next time we run the command.
Catch:
We may even run the script every few minutes but we only need to report if the values (numbers) hasn't changed in the last hour.
The script will list the numbers every time you run it and we may add a styling to those who aren't changed in the last hour to pop them out for the eyes, maybe like adding a red color to them?
Attempt number #1
So this is my first attempt building this script. Here's what it looks like;
#!/bin/bash
commands=`find /usr/local/bin/ -name 'command*'`
date=`date +%s`
while read -r command number unixtime; do
for i in $commands; do
current_block_count=`$i getnumber`
if [[ $command = $i ]]; then
echo "$i exists in the file, checking the number changes within last hour" # just for debugging, will be removed in production
if (( ($date-$unixtime)/60000 > 60 )); then
if (( $number >= $current_number_count )); then
echo "There isn't a change within the last hour, this is a problem!" # just for debugging, will be removed in production
echo -e "$i" "`$i getnumber`" "/" "$number" "\e[31m< No change within last hour."
else
echo "$i" "`$i getnumber`"
echo "There's a change within the last hour, we're good." # just for debugging, will be removed in production
# find the line number of $i so we can change it with the new output
line_number=`grep -Fn '$i' outputs.log`
new_output=`$i getnumber`
sed -i "$line_numbers/.*/$new_output/" outputs.log
fi
else
echo "$i" "`$i getnumber`"
# find the line number of $i so we can change it with the new output
line_number=`grep -Fn '$i' outputs.log`
output_check="$i getnumber; date +%s"
new_output=`eval ${output_check}`
sed -i "$line_numbers/.*/$new_output/" outputs.log
fi
else
echo "$i does not exists in the file, adding it now" # just for debugging, will be removed in production
echo "$i" "`$i getnumber`" "`date +%s`" >> outputs.log
fi
done
done < outputs.log
Which was a quite the disaster and eventually, it did nothing when I've run it.
Attempt number #2
This time, I've tried another approach nesting for loop outside of the while loop.
#!/bin/bash
commands=`find /usr/local/bin/ -name 'command*'`
date=`date +%s`
for i in $commands; do
echo "${i}" "`$i getnumber`"
name=${i}
number=`$i getnumber`
unixtime=$date
echo "$name" "$number" "$unixtime" # just for debugging, will be removed in production
while read -r command number unixtime; do
if ! [ -z ${name+x} ]; then
echo "$name" "$number" "$unix" >> outputs.log
else
if [[ $name = $i ]]; then
if (( ($date-$unixtime)/60000 > 60 )); then
if (( $number >= $current_number_count )); then
echo "There isn't a change within the last hour, this is a problem!" # just for debugging, will be removed in production
echo -e "$i" "`$i getnumber`" "/" "$number" "\e[31m< No change within last hour."
else
echo "$i" "`$i getnumber`"
echo "There's a change within the last hour, we're good." # just for debugging, will be removed in production
# find the line number of $i so we can change it with the new output
line_number=`grep -Fn '$i' outputs.log`
new_output=`$i getnumber`
sed -i "$line_numbers/.*/$new_output/" outputs.log
fi
else
echo "$i" "`$i getnumber`"
# find the line number of $i so we can change it with the new output
line_number=`grep -Fn '$i' outputs.log`
output_check="$i getnumber; date +%s"
new_output=`eval ${output_check}`
sed -i "$line_numbers/.*/$new_output/" outputs.log
fi
else
echo "$i does not exists in the file, adding it now" # just for debugging, will be removed in production
echo "$i" "`$i getnumber`" "`date +%s`" >> outputs.log
fi
fi
done < outputs.log
done
Unfortunately, no luck for me, again.
Can someone give me a helping hand?
Additional notes #2
So basically, you run the script first time, outputs.log is empty, so you write the outputs of commands into outputs.log.
And it's been 10 minutes passed, you run the script again, since it's only 10 minutes passed and not more than an hour, the script won't check if the numbers have changed or not. It will not manipulate the stored values but also display us the outputs of command every time you run it. (Their present outputs and not from the stored values)
In this 10 minutes timeframe, for example, there might have been new commands added so it will check if the commands' outputs are stored every time you run the script, just to deal with new commands.
Now it's been, let's say 1.2 hours passed, you decided to run the script again, this time the script will check if the numbers hasn't changed after more than an hour and report us saying that Hey! It's been more than an hour passed and those numbers still haven't changed, there might be problem!
Simple explanation
You have 100 commands to run, your script will loop through each of them and do the followings for each;
Run the script whenever you want
On each run, check if outputs.log contains the command
If outputs.log contains the commands of each loop, check the last stored date ($unixtime) of each of them.
If last stored date is more than an hour, check the numbers between the current run and the stored value
If the numbers haven't changed for more than an hour, run the command in red text color.
If the numbers have changed, run the command as usual without any warning.
If last stored date is less than an hour, run the command as usual.
If outputs.log doesn't contain the command, simply store them in the file so it can be used for next runs to check.
The following uses a sqlite database to store results, instead of a flat file, which makes querying the history of previous runs easy:
#!/bin/sh
database=tracker.db
if [ ! -e "$database" ]; then
sqlite3 -batch "$database" <<EOF
CREATE TABLE IF NOT EXISTS outputs(command TEXT
, output INTEGER
, ts INTEGER NOT NULL DEFAULT (strftime('%s', 'now')));
CREATE INDEX IF NOT EXISTS outputs_idx ON outputs(command, ts);
EOF
fi
for cmd in /usr/local/bin/command*; do
f=$(basename "$cmd")
o=$("$cmd")
echo "$f $o"
sqlite3 -batch "$database" <<EOF
INSERT INTO outputs(command, output) VALUES ('$f', $o);
SELECT command || ' has unchanged output!'
FROM outputs
WHERE command = '$f' AND ts >= strftime('%s', 'now', '-1 hour')
GROUP BY command
HAVING count(DISTINCT output) = 1 AND count(*) > 1;
EOF
done
It lists commands that have had every run in the last hour produce the same output (and skips commands that have only run once). If instead you're interested in cases where the most recent output of each command is the same as the previous run in that hour timeframe, replace the sqlite3 invocation in the loop with:
sqlite3 -batch $database <<EOF
INSERT INTO outputs(command, output) VALUES ('$f', $o);
WITH prevs AS
(SELECT command
, output
, row_number() OVER w AS rn
, lead(output, 1) OVER w AS prev
FROM outputs
WHERE command = '$f' AND ts >= strftime('%s', 'now', '-1 hour')
WINDOW w AS (ORDER BY ts DESC))
SELECT command || ' has unchanged output!'
FROM prevs
WHERE output = prev AND rn = 1;
EOF
(This requires the sqlite3 shell from release 3.25 or newer because it uses features introduced then.)

qsub array job submission

I am currently trying to run an array job on the "big-computer" at my Uni.
I'm new to Unix and bash and I've been having a hard time getting this to work.
The folder set up is as follow:
model1
- model1.inp
- model1.num
model2
- model2.inp
- model2.num
startup.sh
runAModel.sh
modelArray.sh
Due to restrictions on how long I can run a single job, I was asked to break up my simulations. So I need to run each model 5 times over, each time the model reads the input file .inp and outputs another input file for the subsequent run.
The code below used to work until a week a go or so but it doesn't seem to function anymore. I wonder if I didn't mess something up in there.
I suspected it might be in the line qcmd="qsub -N $modelName -t 1:5 ../../modelArray.sh" of runAModel.sh and that I should replace 1:5 to 1-5 but that didn't seem to work.
I use qstat to see my job and where I would expect to see a list of 5 queued jobs I only see one.
I was given three files to run:
startup.sh :
find . -mindepth 2 -type d -exec ./runAModel.sh {} \;
runAModel.sh :
#!/bin/bash
echo starting model in $1
cd $1 # go into the model directory
modelName=$(basename $PWD)
for f in *
do
dos2unix $f
done
qcmd="qsub -N $modelName -t 1:5 ../../modelArray.sh"
qq=`$qcmd` # runs a qsub command
# extract the job number
qt=`echo $qq | awk '{print $3}'`
jobid=${qt%%.*}
qrls $jobid.1
and modelArray.sh :
#!/bin/bash
# run program, invoke in model directory with input files.
# we want to run in the current working directory
#$ -cwd
# we want to run mpi with 4 cores on he same node:
#$ -pe sharedmem 4
# make a generous guess at the time we need
#$ -l h_rt=30:00:00
# force reservation
#$ -R y
# use 4G per process
#$ -l h_vmem=4G
# hold the array
#$ -h
echo I am task $SGE_TASK_ID in $JOB_ID with $SGE_TASK_LAST tasks in total
echo on $HOSTNAME
date
# run our model - set modules, then get the model name
echo "set modules"
. /etc/profile.d/modules.sh
PROGRAMBUILD=/exports/programlocation
. $PROGRAMBUILD/loadModules.sh
modelName=$(basename $PWD)
echo mpirun -np 4 $PROGRAMBUILD/bin/program $modelName
mpirun -np 4 $PROGRAMBUILD/bin/program $modelName
if [ $SGE_TASK_ID == $SGE_TASK_LAST ]
then
echo I am last task
else
# release the next task....
# next task in this array:
next=$((SGE_TASK_ID+1))
echo insert a test that this task in the array job was successful
echo if so, release next task
echo releasing $next
ssh login01.***.uk qrls $JOB_ID.$next
if [[ "$?" -ne 0 ]]; then
echo failed to qrls $pid
fi
fi

How do I use a shell script to look through directories and alter files?

I've crafted a script that does 90% of what I'm looking to do. It goes into a directory (based on the entered date) and it changes the files I feed into the array. However, I want to alter this script to also contain an array of dates (which are the directory names). It will cycle through the directories, when it finds one of the files from the file name array, it corrects it and moves on until all the files have been corrected. I've tried a few different versions of this, but I am not sure how to implement a second array to continue looking through directories after a file has been corrected.
Currently, my script looks like this:
debug=false
## *****Put file name in quotes******
declare -a arr=("UF19905217" "UG19905218" )
##Put date in DDMMYYYY format for the date the message was original processed.
DATE="25082015"
## now loop through the above array
for i in "${arr[#]}"
do
#if "$debug; then
echo "Fix file named: Inbound_$i.msg"
MSG="Inbound_$i.msg"
#fi
if [ ! -d "$MSG" ]; then
# Enter what you would like changed here. You can copy and paste this command for multiple changes
#DATATYPE
printf "%s\n" ',s/<DataType>EDI<\/DataType>/<DataType>830<\/DataType>/g' wq | ed -s /data1/Inbound/$DATE/$MSG
echo "Complete"
else
echo "Message not found or errored!"
fi
done
I appreciate any help you can provide. Thank you.
I believe you just want to enclose the loop you have in a loop that iterates over the desired directories:
debug=false
## *****Put file name in quotes******
declare -a arr=("UF19905217" "UG19905218" )
##Put date in DDMMYYYY format for the date the message was original processed.
dates=( 25082015 26082015 )
for DATE in "${dates[#]}"; do
for i in "${arr[#]}"; do
MSG="Inbound_$i.msg"
if $debug; then
echo "Fix file named: $MSG"
fi
if [ ! -d "$MSG" ]; then
printf "%s\n" ',s/<DataType>EDI<\/DataType>/<DataType>830<\/DataType>/g' wq | ed -s /data1/Inbound/$DATE/$MSG
echo "Complete"
else
echo "Message not found or errored!"
fi
done
done

SGE array jobs and R

I currently have a R script written to perform a population genetic simulation, then write a table with my results to a text file. I would like to somehow run multiple instances of this script in parallel using an array job (my University's cluster uses SGE), and when its all done I will have generated results files corresponding to each job (Results_1.txt, Results_2.txt, etc.).
Spent the better part of the afternoon reading and trying to figure out how to do this, but haven't really found anything along the lines of what I am trying to do. I was wondering if someone could provide and example or perhaps point me in the direction of something I could read to help with this.
To boil down mithrado's answer to the bare essentials:
Create job script, pop_gen.bash, that may or may not take SGE task id argument as input, storing results in specific file identified by same SGE task id:
#!/bin/bash
Rscript pop_gen.R ${SGE_TASK_ID} > Results_${SGE_TASK_ID}.txt
Submit this script as a job array, e.g. 1000 jobs:
qsub -t 1-1000 pop_gen.bash
Grid Engine will execute pop_gen.bash 1000 times, each time setting SGE_TASK_ID to value ranging from 1-1000.
Additionally, as mentioned above, via passing SGE_TASK_ID as command line variable to pop_gen.R you can use SGE_TASK_ID to write to output file:
args <- commandArgs(trailingOnly = TRUE)
out.file <- paste("Results_", args[1], ".txt", sep="")
# d <- "some data frame"
write.table(d, file=out.file)
HTH
I am not used to do this in R, but I've been using the same approach in python. Imagine that you have an script genetic_simulation.r and it has 3 parameter:
--gene_id --khmer_len and --output_file.
You will have one csv file, genetic_sim_parms.csv with n rows:
first_gene,10,/result/first_gene.txt
...
nth_gene,6,/result/nth_gene.txt
A import detail is the first lane of your genetic_simulation.r. It needs to tell which executable the cluster is going to will use. You might need to tweak its parameters as well, depending on your setup, it will look like to:
#!/path/to/Rscript --vanilla
And finally, you will need a array-job bash script:
#!/bin/bash
#$ -t 1:N < change to number of rows in genetic_sim_parms.csv
#$ -N genetic_simulation.r
echo "Starting on : $(date)"
echo "Running on node : $(hostname)"
echo "Current directory : $(pwd)"
echo "Current job ID : $JOB_ID"
echo "Current job name : $JOB_NAME"
echo "Task index number : $SGE_TASK_ID"
ID=$(awk -F, -v "line=$SGE_TASK_ID" 'NR==line {print $1}' genetic_sim_parms.csv)
LEN=$(awk -F, -v "line=$SGE_TASK_ID" 'NR==line {print $2}' genetic_sim_parms.csv)
OUTPUT=$(awk -F, -v "line=$SGE_TASK_ID" 'NR==line {print $3}' genetic_sim_parms.csv)
echo "id is: $ID"
rscript genetic_simulation.r --gene_id $ID --khmer_len $LEN --output_file $OUTPUT
echo "Finished on : $(date)"
Hope this helps!

Bash script for deleting multiple users based upon date and UID

I am working on a bash script that deletes users based upon two things the date it was created and the user ID. If the date is before the date given in the terminal and the User ID is greater than 1000 the user should be deleted from the system. I have some code written out and it has given me alot of issues because the file that takes in the information with the username, date created, and ID needs to be split or "cut". Is there a better way to go about this then they way I did with breaking the pieces into an array?
#!/bin/bash
if [ $# -eq 0 ] ; then
echo $0 "<date(year/month/day) filename>"
exit 1
fi
dateForDeletion=$1
listOfUsers=$2
for i in `cat $listOfUsers` | while IFS='/t' read username date; do
${array[0]}="$username"
${array[1]}="$date"
uid=`id -u ${array[0]}`
if [$dateForDeletion < $date] && [uid > 1000] ; then
`sudo userdel $username`
fi
done
You really should start smaller and build up, testing each statement individually before adding more. Here are some issues with your code:
You're using a strange amalgamation of a for and while loop.
IFS="/t" appears to try to set IFS to a tab. This should be IFS=$'\t'
${array[0]}="$username" is not a valid assignment. You should use array[0]="$username", though I'm not sure why you're assigning it to an array in the first place.
[$dateForDeletion < $date] is not a valid condition/command. It should be [[ $dateForDeletion < $date ]] (assuming the dates are yyyy-mm-dd format or something that can be compared as strings).
[uid > 1000] is not a valid condition/command. It should be [ "$uid" -gt 1000 ]
`sudo userdel $username` should not have backticks around it.
Here's how your loop should look:
while IFS=$'\t' read -r username date
do
uid=$(id -u "$username")
if [[ $dateForDeletion < $date && $uid -gt 1000 ]]
then
sudo userdel "$username"
fi
done < "$listOfUsers"

Resources