Bash parameter expansion, indirect reference, and backgrounding - arrays

After struggling with this issue for several hours and searching here and failing to come up with a matching solution, it's time to ask:
In bash (4.3) I'm attempting to do a combination of the following:
Create an array
For loop through the values of the array with a command that isn't super fast (curl to a web server to get a value), so we background each loop to parallelize everything to speed it up.
Set the names of the values in the array to variables assigned to values redirected to it from a command via "read"
Background each loop and get their PID into a regular array, and associate each PID with the related array value in an associative array so I have key=value pairs of array value name to PID
Use "wait" to wait for each PID to exit 0 or throw an error telling us which value name(s) in the array failed to exit with 0 by referencing the associative array
I need to be able export all of the VAR names in the original array and their now-associated values (from the curl command results) because I'm sourcing this script from another bash script that will use the resulting exported VARs/values.
The reason I'm using "read" instead of just "export" with "export var=$(command)" or similar, is because when I background and get the PID to use "wait" with in the next for loop, I actually (incorrectly) get the PID of the "export" command which always exits 0, so I don't detect an error. When I use read with the redirect to set the value of the VAR (from name in the array) and background, it actually gets the PID of the command and I catch any errors in the next loop with the "wait" command.
So, basically, this mostly appears to work, except I realized the "read" command doesn't actually appear to be substituting the variable to the array name value properly in a way that the redirected command sends its output to that name in order to set the substituted VAR name to a value. Or, maybe the command is just entirely wrong so I'm not correctly redirecting the result of my command to a VAR name I'm attempting to set.
For what it's worth, when I run the curl | python command by hand (to pull the value and then parse the JSON output) it is definitely succeeding, so I know that's working, I just can't get the redirect to send the resulting output to the VAR name.
Here's a example of what I'm trying to do:
In parent script:
# Source the child script that has the functions I need
source functions.sh
# Create the array
VALUES=(
VALUE_A
VALUE_B
VALUE_C
)
# Call the function sourced from the script above, which will use the above defined array
function_getvalues
In child (sourced) script:
function_getvalues()
{
curl_pids=( )
declare -A value_pids
for value in "${VALUES[#]}"; do
read ${value} < <(curl -f -s -X GET http://path/to/json/value | python3 -c "import sys, json; print(json.load(sys.stdin)['data']['value'])") & curl_pids+=( $! ) value_pids+=([$!]=${value})
done
for pid in "${curl_pids[#]}"; do
wait "$pid" && echo "Successfully retrieved value ${value_pids[$pid]} from Webserver." || { echo "Something went wrong retrieving value ${value_pids[$pid]}, so we couldn't get the output data needed from Webserver. Exiting." ; exit 1 ; }
done
}

The problem is that read, when run in the background, isn't connected to a standard in.[details] Consider this simplified, working example with comment how to cripple it:
VALUES=( VALUE_A VALUE_B )
for value in "${VALUES[#]}"; do
read ${value} < <(echo ${RANDOM}) # add "&" and it stops working
done
echo "VALUE_A=${VALUE_A}"
echo "VALUE_B=${VALUE_B}"
You might be able to do this with coproc, or using read -u with automatic file descriptor allocation, but really this is a job for temporary files:
tmpdir=$(mktemp -d)
VALUES=( VALUE_A VALUE_B )
for value in "${VALUES[#]}"; do
(sleep 1; echo ${RANDOM} > "${tmpdir}"/"${value}") &
done
for value in "${VALUES[#]}"; do
wait_file "${tmpdir}"/"${value}" && {
read -r ${value} < "${tmpdir}"/"${value}";
}
done
echo "VALUE_A=${VALUE_A}"
echo "VALUE_B=${VALUE_B}"
rm -r "${tmpdir}"
This example uses wait_file helper, but you might use inotifywait if you don't mind some dependencies on OS.

Related

How can I read/store an array in bash in parallel?

I have already read posts like How can I store the “find” command results as an array in Bash or Creating an array from a text file in Bash or Store output of command into the array
Now my issue is the following: How to do this in parallel?
Background:
I have a script for processing a large git repository with a lot of submodules and perform certain actions within these. Sometimes there are some tasks that take a while so meanwhile I want to give some user feedback to indicate that something is still happening and the code isn't just stuck ^^
I have a function
function ShowSpinner()
{
pid=$!
while [ -d /proc/$pid ]
do
for x in '-' '/' '|' '\\'
do
echo -ne ${x}" \r"
sleep 0.1
done
done
}
for displaying a little spinner while doing long tasks. And so far currently I use this e.g. like
while IFS= read -r line
do
# Some further processing of the output lines here
done <<< $(git pull 2>&1) & ShowSpinner
which works fine and always displays the spinner until the task is finished.
In particular I use this also for finding submodules in a git repository like
function FindSubmodules()
{
# find all .git FILES and write the result to the temporary file .submodules
find -name ".git" -type f > .submodules & ShowSpinner
# read in the temporary file
SUBMODULES=$(cat .submodules)
# and delete the temporary file
rm .submodules
}
later I iterate the submodules using e.g.
function DoSomethingWith()
{
for submodule in ${SUBMODULES}
do
echo $submodule
done
}
FindSubmodules
DoSomethingWith
Of course I do more stuff in there, this is only a short example.
This works find, but what I don't like here is that this file .submodules is created (and if only temporary). I would prefer to directly store the result in an array and then iterate that one directly.
So after reading mentioned posts I tried to use something like simply
IFS=$'\n'
SUBMODULES=( $(find -name ".git" -type f)) & ShowSpinner
or from the links also
readarray SUBMODULES < <(find -name ".git" -type f) & ShowSpinner
or
readarray -t SUBMODULES "$(find -name ".git" -type f)" & ShowSpinner
and then iterate like
for submodule in ${SUBMODULES [#]}
do
echo $submodule
done
For all three options the result is basically the same: The spinner works fine but all that I get using this is one single entry with the last char of the ShowSpinner instead of the results of find. Without the & ShowSpinner it works fine but of course doesn't show any feedback of a long tasks.
What am I doing wrong? How can I get the readarray to work in parallel with the ShowSpinner function?
Update as suggested I have put it to a function (actually I already had functions just didn't put the spinner behind the entire function so far)
function FindSubmodules()
{
echo ""
echo ${BOLD}"Scanning for Submodules ... "${NORMAL}
SUBMODULES=($(find -name ".git" -type f))
for submodule in "${SUBMODULES[#]}"
do
echo $submodule
done
}
function CheckAllReposForChanges()
{
# Check Submodules first
for submodule in "${SUBMODULES[#]}"
do
# remove prefixed '.'
local removedPrefix=${submodule#.}
# remove suffix '.git'
local removedSuffix=${removedPrefix%.git}
echo "${BASEPATH}${removedSuffix}"
done
# Check the main repo itself
echo "${BASEPATH}"
echo ""
}
FindSubmodules & ShowSpinner
CheckAllReposForChanges
the CheckRepoForChanges function itself works just fine.
What I get now is the spinner and then the correct output from the first FindSubmodules like e.g.
./SomeFolder/.git
./SomeOtherFolder/.git
./SomeThirdFolder/.git
etc
However when it comes to the CheckAllReposForChanges (again the echo is just an example for debugging) I don't get any output except the main repository path. It seems like now SUBMODULES is empty since it is being filled in the background. It worked with the solution I used originally.
Of course the array is empty; you have backgrounded the function which will eventually populate it (and anyway, there is no way really for the background process to populate a variable in its parent once it finishes).
Run both functions in the same background process and they will be able to communicate properly.
{ FindSubmodules
CheckAllReposForChanges
} & ShowSpinner
Maybe I'm misreading the question but it seems (to me) the requirement is to pass data 'up' from a backgrounded/child process to the calling/parent process.
A backgrounded script/function call spawns a new, asynchronous OS-level process; there is no easy way to pass data 'up' from the child process to the parent process.
While it may be possible to build some sort of inter-process shared memory structure to share data between parent and child processes, it's a bit easier if we can use some sort of intermediate storage (eg, fifo, file, database table, queuing system, etc) that the various processes can 'share'.
One idea:
parent process creates one or more temp directories (eg, one for each distinct array to be populated)
each child process writes data to a file (filename = ${BASHPID}) in a particular temp directory in a format that can be easily parsed (and loaded into an array) by the parent
the parent calls the child process, waits for the child process to complete, and then ...
the parent process reads the contents of all files in the temporary directory(s) and loads the appropriate array(s)
For sake of an example I'll assume we just need to populate a single array; I'm also going to use the same temp directory for capturing/storing modules for the functions regardless of whether each function is run in the background or foreground:
unset submodules # delete any variable with this name
submodules=() # init array
outdir=$(mktemp -d) # create temp directory
FindSubmodules()
{
... snip ...
echo "$submodule" >> "${outdir}/${BASHPID}" # write module to temp file
... snip ...
}
CheckAllReposForChanges()
{
... snip ...
echo "$submodule" >> "${outdir}/${BASHPID}" # write module to temp file
... snip ...
}
FindSubmodules & ShowSpinner
CheckAllReposForChanges
# now pull modules from temp file(s) into array; NOTE: assumes each temp file contains a single module name on each line, and no blank lines, otherwise OP can add some logic to address a different format
while read -r modname
do
submodules+=(${modname})
done < <(cat "${outdir}"/[0-9]*)
# remove temp directory and file(s)
'rm' -rf ${outdir}
If you can write your parallelism using GNU Parallel, you can use parset:
dostuff() {
# Do real stuff here
sleep 10;
}
export -f dostuff
function ShowSpinner()
{
while [ -d /proc/$pid ]
do
for x in '-' '/' '|' '\\'
do
echo -ne ${x}" \r"
sleep 0.1
done
done
}
sleep 1000000 &
pid=$!
ShowSpinner &
parset myout dostuff < <(find -name ".git" -type f)
kill $pid
echo
Or (if you are willing to change ShowSpinner:
dostuff() {
# Do real stuff here
sleep 10;
}
export -f dostuff
function ShowSpinner()
{
while true; do
for x in '-' '/' '|' '\\'
do
echo -ne ${x}" \r"
sleep 0.1
done
done
}
ShowSpinner &
pid=$!
parset myout dostuff < <(find -name ".git" -type f)
kill $pid
echo

Using declare for referencing variables from an array in bash

I am trying to loop through an array of directories using a bash script so I can list directories with their timestamp, ownership etc using ls -arlt. I am reviewing bash so would like some feedback.
It works with declare -a for those indirect references but for each directory it outputs and extra directory from the /home/user.
I tried to use declare -n and declare -r for each directory and doesn't work.
#!/bin/bash
# Bash variables
acpi=/etc/acpi
apm=/etc/apm
xml=/etc/xml
array=( acpi apm xml )
# Function to display timestamp, ownership ...
displayInfo()
{
for i in "${array[#]}"; do
declare -n curArray=$i
if [[ -d ${curArray} ]]; then
declare -a _acpi=${curArray[0]} _apm=${curArray[1]} _xml=${curArray[2]}
echo "Displaying folder apci: "
cd $_acpi
ls -alrt
read -p "Press enter to continue"
echo "Displaying folder apm: "
cd $_apm
ls -alrt
read -p "Press enter to continue"
echo "Displaying folder xml: "
cd $_xml
ls -alrt
read -p "Press enter to continue"
else
echo "Displayed Failed" >&2
exit 1
fi
done
}
displayInfo
exit 0
It outputs an extra directory listing the /home/user and don't want that output.
There are a lot of complex and powerful shell features being used here, but in ways that don't fit together or make sense. I'll go over the mistakes in a minute, first let me just give how I'd do it. One thing I will use that you might not be familiar with is indirect variable references with ${!var} -- this is like using a nameref variable, but IMO it's clearer what's going on.
acpi=/etc/acpi
apm=/etc/apm
xml=/etc/xml
array=( acpi apm xml )
displayInfo()
{
for curDirectory in "${array[#]}"; do
if [[ -d ${!curDirectory} ]]; then
echo "Displaying folder $curDirectory:"
ls -alrt "${!curDirectory}"
read -p "Press enter to continue"
else
echo "Error: ${!curDirectory} does not exist or is not a directory" >&2
exit 1
fi
done
}
displayInfo
(One problem with this is that it does the "Press enter to continue" thing after each directory, rather than just between them. This can be fixed, but it's a little more work.)
Ok, now for what went wrong with the original. My main recommendation for you would be to try mentally stepping through your code to see what it's doing. It can help to put set -x before it, so the shell will print its interpretation of what it's doing as it runs, and see how it compares to what you expected. Let's do a short walkthrough of the displayInfo function:
for i in "${array[#]}"; do
This will loop over the contents of array, so on the first pass through the loop i will be set to "acpi". Good so far.
declare -n curArray=$i
This creates a nameref variable pointing to the other variable acpi -- this is similar to what I did with ${! }, and basically reasonable so far. Well, with one exception: the name suggests it's an array, but acpi is a plain variable, not an array.
if [[ -d ${curArray} ]]; then
This checks whether the contents of the acpi variable, "/etc/acpi" is the path of an existing directory (which it is). Still doing good.
declare -a _acpi=${curArray[0]} _apm=${curArray[1]} _xml=${curArray[2]}
Here's where things go completely off the rails. curArray points to the variable acpi, so ${curArray[0]} etc are equivalent to ${acpi[0]} etc. But acpi isn't an array, it's a plain variable, so ${acpi[0]} gets its value, and ${acpi[1]} and ${acpi[2]} get nothing. Furthermore, you're using declare -a (declare arrays), but you're just assigning single values to _acpi, _apm, and _xml. They're declared as arrays, but you're just using them as plain variables (basically the reverse of how you're using curArray -> acpi).
There's a deeper confusion here as well. The for loop above is iterating over "acpi", "apm", and "xml", and we're currently working on "acpi". During this pass through the loop, you should only be working on acpi, not also trying to work on apm and xml. That's the point of having a for loop there.
Ok, that's the main problem here, but let me just point out a couple of other things I'd consider bad practice:
cd $_apm
ls -alrt
Using a variable reference without double-quotes around it like this invites parsing confusion; you should almost always put double-quotes, like cd "$_apm". Also, using cd in a script is dangerous because if it fails the rest of the script will execute in the wrong place. In this case, _apm is empty, so without double-quotes it's equivalent to just cd, which moves to your home directory. This is why you're getting that result. If you used cd "$_apm" it would get an error instead... but since you don't check for that it'll go ahead and still list an irrelevant location.
It's almost always better to avoid cd and its complications entirely, and just use explicit paths, like ls -alrt "$_apm".
echo "Displayed Failed" >&2
exit 1
Do you actually want to exit the entire script if one of the directories doesn't exist? It'd make more sense to me to just return 1 (which exits just the function, not the entire script), or better yet continue (which just goes on to the next iteration of the loop -- i.e. the next directory on the list). I left the exit in my version, but I'd recommend changing it.
One more similar thing:
acpi=/etc/acpi
apm=/etc/apm
xml=/etc/xml
array=( acpi apm xml )
Is there any actual reason to use this array -> variable name -> actual directory path system (and resulting indirect expansion or nameref complications), rather than just having an array of directory paths, like this?
array=( /etc/acpi /etc/apm /etc/xml )
I left the indirection in my version above, but really if there's no reason for it I'd remove the complication.

Adding value to an associative array named after a variable

I need your help with a bash >= 4 script I'm writing.
I am retrieving some files from remote hosts to back them up.
I have a for loop that iterate through the hosts and for each one tests connection and the start a function that retrieves the various files.
My problem is that I need to know what gone wrong (and if), so I am trying to store OK or KO values in an array and parse it later.
This is the code:
...
for remote_host in $hosts ; do
short_host=$(echo "$remote_host" | grep -o '^[^.]\+')
declare -A cluster
printf "INFO: Testing connectivity to %s... " "$remote_host"
if ssh -q "$remote_host" exit ; then
printf "OK!\n"
cluster[$short_host]="Reacheable"
mkdir "$short_host"
echo "INFO: Collecting files ..."
declare -A ${short_host}
objects1="/etc/krb5.conf /etc/passwd /etc/group /etc/fstab /etc/sudoers /etc/shadow"
for obj in ${objects1} ; do
if file_retrieve "$user" "$remote_host" "$obj" ; then
-> ${short_host}=["$obj"]=OK
else
${short_host}=["$obj"]=KO
fi
done
...
So I'm using an array named cluster to list if the nodes were reacheable, and another array - named after the short name of the node - to list OK or KO for single files.
On execution, I got the following error (line 130 is the line I marked with the arrow above):
./test.sh: line 130: ubuntu01=[/etc/krb5.conf]=OK: command not found
I think this is a synthax error for sure, but I can't fix it. I tried a bunch of combinations without success.
Thanks for your help.
Since the array name is contained in a variable short_list, you need eval to make the assignment work:
${short_host}=["$obj"]=OK
Change it to:
eval ${short_host}=["$obj"]=OK
eval ${short_host}=["$obj"]=OK
Similar posts:
Single line while loop updating array

How can I split bash CLI arguments into two separate arrays for later usage?

New to StackOverflow and new to bash scripting. I have a shell script that is attempting to do the following:
cd into a directory on a remote machine. Assume I have already established a successful SSH connection.
Save the email addresses from the command line input (these could range from 1 to X number of email addresses entered) into an array called 'emails'
Save the brand IDs (integers) from the command line input (these could range from 1 to X number of brand IDs entered) into an array called 'brands'
Use nested for loops to iterate over the 'emails' and 'brands' arrays and add each email address to each brand via add.py
I am running into trouble splitting up and saving data into each array, because I do not know where the command line indices of the emails will stop, and where the indices of the brands will begin. Is there any way I can accomplish this?
command line input I expect to look as follows:
me#some-remote-machine:~$ bash script.sh person1#gmail.com person2#gmail.com person3#gmail.com ... personX#gmail.com brand1 brand2 brand3 ... brandX
The contents of script.sh look like this:
#!/bin/bash
cd some/directory
emails= ???
brands= ???
for i in $emails
do
for a in $brands
do
python test.py add --email=$i --brand_id=$a --grant=manage
done
done
Thank you in advance, and please let me know if I can clarify or provide more information.
Use a sentinel argument that cannot possibly be a valid e-mail address. For example:
$ bash script.sh person1#gmail.com person2#gmail.com '***' brand1 brand2 brand3
Then in a loop, you can read arguments until you reach the non-email; everything after that is a brand.
#!/bin/bash
cd some/directory
while [[ $1 != '***' ]]; do
emails+=("$1")
shift
done
shift # Ignore the sentinal
brands=( "$#" ) # What's left
for i in "${emails[#]}"
do
for a in "${brands[#]}"
do
python test.py add --email="$i" --brand_id="$a" --grant=manage
done
done
If you can't modify the arguments that will be passed to script.sh, then perhaps you can distinguish between an address and a brand by the presence or absence of a #:
while [[ $1 = *#* ]]; do
emails+=("$1")
shift
done
brands=("$#")
I'm assuming that the number of addresses and brands are independent. Otherwise, you can simply look at the total number of arguments $#. Say there are N of each. Then
emails=( "${#:1:$#/2}" ) # First half
brands=( "${#:$#/2+1}" ) # Second half

Shell script for checking if array is empty and restarting program if so

I want to make a shell script that keeps running to check if my two light weight web servers are still running, and restart them if one is not.
I can use the the command pgrep -f thin to get an array (?) of pids of my server called thin.
When this returned array has a count of zero I want to run a command which starts both servers:
cd [path_to_app] && bundle exec thin -C app_config.yml start
pgrep -f thin returns all the pids of the servers that are running. For example:
2354223425
I am new to shell scripting and don't know how to store the results of pgrep-f thin in an array. E.g.,
#!/bin/sh
while true
do
arr=$(pgrep -f thin) # /edited and now THIS WORKS!
#Then I want to check the length of the array and when it is empty run the above
#command, e.g.,
if [ ${#arr[#]} == 0 ]; then
cd [path_to_app] && bundle exec thin -C app_config.yml start
fi
#wait a bit before checking again
sleep 30
done
The first problem I have is that I cannot store the pgrep values in an array, and I am not sure if I can check against zero values. After that I am not sure if there are problems with the other code. I hope someone can help me!
You forgot to execute the command:
arr=($(pgrep -f thin))
[...] when it is empty
If you only check for emptyness, you can directly use the exit status of grep.
-q, --quiet, --silent
Quiet; do not write anything to standard output.
Exit immediately with zero status
if any match is found, even if an error was detected.

Resources