How can I read/store an array in bash in parallel? - arrays

I have already read posts like How can I store the “find” command results as an array in Bash or Creating an array from a text file in Bash or Store output of command into the array
Now my issue is the following: How to do this in parallel?
Background:
I have a script for processing a large git repository with a lot of submodules and perform certain actions within these. Sometimes there are some tasks that take a while so meanwhile I want to give some user feedback to indicate that something is still happening and the code isn't just stuck ^^
I have a function
function ShowSpinner()
{
pid=$!
while [ -d /proc/$pid ]
do
for x in '-' '/' '|' '\\'
do
echo -ne ${x}" \r"
sleep 0.1
done
done
}
for displaying a little spinner while doing long tasks. And so far currently I use this e.g. like
while IFS= read -r line
do
# Some further processing of the output lines here
done <<< $(git pull 2>&1) & ShowSpinner
which works fine and always displays the spinner until the task is finished.
In particular I use this also for finding submodules in a git repository like
function FindSubmodules()
{
# find all .git FILES and write the result to the temporary file .submodules
find -name ".git" -type f > .submodules & ShowSpinner
# read in the temporary file
SUBMODULES=$(cat .submodules)
# and delete the temporary file
rm .submodules
}
later I iterate the submodules using e.g.
function DoSomethingWith()
{
for submodule in ${SUBMODULES}
do
echo $submodule
done
}
FindSubmodules
DoSomethingWith
Of course I do more stuff in there, this is only a short example.
This works find, but what I don't like here is that this file .submodules is created (and if only temporary). I would prefer to directly store the result in an array and then iterate that one directly.
So after reading mentioned posts I tried to use something like simply
IFS=$'\n'
SUBMODULES=( $(find -name ".git" -type f)) & ShowSpinner
or from the links also
readarray SUBMODULES < <(find -name ".git" -type f) & ShowSpinner
or
readarray -t SUBMODULES "$(find -name ".git" -type f)" & ShowSpinner
and then iterate like
for submodule in ${SUBMODULES [#]}
do
echo $submodule
done
For all three options the result is basically the same: The spinner works fine but all that I get using this is one single entry with the last char of the ShowSpinner instead of the results of find. Without the & ShowSpinner it works fine but of course doesn't show any feedback of a long tasks.
What am I doing wrong? How can I get the readarray to work in parallel with the ShowSpinner function?
Update as suggested I have put it to a function (actually I already had functions just didn't put the spinner behind the entire function so far)
function FindSubmodules()
{
echo ""
echo ${BOLD}"Scanning for Submodules ... "${NORMAL}
SUBMODULES=($(find -name ".git" -type f))
for submodule in "${SUBMODULES[#]}"
do
echo $submodule
done
}
function CheckAllReposForChanges()
{
# Check Submodules first
for submodule in "${SUBMODULES[#]}"
do
# remove prefixed '.'
local removedPrefix=${submodule#.}
# remove suffix '.git'
local removedSuffix=${removedPrefix%.git}
echo "${BASEPATH}${removedSuffix}"
done
# Check the main repo itself
echo "${BASEPATH}"
echo ""
}
FindSubmodules & ShowSpinner
CheckAllReposForChanges
the CheckRepoForChanges function itself works just fine.
What I get now is the spinner and then the correct output from the first FindSubmodules like e.g.
./SomeFolder/.git
./SomeOtherFolder/.git
./SomeThirdFolder/.git
etc
However when it comes to the CheckAllReposForChanges (again the echo is just an example for debugging) I don't get any output except the main repository path. It seems like now SUBMODULES is empty since it is being filled in the background. It worked with the solution I used originally.

Of course the array is empty; you have backgrounded the function which will eventually populate it (and anyway, there is no way really for the background process to populate a variable in its parent once it finishes).
Run both functions in the same background process and they will be able to communicate properly.
{ FindSubmodules
CheckAllReposForChanges
} & ShowSpinner

Maybe I'm misreading the question but it seems (to me) the requirement is to pass data 'up' from a backgrounded/child process to the calling/parent process.
A backgrounded script/function call spawns a new, asynchronous OS-level process; there is no easy way to pass data 'up' from the child process to the parent process.
While it may be possible to build some sort of inter-process shared memory structure to share data between parent and child processes, it's a bit easier if we can use some sort of intermediate storage (eg, fifo, file, database table, queuing system, etc) that the various processes can 'share'.
One idea:
parent process creates one or more temp directories (eg, one for each distinct array to be populated)
each child process writes data to a file (filename = ${BASHPID}) in a particular temp directory in a format that can be easily parsed (and loaded into an array) by the parent
the parent calls the child process, waits for the child process to complete, and then ...
the parent process reads the contents of all files in the temporary directory(s) and loads the appropriate array(s)
For sake of an example I'll assume we just need to populate a single array; I'm also going to use the same temp directory for capturing/storing modules for the functions regardless of whether each function is run in the background or foreground:
unset submodules # delete any variable with this name
submodules=() # init array
outdir=$(mktemp -d) # create temp directory
FindSubmodules()
{
... snip ...
echo "$submodule" >> "${outdir}/${BASHPID}" # write module to temp file
... snip ...
}
CheckAllReposForChanges()
{
... snip ...
echo "$submodule" >> "${outdir}/${BASHPID}" # write module to temp file
... snip ...
}
FindSubmodules & ShowSpinner
CheckAllReposForChanges
# now pull modules from temp file(s) into array; NOTE: assumes each temp file contains a single module name on each line, and no blank lines, otherwise OP can add some logic to address a different format
while read -r modname
do
submodules+=(${modname})
done < <(cat "${outdir}"/[0-9]*)
# remove temp directory and file(s)
'rm' -rf ${outdir}

If you can write your parallelism using GNU Parallel, you can use parset:
dostuff() {
# Do real stuff here
sleep 10;
}
export -f dostuff
function ShowSpinner()
{
while [ -d /proc/$pid ]
do
for x in '-' '/' '|' '\\'
do
echo -ne ${x}" \r"
sleep 0.1
done
done
}
sleep 1000000 &
pid=$!
ShowSpinner &
parset myout dostuff < <(find -name ".git" -type f)
kill $pid
echo
Or (if you are willing to change ShowSpinner:
dostuff() {
# Do real stuff here
sleep 10;
}
export -f dostuff
function ShowSpinner()
{
while true; do
for x in '-' '/' '|' '\\'
do
echo -ne ${x}" \r"
sleep 0.1
done
done
}
ShowSpinner &
pid=$!
parset myout dostuff < <(find -name ".git" -type f)
kill $pid
echo

Related

Create file with same name as directory without typing the filename possibly with brace expansion or some other trick

Let's say I have a directory called Navigation and inside that I want to make a file called Navigation.jsx.
Instead of doing touch Navigation/Navigation.jsx I'm trying to figure out if there is a trick to not have to type Navigation twice, such as brace expansion.
I tried stuff like touch Navigation/{,.jsx} and touch Navigation/{/,.jsx} but rather than removing the slash it only produces a file called .jsx.
When doing this many times for multiple components it gets really monotonous and I'd love a streamlined way of doing it. Hey, maybe I'm thinking about this all wrong and there's a different flow I should use to create folders and files.
Here's what I did. Questions and comments welcome.
#!/usr/bin/env bash
if [ $# -gt 0 ]; then
for arg in "$#"; do
rfc="const $arg = () => {
return (
<div>
$arg
</div>
)
}
export default $arg
"
mkdir "$arg"/
touch "$arg"/"$arg".jsx
echo "$rfc" >>"$arg"/"$arg".jsx
echo "Component $arg created at $arg/$arg.jsx"
done
else
echo "Enter component names as arguments"
fi
If you put the directory names in a file, you could use a read while loop.
#!/bin/sh -x
find . -type d | sed '1d' | sed s'#^..##g' > stack
while read n
do
mkdir "${n}"
touch "${n}"/"${n}"
done < stack

Loop thru a filename list and iterate thru a variable/array removing all strings from filenames with bash

I have a list of strings that I have in a variable and would like to remove those strings from a list of filenames. I'm pulling that string from a file that I can add to and modify over time. Some of the strings in the variable may include part of the item needed to be removed while the other may be another line in the list. Thats why I need to loop thru the entire variable list.
I'm familiar using a while loop to loop thru a list but not sure how I can loop thru each line to remove all strings from that filename.
Here's an example:
getstringstoremove=$(cat /text/from/some/file.txt)
echo "$getstringstoremove"
# Or the above can be an array
getstringstoremove=$(cat /text/from/some/file.txt)
declare -a arr=($getstringstoremove)
the above 2 should return the following lines
-SOMe.fil
(Ena)M-3_1
.So[Me].filEna)M-3_2
SOMe.fil(Ena)M-3_3
Here's the loop I was running to grab all filenames from a directory and remove anything other than the filenames
ls -l "/files/in/a/folder/" | awk -v N=9 '{sep=""; for (i=N; i<=NF; i++) {printf("%s%s",sep,$i); sep=OFS}; printf("\n")}' | while read line; do
echo "$line"
returns the following result after each loop
# 1st loop
ilikecoffee1-SOMe.fil(Ena)M-3_1.jpg
# iterate thru $getstringstoremove to remove all strings from the above file.
# 2nd loop
ilikecoffee2.So[Me].filEna)M-3_2.jpg
# iterate thru $getstringstoremove again
# 3rd loop
ilikecoffee3SOMe.fil(Ena)M-3_3.jpg
# iterate thru $getstringstoremove and again
done
the final desired output would be the following
ilikecoffee1.jpg
ilikecoffee2.jpg
ilikecoffee3.jpg
I'm running this in bash on Mac.
I hope this makes sense as I'm stuck and can use some help.
If someone has a better way of doing this by all means it doesn't have to be the way I have it listed above.
You can get the new filenames with this awk one-liner:
$ awk 'NR==FNR{a[$0];next} {for(i in a){n=index($0,i);if(n){$0=substr($0,0,n-1)substr($0,n+length(i))}}} 1' rem.txt files.lst
This assumes your exclusion strings are in rem.txt and there's a files list in files.lst.
Spaced out for easier commenting:
NR==FNR { # suck the first file into the indices of an array,
a[$0]
next
}
{
for (i in a) { # for each file we step through the array,
n=index($0,i) # search for an occurrence of this string,
if (n) { # and if found,
$0=substr($0,0,n-1)substr($0,n+length(i))
# rewrite the line with the string missing,
}
}
}
1 # and finally, print the line.
If you stow the above script in a file, say foo.awk, you could run it as:
$ awk -f foo.awk rem.txt files.lst
to see the resultant files.
Note that this just shows you how to build new filenames. If what you want is to do this for each file in a directory, it's best to avoid running your renames directly from awk, and use shell constructs designed for handling files, like a for loop:
for f in path/to/*.jpg; do
mv -v "$f" "$(awk -f foo.awk rem.txt - <<<"$f")"
done
This should be pretty obvious except perhaps for the awk options, which are:
-f foo.awk, use the awk script from this filename,
rem.txt, your list of removal strings,
-, a hyphen indicating that standard input should be used IN ADDITION to rem.txt, and
<<<"$f", a "here-string" to provide that input to awk.
Note that this awk script will work with both gawk and the non-GNU awk that is included in macos.
I think I have understood what you mean, and I would do it with Perl which comes built-in to the standard macOS - so nothing to install.
I assume you have a file called remove.txt with your list of stuff to remove, and that you want to run the script on all files in your current directory. If so, the script would be:
#!/usr/local/bin/perl -w
use strict;
# Load the strings to remove into array "strings"
my #strings = `cat remove.txt`;
for(my $i=0;$i<$#strings;$i++){
# Strip carriage returns and quote metacharacters - e.g. *()[]
chomp($strings[$i]);
$strings[$i] = quotemeta($strings[$i]);
}
# Iterate over all filenames
my #files = glob('*');
foreach my $file (#files){
my $new = $file;
# Iterate over replacements
foreach my $string (#strings){
$new =~ s/$string//;
}
# Check if name would change
if($new ne $file){
if( -f $new){
printf("Cowardly refusing to rename %s as %s since it involves overwriting\n",$file,$new);
} else {
printf("Rename %s as %s\n",$file,$new);
# rename $file,$new;
}
}
}
Then save that in your HOME directory as renamer. Make it executable - only necessary once - with this command in Terminal:
chmod +x $HOME/renamer
Then you can go in any directory where you madly named files are and run the script like this:
cd path/to/mad/files
$HOME/renamer
As with all things you download off the Internet, make a backup first and just run on a small, copied, subset of your files till you get the idea of how it works.
If you use homebrew as your package manager, you could install rename using:
brew install rename
You could then take all the Perl from my other answer and condense it down to a couple of lines and embed it in a rename command which would give you the added benefit of being able to do dry-runs etc. The code below does exactly the same as my other answer but is somewhat harder to read for non_perl folk.
Your command would simply be:
rename --dry-run '
my #strings = map { s/\r|\n//g; $_=quotemeta($_) } `cat remove.txt`;
foreach my $string (#strings){ s/$string//; } ' *
Sample Output
'ilikecoffee(Ena)M-3_1' would be renamed to 'ilikecoffee'
'ilikecoffee-SOMe.fil' would be renamed to 'ilikecoffee'
'ilikecoffee.So[Me].filEna)M-3_2' would be renamed to 'ilikecoffee'
To try and understand it, remember:
the rename part applies the following Perl to each file because of the asterisk at the end
the #strings part reads all the strings from the file remove.txt and removes any carriage returns and linefeeds from them and quotes any metacharacters
the foreach applies each of the deletions to the current filename which rename stores in $_ for you
Note that this method trades simplicity for performance somewhat. If you have millions of files to do, the other method will be quicker because here I read the remove.txt file for each and every file whose name is checked, but if you only have a few hundred/thousand files, I doubt you'll notice it.
This should be much the same, just shorter:
rename --dry-run '
my #strings = `cat remove.txt`; chomp #strings;
foreach my $string (#strings){ s/\Q$string\E//; } ' *

Bash parameter expansion, indirect reference, and backgrounding

After struggling with this issue for several hours and searching here and failing to come up with a matching solution, it's time to ask:
In bash (4.3) I'm attempting to do a combination of the following:
Create an array
For loop through the values of the array with a command that isn't super fast (curl to a web server to get a value), so we background each loop to parallelize everything to speed it up.
Set the names of the values in the array to variables assigned to values redirected to it from a command via "read"
Background each loop and get their PID into a regular array, and associate each PID with the related array value in an associative array so I have key=value pairs of array value name to PID
Use "wait" to wait for each PID to exit 0 or throw an error telling us which value name(s) in the array failed to exit with 0 by referencing the associative array
I need to be able export all of the VAR names in the original array and their now-associated values (from the curl command results) because I'm sourcing this script from another bash script that will use the resulting exported VARs/values.
The reason I'm using "read" instead of just "export" with "export var=$(command)" or similar, is because when I background and get the PID to use "wait" with in the next for loop, I actually (incorrectly) get the PID of the "export" command which always exits 0, so I don't detect an error. When I use read with the redirect to set the value of the VAR (from name in the array) and background, it actually gets the PID of the command and I catch any errors in the next loop with the "wait" command.
So, basically, this mostly appears to work, except I realized the "read" command doesn't actually appear to be substituting the variable to the array name value properly in a way that the redirected command sends its output to that name in order to set the substituted VAR name to a value. Or, maybe the command is just entirely wrong so I'm not correctly redirecting the result of my command to a VAR name I'm attempting to set.
For what it's worth, when I run the curl | python command by hand (to pull the value and then parse the JSON output) it is definitely succeeding, so I know that's working, I just can't get the redirect to send the resulting output to the VAR name.
Here's a example of what I'm trying to do:
In parent script:
# Source the child script that has the functions I need
source functions.sh
# Create the array
VALUES=(
VALUE_A
VALUE_B
VALUE_C
)
# Call the function sourced from the script above, which will use the above defined array
function_getvalues
In child (sourced) script:
function_getvalues()
{
curl_pids=( )
declare -A value_pids
for value in "${VALUES[#]}"; do
read ${value} < <(curl -f -s -X GET http://path/to/json/value | python3 -c "import sys, json; print(json.load(sys.stdin)['data']['value'])") & curl_pids+=( $! ) value_pids+=([$!]=${value})
done
for pid in "${curl_pids[#]}"; do
wait "$pid" && echo "Successfully retrieved value ${value_pids[$pid]} from Webserver." || { echo "Something went wrong retrieving value ${value_pids[$pid]}, so we couldn't get the output data needed from Webserver. Exiting." ; exit 1 ; }
done
}
The problem is that read, when run in the background, isn't connected to a standard in.[details] Consider this simplified, working example with comment how to cripple it:
VALUES=( VALUE_A VALUE_B )
for value in "${VALUES[#]}"; do
read ${value} < <(echo ${RANDOM}) # add "&" and it stops working
done
echo "VALUE_A=${VALUE_A}"
echo "VALUE_B=${VALUE_B}"
You might be able to do this with coproc, or using read -u with automatic file descriptor allocation, but really this is a job for temporary files:
tmpdir=$(mktemp -d)
VALUES=( VALUE_A VALUE_B )
for value in "${VALUES[#]}"; do
(sleep 1; echo ${RANDOM} > "${tmpdir}"/"${value}") &
done
for value in "${VALUES[#]}"; do
wait_file "${tmpdir}"/"${value}" && {
read -r ${value} < "${tmpdir}"/"${value}";
}
done
echo "VALUE_A=${VALUE_A}"
echo "VALUE_B=${VALUE_B}"
rm -r "${tmpdir}"
This example uses wait_file helper, but you might use inotifywait if you don't mind some dependencies on OS.

Bash Array Script Exclude Duplicates

So I have written a bash script (named music.sh) for a Raspberry Pi to perform the following functions:
When executed, look into one single directory (Music folder) and select a random folder to look into. (Note: none of these folders here have subdirectories)
Once a folder within "Music" has been selected, then play all mp3 files IN ORDER until the last mp3 file has been reached
At this point, the script would go back to the folders in the "Music" directory and select another random folder
Then it would again play all mp3 files in that folder in order
Loop indefinitely until input from user
I have this code which does all of the above EXCEPT for the following items:
I would like to NOT play any other "album" that has been played before
Once all albums played once, then shutdown the system
Here is my code so far that is working (WITH duplicates allowed):
#!/bin/bash
folderarray=($(ls -d /home/alphekka/Music/*/))
for i in "${folderarray[#]}";
do
folderitems=(${folderarray[RANDOM % ${#folderarray[#]}]})
for j in "${folderitems[#]}";
do
echo `ls $j`
cvlc --play-and-exit "${j[#]}"
done
done
exit 0
Please note that there isn't a single folder or file that has a space in the name. If there is a space, then I face some issues with this code working.
Anyways, I'm getting close, but I'm not quite there with the entire functionality I'm looking for. Any help would be greatly appreciated! Thank you kindly! :)
Use an associative array as a set. Note that this will work for all valid folder and file names.
#!/bin/bash
declare -A folderarray
# Each folder name is a key mapped to an empty string
for d in /home/alphekka/Music/*/; do
folderarray["$d"]=
done
while [[ "${!folderarray[*]}" ]]; do
# Get a list of the remaining folder names
foldernames=( "${!folderarray[#]}" )
# Pick a folder at random
folder=${foldernames[RANDOM%${#foldernames[#]}]}
# Remove the folder from the set
# Must use single quotes; see below
unset folderarray['$folder']
for j in "$folder"/*; do
cvlc --play-and-exit "$j"
done
done
Dealing with keys that contain spaces (and possibly other special characters) is tricky. The quotes shown in the call to unset above are not syntactic quotes in the usual sense. They do not prevent $folder from being expanded, but they do appear to be used by unset itself to quote the resulting string.
Here's another solution: randomize the list of directories first, save the result in an array and then play (my script just prints) the files from each element of the array
MUSIC=/home/alphekka/Music
OLDIFS=$IFS
IFS=$'\n'
folderarray=($(ls -d $MUSIC/*/|while read line; do echo $RANDOM $line; done| sort -n | cut -f2- -d' '))
for folder in ${folderarray[*]};
do
printf "Folder: %s\n" $folder
fileArray=($(find $folder -type f))
for j in ${fileArray[#]};
do
printf "play %s\n" $j
done
done
For the random shuffling I used this answer.
One liner solution with mpv, rl (randomlines), xargs, find:
find /home/alphekka/Music/ -maxdepth 1 -type d -print0 | rl -d \0 | xargs -0 -l1 mpv

Append to an array variable from a pipeline command

I am writing a bash function to get all git repositories, but I have met a problem when I want to store all the git repository pathnames to the array patharray. Here is the code:
gitrepo() {
local opt
declare -a patharray
locate -b '\.git' | \
while read pathname
do
pathname="$(dirname ${pathname})"
if [[ "${pathname}" != *.* ]]; then
# Note: how to add an element to an existing Bash Array
patharray=("${patharray[#]}" '\n' "${pathname}")
# echo -e ${patharray[#]}
fi
done
echo -e ${patharray[#]}
}
I want to save all the repository paths to the patharray array, but I can't get it outside the pipeline which is comprised of locate and while command.
But I can get the array in the pipeline command, the commented command # echo -e ${patharray[#]} works well if uncommented, so how can I solve the problem?
And I have tried the export command, however it seems that it can't pass the patharray to the pipeline.
Bash runs all commands of a pipeline in separate SubShells. When a subshell containing a while loop ends, all changes you made to the patharray variable are lost.
You can simply group the while loop and the echo statement together so they are both contained within the same subshell:
gitrepo() {
local pathname dir
local -a patharray
locate -b '\.git' | { # the grouping begins here
while read pathname; do
pathname=$(dirname "$pathname")
if [[ "$pathname" != *.* ]]; then
patharray+=( "$pathname" ) # add the element to the array
fi
done
printf "%s\n" "${patharray[#]}" # all those quotes are needed
} # the grouping ends here
}
Alternately, you can structure your code to not need a pipe: use ProcessSubstitution
( Also see the Bash manual for details - man bash | less +/Process\ Substitution):
gitrepo() {
local pathname dir
local -a patharray
while read pathname; do
pathname=$(dirname "$pathname")
if [[ "$pathname" != *.* ]]; then
patharray+=( "$pathname" ) # add the element to the array
fi
done < <(locate -b '\.git')
printf "%s\n" "${patharray[#]}" # all those quotes are needed
}
First of all, appending to an array variable is better done with array[${#array[*]}]="value" or array+=("value1" "value2" "etc") unless you wish to transform the entire array (which you don't).
Now, since pipeline commands are run in subprocesses, changes made to a variable inside a pipeline command will not propagate to outside it. There are a few options to get around this (most are listed in Greg's BashFAQ/024):
pass the result through stdout instead
the simplest; you'll need to do that anyway to get the value from the function (although there are ways to return a proper variable)
any special characters in paths can be handled reliably by using \0 as a separator (see Capturing output of find . -print0 into a bash array for reading \0-separated lists)
locate -b0 '\.git' | while read -r -d '' pathname; do dirname -z "$pathname"; done
or simply
locate -b0 '\.git' | xargs -0 dirname -z
avoid running the loop in a subprocess
avoid pipeline at all
temporary file/FIFO (bad: requires manual cleanup, accessible to others)
temporary variable (mediocre: unnecessary memory overhead)
process substitution (a special, syntax-supported case of FIFO, doesn't require manual cleanup; code adapted from Greg's BashFAQ/020):
i=0 #`unset i` will error on `i' usage if the `nounset` option is set
while IFS= read -r -d $'\0' file; do
patharray[i++]="$(dirname "$file")" # or however you want to process each file
done < <(locate -b0 '\.git')
use the lastpipe option (new in Bash 4.2) - doesn't run the last command of a pipeline in a subprocess (mediocre: has global effect)

Resources