Bash - fastest way to do whole string matching over array elements?

Bash - fastest way to do whole string matching over array elements? - arrays

I have bash array (called tenantlist_array below) populated with elements with the following format:
{3 characters}-{3-5 characters}{3-5 digits}-{2 chars}{1-2 digits}.
Example:
abc-hac101-bb0
xyz-b2blo97250-aa99
abc-b2b9912-xy00
fff-hac101-g3
Array elements are unique. Please notice the hyphen, it is part of every array element.
I need to check if the supplied string (used in the below example as a variable tenant) produces a full match with any array element - because array elements are unique, the first match is sufficient.
I am iterating over array elements using the simple code:
tenant="$1"
for k in "${tenantlist_array[#]}"; do
result=$(grep -x -- "$tenant" <<<"$k")
if [[ $result ]]; then
break
fi
done
Please note - I need to have a full string match - if, for example, the string I am searching is hac101 it must not match any array element even if can be a substring if an array element.
In other words, only the full string abc-hac101-bb0 must produce the match with the first element. Strings abc, abc-hac, b2b, 99, - must not produce the match. That's why -x parameter is with the grep call.
Now, the above code works, but I find it quite slow. I've run it with the array having 193 elements and on an ordinary notebook it takes almost 90 seconds to iterate over the array elements:
real 1m2.541s
user 0m0.500s
sys 0m24.063s
And with the 385 elements in the array, time is following:
real 2m8.618s
user 0m0.906s
sys 0m48.094s
So my question - is there a faster way to do it?

Without running any loop you can do this using glob:
tenant="$1"
[[ $(printf '\3%s\3' "${tenantlist_array[#]}") == *$'\3'"$tenant"$'\3'* ]] &&
echo "ok" || echo "no"
In printf we place a control character \3 around each element and while comparing we make sure to place \3 before & after search key.

Thanks to #arco444, the solution is astonishingly simple:
tenant="$1"
for k in "${tenantlist_array[#]}"; do
if [[ $k = "$tenant" ]]; then
result="$k"
break
fi
done
And the seed difference for the 385 member array:
real 0m0.007s
user 0m0.000s
sys 0m0.000s
Thousand times faster.
This gives an idea of how wasteful is calling grep, which needs to be avoided, if possible.

This is an alternative way of using grep that actually uses grep at most of its power.
The code to "format" the array could be completely removed just appending a \n at the end of each uuid string when creating the array the first time.
This code would also degrade much slower with the length of the strings that are compared and with the length of the array.
tenant="$1"
formatted_array=""
for k in "${tenantlist_array[#]}"; do
formatted_array="$formatted_array $i\n"
done
result=$(echo -e "$formatted_array" | grep $tenant)

Related

Set variable from item in array before the array is looped in a bash script

I am creating an array from the output of a command and then i am looping through the array and running commands that use each item in the array.
Before i loop through the array i want to create a variable that uses one of the values in my array. I will use this value when one of the items in the array contains a specific string.
I am not sure how to pick the value i need to set the variable from my array before i then loop through the array. I currently have this which is not working for me. I have also tried looping through to get my value but the value does not follow to the next loop, i dont think its being set and i cant keep the loop open as i am looping inside my loop.
readarray -t ARRAY < <( command that gets array of 5 hostnames )
if [[ $ARRAY[#] == *"FT-01"* ]]; then
FTP="$ARRAY"
fi
for server in "${ARRAY[#]}"; do
echo "Server: ${srv}"
echo "-------------------"
if [[ $server == *"ER-01"* ]]; then
echo " FTP server is ${FTP} and this is ${server}"
fi
done
I'm pretty sure the first if statement would never work but i am at a loss to how to pick out the the value i need from the array.

Sometimes difficulty expressing an idea is a sign that you're thinking like a C programmer rather than a shell scripter. Arrays and for loops aren't the most natural idioms in shell scripts. Consider streaming and pipes instead.
Let's say the command that gets hostnames is called list-of-hostnames. If it prints one host name per line you can filter the results with grep.
FTP=$(list-of-hostnames | grep FT-01)
If you really do want to work with an array you could use printf '%s\n' to turn it into a grep-able stream.
FTP=$(printf '%s\n' "${ARRAY[#]}" | grep FT-01)

Practical use of bash array

After reading up on how to initialize arrays in Bash, and seeing some basic examples put forward in blogs, there remains some uncertainties on its practical use. An interesting example perhaps would be to sort in ascending order -- list countries from A to Z in random order, one for each letter.
But in the real world, how is a Bash array applied? What is it applied to? What is the common use case for arrays? This is one area I am hoping to be familiar with. Any champions in the use of bash arrays? Please provide your example.

There are a few cases where I like to use arrays in Bash.
When I need to store a collections of strings that may contain spaces or $IFS characters.
declare -a MYARRAY=(
"This is a sentence."
"I like turtles."
"This is a test."
)
for item in "${MYARRAY[#]}"; do
echo "$item" $(echo "$item" | wc -w) words.
done
This is a sentence. 4 words.
I like turtles. 3 words.
This is a test. 4 words.
When I want to store key/value pairs, for example, short names mapped to long descriptions.
declare -A NEWARRAY=(
["sentence"]="This is a sentence."
["turtles"]="I like turtles."
["test"]="This is a test."
)
echo ${NEWARRAY["turtles"]}
echo ${NEWARRAY["test"]}
I like turtles.
This is a test.
Even if we're just storing single "word" items or numbers, arrays make it easy to count and slice our data.
# Count items in array.
$ echo "${#MYARRAY[#]}"
3
# Show indexes of array.
$ echo "${!MYARRAY[#]}"
0 1 2
# Show indexes/keys of associative array.
$ echo "${!NEWARRAY[#]}"
turtles test sentence
# Show only the second through third elements in the array.
$ echo "${MYARRAY[#]:1:2}"
I like turtles. This is a test.
Read more about Bash arrays here. Note that only Bash 4.0+ supports every operation I've listed (associative arrays, for example), but the link shows which versions introduced what.

how to enumerate multiple arrays using same base name

I am trying to create multiple arrays holding random lists of file names referencing the number of elements in another array. How can I append a $cntr var (beginning with cntr=0) to the end of the new array names so they are directly referenced with elements in other array?
Wow I hope that reads somewhat sensible. Here is what I got going on so far that I hope helps make better sense of what I mean:
function fGenRanList() {
cntr=0
while [[ "$cntr" -lt "${#mTypeAr[#]}" ]] ; do
n="${nAr[$cntr]}" ; echo "\$n: $n"
tracks${cntr}=() ; echo "\$tracks${cntr}: $tracks${cntr}"
while ((n > 0)) && IFS= read -rd $'\0' ; do
tracks${cntr}+=("$REPLY")
((n--))
done < <(sort -zuR <(find "${dirAr[$cntr]}" -type f \( -name '*.mp3' -o -name '*.ogg' \) -print0))
((cntr++))
done
}
error I get is:
/home/user/bin/ranSong_multDirs.sh: line 95: syntax error near unexpected token `"$REPLY"'
/home/user/bin/ranSong_multDirs.sh: line 95: ` tracks${cntr}+=("$REPLY")'
But I first commentted out the echo statements from the tracks${cntr}=() array initialization to get rid of a similar error, but unsure whether or not track${cntr} gets initialized in the first place.
By the end I should end up with as many track(n) arrays as there are elements in ${#mTypeAr[#]}, using the numeric var stored in array ${nAr[$cntr]} to determine how many elements each track array will contain.
Maybe I am making things more difficult than need be, trying to implement arrays into older scripts I have both in order to make them a little more efficient, but I guess am driven primarily to get a better handle on using BASH arrays to store vars for similar but multiple processes which I seem to do often in my scripts.

Change this line, which is not valid bash syntax,
tracks${cntr}+=("$REPLY")
to
declare "tracks${cntr}+=($REPLY)"
Rather than having a syntactic assignment, the declare command takes a string that *look*s like an assignment as an argument; that argument is processed by the shell first, so if cntr is currently 3 and $REPLY is foo, the actual assignment performed is
tracks3+=(foo)
The declare command gives you a level of indirection in making parameter assignments.

How to read lines from a file into an array?

I'm trying to read in a file as an array of lines and then iterate over it with zsh. The code I've got works most of the time, except if the input file contains certain characters (such as brackets). Here's a snippet of it:
#!/bin/zsh
LIST=$(cat /path/to/some/file.txt)
SIZE=${${(f)LIST}[(I)${${(f)LIST}[-1]}]}
POS=${${(f)LIST}[(I)${${(f)LIST}[-1]}]}
while [[ $POS -le $SIZE ]] ; do
ITEM=${${(f)LIST}[$POS]}
# Do stuff
((POS=POS+1))
done
What would I need to change to make it work properly?

I know it's been a lot of time since the question was answered but I think it's worth posting a simpler answer (which doesn't require the zsh/mapfile external module):
#!/bin/zsh
for line in "${(#f)"$(</path/to/some/file.txt)"}"
{
// do something with each $line
}

#!/bin/zsh
zmodload zsh/mapfile
FNAME=/path/to/some/file.txt
FLINES=( "${(f)mapfile[$FNAME]}" )
LIST="${mapfile[$FNAME]}" # Not required unless stuff uses it
integer POS=1 # Not required unless stuff uses it
integer SIZE=$#FLINES # Number of lines, not required unless stuff uses it
for ITEM in $FLINES
# Do stuff
(( POS++ ))
done
You have some strange things in your code:
Why are you splitting LIST each time instead of making it an array variable? It is just a waste of CPU time.
Why don’t you use for ITEM in ${(f)LIST}?
There is a possibility to directly ask zsh about array length: $#ARRAY. No need in determining the index of the last occurrence of the last element.
POS gets the same value as SIZE in your code. Hence it will iterate only once.
Brackets are problems likely because of 3.: (I) is matching against a pattern. Do read documentation.

Let's say, for the purpose of example, that file.txt contains the following text:
one
two
three
The solution depends on whether or not you'd like to elide the empty lines in file.txt:
Creating an array lines from file file.txt, eliding empty lines:
typeset -a lines=("${(f)"$(<file.txt)"}")
print ${#lines}
Expected output:
3
Creating an array lines from file file.txt, without eliding empty lines:
typeset -a lines=("${(#f)"$(<file.txt)"}")
print ${#lines}
Expected output:
5
In the end, the difference in the resulting array is a result of whether or not the parameter expansion flag (#) is provided during brace expansion.

while read -r line;
do ARRAY+=("$line");
done < file.txt

(bash) How to access chunks of command output as discrete array elements?

nmcli -t -f STATE,WIFI,WWAN
gives the output
connected:enabled:disabled
which I'd like to convert to something like
Networking: connected, Wifi: enabled, WWAN: disabled
The logical solution to me is to turn this into an array. Being quite new to bash scripting, I have read that arrays are just regular variables and the elements are separated by whitespace. Currently my code is
declare -a NMOUT=$(nmcli -t -f STATE,WIFI,WWAN nm | tr ":" "\n")
which seems to sort of work for a for loop, but not if i want to ask for a specific element, as in ${NMOUT[]}. Clearly I am missing out on some key concept here. How do I access specific elements in this array?

IFS=: read -a NMOUT < <(nmcli -t -f STATE,WIFI,WWAN)

Ignacio Vazquez-Abrams provided a much better solution for creating the array. I will address the posted question.
Array's in bash are indexed by integers starting at 0.
"${NMOUT[0]}" # first element of the array
"${NMOUT[2]}" # third element of the array
"${NMOUT[#]}" # All array elements
"${NMOUT[*]}" # All array elements as a string
The following provides good information on using arrays in bash: http://mywiki.wooledge.org/BashFAQ/005