After reading up on how to initialize arrays in Bash, and seeing some basic examples put forward in blogs, there remains some uncertainties on its practical use. An interesting example perhaps would be to sort in ascending order -- list countries from A to Z in random order, one for each letter.
But in the real world, how is a Bash array applied? What is it applied to? What is the common use case for arrays? This is one area I am hoping to be familiar with. Any champions in the use of bash arrays? Please provide your example.
There are a few cases where I like to use arrays in Bash.
When I need to store a collections of strings that may contain spaces or $IFS characters.
declare -a MYARRAY=(
"This is a sentence."
"I like turtles."
"This is a test."
)
for item in "${MYARRAY[#]}"; do
echo "$item" $(echo "$item" | wc -w) words.
done
This is a sentence. 4 words.
I like turtles. 3 words.
This is a test. 4 words.
When I want to store key/value pairs, for example, short names mapped to long descriptions.
declare -A NEWARRAY=(
["sentence"]="This is a sentence."
["turtles"]="I like turtles."
["test"]="This is a test."
)
echo ${NEWARRAY["turtles"]}
echo ${NEWARRAY["test"]}
I like turtles.
This is a test.
Even if we're just storing single "word" items or numbers, arrays make it easy to count and slice our data.
# Count items in array.
$ echo "${#MYARRAY[#]}"
3
# Show indexes of array.
$ echo "${!MYARRAY[#]}"
0 1 2
# Show indexes/keys of associative array.
$ echo "${!NEWARRAY[#]}"
turtles test sentence
# Show only the second through third elements in the array.
$ echo "${MYARRAY[#]:1:2}"
I like turtles. This is a test.
Read more about Bash arrays here. Note that only Bash 4.0+ supports every operation I've listed (associative arrays, for example), but the link shows which versions introduced what.
Related
As I am teaching myself Bash programming, I came across an interesting use case, where I want to take a list of variables that exist in the environment, and put them into an array. Then, I want to output a list of the variable names and their values, and store that output in an array, one entry per variable.
I'm only about 2 weeks into Bash shell scripting in any "real" way, and I am educating myself on arrays. A common function in other programming language is the ability to "zip" two arrays, e.g. as is done in Python. Another common feature in any programming language is indirection, e.g. via pointer indirection, etc. This is largely academic, to teach myself through a somewhat challenging example, but I think this has widespread use if for no other reason than debugging, keeping track of overall system state, etc.
What I want is for the following input... :
VAR_ONE="LIGHT RED"
VAR_TWO="DARK GREEN"
VAR_THREE="BLUE"
VARIABLE_ARRAY=(VAR_ONE VAR_TWO VAR_THREE)
... to be converted into the following output (as an array, one element per line):
VAR_ONE: LIGHT RED
VAR_TWO: DARK GREEN
VAR_THREE: BLUE
Constraints:
Assume that I do not have control of all of the variables, so I cannot just sidestep the problem e.g. by using an associative array from the get-go. (i.e. please do not recommend avoiding the need for indirect reference lookups altogether by never having a discrete variable named "VAR_ONE"). But a solution that stores the result in an associative array is fine.
Assume that variable names will never contain spaces, but their values might.
The final output should not contain separate elements just because the input variables had values containing spaces.
What I've read about so far:
I've read some StackOverflow posts like this one, that deal with using indirect references to arrays themselves (e.g. if you have three arrays and want to choose which one to pull from based on an "array choice" variable): How to iterate over an array using indirect reference?
I've also found one single post that deals with "zipping" arrays in Bash in the manner I'm talking about, where you pair-up e.g. the 1st element from array1 and array2, then pair up the 2nd elements, etc.: Iterate over two arrays simultaneously in bash
...but I haven't found anything that quite discusses this unique use-case...
QUESTION:
How should I make an array containing a list of variable names and their values (colon-separated), given an array containing a list of variable names only. I'm not "failing to come up with any way to do it" but I want to find the "preferred" way to do this in Bash, considering performance, security, and being concise/understandable.
EDIT: I'll post what I've come up with thus far as an answer to this post... but not mark it as answered, since I want to also hear some unbiased recommendations...
OP starts with:
VAR_ONE="LIGHT RED"
VAR_TWO="DARK GREEN"
VAR_THREE="BLUE"
VARIABLE_ARRAY=(VAR_ONE VAR_TWO VAR_THREE)
OP has provided an answer with 4 sets of code:
# first 3 sets of code generate:
$ typeset -p outputValues
declare -a outputValues=([0]="VAR_ONE: LIGHT RED" [1]="VAR_TWO: DARK GREEN" [2]="VAR_THREE: BLUE")
# the 4th set of code generates the following where the data values are truncated at the first space:
$ typeset -p outputValues
declare -a outputValues=([0]="VAR_ONE: LIGHT" [1]="VAR_TWO: DARK" [2]="VAR_THREE: BLUE")
NOTES:
I'm assuming the output from the 4th set of code is wrong so will be ignoring this one
OP's code samples touch on a couple ideas I'm going to make use of (below) ... nameref's (declare -n <variable_name>) and indirect variable references (${!<variable_name>})
For readability (and maintainability by others) I'd probably avoid the various eval and expansion ideas and instead opt for using bash namesref's (declare -n); a quick example:
$ x=5
$ echo "${x}"
5
$ y=x
$ echo "${y}"
x
$ declare -n y="x" # nameref => y=(value of x)
$ echo "${y}"
5
Pulling this into the original issue we get:
unset outputValues
declare -a outputValues # optional; declare 'normal' array
for var_name in "${VARIABLE_ARRAY[#]}"
do
declare -n data_value="${var_name}"
outputValues+=("${var_name}: ${data_value}")
done
Which gives us:
$ typeset -p outputValues
declare -a outputValues=([0]="VAR_ONE: LIGHT RED" [1]="VAR_TWO: DARK GREEN" [2]="VAR_THREE: BLUE")
While this generates the same results (as OP's first 3 sets of code) there is (for me) the nagging question of how is this new array going to be used?
If the sole objective is to print this pre-formatted data to stdout ... ok, though why bother with a new array when the same can be done with the current array and nameref's?
If the objective is to access this array as sets of variable name/value pairs for processing purposes, then the current structure is going to be hard(er) to work with, eg, each array 'value' will need to be parsed/split based on the delimiter :<space> in order to access the actual variable names and values.
In this scenario I'd opt for using an associative array, eg:
unset outputValues
declare -A outputValues # required; declare associative array
for var_name in "${VARIABLE_ARRAY[#]}"
do
declare -n data_value="${var_name}"
outputValues[${var_name}]="${data_value}"
done
Which gives us:
$ typeset -p outputValues
declare -A outputValues=([VAR_ONE]="LIGHT RED" [VAR_THREE]="BLUE" [VAR_TWO]="DARK GREEN" )
NOTES:
again, why bother with a new array when the same can be done with the current array and nameref's?
if the variable $data_value is to be re-used in follow-on code as a 'normal' variable it will be necessary to remove the nameref attribute (unset -n data_value)
With an associative array (index=variable name / array element=variable value) it becomes easier to reference the variable name/value pairs, eg:
$ myvar=VAR_ONE
$ echo "${myvar}: ${outputValues[${myvar}]}"
VAR_ONE: LIGHT RED
$ for var_name in "${!outputValues[#]}"; do echo "${var_name}: ${outputValues[${var_name}]}"; done
VAR_ONE: LIGHT RED
VAR_THREE: BLUE
VAR_TWO: DARK GREEN
In older versions of bash (before nameref's were available), and still available in newer versions of bash, there's the option of using indirect variable references;
$ x=5
$ echo "${x}"
5
$ unset -n y # make sure 'y' has not been previously defined as a nameref
$ y=x
$ echo "${y}"
x
$ echo "${!y}"
5
Pulling this into the associative array approach:
unset -n var_name # make sure var_name not previously defined as a nameref
unset outputValues
declare -A outputValues # required; declare associative array
for var_name in "${VARIABLE_ARRAY[#]}"
do
outputValues[${var_name}]="${!var_name}"
done
Which gives us:
$ typeset -p outputValues
declare -A outputValues=([VAR_ONE]="LIGHT RED" [VAR_THREE]="BLUE" [VAR_TWO]="DARK GREEN" )
NOTE: While this requires less coding in the for loop, if you forget to unset -n the variable (var_name in this case) then you'll end up with the wrong results if var_name was previously defined as a nameref; perhaps a minor issue but it requires the coder to know of, and code for, this particular issue ... a bit too esoteric (for my taste) so I prefer to stick with namerefs ... ymmv ...
I've come up with a handful of possible solutions in the last couple days, each one with their own pro's and con's. I won't mark this as the answer for awhile though, since I'm interested in hearing unbiased recommendations.
My brainstorming solutions thus far:
OPTION #1 - FOR-LOOP:
alias PrintCommandValues='unset outputValues
for var in ${VARIABLE_ARRAY[#]}
do outputValues+=("${var}: ${!var}")
done; printf "%s\n\n" "${outputValues[#]}"'
PrintCommandValues
Pro's: Traditional, easy to understand
Cons: A little verbose. I'm not sure about Bash, but I've been doing a lot of Mathematica programming (imperative-style), where such loops are notably slower. Anybody know if that's true for Bash?
OPTION #2 - EVAL:
i=0; outputValues=("${VARIABLE_ARRAY[#]}")
eval declare "${VARIABLE_ARRAY[#]/#/outputValues[i++]+=:\\ $}"
printf "%s\n\n" "${outputValues[#]}"
Pros: Shorter than the for-loop, and still easy to understand.
Cons: I'm no expert, but I've read a lot of warnings to avoid eval whenever possible, due to security issues. Probably not something I'll concern myself a ton over when I'm mostly writing scripts for "handy utility purposes" for my personal machine only, but...
OPTION #3 - QUOTED DECLARE WITH PARENTHESIS:
i=0; declare -a outputValues="(${VARIABLE_ARRAY[#]/%/'\:\ "${!VARIABLE_ARRAY[i++]}"'})"
printf "%s\n\n" "${outputValues[#]}"
Pros: Super-concise. I just plain stumbled onto this syntax -- I haven't found it mentioned anywhere on the web. Apparently, using declare in Bash (I use version 4.4.20(1)), if (and ONLY if) you place array-style (...) brackets after the equals-sign, and quote it, you get one more "round" of expansion/dereferencing, similar to eval. I happened to be toying with this post, and found the part about the "extra expansion" by accident.
For example, compare these two tests:
varName=varOne; varOne=something
declare test1=\$$varName
declare -a test2="(\$$varName)"
declare -p test1 test2
Output:
declare -- test1="\$varOne"
declare -a test2=([0]="something")
Pretty neat, I think...
Anyways, the cons for this method are... I've never seen it documented officially or unofficially anywhere, so... portability...?
Alternative for this option:
i=0; declare -a LABELED_VARIABLE_ARRAY="(${VARIABLE_ARRAY[#]/%/'\:\ \$"${VARIABLE_ARRAY[i++]}"'})"
declare -a outputValues=("${LABELED_VARIABLE_ARRAY[#]#P}")
printf "%s\n\n" "${outputValues[#]}"
JUST FOR FUN - BRACE EXPANSION:
unset outputValues; OLDIFS=$IFS; IFS=; i=0; j=0
declare -n nameCursor=outputValues[i++]; declare -n valueCursor=outputValues[j++]
declare {nameCursor+=,valueCursor+=": "$}{VAR_ONE,VAR_TWO,VAR_THREE}
printf "%s\n\n" "${outputValues[#]}"
IFS=$OLDIFS
Pros: ??? Maybe speed?
Cons: Pretty verbose, not very easy to understand
Anyways, those are all of my methods... Are any of them reasonable, or would you do something different altogether?
I have bash array (called tenantlist_array below) populated with elements with the following format:
{3 characters}-{3-5 characters}{3-5 digits}-{2 chars}{1-2 digits}.
Example:
abc-hac101-bb0
xyz-b2blo97250-aa99
abc-b2b9912-xy00
fff-hac101-g3
Array elements are unique. Please notice the hyphen, it is part of every array element.
I need to check if the supplied string (used in the below example as a variable tenant) produces a full match with any array element - because array elements are unique, the first match is sufficient.
I am iterating over array elements using the simple code:
tenant="$1"
for k in "${tenantlist_array[#]}"; do
result=$(grep -x -- "$tenant" <<<"$k")
if [[ $result ]]; then
break
fi
done
Please note - I need to have a full string match - if, for example, the string I am searching is hac101 it must not match any array element even if can be a substring if an array element.
In other words, only the full string abc-hac101-bb0 must produce the match with the first element. Strings abc, abc-hac, b2b, 99, - must not produce the match. That's why -x parameter is with the grep call.
Now, the above code works, but I find it quite slow. I've run it with the array having 193 elements and on an ordinary notebook it takes almost 90 seconds to iterate over the array elements:
real 1m2.541s
user 0m0.500s
sys 0m24.063s
And with the 385 elements in the array, time is following:
real 2m8.618s
user 0m0.906s
sys 0m48.094s
So my question - is there a faster way to do it?
Without running any loop you can do this using glob:
tenant="$1"
[[ $(printf '\3%s\3' "${tenantlist_array[#]}") == *$'\3'"$tenant"$'\3'* ]] &&
echo "ok" || echo "no"
In printf we place a control character \3 around each element and while comparing we make sure to place \3 before & after search key.
Thanks to #arco444, the solution is astonishingly simple:
tenant="$1"
for k in "${tenantlist_array[#]}"; do
if [[ $k = "$tenant" ]]; then
result="$k"
break
fi
done
And the seed difference for the 385 member array:
real 0m0.007s
user 0m0.000s
sys 0m0.000s
Thousand times faster.
This gives an idea of how wasteful is calling grep, which needs to be avoided, if possible.
This is an alternative way of using grep that actually uses grep at most of its power.
The code to "format" the array could be completely removed just appending a \n at the end of each uuid string when creating the array the first time.
This code would also degrade much slower with the length of the strings that are compared and with the length of the array.
tenant="$1"
formatted_array=""
for k in "${tenantlist_array[#]}"; do
formatted_array="$formatted_array $i\n"
done
result=$(echo -e "$formatted_array" | grep $tenant)
I have a bash script which breaks bash array into pairs, and match on either element;
declare -a arr=(
"apple" "fruit"
"cabbage" "vegetables"
)
for ((i=0; i<${#arr[#]}; i+=2)); do
echo "${arr[i]} ${arr[i+1]}"
done
So when you run this script, it prints out each 2 element from the array, like this;
# bash script
apple fruit
cabbage vegetables
and I can also choose any element I want with ${arr[i+#]}.
Now I'm trying to read this array from a separate text file, instead of inside the script since I'll be manipulating this array in the future.
I've tried this method so far, which looked pretty promising at first but didn't work at all;
filename='stuff.log'
filelines=`cat $filename`
for line in $filelines ; do
props=($line)
echo "${props[0]} ${props[1]}"
done
which should've print out the below content in the console (basically the same thing as the first script where the array is inside the script), supposedly but instead, it returned nothing.
# bash script
apple fruit
cabbage vegetables
And the inside of stuff.log is;
"apple" "fruit"
"cabbage" "vegetables"
How can I basically read the array from a separate file for the first script and also be able to manipulate the content of array file in the future?
I think, if you trust your input, you can do:
IFS=' \n' eval props=($(<stuff.log))
Eval is evil and it is there to remove leading and trailing ". And it will parse properly elements with spaces in them. We can do a little safer by reading the file into array and then removing leading and trailing ":
IFS=' \n' props=($(<stuff.log))
IFS='\n' props=($(printf "%s\n" "${props[#]}" | sed 's/^"//;s/"$//'))
Anyway I think I would hesitate to use such method in production code. Would be better to write a proper fully parser that takes " into account and reads input char by char.
If you want to read a file into an array, use mapfile or readarray commands (they are exactly the same command).
My closest most helpful matches when I searched for an answer ahead of posting:
Iterate over array in shell whose name is stored in a variable
How to use an argument/parameter name as a variable in a bash script
How to iterate over an array using indirect reference?
My attempt with partial success:
#!/bin/bash
declare -a large_furry_mammals
declare -a array_reference
# I tried both declaring array_reference as an array and
# not declaring it as an array. no change in behavior.
large_furry_mammals=(horse zebra gorilla)
size=large
category=mammals
tmp="${size}_furry_${category}"
eval array_reference='$'$tmp
echo tmp=$tmp
echo array_reference[0]=${array_reference[0]}
echo array_reference[1]=${array_reference[1]}
Output
tmp=large_furry_mammals
array_reference[0]=horse
array_reference[1]=
Expectation
I would have expected to get the output zebra when I echoed array_reference[1].
...but I'm missing some subtlety...
Why can I not access elements of the index array beyond index 0?
This suggests that array_reference is not actually being treated as an array.
I'm not looking to make a copy of the array. I want to reference (what will be) a static array based on a variable pointing to that array, i.e., ${size}_furry_${category} -> large_furry_mammals.
I've been successful with the general idea here using the links I've posted but only as long as its not an array. When it's an array, it's falling down for me.
Addendum Dec 5, 2018
bash 4.3 is not available in this case. #benjamin's answer does work on under 4.3.
I'll be needing to loop over the resulting array variable's contents. This kinda dumb example I gave involving mammals was just to describe the concept. There's actually a real world case around this. I have set of static reference arrays and an input string would be parsed to select which array was relevant and then I will loop over the array that was selected. I could do a case statement but with more than 100 reference arrays that would be the direct but overly verbose way to do it.
This pseudo code is probably better example of what I'm going after.
m1_array=(x a r d)
m2_array=(q 3 fg d)
m3_array=(c e p)
Based on some logic...select which array prefix you need.
x=m1
for each element in ${x}_array
do
some-task
done
I'm doing some testing with #eduardo's solution to see if I can adapt the way he references the variables to get to my endgame.
** Addendum #2 December 14, 2018 **
Solution
I found it! Working with #eduardo's example I came up with the following:
#!/bin/bash
declare -a large_furry_mammals
#declare -a array_reference
large_furry_mammals=(horse zebra gorilla)
size=large
category=mammals
tmp="${size}_furry_${category}[#]"
for element in "${!tmp}"
do
echo $element
done
Here is what execution looks like. We successfully iterate over the elements of the array string that was built dynamically.
./example3b.sh
horse
zebra
gorilla
Thank you everyone.
If you have Bash 4.3 or newer, you can use namerefs:
large_furry_mammals=(horse zebra gorilla)
size=large
category=mammals
declare -n array_reference=${size}_furry_$category
printf '%s\n' "${array_reference[#]}"
with output
horse
zebra
gorilla
This is a reference, so changes are reflected in both large_furry_mammals and array_reference:
$ array_reference[0]='donkey'
$ large_furry_mammals[3]='llama'
$ printf '%s\n' "${array_reference[#]}"
donkey
zebra
gorilla
llama
$ printf '%s\n' "${large_furry_mammals[#]}"
donkey
zebra
gorilla
llama
declare -a large_furry_mammals
declare -a array_reference
large_furry_mammals=(horse zebra gorilla)
size=large
category=mammals
echo ${large_furry_mammals[#]}
tmp="${size}_furry_${category}"
array_reference=${tmp}"[1]"
eval ${array_reference}='bear'
echo tmp=$tmp
echo ${large_furry_mammals[#]}
I'm trying build an sort of property set in ksh.
Thought the easiest way to do so was using arrays but the syntax is killing me.
What I want is to
Build an arbitrary sized array in a config file with a name and a property.
Iterate for each item in that list and get that property.
I theory what I wish I could do is something like
MONITORINGSYS={
SYS1={NAME="GENERATOR" MONITORFUNC="getGeneratorStatus"}
SYS2={NAME="COOLER" MONITORFUNC="getCoolerStatus"}
}
Then later on, be able to do something like:
for CURSYS in $MONITORINGSYS
do
CSYSNAME=$CURSYS.NAME
CSYSFUNC=$CURSYS.MONITORFUNC
REPORT="$REPORT\n$CSYSNAME"
CSYSSTATUS=CSYSFUNC $(date)
REPORT="$REPORT\t$CSYSSTATUS"
done
echo $REPORT
Well, that's not real programming, but I guess you got the point..
How do I do that?
[EDIT]
I do not mean I want to use associative arrays. I only put this way to make my question more clear... I.e. It would not be a problem if the loop was something like:
for CURSYS in $MONITORINGSYS
do
CSYSNAME=${CURSYS[0]}
CSYSFUNC=${CURSYS[1]}
REPORT="$REPORT\n$CSYSNAME"
CSYSSTATUS=CSYSFUNC $(date)
REPORT="$REPORT\t$CSYSSTATUS"
done
echo $REPORT
Same applies to the config file.. I'm just looking for a syntax that makes it minimally readable.
cheers
Not exactly sure what you want... Kornshell can handle both associative and indexed arrays.
However, Kornshell arrays are one dimensional. It might be possible to use indirection to emulate a two dimensional array via the use of $() and eval. I did this a couple of times in the older Perl 4.x and Perl 3.x, but it's a pain. If you want multidimensional arrays, use Python or Perl.
The only thing is that you must declare arrays via the typedef command:
$ typeset -A foohash #foohash is an associative array
$ typeset -a foolist #foolist is an integer indexed array.
Maybe your script can look something like this
typeset -a sysname
typeset -a sysfunct
sysname[1] = "GENERATOR"
sysname[2] = "COOLER"
sysfunc[1] = "getGeneratorStatus"
sysfunc[2] = "getCoolerStatus"
for CURSYS in {1..2}
do
CSYSNAME="${sysname[$CURSYS]}"
CSYSFUNC="${sysfunc[$CURSYS]}"
REPORT="$REPORT\n$CSYSNAME"
CSYSSTATUS=$(eval "CSYSFUNC $(date)")
REPORT="$REPORT\t$CSYSSTATUS"
done
echo $REPORT
ksh93 now has compound variables which can contain a mixture of indexed and associative arrays. No need to declare it as ksh will work it out itself.
#!/bin/ksh
MONITORINGSYS=(
[SYS1]=(NAME="GENERATOR" MONITORFUNC="getGeneratorStatus")
[SYS2]=(NAME="COOLER" MONITORFUNC="getCoolerStatus")
)
echo MONITORING REPORT
echo "-----------------"
for sys in ${!MONITORINGSYS[*]}; do
echo "System: $sys"
echo "Name: ${MONITORINGSYS[$sys].NAME}"
echo "Generator: ${MONITORINGSYS[$sys].MONITORFUNC}"
echo
done
Output:
MONITORING REPORT
-----------------
System: SYS1
Name: GENERATOR
Generator: getGeneratorStatus
System: SYS2
Name: COOLER
Generator: getCoolerStatus