bash string quoted multi-word args to array - arrays

The Question:
In bash scripting, what is the best way to convert a string, containing literal quotes surrounding multiple words, into an array with the same result of parsed arguments?
The Controversy:
Many questions exist all applying evasive tactics to avoid the problem instead of finding a solution, this question raises the following arguments and would like to encourage the reader to focus on arguments and if you are up for it, partake in the challenge to find the optimum solution.
Arguments raised:
Although there are many scenarios where this pattern should be avoided, because there exists alternative solutions better suited, the author is of the opinion that valid use cases still remain. This question will attempt to produce one such use case, but make no claim to the viability thereof only that it is a conceivable scenario which may present itself in a real world situation.
You must find the optimum solution to satisfy the requirement. The use case was chosen specifically for its real world applications. You may not agree with the decisions that were made but are not tasked to give an opinion only to deliver the solution.
Satisfy the requirement without modifying the input or choice of transport. Both specifically chosen with a real world scenario to defend the narrative that those parts are out of your control.
No answers exist to the particular problem and this question aims to address that. If you are inclined to avoid this pattern then simply avoid the question but if you think you are up for the challenge lets see how you would approach the problem.
The Valid use case:
Converting an existing script currently in use to receive parameters via named pipe or similar stream. In order to minimize the impact on the myriad of scripts outside of the developers control a decision was made to not change the interface. Existing scripts must be able to pass the same arguments via the new stream implementation as they did before.
Existing implementation:
$ ./string2array arg1 arg2 arg3
args=(
[0]="arg1"
[1]="arg2"
[2]="arg3"
)
Required change:
$ echo "arg1 arg2 arg3" | ./string2array
args=(
[0]="arg1"
[1]="arg2"
[2]="arg3"
)
The problem:
As pointed out by Bash and Double-Quotes passing to argv literal quotes are not parsed as would be expected.
This workbench script can be used to test various solutions, it handles the transport and formulates a measurable response. It is suggested that you focus on the solution script which gets sourced with the string as argument and you should populate the $args variable as an array.
The string2array workbench script:
#!/usr/bin/env bash
#string2arry
args=()
function inspect() {
local inspct=$(declare -p args)
inspct=${inspct//\[/\\n\\t[}; inspct=${inspct//\'/}; inspct="${inspct:0:-1}\n)"
echo -e ${inspct#*-a }
}
while read -r; do
# source the solution to turn $REPLY in $args array
source $1 "${REPLY}"
inspect
done
Standard solution - FAILS
The solution for turning a string into a space delimited array of words worked for our first example above:
#solution1
args=($#)
Undesired result
Unfortunately the standard solution produces an undesired result for quoted multi word arguments:
$ echo 'arg1 "multi arg 2" arg3' | ./string2array solution1
args=(
[0]="arg1"
[1]="\"multi"
[2]="arg"
[3]="2\""
[4]="arg3"
)
The Challenge:
Using the workbench script provide a solution snippet that will produce the following result for the arguments received.
Desired result:
$ echo 'arg1 "multi arg 2" arg3' | ./string2array solution-xyz
args=(
[0]="arg1"
[1]="multi arg 2"
[2]="arg3"
)
The solution should be compatible with standard argument parsing in every way. The following unit test should pass for for the provided solution. If you can think of anything currently missing from the unit test please leave a comment and we can update it.
Unit test for the requirements
Update: Test simplified and includes the Johnathan Leffer test
#!/usr/bin/env bash
#test_string2array
solution=$1
function test() {
cmd="echo \"${1}\" | ./string2array $solution"
echo "$ ${cmd}"
echo ${1} | ./string2array $solution > /tmp/t
cat /tmp/t
echo -n "Result : "
[[ $(cat /tmp/t|wc -l) -eq 7 ]] && echo "PASSED!" || echo "FAILED!"
}
echo 1. Testing single args
test 'arg1 arg2 arg3 arg4 arg5'
echo
echo 2. Testing multi args \" quoted
test 'arg1 "multi arg 2" arg3 "a r g 4" arg5'
echo
echo 3 Testing multi args \' quoted
test "arg1 'multi arg 2' arg3 'a r g 4' arg5"
echo
echo 4 Johnathan Leffer test
test "He said, \"Don't do that!\" but \"they didn't listen.\""

The declare built-in seems to do what you want; in my test, it's your inspect function that doesn't seem work to properly test all inputs:
# solution3
declare -a "args=($1)"
Then
$ echo "arg1 'arg2a arg2b' arg3" | while read -r; do
> source solution3 "${REPLY}"
> for arg in "${args[#]}"; do
> echo "Arg $((++i)): $arg"
> done
> done
Arg 1: arg1
Arg 2: arg2a arg2b
Arg 3: arg3

You may do it with declare instead of eval, for example:
Instead of:
string='"aString that may haveSpaces IN IT" bar foo "bamboo" "bam boo"'
echo "Initial string: $string"
eval 'for word in '$string'; do echo $word; done'
Do:
declare -a "array=($string)"
for item in "${array[#]}"; do echo "[$item]"; done
But please note, it is not much safer if input comes from user!
So, if you try it with say string like:
string='"aString that may haveSpaces IN IT" bar foo "bamboo" "bam boo" `hostname`'
You get hostname evaluated (there off course may be something like rm -rf /)!
Very-very simple attempt to guard it just replace chars like backtrick ` and $:
string='"aString that may haveSpaces IN IT" bar foo "bamboo" "bam boo" `hostname`'
declare -a "array=( $(echo $string | tr '`$<>' '????') )"
for item in "${array[#]}"; do echo "[$item]"; done
Now you got output like:
[aString that may haveSpaces IN IT]
[bar]
[foo]
[bamboo]
[bam boo]
[?hostname?]
More details about methods and pros about using different methods you may found in that good answer: Why should eval be avoided in Bash, and what should I use instead?
See also https://superuser.com/questions/1066455/how-to-split-a-string-with-quotes-like-command-arguments-in-bash/1186997#1186997
But there still leaved vector for attack.
I very would have in bash method of string quote like in double quotes (") but without interpreting content.

First attempt
Populate a variable with the combined words once the open quote was detected and only append to the array once the close quote arrives.
Solution
#solution2
j=''
for a in ${1}; do
if [ -n "$j" ]; then
[[ $a =~ ^(.*)[\"\']$ ]] && {
args+=("$j ${BASH_REMATCH[1]}")
j=''
} || j+=" $a"
elif [[ $a =~ ^[\"\'](.*)$ ]]; then
j=${BASH_REMATCH[1]}
else
args+=($a)
fi
done
Unit test results:
$ ./test_string2array solution2
1. Testing single args
$ echo "arg1 arg2 arg3 arg4 arg5" | ./string2array solution2
args=(
[0]="arg1"
[1]="arg2"
[2]="arg3"
[3]="arg4"
[4]="arg5"
)
Result : PASSED!
2. Testing multi args " quoted
$ echo 'arg1 "multi arg 2" arg3 "a r g 4" arg5' | ./string2array solution2
args=(
[0]="arg1"
[1]="multi arg 2"
[2]="arg3"
[3]="a r g 4"
[4]="arg5"
)
Result : PASSED!
3 Testing multi args ' quoted
$ echo "arg1 'multi arg 2' arg3 'a r g 4' arg5" | ./string2array solution2
args=(
[0]="arg1"
[1]="multi arg 2"
[2]="arg3"
[3]="a r g 4"
[4]="arg5"
)
Result : PASSED!

So I think xargs actually works for all your test cases, eg:
echo 'arg1 "multi arg 2" arg3' | xargs -0 ./string2array

Second attempt
Append the element in place without the need for an additional variable.
#solution3
for i in $1; do
[[ $i =~ ^[\"\'] ]] && args+=(' ')
lst=$(( ${#args[#]}-1 ))
[[ "${args[*]}" =~ [[:space:]]$ ]] && args[$lst]+="${i/[\"\']/} " || args+=($i)
[[ $i =~ [\"\']$ ]] && args[$lst]=${args[$lst]:1:-1}
done

Modify in place
Let bash convert the string to array and then loop through to fix it.
args=($#) cnt=${#args[#]} idx=-1 chr=
for (( i=0; i<cnt; i++ )); do
[[ $idx -lt 0 ]] && {
[[ ${args[$i]:0:1} =~ [\'\"] ]] && \
idx=$i chr=${args[$idx]:0:1} args[$idx]="${args[$idx]:1}"
continue
}
args[$idx]+=" ${args[$i]}"
unset args[$i]
[[ ${args[$idx]: -1:1} == $chr ]] && args[$idx]=${args[$idx]:0:-1} idx=-1
done

Modify the delimiter
In this solution we turn the spaces into commas, remove the quotes and reset the spaces for the multi word arguments, to allow for the correct argument parsing.
#solution4
s=${*//[[:space:]]/\l}
while [[ $s =~ [\"\']([^\"\']*)[\"\'] ]]; do
s=${s/$BASH_REMATCH/${BASH_REMATCH[1]//\l/ }}
done
IFS=\l
args=(${s})
NEEDS WORK!!

Related

bash access associative array creates unbound variable [duplicate]

Using:
set -o nounset
Having an indexed array like:
myArray=( "red" "black" "blue" )
What is the shortest way to check if element 1 is set?
I sometimes use the following:
test "${#myArray[#]}" -gt "1" && echo "1 exists" || echo "1 doesn't exist"
I would like to know if there's a preferred one.
How to deal with non-consecutive indexes?
myArray=()
myArray[12]="red"
myArray[51]="black"
myArray[129]="blue"
How to quick check that 51 is already set for example?
How to deal with associative arrays?
declare -A myArray
myArray["key1"]="red"
myArray["key2"]="black"
myArray["key3"]="blue"
How to quick check that key2 is already used for example?
To check if the element is set (applies to both indexed and associative array)
[ "${array[key]+abc}" ] && echo "exists"
Basically what ${array[key]+abc} does is
if array[key] is set, return abc
if array[key] is not set, return nothing
References:
See Parameter Expansion in Bash manual and the little note
if the colon is omitted, the operator tests only for existence [of parameter]
This answer is actually adapted from the answers for this SO question: How to tell if a string is not defined in a bash shell script?
A wrapper function:
exists(){
if [ "$2" != in ]; then
echo "Incorrect usage."
echo "Correct usage: exists {key} in {array}"
return
fi
eval '[ ${'$3'[$1]+muahaha} ]'
}
For example
if ! exists key in array; then echo "No such array element"; fi
From man bash, conditional expressions:
-v varname
True if the shell variable varname is set (has been assigned a value).
example:
declare -A foo
foo[bar]="this is bar"
foo[baz]=""
if [[ -v "foo[bar]" ]] ; then
echo "foo[bar] is set"
fi
if [[ -v "foo[baz]" ]] ; then
echo "foo[baz] is set"
fi
if [[ -v "foo[quux]" ]] ; then
echo "foo[quux] is set"
fi
This will show that both foo[bar] and foo[baz] are set (even though the latter is set to an empty value) and foo[quux] is not.
New answer
From version 4.2 of bash (and newer), there is a new -v option to built-in test command.
From version 4.3, this test could address element of arrays.
array=([12]="red" [51]="black" [129]="blue")
for i in 10 12 30 {50..52} {128..131};do
if [ -v 'array[i]' ];then
echo "Variable 'array[$i]' is defined"
else
echo "Variable 'array[$i]' not exist"
fi
done
Variable 'array[10]' not exist
Variable 'array[12]' is defined
Variable 'array[30]' not exist
Variable 'array[50]' not exist
Variable 'array[51]' is defined
Variable 'array[52]' not exist
Variable 'array[128]' not exist
Variable 'array[129]' is defined
Variable 'array[130]' not exist
Variable 'array[131]' not exist
Note: regarding ssc's comment, I've single quoted 'array[i]' in -v test, in order to satisfy shellcheck's error SC2208. This seem not really required here, because there is no glob character in array[i], anyway...
This work with associative arrays in same way:
declare -A aArray=([foo]="bar" [bar]="baz" [baz]=$'Hello world\041')
for i in alpha bar baz dummy foo test;do
if [ -v 'aArray[$i]' ];then
echo "Variable 'aArray[$i]' is defined"
else
echo "Variable 'aArray[$i]' not exist"
fi
done
Variable 'aArray[alpha]' not exist
Variable 'aArray[bar]' is defined
Variable 'aArray[baz]' is defined
Variable 'aArray[dummy]' not exist
Variable 'aArray[foo]' is defined
Variable 'aArray[test]' not exist
With a little difference:In regular arrays, variable between brackets ([i]) is integer, so dollar symbol ($) is not required, but for associative array, as key is a word, $ is required ([$i])!
Old answer for bash prior to V4.2
Unfortunately, bash give no way to make difference betwen empty and undefined variable.
But there is some ways:
$ array=()
$ array[12]="red"
$ array[51]="black"
$ array[129]="blue"
$ echo ${array[#]}
red black blue
$ echo ${!array[#]}
12 51 129
$ echo "${#array[#]}"
3
$ printf "%s\n" ${!array[#]}|grep -q ^51$ && echo 51 exist
51 exist
$ printf "%s\n" ${!array[#]}|grep -q ^52$ && echo 52 exist
(give no answer)
And for associative array, you could use the same:
$ unset array
$ declare -A array
$ array["key1"]="red"
$ array["key2"]="black"
$ array["key3"]="blue"
$ echo ${array[#]}
blue black red
$ echo ${!array[#]}
key3 key2 key1
$ echo ${#array[#]}
3
$ set | grep ^array=
array=([key3]="blue" [key2]="black" [key1]="red" )
$ printf "%s\n" ${!array[#]}|grep -q ^key2$ && echo key2 exist || echo key2 not exist
key2 exist
$ printf "%s\n" ${!array[#]}|grep -q ^key5$ && echo key5 exist || echo key5 not exist
key5 not exist
You could do the job without the need of externals tools (no printf|grep as pure bash), and why not, build checkIfExist() as a new bash function:
$ checkIfExist() {
eval 'local keys=${!'$1'[#]}';
eval "case '$2' in
${keys// /|}) return 0 ;;
* ) return 1 ;;
esac";
}
$ checkIfExist array key2 && echo exist || echo don\'t
exist
$ checkIfExist array key5 && echo exist || echo don\'t
don't
or even create a new getIfExist bash function that return the desired value and exit with false result-code if desired value not exist:
$ getIfExist() {
eval 'local keys=${!'$1'[#]}';
eval "case '$2' in
${keys// /|}) echo \${$1[$2]};return 0 ;;
* ) return 1 ;;
esac";
}
$ getIfExist array key1
red
$ echo $?
0
$ # now with an empty defined value
$ array["key4"]=""
$ getIfExist array key4
$ echo $?
0
$ getIfExist array key5
$ echo $?
1
What about a -n test and the :- operator?
For example, this script:
#!/usr/bin/env bash
set -e
set -u
declare -A sample
sample["ABC"]=2
sample["DEF"]=3
if [[ -n "${sample['ABC']:-}" ]]; then
echo "ABC is set"
fi
if [[ -n "${sample['DEF']:-}" ]]; then
echo "DEF is set"
fi
if [[ -n "${sample['GHI']:-}" ]]; then
echo "GHI is set"
fi
Prints:
ABC is set
DEF is set
tested in bash 4.3.39(1)-release
declare -A fmap
fmap['foo']="boo"
key='foo'
# should echo foo is set to 'boo'
if [[ -z "${fmap[${key}]}" ]]; then echo "$key is unset in fmap"; else echo "${key} is set to '${fmap[${key}]}'"; fi
key='blah'
# should echo blah is unset in fmap
if [[ -z "${fmap[${key}]}" ]]; then echo "$key is unset in fmap"; else echo "${key} is set to '${fmap[${key}]}'"; fi
Reiterating this from Thamme:
[[ ${array[key]+Y} ]] && echo Y || echo N
This tests if the variable/array element exists, including if it is set to a null value. This works with a wider range of bash versions than -v and doesn't appear sensitive to things like set -u. If you see a "bad array subscript" using this method please post an example.
This is the easiest way I found for scripts.
<search> is the string you want to find, ASSOC_ARRAY the name of the variable holding your associative array.
Dependign on what you want to achieve:
key exists:
if grep -qe "<search>" <(echo "${!ASSOC_ARRAY[#]}"); then echo key is present; fi
key exists not:
if ! grep -qe "<search>" <(echo "${!ASSOC_ARRAY[#]}"); then echo key not present; fi
value exists:
if grep -qe "<search>" <(echo "${ASSOC_ARRAY[#]}"); then echo value is present; fi
value exists not:
if ! grep -qe "<search>" <(echo "${ASSOC_ARRAY[#]}"); then echo value not present; fi
I wrote a function to check if a key exists in an array in Bash:
# Check if array key exists
# Usage: array_key_exists $array_name $key
# Returns: 0 = key exists, 1 = key does NOT exist
function array_key_exists() {
local _array_name="$1"
local _key="$2"
local _cmd='echo ${!'$_array_name'[#]}'
local _array_keys=($(eval $_cmd))
local _key_exists=$(echo " ${_array_keys[#]} " | grep " $_key " &>/dev/null; echo $?)
[[ "$_key_exists" = "0" ]] && return 0 || return 1
}
Example
declare -A my_array
my_array['foo']="bar"
if [[ "$(array_key_exists 'my_array' 'foo'; echo $?)" = "0" ]]; then
echo "OK"
else
echo "ERROR"
fi
Tested with GNU bash, version 4.1.5(1)-release (i486-pc-linux-gnu)
For all time people, once and for all.
There's a "clean code" long way, and there is a shorter, more concise, bash centered way.
$1 = The index or key you are looking for.
$2 = The array / map passed in by reference.
function hasKey ()
{
local -r needle="${1:?}"
local -nr haystack=${2:?}
for key in "${!haystack[#]}"; do
if [[ $key == $needle ]] ;
return 0
fi
done
return 1
}
A linear search can be replaced by a binary search, which would perform better with larger data sets. Simply count and sort the keys first, then do a classic binary halving of of the haystack as you get closer and closer to the answer.
Now, for the purist out there that is like "No, I want the more performant version because I may have to deal with large arrays in bash," lets look at a more bash centered solution, but one that maintains clean code and the flexibility to deal with arrays or maps.
function hasKey ()
{
local -r needle="${1:?}"
local -nr haystack=${2:?}
[ -n ${haystack["$needle"]+found} ]
}
The line [ -n ${haystack["$needle"]+found} ]uses the ${parameter+word} form of bash variable expansion, not the ${parameter:+word} form, which attempts to test the value of a key, too, which is not the matter at hand.
Usage
local -A person=(firstname Anthony lastname Rutledge)
if hasMapKey "firstname" person; then
# Do something
fi
When not performing substring expansion, using the form described
below (e.g., ‘:-’), Bash tests for a parameter that is unset or null.
Omitting the colon results in a test only for a parameter that is
unset. Put another way, if the colon is included, the operator tests
for both parameter’s existence and that its value is not null; if the
colon is omitted, the operator tests only for existence.
${parameter:-word}
If parameter is unset or null, the expansion of word is substituted. Otherwise, the value of parameter is substituted.
${parameter:=word}
If parameter is unset or null, the expansion of word is assigned to parameter. The value of parameter is then substituted. Positional
parameters and special parameters may not be assigned to in this way.
${parameter:?word}
If parameter is null or unset, the expansion of word (or a message to that effect if word is not present) is written to the standard
error and the shell, if it is not interactive, exits. Otherwise, the
value of parameter is substituted. ${parameter:+word}
If parameter is null or unset, nothing is substituted, otherwise the expansion of word is substituted.
https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html#Shell-Parameter-Expansion
If $needle does not exist expand to nothing, otherwise expand to the non-zero length string, "found". This will make the -n test succeed if the $needle in fact does exist (as I say "found"), and fail otherwise.
Both in the case of arrays and hash maps I find the easiest and more straightforward solution is to use the matching operator =~.
For arrays:
myArray=("red" "black" "blue")
if [[ " ${myArray[#]} " =~ " blue " ]]; then
echo "blue exists in myArray"
else
echo "blue does not exist in myArray"
fi
NOTE: The spaces around the array guarantee the first and last element can match. The spaces around the value guarantee an exact match.
For hash maps, it's actually the same solution since printing a hash map as a string gives you a list of its values.
declare -A myMap
myMap=(
["key1"]="red"
["key2"]="black"
["key3"]="blue"
)
if [[ " ${myMap[#]} " =~ " blue " ]]; then
echo "blue exists in myMap"
else
echo "blue does not exist in myMap"
fi
But what if you would like to check whether a key exists in a hash map? In the case you can use the ! operator which gives you the list of keys in a hash map.
if [[ " ${!myMap[#]} " =~ " key3 " ]]; then
echo "key3 exists in myMap"
else
echo "key3 does not exist in myMap"
fi
I get bad array subscript error when the key I'm checking is not set. So, I wrote a function that loops over the keys:
#!/usr/bin/env bash
declare -A helpList
function get_help(){
target="$1"
for key in "${!helpList[#]}";do
if [[ "$key" == "$target" ]];then
echo "${helpList["$target"]}"
return;
fi
done
}
targetValue="$(get_help command_name)"
if [[ -z "$targetvalue" ]];then
echo "command_name is not set"
fi
It echos the value when it is found & echos nothing when not found. All the other solutions I tried gave me that error.

Does array 1 contain any of the strings in array 2

I am trying to make an if statement where if array1 contains any of the strings in array2 it should print "match" else print "no match"
So far I have the following. Not sure how to complete it. Both loops should break as soon as the first match is found.
#!/bin/bash
array1=(a b c 1 2 3)
array2=(b 1)
for a in "${array1[#]}"
do
for b in "${array2[#]}"
do
if [ "$a" == "$b" ]; then
echo "Match!"
break
fi
done
done
Maybe this isn't even the best way to do it?
This illustrates the desired result
if [ array1 contains strings in array2 ]
then
echo "match"
else
echo "no match"
fi
To check whether array1 contains any entry from array2 you can use grep. This will be way faster and shorter than loops in bash.
The following commands exit with status code 0 if and only if there is a match. Use them as ...
if COMMAND FROM BELOW; then
echo match
else
echo no match
fi
Single-Line Array Entries
The simple version for strings without linebreaks is
printf %s\\n "${array1[#]}" | grep -qFxf <(printf %s\\n "${array2[#]}")
Multiline Array Entries
Sadly there doesn't seem to be a straightforward way to make this work for array entries with linebreaks. GNU grep has the option -z to set the "line" delimiters in the input to null, but apparently no option to do the same for the file provided to -f. Listing the entries from array2 as -e arguments to grep is not working either -- grep -F seems to be unable to match multiline patterns. However, we can use the following hack:
printf %q\\n "${array1[#]}" | grep -qFxf <(printf %q\\n "${array2[#]}")
Here we assume that bash's built-in printf %q always prints a unique single line -- which it currently does. However, future implementations of bash may change this. The documentation help printf only states that the output thas to be correctly quoted for bash.
For a fast solution, you're better off using an external tool that can process the entire array as a whole (such as the grep-based answers). Doing nested loops in pure bash is likely to be slower for any substantial amount of data (where the item-by-item processing in bash is likely to be more expensive than the external process start-up time).
However, if you do need a pure bash solution, I see that your current solution has no way to print out the "no match" scenario. In addition, it may print out "match" multiple times.
To fix that, you can just store the fact that a match has been found, and use that to both:
exit the outer loop early as well as the inner loop; and
print the correct string at the end.
To do this, you can use something like:
#!/bin/bash
# Test data.
array1=(a b c 1 2 3)
array2=(b 1)
# Default to not-found state.
foundMatch=false
for a in "${array1[#]}" ; do
for b in "${array2[#]}" ; do
# Any match switches to found state and exits inner loop.
[[ "$a" == "$b" ]] && foundMatch=true && break
done
# If found, exit outer loop as well.
${foundMatch} && break
done
# Output appropriate message for found/not-found state.
$foundMatch && echo "Match" || echo "No match"
For array elements which does not contain newlines, the grep -qf with printf "%s\n" would be a good option. For comparing arrays with any elements, I ended with this:
cmp -s /dev/null <(comm -z12 <(printf "%s\0" "${array1[#]}" | sort -z) <(printf "%s\0" "${array2[#]}" | sort -z))
The printf "%s\0" "${array[#]}" | sort -z print a sorted list of zero terminated array elements. The comm -z12 then extracts common elements in both lists. The cmp -s /dev/null checks if the output of comm is empty, which will not be empty if any element is in both lists. You could use [ -z "$(comm -z ...)" ] to check if the output of comm would be empty, but bash will complain that the output of a command captured with $(..) contains a null byte, so it's better to cmp -s /dev/null.
I think | is faster then <(), so your if could be:
if ! printf "%s\0" "${array1[#]}" | sort -z |
comm -z12 - <(printf "%s\0" "${array2[#]}" | sort -z) |
cmp -s /dev/null -; then
echo "Some elements are in both array1 and array2"
fi
The following could work:
printf "%s\0" "${array1[#]}" | eval grep -qzFx "$(printf " -e %q" "${array2[#]}")"
But I believe I found a bug in grepv3.1 when matching a newline character with -x flag. If you don't use the newline character, the above line works.
Would you try the following:
array1=(a b c 1 2 3)
array2=(b 1)
declare -A seen # set marks of the elements of array1
for b in "${array2[#]}"; do
(( seen[$b]++ ))
done
for a in "${array1[#]}"; do
(( ${seen[$a]} )) && echo "match" && exit
done
echo "no match"
It may be efficient by avoiding the double loop, although the discussion of efficiency may be meaningless as long as using bash :)

How do you unset all empty array elements in bash? [duplicate]

I need to remove an element from an array in bash shell.
Generally I'd simply do:
array=("${(#)array:#<element to remove>}")
Unfortunately the element I want to remove is a variable so I can't use the previous command.
Down here an example:
array+=(pluto)
array+=(pippo)
delete=(pluto)
array( ${array[#]/$delete} ) -> but clearly doesn't work because of {}
Any idea?
The following works as you would like in bash and zsh:
$ array=(pluto pippo)
$ delete=pluto
$ echo ${array[#]/$delete}
pippo
$ array=( "${array[#]/$delete}" ) #Quotes when working with strings
If need to delete more than one element:
...
$ delete=(pluto pippo)
for del in ${delete[#]}
do
array=("${array[#]/$del}") #Quotes when working with strings
done
Caveat
This technique actually removes prefixes matching $delete from the elements, not necessarily whole elements.
Update
To really remove an exact item, you need to walk through the array, comparing the target to each element, and using unset to delete an exact match.
array=(pluto pippo bob)
delete=(pippo)
for target in "${delete[#]}"; do
for i in "${!array[#]}"; do
if [[ ${array[i]} = $target ]]; then
unset 'array[i]'
fi
done
done
Note that if you do this, and one or more elements is removed, the indices will no longer be a continuous sequence of integers.
$ declare -p array
declare -a array=([0]="pluto" [2]="bob")
The simple fact is, arrays were not designed for use as mutable data structures. They are primarily used for storing lists of items in a single variable without needing to waste a character as a delimiter (e.g., to store a list of strings which can contain whitespace).
If gaps are a problem, then you need to rebuild the array to fill the gaps:
for i in "${!array[#]}"; do
new_array+=( "${array[i]}" )
done
array=("${new_array[#]}")
unset new_array
You could build up a new array without the undesired element, then assign it back to the old array. This works in bash:
array=(pluto pippo)
new_array=()
for value in "${array[#]}"
do
[[ $value != pluto ]] && new_array+=($value)
done
array=("${new_array[#]}")
unset new_array
This yields:
echo "${array[#]}"
pippo
This is the most direct way to unset a value if you know it's position.
$ array=(one two three)
$ echo ${#array[#]}
3
$ unset 'array[1]'
$ echo ${array[#]}
one three
$ echo ${#array[#]}
2
This answer is specific to the case of deleting multiple values from large arrays, where performance is important.
The most voted solutions are (1) pattern substitution on an array, or (2) iterating over the array elements. The first is fast, but can only deal with elements that have distinct prefix, the second has O(n*k), n=array size, k=elements to remove. Associative array are relative new feature, and might not have been common when the question was originally posted.
For the exact match case, with large n and k, possible to improve performance from O(nk) to O(n+klog(k)). In practice, O(n) assuming k much lower than n. Most of the speed up is based on using associative array to identify items to be removed.
Performance (n-array size, k-values to delete). Performance measure seconds of user time
N K New(seconds) Current(seconds) Speedup
1000 10 0.005 0.033 6X
10000 10 0.070 0.348 5X
10000 20 0.070 0.656 9X
10000 1 0.043 0.050 -7%
As expected, the current solution is linear to N*K, and the fast solution is practically linear to K, with much lower constant. The fast solution is slightly slower vs the current solution when k=1, due to additional setup.
The 'Fast' solution: array=list of input, delete=list of values to remove.
declare -A delk
for del in "${delete[#]}" ; do delk[$del]=1 ; done
# Tag items to remove, based on
for k in "${!array[#]}" ; do
[ "${delk[${array[$k]}]-}" ] && unset 'array[k]'
done
# Compaction
array=("${array[#]}")
Benchmarked against current solution, from the most-voted answer.
for target in "${delete[#]}"; do
for i in "${!array[#]}"; do
if [[ ${array[i]} = $target ]]; then
unset 'array[i]'
fi
done
done
array=("${array[#]}")
Here's a one-line solution with mapfile:
$ mapfile -d $'\0' -t arr < <(printf '%s\0' "${arr[#]}" | grep -Pzv "<regexp>")
Example:
$ arr=("Adam" "Bob" "Claire"$'\n'"Smith" "David" "Eve" "Fred")
$ echo "Size: ${#arr[*]} Contents: ${arr[*]}"
Size: 6 Contents: Adam Bob Claire
Smith David Eve Fred
$ mapfile -d $'\0' -t arr < <(printf '%s\0' "${arr[#]}" | grep -Pzv "^Claire\nSmith$")
$ echo "Size: ${#arr[*]} Contents: ${arr[*]}"
Size: 5 Contents: Adam Bob David Eve Fred
This method allows for great flexibility by modifying/exchanging the grep command and doesn't leave any empty strings in the array.
Partial answer only
To delete the first item in the array
unset 'array[0]'
To delete the last item in the array
unset 'array[-1]'
To expand on the above answers, the following can be used to remove multiple elements from an array, without partial matching:
ARRAY=(one two onetwo three four threefour "one six")
TO_REMOVE=(one four)
TEMP_ARRAY=()
for pkg in "${ARRAY[#]}"; do
for remove in "${TO_REMOVE[#]}"; do
KEEP=true
if [[ ${pkg} == ${remove} ]]; then
KEEP=false
break
fi
done
if ${KEEP}; then
TEMP_ARRAY+=(${pkg})
fi
done
ARRAY=("${TEMP_ARRAY[#]}")
unset TEMP_ARRAY
This will result in an array containing:
(two onetwo three threefour "one six")
Here's a (probably very bash-specific) little function involving bash variable indirection and unset; it's a general solution that does not involve text substitution or discarding empty elements and has no problems with quoting/whitespace etc.
delete_ary_elmt() {
local word=$1 # the element to search for & delete
local aryref="$2[#]" # a necessary step since '${!$2[#]}' is a syntax error
local arycopy=("${!aryref}") # create a copy of the input array
local status=1
for (( i = ${#arycopy[#]} - 1; i >= 0; i-- )); do # iterate over indices backwards
elmt=${arycopy[$i]}
[[ $elmt == $word ]] && unset "$2[$i]" && status=0 # unset matching elmts in orig. ary
done
return $status # return 0 if something was deleted; 1 if not
}
array=(a 0 0 b 0 0 0 c 0 d e 0 0 0)
delete_ary_elmt 0 array
for e in "${array[#]}"; do
echo "$e"
done
# prints "a" "b" "c" "d" in lines
Use it like delete_ary_elmt ELEMENT ARRAYNAME without any $ sigil. Switch the == $word for == $word* for prefix matches; use ${elmt,,} == ${word,,} for case-insensitive matches; etc., whatever bash [[ supports.
It works by determining the indices of the input array and iterating over them backwards (so deleting elements doesn't screw up iteration order). To get the indices you need to access the input array by name, which can be done via bash variable indirection x=1; varname=x; echo ${!varname} # prints "1".
You can't access arrays by name like aryname=a; echo "${$aryname[#]}, this gives you an error. You can't do aryname=a; echo "${!aryname[#]}", this gives you the indices of the variable aryname (although it is not an array). What DOES work is aryref="a[#]"; echo "${!aryref}", which will print the elements of the array a, preserving shell-word quoting and whitespace exactly like echo "${a[#]}". But this only works for printing the elements of an array, not for printing its length or indices (aryref="!a[#]" or aryref="#a[#]" or "${!!aryref}" or "${#!aryref}", they all fail).
So I copy the original array by its name via bash indirection and get the indices from the copy. To iterate over the indices in reverse I use a C-style for loop. I could also do it by accessing the indices via ${!arycopy[#]} and reversing them with tac, which is a cat that turns around the input line order.
A function solution without variable indirection would probably have to involve eval, which may or may not be safe to use in that situation (I can't tell).
Using unset
To remove an element at particular index, we can use unset and then do copy to another array. Only just unset is not required in this case. Because unset does not remove the element it just sets null string to the particular index in array.
declare -a arr=('aa' 'bb' 'cc' 'dd' 'ee')
unset 'arr[1]'
declare -a arr2=()
i=0
for element in "${arr[#]}"
do
arr2[$i]=$element
((++i))
done
echo "${arr[#]}"
echo "1st val is ${arr[1]}, 2nd val is ${arr[2]}"
echo "${arr2[#]}"
echo "1st val is ${arr2[1]}, 2nd val is ${arr2[2]}"
Output is
aa cc dd ee
1st val is , 2nd val is cc
aa cc dd ee
1st val is cc, 2nd val is dd
Using :<idx>
We can remove some set of elements using :<idx> also. For example if we want to remove 1st element we can use :1 as mentioned below.
declare -a arr=('aa' 'bb' 'cc' 'dd' 'ee')
arr2=("${arr[#]:1}")
echo "${arr2[#]}"
echo "1st val is ${arr2[1]}, 2nd val is ${arr2[2]}"
Output is
bb cc dd ee
1st val is cc, 2nd val is dd
http://wiki.bash-hackers.org/syntax/pe#substring_removal
${PARAMETER#PATTERN} # remove from beginning
${PARAMETER##PATTERN} # remove from the beginning, greedy match
${PARAMETER%PATTERN} # remove from the end
${PARAMETER%%PATTERN} # remove from the end, greedy match
In order to do a full remove element, you have to do an unset command with an if statement. If you don't care about removing prefixes from other variables or about supporting whitespace in the array, then you can just drop the quotes and forget about for loops.
See example below for a few different ways to clean up an array.
options=("foo" "bar" "foo" "foobar" "foo bar" "bars" "bar")
# remove bar from the start of each element
options=("${options[#]/#"bar"}")
# options=("foo" "" "foo" "foobar" "foo bar" "s" "")
# remove the complete string "foo" in a for loop
count=${#options[#]}
for ((i = 0; i < count; i++)); do
if [ "${options[i]}" = "foo" ] ; then
unset 'options[i]'
fi
done
# options=( "" "foobar" "foo bar" "s" "")
# remove empty options
# note the count variable can't be recalculated easily on a sparse array
for ((i = 0; i < count; i++)); do
# echo "Element $i: '${options[i]}'"
if [ -z "${options[i]}" ] ; then
unset 'options[i]'
fi
done
# options=("foobar" "foo bar" "s")
# list them with select
echo "Choose an option:"
PS3='Option? '
select i in "${options[#]}" Quit
do
case $i in
Quit) break ;;
*) echo "You selected \"$i\"" ;;
esac
done
Output
Choose an option:
1) foobar
2) foo bar
3) s
4) Quit
Option?
Hope that helps.
There is also this syntax, e.g. if you want to delete the 2nd element :
array=("${array[#]:0:1}" "${array[#]:2}")
which is in fact the concatenation of 2 tabs. The first from the index 0 to the index 1 (exclusive) and the 2nd from the index 2 to the end.
POSIX shell script does not have arrays.
So most probably you are using a specific dialect such as bash, korn shells or zsh.
Therefore, your question as of now cannot be answered.
Maybe this works for you:
unset array[$delete]
What I do is:
array="$(echo $array | tr ' ' '\n' | sed "/itemtodelete/d")"
BAM, that item is removed.
This is a quick-and-dirty solution that will work in simple cases but will break if (a) there are regex special characters in $delete, or (b) there are any spaces at all in any items. Starting with:
array+=(pluto)
array+=(pippo)
delete=(pluto)
Delete all entries exactly matching $delete:
array=(`echo $array | fmt -1 | grep -v "^${delete}$" | fmt -999999`)
resulting in
echo $array -> pippo, and making sure it's an array:
echo $array[1] -> pippo
fmt is a little obscure: fmt -1 wraps at the first column (to put each item on its own line. That's where the problem arises with items in spaces.) fmt -999999 unwraps it back to one line, putting back the spaces between items. There are other ways to do that, such as xargs.
Addendum: If you want to delete just the first match, use sed, as described here:
array=(`echo $array | fmt -1 | sed "0,/^${delete}$/{//d;}" | fmt -999999`)
Actually, I just noticed that the shell syntax somewhat has a behavior built-in that allows for easy reconstruction of the array when, as posed in the question, an item should be removed.
# let's set up an array of items to consume:
x=()
for (( i=0; i<10; i++ )); do
x+=("$i")
done
# here, we consume that array:
while (( ${#x[#]} )); do
i=$(( $RANDOM % ${#x[#]} ))
echo "${x[i]} / ${x[#]}"
x=("${x[#]:0:i}" "${x[#]:i+1}")
done
Notice how we constructed the array using bash's x+=() syntax?
You could actually add more than one item with that, the content of a whole other array at once.
In ZSH this is dead easy (note this uses more bash compatible syntax than necessary where possible for ease of understanding):
# I always include an edge case to make sure each element
# is not being word split.
start=(one two three 'four 4' five)
work=(${(#)start})
idx=2
val=${work[idx]}
# How to remove a single element easily.
# Also works for associative arrays (at least in zsh)
work[$idx]=()
echo "Array size went down by one: "
[[ $#work -eq $(($#start - 1)) ]] && echo "OK"
echo "Array item "$val" is now gone: "
[[ -z ${work[(r)$val]} ]] && echo OK
echo "Array contents are as expected: "
wanted=("${start[#]:0:1}" "${start[#]:2}")
[[ "${(j.:.)wanted[#]}" == "${(j.:.)work[#]}" ]] && echo "OK"
echo "-- array contents: start --"
print -l -r -- "-- $#start elements" ${(#)start}
echo "-- array contents: work --"
print -l -r -- "-- $#work elements" "${work[#]}"
Results:
Array size went down by one:
OK
Array item two is now gone:
OK
Array contents are as expected:
OK
-- array contents: start --
-- 5 elements
one
two
three
four 4
five
-- array contents: work --
-- 4 elements
one
three
four 4
five
To avoid conflicts with array index using unset - see https://stackoverflow.com/a/49626928/3223785 and https://stackoverflow.com/a/47798640/3223785 for more information - reassign the array to itself: ARRAY_VAR=(${ARRAY_VAR[#]}).
#!/bin/bash
ARRAY_VAR=(0 1 2 3 4 5 6 7 8 9)
unset ARRAY_VAR[5]
unset ARRAY_VAR[4]
ARRAY_VAR=(${ARRAY_VAR[#]})
echo ${ARRAY_VAR[#]}
A_LENGTH=${#ARRAY_VAR[*]}
for (( i=0; i<=$(( $A_LENGTH -1 )); i++ )) ; do
echo ""
echo "INDEX - $i"
echo "VALUE - ${ARRAY_VAR[$i]}"
done
exit 0
[Ref.: https://tecadmin.net/working-with-array-bash-script/ ]
How about something like:
array=(one two three)
array_t=" ${array[#]} "
delete=one
array=(${array_t// $delete / })
unset array_t
#/bin/bash
echo "# define array with six elements"
arr=(zero one two three 'four 4' five)
echo "# unset by index: 0"
unset -v 'arr[0]'
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
arr_delete_by_content() { # value to delete
for i in ${!arr[*]}; do
[ "${arr[$i]}" = "$1" ] && unset -v 'arr[$i]'
done
}
echo "# unset in global variable where value: three"
arr_delete_by_content three
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
echo "# rearrange indices"
arr=( "${arr[#]}" )
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
delete_value() { # value arrayelements..., returns array decl.
local e val=$1; new=(); shift
for e in "${#}"; do [ "$val" != "$e" ] && new+=("$e"); done
declare -p new|sed 's,^[^=]*=,,'
}
echo "# new array without value: two"
declare -a arr="$(delete_value two "${arr[#]}")"
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
delete_values() { # arraydecl values..., returns array decl. (keeps indices)
declare -a arr="$1"; local i v; shift
for v in "${#}"; do
for i in ${!arr[*]}; do
[ "$v" = "${arr[$i]}" ] && unset -v 'arr[$i]'
done
done
declare -p arr|sed 's,^[^=]*=,,'
}
echo "# new array without values: one five (keep indices)"
declare -a arr="$(delete_values "$(declare -p arr|sed 's,^[^=]*=,,')" one five)"
for i in ${!arr[*]}; do echo "arr[$i]=${arr[$i]}"; done
# new array without multiple values and rearranged indices is left to the reader

Create associative array from grep output

I have a grep output and I'm trying to make an associative array from the output that I get.
Here is my grep output:
"HardwareSerialNumber": "123456789101",
"DeviceId": "devid1234",
"HardwareSerialNumber": "111213141516",
"DeviceId": "devid5678",
I want to use that output to define an associative array, like this:
array[123456789101]=devid1234
array[11213141516]=devid5678
Is that possible? I'm new at making arrays. I hope someone could help me in my problem.
Either pipe your grep output to a helper script with a while loop containing a simple "0/1" toggle to read two lines taking the last field of each to fill your array, e.g.
#!/bin/bash
declare -A array
declare -i n=0
arridx=
while read -r label value; do # read 2 fields
if [ "$n" -eq 0 ]
then
arridx="${value:1}" # strip 1st and lst 2 chars
arridx="${arridx:0:(-2)}" # save in arridx (array index)
((n++)) # increment toggle
else
arrval="${value:1}" # strip 1st and lst 2 chars
arrval="${arrval:0:(-2)}" # save in arrval (array value)
array[$arridx]="$arrval" # assign to associative array
n=0 # zero toggle
fi
done
for i in ${!array[#]}; do # output array
echo "array[$i] ${array[$i]}"
done
Or you can use process substitution containing the grep command within the script to do the same thing, e.g.
done < <( your grep command )
You can also add a check under the else clause that if [[ $label =~ DeviceId ]] to validate you are on the right line and catch any variation in the grep output content.
Example Input
$ cat dat/grepout.txt
"HardwareSerialNumber": "123456789101",
"DeviceId": "devid1234",
"HardwareSerialNumber": "111213141516",
"DeviceId": "devid5678",
Example Use/Output
$ cat dat/grepout.txt | bash parsegrep2array.sh
array[123456789101] devid1234
array[111213141516] devid5678
Parsing out the values is easy, and once you have them you can certainly use those values to build up an array. The trickiest part comes from the fact that you need to combine input from separate lines. Here is one approach; note that this script is verbose on purpose, to show what's going on; once you see what's happening, you can eliminate most of the output:
so.input
"HardwareSerialNumber": "123456789101",
"DeviceId": "devid1234",
"HardwareSerialNumber": "111213141516",
"DeviceId": "devid5678",
so.sh
#!/bin/bash
declare -a hardwareInfo
while [[ 1 ]]; do
# read in two lines of input
# if either line is the last one, we don't have enough input to proceed
read lineA < "${1:-/dev/stdin}"
# if EOF or empty line, exit
if [[ "$lineA" == "" ]]; then break; fi
read lineB < "${1:-/dev/stdin}"
# if EOF or empty line, exit
if [[ "$lineB" == "" ]]; then break; fi
echo "$lineA"
echo "$lineB"
hwsn=$lineA
hwsn=${hwsn//HardwareSerialNumber/}
hwsn=${hwsn//\"/}
hwsn=${hwsn//:/}
hwsn=${hwsn//,/}
echo $hwsn
# some checking could be done here to test that the value is numeric
devid=$lineB
devid=${devid//DeviceId/}
devid=${devid//\"/}
devid=${devid//:/}
devid=${devid//,/}
echo $devid
# some checking could be done here to make sure the value is valid
# populate the array
hardwareInfo[$hwsn]=$devid
done
# spacer, for readability of the output
echo
# display the array; in your script, you would do something different and useful
for key in "${!hardwareInfo[#]}"; do echo $key --- ${hardwareInfo[$key]}; done
cat so.input | ./so.sh
"HardwareSerialNumber": "123456789101",
"DeviceId": "devid1234",
123456789101
devid1234
"HardwareSerialNumber": "111213141516",
"DeviceId": "devid5678",
111213141516
devid5678
111213141516 --- devid5678
123456789101 --- devid1234
I created the input file so.input just for convenience. You would probably pipe your grep output into the bash script, like so:
grep-command | ./so.sh
EDIT #1: There are lots of choices for parsing out the key and value from the strings fed in by grep; the answer from #David C. Rankin shows another way. The best way depends on what you can rely on about the content and structure of the grep output.
There are also several choices for reading two separate lines that are related to each other; David's "toggle" approach is also good, and commonly used; I considered it myself, before going with "read two lines and stop if either is blank".
EDIT #2: I see declare -A in David's answer and in examples on the web; I used declare -a because that's what my version of bash wants (I'm using a Mac). So, just be aware that there can be differences.

BASH: Best way to set variable from array

Bash 4 on Linux ~
I have an array of possible values. I must restrict user input to these values.
Arr=(hello kitty goodbye quick fox)
User supplies value as argument to script:
bash myscript.sh -b var
Currently, I'm trying the following:
function func_exists () {
_var="$1"
for i in ${Arr[#]}
do
if [ "$i" == "$_var" ]
then
echo hooray for "$_var"
return 1
fi
done
return 0
}
func_exists $var
if [ $? -ne 1 ];then
echo "Not a permitted value."
func_help
exit $E_OPTERROR
fi
Seems to work fine, are there better methods for testing user input against an array of allowed values?
UPDATE: I like John K's answer ...can someone clarify the use of $#? I understand that this represents all positional parameters -- so we shift the first param off the stack and $# now represents all remaining params, those being the passed array ...is that correct? I hate blindly using code without understanding ...even if it works!
Your solution is what I'd do. Maybe using a few more shell-isms, such as returning 0 for success and non-0 for failure like UNIX commands do in general.
# Tests if $1 is in the array ($2 $3 $4 ...).
is_in() {
value=$1
shift
for i in "$#"; do
[[ $i == $value ]] && return 0
done
return 1
}
if ! is_in "$var" "${Arr[#]}"; then
echo "Not a permitted value." >&2
func_help
exit $E_OPTERROR
fi
Careful use of double quotes makes sure this will work even if the individual array entries contain spaces, which is allowed. This is a two element array: list=('hello world' 'foo bar').
Another solution. is_in is just a variable:
Arr=(hello kitty goodbye quick fox)
var='quick'
string=" ${Arr[*]} " # array to string, framed with blanks
is_in=1 # false
# try to delete the variable inside the string; true if length differ
[ "$string" != "${string/ ${var} /}" ] && is_in=0
echo -e "$is_in"
function func_exists () {
case "$1"
in
hello)
kitty)
goodbye)
quick)
fox)
return 1;;
*)
return 0;;
esac
}

Resources