Surprising array expansion behaviour - arrays

I've been surprised with the line marked (!!) in the following example:
log1 () { echo $#; }
log2 () { echo "$#"; }
X=(a b)
IFS='|'
echo ${X[#]} # prints a b
echo "${X[#]}" # prints a b
echo ${X[*]} # prints a b
echo "${X[*]}" # prints a|b
echo "---"
log1 ${X[#]} # prints a b
log1 "${X[#]}" # prints a b
log1 ${X[*]} # prints a b
log1 "${X[*]}" # prints a b (!!)
echo "---"
log2 ${X[#]} # prints a b
log2 "${X[#]}" # prints a b
log2 ${X[*]} # prints a b
log2 "${X[*]}" # prints a|b
Here is my understanding of the behavior:
${X[*]} and ${X[#]} both expand to a b
"${X[*]}" expands to "a|b"
"${X[#]}" expands to "a" "b"
$* and $# have the same behavior as ${X[*]} and ${X[#]}, except for their content being the parameters of the program or function
This seems to be confirmed by the bash manual.
In the line log1 "${X[*]}", I therefore expect the quoted expression to expand to "a|b", then to be passed to the log1 function. The function has a single string parameter which it displays. Why does something else happen?
It'd be cool if your answers were backed by manual/standard references!

IFS is used not just to join the elements of ${X[*]}, but also to split the unquoted expansion $#. For log1 "${X[*]}", the following happens:
"${X[*]}" expands to a|b as expected, so $1 is set to a|b inside log1.
When $# (unquoted) is expanded, the resulting string is a|b.
The unquoted expansion undergoes word-splitting with | as the delimiter (due to the global value of IFS), so that echo receives two arguments, a and b.

That's because $IFS is set to |:
(X='a|b' ; IFS='|' ; echo $X)
Output:
a b
man bash says:
IFS The Internal Field Separator that is used for word splitting after expansion ...

In the POSIX spec section on [Special Parameters[(http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_02) we find.
#
Expands to the positional parameters, starting from one. When the expansion occurs within double-quotes, and where field splitting (see Field Splitting) is performed, each positional parameter shall expand as a separate field, with the provision that the expansion of the first parameter shall still be joined with the beginning part of the original word (assuming that the expanded parameter was embedded within a word), and the expansion of the last parameter shall still be joined with the last part of the original word. If there are no positional parameters, the expansion of '#' shall generate zero fields, even when '#' is double-quoted.
*
Expands to the positional parameters, starting from one. When the expansion occurs within a double-quoted string (see Double-Quotes), it shall expand to a single field with the value of each parameter separated by the first character of the IFS variable, or by a if IFS is unset. If IFS is set to a null string, this is not equivalent to unsetting it; its first character does not exist, so the parameter values are concatenated.
So starting with the quoted variants (they are simpler):
We see that the * expansion "expand[s] to a single field with the value of each parameter separated by the first character of the IFS variable". This is why you get a|b from echo "${X[*]" and log2 "${X[*]}".
We also see that the # expansion expands such that "each positional parameter shall expand as a separate field". This is why you get a b from echo "${X[#]}" and log2 "${X[#]}".
Did you see that note about field splitting in the spec text? "where field splitting (see Field Splitting) is performed"? That's the key to the mystery here.
Outside of quotes the behavior of the expansions is the same. The difference is what happens after that. Specifically, field/word splitting.
The simplest way to show the problem is to run your code with set -x enabled.
Which gets you this:
+ X=(a b)
+ IFS='|'
+ echo a b
a b
+ echo a b
a b
+ echo a b
a b
+ echo 'a|b'
a|b
+ echo ---
---
+ log1 a b
+ echo a b
a b
+ log1 a b
+ echo a b
a b
+ log1 a b
+ echo a b
a b
+ log1 'a|b'
+ echo a b
a b
+ echo ---
---
+ log2 a b
+ echo a b
a b
+ log2 a b
+ echo a b
a b
+ log2 a b
+ echo a b
a b
+ log2 'a|b'
+ echo 'a|b'
a|b
The thing to notice here is that by the time log1 is called in all but the final case the | is already gone.
The reason it is already gone is because without quotes the results of the variable expansion (in this case the * expansion) are field/word split. And since IFS is used both to combine the fields being expanded and then to split them again the | gets swallowed by field splitting.
And to finish the explanation (for the case actually in question), the reason this fails for log1 even with the quoted version of the expansion in the call (i.e. log1 "${X[*]}" which expands to log1 "a|b" correctly) is because log1 itself does not use a quoted expansion of # so the expansion of # in the function is itself word-split (as can be seen by echo a b in that log1 case as well as all the other log1 cases).

Related

bash array into string pattern

How do I change a bash array into a string with this pattern? Let's say I have arr=(a b c d) and I want to change to this pattern
'a' + 'b' + 'c' + 'd' with the white space in between.
PS-I figured out this pattern 'a'+'b'+'c'+'d' but not sure how to put " + " instead of just "+" in between.
In pure bash without resorting to an external command and without creating a subshell, using bash built-in printf's implicit loop with -v option (which assigns the output to the variable rather than printing it):
printf -v str " + '%s'" "${arr[#]}"
str=${str:3} # to strip off leading ' + '
echo "$str"
This will also work with array elements containing blank characters (try with arr=(a b c d "x y") ). This solution assumes arr is not an empty array.
Like this:
#!/bin/bash
arr=(a b c d)
str="${arr[#]}"
str=${str// / + } # Parameter Expansion
sed "s/[a-z]/'&'/g" <<< "$str"
Output
'a' + 'b' + 'c' + 'd'
Check
See: http://mywiki.wooledge.org/BashFAQ/073 and "Parameter Expansion" in man bash. Also see http://wiki.bash-hackers.org/syntax/pe.
Here’s a version in pure Bash that also tolerates whitespace in arr members.
arr=(a b c d 'e f' 'g h')
aux=("${arr[#]/#/ \'}") # add prefix " '"
aux=("${aux[#]/%/\' }") # add suffix "' "
str="$(IFS=+ # join with "+"
printf '%s' "${aux[*]}")"
str="${str:1:${#str} - 2}" # trim spaces
printf '%s\n' "$str"

Recursive parsing and arrays in shell script

I intend to accept a single argument for my shell script my_script.sh and parse the values from it using separators. For example,
./my_script.sh a-e,f/b-1/c-5,g/d
means my primary separator is / and secondary separator is - and tertiary separator is ,. The challenge here is the number of values separated by , or - is not fixed, but variable. Like in d, there is no - or , at all. I can always parse the values separated by / as:
IFS='/' read -ra list_l1 <<<$1
This way, I get the number of times I need to loop over. But I'm stuck trying a parsing within list_l1. Here,
I need to see if there is - and , or if they are there at all.
If there is - and ,, get the values after - and pass it/them as arguments to another script (eg. for a e,f will be passed as separate arguments to another script).
If there is no - and ,, just run another script without arguments (eg. for d, another script is run without any arguments).
How can I get this done?
UPDATE:
I managed to figure a way for level one:
IFS='/' read -ra list_l1 <<<$1
for i in "${!list_l1[#]}"; do
list_l2[$i]="${list_l1[$i]//,/$' '}"
# This section is a pseudocode of what I would like to do:
get 'type' from first part (before '-' as in example above)
if type == 'a':
pass the with parameters after '-' to another .sh script, discarding the separators '-', ','
elif type == 'b':
pass the with parameters after '-' to another .sh script, discarding the separators '-', ','
elif type == 'c':
pass the with parameters after '-' to another .sh script, discarding the separators '-', ','
elif type == 'd':
pass the with parameters after '-' to another .sh script, discarding the separators '-', ','
# This section is a pseudocode of what I would like to do:
done
Take a look at this:
#!/usr/bin/env bash
f() { printf 'I am called with %d arguments: %s\n' "$#" "$*"; }
param='a-e,f/b-1/c-5,g/d'
IFS=/ read -ra a <<< "$param"
for i in "${a[#]}"; do
IFS=- read -r _ b <<< "$i"
IFS=, read -ra c <<< "$b"
f "${c[#]}"
done
$ ./script
I am called with 2 arguments: e f
I am called with 1 arguments: 1
I am called with 2 arguments: 5 g
I am called with 0 arguments:
Based on what I understood of your question, I produced this code:
** Edit no1, calling another script using that array**
#!/bin/bash
arg='a-e,f/b-1/c-5,g/d'
# Cuts it in [a-e,f] [b-1] [c5,g] [d]
IFS='//' read -ra list_l1 <<<$arg
echo "First cut on /."
echo "Content of list_l1"
for K in "${!list_l1[#]}"
do
echo "list_l1[$K]: ${list_l1[$K]}"
done
echo ""
declare -A list_l2
echo "Then loop, cut on '-' and replace ',' by ' '."
for onearg in ${list_l1[#]}
do
IFS='-' read part1 part2 <<<$onearg
list_l2[$part1]=$(echo $part2 | tr ',' ' ')
done
echo "Content of list_l2:"
for K in "${!list_l2[#]}"
do
echo "list_l2[$K]: ${list_l2[$K]}"
done
# Calling another script using these values
echo ""
for K in "${!list_l2[#]}"
do
echo "./another_script.sh ${list_l2[$K]}"
done
Which gives the following output:
$ ./t.bash
First cut on /.
Content of list_l1
list_l1[0]: a-e,f
list_l1[1]: b-1
list_l1[2]: c-5,g
list_l1[3]: d
Then loop, cut on '-' and replace ',' by ' '.
Content of list_l2:
list_l2[a]: e f
list_l2[b]: 1
list_l2[c]: 5 g
list_l2[d]:
./another_script.sh e f
./another_script.sh 1
./another_script.sh 5 g
./another_script.sh
Some details:
The first step is to cut on '/'. This creates list_l1.
All elements in list_l1 start with ['a', 'b', 'c', 'd', ...]. The first letter of each element after the cut on '/'.
Then each of these is cut a second time on '-'.
The first part of that cut (left of the '-') becomes key.
The second part of that cut (right of the '-') becomes the value.
list_l2 is created as an associative array, using the key and value that were just calculated.
This way list_l2 contains everything you need, without having to reference list_l1 at all later. If you need the list of keys, use ${!list_l2[#]}. If you need the list of values, use ${list_l2[#]}.
Let me know if that meets your requirement.

bash: Why is the value of ${#:-1} different to ${#: -1}?

Why does the addition of the space between the : and the -1 change the behaviour shown below?
(ins)$ set a b c d
(ins)$ echo ${#:-1}
a b c d
(ins)$ echo ${#: -1}
d
(ins)$
The same behaviour also affects $*.
I'm running GNU bash, version 4.4.5(1)-release (x86_64-unknown-linux-gnu).
To avoid ambiguity with the parameter expansion pattern:
${parameter:-word}
which means to substitute the expansion of word if `parameter is unset or null; otherwise the expansion of parameter is substituted.
So for the slicing operation, a space or parentheses is used:
$ set a b c d
$ echo "${#: -1}"
d
$ echo "${#:(-1)}"
d

How to expand the elements of an array in zsh?

Say I have an array in zsh
a=(1 2 3)
I want to append .txt to each element
echo ${a}.txt # this doesn't work
So the output is
1.txt 2.txt 3.txt
UPDATE:
I guess I can do this, but I think there's a more idiomatic way:
for i in $a; do
echo $i.txt
done
You need to set RC_EXPAND_PARAM option:
$ setopt RC_EXPAND_PARAM
$ echo ${a}.txt
1.txt 2.txt 3.txt
From zsh manual:
RC_EXPAND_PARAM (-P)
Array expansions of the form `foo${xx}bar', where the parameter xx is set to
(a b c), are substituted with `fooabar foobbar foocbar' instead of the
default `fooa b cbar'. Note that an empty array will therefore cause all
arguments to be removed.
You can also set this option just for for one array expansion using ^ flag:
$ echo ${^a}.txt
1.txt 2.txt 3.txt
$ echo ${^^a}.txt
1 2 3.txt
Again citing zsh manual:
${^spec}
Turn on the RC_EXPAND_PARAM option for the evaluation of spec; if the `^' is
doubled, turn it off. When this option is set, array expansions of the form
foo${xx}bar, where the parameter xx is set to (a b c), are substituted with
`fooabar foobbar foocbar' instead of the default `fooa b cbar'. Note that an
empty array will therefore cause all arguments to be removed.

Shell script array from command line

I'm trying to write a shell script that can accept multiple elements on the command line to be treated as a single array. The command line argument format is:
exec trial.sh 1 2 {element1 element2} 4
I know that the first two arguments are can be accessed with $1 and $2, but how can I access the array surrounded by the brackets, that is the arguments surrounded by the {} symbols?
Thanks!
This tcl script uses regex parsing to extract pieces of the commandline, transforming your third argument into a list.
Splitting is done on whitespaces - depending on where you want to use this may or may not be sufficient.
#!/usr/bin/env tclsh
#
# Sample arguments: 1 2 {element1 element2} 4
# Split the commandline arguments:
# - tcl will represent the curly brackets as \{ which makes the regex a bit ugly as we have to escape this
# - we use '->' to catch the full regex match as we are not interested in the value and it looks good
# - we are splitting on white spaces here
# - the content between the curly braces is extracted
regexp {(.*?)\s(.*?)\s\\\{(.*?)\\\}\s(.*?)$} $::argv -> first second third fourth
puts "Argument extraction:"
puts "argv: $::argv"
puts "arg1: $first"
puts "arg2: $second"
puts "arg3: $third"
puts "arg4: $fourth"
# Third argument is to be treated as an array, again split on white space
set theArguments [regexp -all -inline {\S+} $third]
puts "\nArguments for parameter 3"
foreach arg $theArguments {
puts "arg: $arg"
}
You should always place variable length arguments at the end. But if you can guarantee you always mjust provide the last argument, then something like this will suffice:
#!/bin/bash
arg1=$1 ; shift
arg2=$1 ; shift
# Get the array passed in.
arrArgs=()
while (( $# > 1 )) ; do
arrArgs=( "${arrArgs[#]}" "$1" )
shift
done
lastArg=$1 ; shift

Resources