Can I pass an array to awk using -v? - arrays

I would like to be able to pass an array variable to awk. I don't mean a shell array but a native awk one. I know I can pass scalar variables like this:
awk -vfoo="1" 'NR==foo' file
Can I use the same mechanism to define an awk array? Something like:
$ awk -v"foo[0]=1" 'NR==foo' file
awk: fatal: `foo[0]' is not a legal variable name
I've tried a few variations of the above but none of them work on GNU awk 4.1.1 on my Debian. So, is there any version of awk (gawk,mawk or anything else) that can accept an array from the -v switch?
I know I can work around this and can easily think of ways to do so, I am just wondering if any awk implementation supports this kind of functionality natively.

You can use the split() function inside mawk or gawk to split the input of the "-v" value (here is the gawk man page):
split(s, a [, r [, seps] ])
Split the string s into the array a and the separators array seps on
the regular expression r, and return the number of fields.*
An example here in which i pass the value "ARRAYVAR", a comma separated list of values which is my array, with "-v" to the awk program, then split it into the internal variable array "arrayval" using the split() function and then print the 3rd value of the array:
echo 0 | gawk -v ARRAYVAR="a,b,c,d,e,f" '{ split(ARRAYVAR,arrayval,","); print(arrayval[3]) }'
c
Seems to work :)

It looks like it is impossible by definition.
From man awk we have that:
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the
program begins. Such variable values are available to the BEGIN rule
of an AWK program.
Then we read in Using Variables in a Program that:
The name of a variable must be a sequence of letters, digits, or
underscores, and it may not begin with a digit.
Variables in awk can be assigned either numeric or string values.
So the way the -v implementation is defined makes it impossible to provide an array as a variable, since any kind of usage of the characters = or [ is not allowed as part of the -v variable passing. And both are required, since arrays in awk are only associative.

If you don't insist on using -v you could use -i (include) instead to read an awk file that contains the variable settings.
Like this:
if F=$(mktemp inputXXXXXX); then
cat >$F << 'END'
BEGIN {
foo[0]=1
}
END
cat $F
awk -i $F 'BEGIN { print foo[0] }' </dev/null
rm $F
fi
Sample trace (using gawk-4.2.1):
bash -x /tmp/test.sh
++ mktemp inputXXXXXX
+ F=inputrpMsan
+ cat
+ cat inputrpMsan
BEGIN {
foo[0]=1
}
+ awk -i inputrpMsan 'BEGIN { print foo[0] }'
1
+ rm inputrpMsan

Unfortunately, this is not possible. However, you can convert a bash array to an awk array using a few clever methods.
I wanted to do this recently by passing a bash array to awk to use it for filtering, so here is what I did:
$ arr=( hello world this is bash array )
$ echo -e 'this\nmight\nnot\nshow\nup' | awk 'BEGIN {
for (i = 1; i < ARGC; i++) {
my_filter[ARGV[i]]=1
ARGV[i]="" # unset ARGV[i] otherwise awk might try to read it as a file
}
} !my_filter[$0]' "${arr[#]}"
Output:
might
not
show
up

For associative arrays, you could pass it as a string of key-value pairs, and then reformat it in the BEGIN section.
$ echo | awk -v m="a,b;c,d" '
BEGIN {
split(m,M,";")
for (i in M) {
split(M[i],MM,",")
MA[MM[1]]=MM[2]
}
}
{
for (a in MA) {
printf("MA[%s]=%s\n",a, MA[a])
}
}'
Output:
MA[a]=b
MA[c]=d

Related

Parse variables from string and add them to an array with Bash

In Bash, how can I get the strings between acolades (without the '_value' suffix) from for example
"\\*\\* ${host_name_value}.${host_domain_value} - ${host_ip_value}\\*\\*"
and put them into an array?
The result for the above example should be something like:
var_array=("host_name" "host_domain")
The string could also contain other stuff such as:
"${package_updates_count_value} ${package_updates_type_value} updates"
The result for the above example should be something like:
var_array=("package_updates_count" "package_updates_type")
All variables end with _value. There could 1 or more variables in the string.
Not sure what would be the most efficient way and how I'd best handle this. Regex? Sed?
input='\\*\\* ${host_name_value}.${host_domain_value} \\*\\*'
# would also work with cat input or the like.
myarray=($(echo "$input" | awk -F'$' \
'{for(i=1;i<=NF;i++) {match($i, /{([^}]*)_value}/, a); print a[1]}}'))
Split your line(s) on $. Check if a column contains { }. If it does, print what's after { and before _value}. (If not, it will print out the empty string, which bash array creation will ignore.)
If there are only two variables, this will work.
input='\\*\\* ${host_name_value}.${host_domain_value} \\*\\*'
first=$(echo $input | sed -r -e 's/[}].+//' -e 's/.+[{]//')
last=$(echo $input | sed -r -e 's/.+[{]//' -e 's/[}].+//')
output="var_array=(\"$first\" \"$last\")"
Maybe not very efficient and beautiful, but it works well.
Starting with a string variable:
$ str='\\*\\* ${host_name_value}.${host_domain_value} - ${host_ip_value}\\*\\*'
Use grep -o to print all matching words.
$ grep -o '\${\w*_value}' <<< "$str"
${host_name_value}
${host_domain_value}
${host_ip_value}
Then remove ${ and _value}.
$ grep -o '\${\w*_value}' <<< "$str" | sed 's/^\${//; s/_value}$//'
host_name
host_domain
host_ip
Finally, use readarray to safely read the results into an array.
$ readarray -t var_array < <(grep -o '\${\w*_value}' <<< "$str" | sed 's/^\${//; s/_value}$//')
$ declare -p var_array
declare -a var_array=([0]="host_name" [1]="host_domain" [2]="host_ip")

How do I convert CSV data into an associative array using Bash 4?

The file /tmp/file.csv contains the following:
name,age,gender
bob,21,m
jane,32,f
The CSV file will always have headers.. but might contain a different number of fields:
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
In either case, I want to convert my CSV data into an array of associative arrays..
What I need
So, I want a Bash 4.3 function that takes CSV as piped input and sends the array to stdout:
/tmp/file.csv:
name,age,gender
bob,21,m
jane,32,f
It needs to be used in my templating system, like this:
{{foo | csv_to_array | foo2}}
^ this is a fixed API, I must use that syntax .. foo2 must receive the array as standard input.
The csv_to_array func must do it's thing, so that afterwards I can do this:
$ declare -p row1; declare -p row2; declare -p new_array;
and it would give me this:
declare -A row1=([gender]="m" [name]="bob" [age]="21" )
declare -A row2=([gender]="f" [name]="jane" [age]="32" )
declare -a new_array=([0]="row1" [1]="row2")
..Once I have this array structure (an indexed array of associative array names), I have a shell-based templating system to access them, like so:
{{#new_array}}
Hi {{item.name}}, you are {{item.age}} years old.
{{/new_array}}
But I'm struggling to generate the arrays I need..
Things I tried:
I have already tried using this as a starting point to get the array structure I need:
while IFS=',' read -r -a my_array; do
echo ${my_array[0]} ${my_array[1]} ${my_array[2]}
done <<< $(cat /tmp/file.csv)
(from Shell: CSV to array)
..and also this:
cat /tmp/file.csv | while read line; do
line=( ${line//,/ } )
echo "0: ${line[0]}, 1: ${line[1]}, all: ${line[#]}"
done
(from https://www.reddit.com/r/commandline/comments/1kym4i/bash_create_array_from_one_line_in_csv/cbu9o2o/)
but I didn't really make any progress in getting what I want out the other end...
EDIT:
Accepted the 2nd answer, but I had to hack the library I am using to make either solution work..
I'll be happy to look at other answers, which do not export the declare commands as strings, to be run in the current env, but instead somehow hoist the resultant arrays of the declare commands to the current env (the current env is wherever the function is run from).
Example:
$ cat file.csv | csv_to_array
$ declare -p row2 # gives the data
So, to be clear, if the above ^ works in a terminal, it'll work in the library I'm using without the hacks I had to add (which involved grepping STDIN for ^declare -a and using source <(cat); eval $STDIN... in other functions)...
See my comments on the 2nd answer for more info.
The approach is straightforward:
Read the column headers into an array
Read the file line by line, in each line …
Create a new associative array and register its name in the array of array names
Read the fields and assign them according to the column headers
In the last step we cannot use read -a, mapfile, or things like these since they only create regular arrays with numbers as indices, but we want an associative array instead, so we have to create the array manually.
However, the implementation is a bit convoluted because of bash's quirks.
The following function parses stdin and creates arrays accordingly.
I took the liberty to rename your array new_array to rowNames.
#! /bin/bash
csvToArrays() {
IFS=, read -ra header
rowIndex=0
while IFS= read -r line; do
((rowIndex++))
rowName="row$rowIndex"
declare -Ag "$rowName"
IFS=, read -ra fields <<< "$line"
fieldIndex=0
for field in "${fields[#]}"; do
printf -v quotedFieldHeader %q "${header[fieldIndex++]}"
printf -v "$rowName[$quotedFieldHeader]" %s "$field"
done
rowNames+=("$rowName")
done
declare -p "${rowNames[#]}" rowNames
}
Calling the function in a pipe has no effect. Bash executes the commands in a pipe in a subshell, therefore you won't have access to the arrays created by someCommand | csvToArrays. Instead, call the function as either one of the following
csvToArrays < <(someCommand) # when input comes from a command, except "cat file"
csvToArrays < someFile # when input comes from a file
Bash scripts like these tend to be very slow. That's the reason why I didn't bother to extract printf -v quotedFieldHeader … from the inner loop even though it will do the same work over and over again.
I think the whole templating thing and everything related would be way easier to program and faster to execute in languages like python, perl, or something like that.
The following script:
csv_to_array() {
local -a values
local -a headers
local counter
IFS=, read -r -a headers
declare -a new_array=()
counter=1
while IFS=, read -r -a values; do
new_array+=( row$counter )
declare -A "row$counter=($(
paste -d '' <(
printf "[%s]=\n" "${headers[#]}"
) <(
printf "%q\n" "${values[#]}"
)
))"
(( counter++ ))
done
declare -p new_array ${!row*}
}
foo2() {
source <(cat)
declare -p new_array ${!row*} |
sed 's/^/foo2: /'
}
echo "==> TEST 1 <=="
cat <<EOF |
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
EOF
csv_to_array |
foo2
echo "==> TEST 2 <=="
cat <<EOF |
name,age,gender
bob,21,m
jane,32,f
EOF
csv_to_array |
foo2
will output:
==> TEST 1 <==
foo2: declare -a new_array=([0]="row1" [1]="row2" [2]="row3")
foo2: declare -A row1=([url]="foo.io" [description]="a cool foo site" [id]="1" [title]="foo name" )
foo2: declare -A row2=([url]="http://bar.io" [description]="a great bar site" [id]="2" [title]="bar title" )
foo2: declare -A row3=([url]="https://baz.io" [description]="some description" [id]="3" [title]="baz heading" )
==> TEST 2 <==
foo2: declare -a new_array=([0]="row1" [1]="row2")
foo2: declare -A row1=([gender]="m" [name]="bob" [age]="21" )
foo2: declare -A row2=([gender]="f" [name]="jane" [age]="32" )
The output comes from foo2 function.
The csv_to_array function first reads the headaers. Then for each read line it adds new element into new_array array and also creates a new associative array with the name row$index with elements created from joining the headers names with values read from the line. On the end the output from declare -p is outputted from the function.
The foo2 function sources the standard input, so the arrays come into scope for it. It outputs then those values again, prepending each line with foo2:.

Iterating over lines (w/ numbers) read from a file to an array in bash

I'm trying to write a small script that will take the 4th columns of a file and store it in an array then do a little comparison. If the element in the array is greater than 0 and less than 500 I have to increment the counter. However when I run the script the counter always shows 0. Here's my script
#!/bin/bash
mapfile -t my_array < <(cat file1.txt | awk '{ print $4 }' > test.txt)
COUNTER=0
for i in ${my_array[#]}; do
if [["${my_array[$i]}" -gt 0 -a "${my_array[$i]}" -lt 500 ]]
then
COUNTER=$((COUNTER + 1))
fi
printf "%s\t%s\n" "%i" "${my_array[$i]}"//just to test if the mapfile command is working
done
echo $COUNTER
output:
./script1.bash
0
#!/bin/bash
mapfile -t my_array < <(awk '{ print $4 }' file1.txt | tee test.txt)
COUNTER=0
for idx in "${!my_array[#]}"; do
value=${my_array[$idx]}
if (( value > 0 )) && (( value < 500 )); then
COUNTER=$((COUNTER + 1))
fi
printf "%s\t%s\n" "$idx" "$value"
done
echo "$COUNTER"
The use of cat here is needless: It added nothing but inefficiency (requiring an extra process to be started, and forcing awk to read from a pipe rather than direct from a file).
mapfile had nothing to read because the output of awk was redirected to test.txt. If you want it to go to both a file and stdout, then you need to use tee.
-a is not valid in [[ ]]; use && instead there. However, since you're doing only arithmetic, (( )) is more appropriate. Incidentally, -a is officially marked obsolescent even for [ ] and test; see the current POSIX standard.
${my_array[#]} iterates over values. If you want to iterate over indexes, you need ${!my_array[#]} instead.
Whitespace is mandatory in separating command names. [["$foo" is a different command from [[, unless $foo is empty or starts with a character in $IFS.
If you redirect the output to a file: > test.txt then there is no output in "standard output" because it is consumed by the file. So, first, you need to remove that redirection. You may use:
mapfile -t my_array < <(cat file1.txt | awk '{ print $4 }' )
But since awk could perfectly well read a file, this is better:
mapfile -t my_array < <(awk '{ print $4 }' file1.txt)
And since you are using awk, it could do the comparison to 0 and 500 and output the whole count.
counter=$(awk '{if($4>0 && $4<500){c++}}END{print c}' file1.txt)
echo "$counter"
Simpler, faster.
That will also avoid some simple mistakes in your script, like missing an space in the […] construct:
if [[ "${my … # NOT "if [["${my …"
And some missing quotes:
for i in "${my_array[#]}" # NOT for i in ${my_array[#]}
In general, it is a good idea to check your script with ShellCheck.net to remove some simple mistakes.

How to check if a variable is an array?

I was playing with PROCINFO and its sorted_in index to be able to control the array transversal.
Then I wondered what are the contents of PROCINFO, so I decided to go through it and print its values:
$ awk 'BEGIN {for (i in PROCINFO) print i, PROCINFO[i]}'
ppid 7571
pgrpid 14581
api_major 1
api_minor 1
group1 545
gid 545
group2 1000
egid 545
group3 10004
awk: cmd. line:1: fatal: attempt to use array `PROCINFO["identifiers"]' in a scalar context
As you see, it breaks because there is -at least- one item that it is also an array itself.
The fast workaround is to skip this one:
awk 'BEGIN {for (i in PROCINFO) {if (i!="identifiers") {print i, PROCINFO[i]}}}'
However it looks a bit hacky and would like to have something like
awk 'BEGIN {for (i in PROCINFO) {if (!(a[i] is array)) {print i, PROCINFO[i]}}}'
^^^^^^^^^^^^^^^^
Since there is not a thing like a type() function to determine if a variable is an array or a scalar, I wonder: is there any way to check if an element is an array?
I was thinking in something like going through it with a for and catching the possible error, but I don't know how.
$ awk 'BEGIN{a[1]=1; for (i in a) print i}'
1
$ awk 'BEGIN{a=1; for (i in a) print i}'
awk: cmd. line:1: fatal: attempt to use scalar `a' as an array
$ awk 'BEGIN{a[1]=1; print a}'
awk: cmd. line:1: fatal: attempt to use array `a' in a scalar context
In GNU Awk, there's an answer, but the recommended approach depends on what version you are running.
From GNU Awk 4.2, released in October 2017, there is a new function typeof() to check this, as indicated in the release notes from the beta release:
The new typeof() function can be used to indicate if a variable or array element is an array, regexp, string or number. The isarray() function is deprecated in favor of typeof().
So now you can say:
$ awk 'BEGIN { a[1] = "a"; print typeof(a) }'
array
And perform the check as follows:
$ awk 'BEGIN { a = "a"; if (typeof(a) == "array") print "yes" }'
$ awk 'BEGIN { a[1] = "a"; if (typeof(a) == "array") print "yes" }'
yes
In older versions, you can use isarray():
$ awk 'BEGIN { a = "a"; if (isarray(a)) print "yes" }'
$ awk 'BEGIN { a[1] = "a"; if (isarray(a)) print "yes" }'
yes
From the man page:
isarray(x)
Return true if x is an array, false otherwise.

Shell - Looping Array with command and increment command values

var1=$(echo $getDate | awk '{print $1} {print $2}')
var2=$(echo $getDate | awk '{print $3} {print $4}')
var3=$(echo $getDate | awk '{print $5} {print $6}')
Instead of repeating like the code above, I need to:
loop the same command
increment the values ({print $1} {print $2})
store the value in an array
I was doing something like below but I am stuck maybe someone can help me please:
COMMAND=`find $locationA -type f | wc -l`
getDate=$(find $locationA -type f | xargs ls -lrt | awk '{print $6} {print $7}')
a=1
b=2
for i in $COMMAND
do
i=$(echo $getDate | awk '{print $a} {print $b}')
myarray+=('$i')
a=$((a+1))
b=$((b+1))
done
PS - using ksh
Problem: $COMMAND stores the number of files found in $locationA. I need to loop through the amount of files found and store their dates in an array.
I don't get the meaning of your example code (what is the 'for' loop supposed to do? What is the content of the variable COMMAND?), but in your question you ask to store something in an array, while in the code you wish to simplify, you don't use an array, but simple variables (var1, var2, ....).
If I understand your requirement correctly, your variable getDate contains a string of several words, which are separated by spaces, and you want to assign the first two words to var1, the following two words to var2, and so on. Is this correct?
Now the edited code is at least a bit clearer, though I still don't understand, why you use i as a loop variable, and overwrite it in the first statement inside the loop.
However, a few comments:
If you push '$i' into your array, you will get a literal '$' sign, followed by the letter 'i'. To add a variable i containing to numbers, you need double quotes ("$i").
I don't understand why you want to loop over the cotnent of the variable COMMAND. This variable will always hold a single number, which means that the loop will be executed exactly once.
You could use a counting loop, incrementing loop variable by 2 on each iteration. You would have to precalculate the number of iterations beforehand.
Perhaps an easier alternative, which would work in bash or in zsh (I did not try other shells) is to first turn your variable in an array,
tmparr=($(echo $getDate|fmt -w 1))
and then use a loop to collect pairs of this element:
myarray=()
for ((i=0; i<${#tmparr[*]}; i+=2))
do
myarray+=("${tmparr[$i]} ${tmparr[$((i+1))]}")
done
${myarray[0]} will hold a string consisting of the first to words from getDate, etc.
This one should work on zsh, at least with newer versions:
myarray=()
echo $g|fmt -w 1|paste -s -d " \n"|while read s; do myarray+=("$s"); done
This leaves the first pair in ${myarray[1]}, etc.
It doesn't work with bash (and old zsh versions), because these shells would execute the body of the loop in a subshell.
ADDED:
On a second thought, in zsh this one would be simpler:
myarray=("${(f)$(echo $g|fmt -w 1|paste -s -d ' \n')}")

Resources