Access a bash array in awk loop - arrays

I have a bash array like
myarray = (1 2 3 4 5 ... n)
Also I am reading a file with an input of only one line for example:
1 2 3 4 5 ... n
I am reading it line by line into an array and printing it with:
awk 'BEGIN{FS=OFS="\t"}
NR>=1{for (i=1;i<=NF;i++) a[i]+=$i}
END{for (i=1;i<NF;i++) print OFS a[i]}' myfile.txt
myarray has the same size as a. Now myarray starts with the index 0 and a with index 1. My main problem though is how I can pass the bash array to my awk expression so that I can use it inside the print loop with the corresponding elements. So what I tried was this:
awk -v array="${myarray[*]}"
'BEGIN{FS=OFS="\t"}
NR>=1{for (i=1;i<=NF;i++) a[i]+=$i}
END{for (i=1;i<NF;i++) print OFS a[i] OFS array[i-1]}' myfile.txt
This doens't work though. I don't get any output for myarray. My desired output in this example would be:
1 1
2 2
3 3
4 4
5 5
...
n n

To my understanding, you just need to feed awk with the bash array in a correct way. That is, by using split():
awk -v bash_array="${myarray[*]}"
'BEGIN{split(bash_array,array); FS=OFS="\t"}
NR>=1{for (i=1;i<=NF;i++) a[i]+=$i}
END{for (i=1;i<NF;i++) print a[i], array[i]}' file
Since the array array[] is now in awk, you don't have to care about the indices, so you can call them normally, without worrying about the ones in bash starting from 0.
Note also that print a,b is the same (and cleaner) as print a OFS b, since you already defined OFS in the BEGIN block.

Related

How do I split a text file into an array by blank lines?

I have a bash command that outputs text in the following format:
Header 1
- Point 1
- Point 2
Header 2
- Point 1
- Point 2
Header 3
-Point 1
- Point 2
...
I want to parse this text into an array, separating on the empty line so that array[0] for example contains:
Header 1
- Point 1
- Point 2
And then I want to edit some of the data in the array if it satisfies certain conditions.
I was looking at something like this Separate by blank lines in bash but I'm completely new to bash so I don't understand how to save the output from awk RS=null to an array instead of printing it out. Could someone please point me in the right direction?
You can use readarray command to populate a bash array after reading your file with gnu awk command with empty RS that lets awk split records on empty lines and using ORS as \0 (NUL) byte:
IFS= readarray -d '' arr < <(awk -v RS= -v ORS='\0' '1' file)
Check output:
echo "${arr[0]}"
Header 1
- Point 1
- Point 2
echo "${arr[1]}"
Header 2
- Point 1
- Point 2
echo "${arr[2]}"
Header 3
-Point 1
- Point 2
Online Demo

Bash function with array won't work

I am trying to write a function in bash but it won't work. The function is as follows, it gets a file in the format of:
1 2 first 3
4 5 second 6
...
I'm trying to access only the strings in the 3rd word in every line and to fill the array "arr" with them, without repeating identical strings.
When I activated the "echo" command right after the for loop, it printed only the first string in every iteration (in the above case "first").
Thank you!
function storeDevNames {
n=0
b=0
while read line; do
line=$line
tempArr=( $line )
name=${tempArr[2]}
for i in $arr ; do
#echo ${arr[i]}
if [ "${arr[i]}" == "$name" ]; then
b=1
break
fi
done
if [ "$b" -eq 0 ]; then
arr[n]=$name
n=$(($n+1))
fi
b=0
done < $1
}
The following line seems suspicious
for i in $arr ; do
I changed it as follows and it works for me:
#! /bin/bash
function storeDevNames {
n=0
b=0
while read line; do
# line=$line # ?!
tempArr=( $line )
name=${tempArr[2]}
for i in "${arr[#]}" ; do
if [ "$i" == "$name" ]; then
b=1
break
fi
done
if [ "$b" -eq 0 ]; then
arr[n]=$name
(( n++ ))
fi
b=0
done
}
storeDevNames < <(cat <<EOF
1 2 first 3
4 5 second 6
7 8 first 9
10 11 third 12
13 14 second 15
EOF
)
echo "${arr[#]}"
You can replace all of your read block with:
arr=( $(awk '{print $3}' <"$1" | sort | uniq) )
This will fill arr with only unique names from the 3rd word such as first, second, ... This will reduce the entire function to:
function storeDevNames {
arr=( $(awk '{print $3}' <"$1" | sort | uniq) )
}
Note: this will provide a list of all unique device names in sorted order. Removing duplicates also destroys the original order. If preserving the order accept where duplicates are removed, see 4ae1e1's alternative.
You're using the wrong tool. awk is designed for this kind of job.
awk '{ if (!seen[$3]++) print $3 }' <"$1"
This one-liner prints the third column of each line, removing duplicates along the way while preserving the order of lines (only the first occurrence of each unique string is printed). sort | uniq, on the other hand, breaks the original order of lines. This one-liner is also faster than using sort | uniq (for large files, which doesn't seem to be applicable in OP's case), since this one-liner linearly scans the file once, while sort is obviously much more expensive.
As an example, for an input file with contents
1 2 first 3
4 5 second 6
7 8 third 9
10 11 second 12
13 14 fourth 15
the above awk one-liner gives you
first
second
third
fourth
To put the results in an array:
arr=( $(awk '{ if (!seen[$3]++) print $3 }' <"$1") )
Then echo ${arr[#]} will give you first second third fourth.

How to parse only selected column values using awk

I have a sample flat file which contains the following block
test my array which array is better array huh got it?
INDIA USA SA NZ AUS ARG ARM ARZ GER BRA SPN
I also have an array(ksh_arr2) which was defined like this
ksh_arr2=$(awk '{if(NR==1){for(i=1;i<=NF;i++){if($i~/^arr/){print i}}}}' testUnix.txt)
and contains the following integers
3 5 8
Now I want to parse only those column values which are at the respective numbered positions i.e. third fifth and eighth.
I also want the outputs from the 2nd line on wards.
So I tried the following
awk '{for(i=1;i<=NF;i++){if(NR >=1 && i=${ksh_arr2[i]}) do print$i ; done}}' testUnix.txt
but it is apparently not printing the desired outputs.
What am I missing ? Please help.
How i would approach it
awk -vA="${ksh_arr2[*]}" 'BEGIN{split(A,B," ")}{for(i in B)print $B[i]}' file
Explanation
-vA="${ksh_arr2[*]}" - Set variable A to expanded ksh array
'BEGIN{split(A,B," ") - Splits the expanded array on spaces
(effictively recreating it in awk)
{for(i in B)print $B[i]} - Index in the new array print the field that is the number
contained in that index
Edit
If you want to preserve the order of the fields when printing then this would be better
awk -vA="${ksh_arr2[*]}" 'BEGIN{split(A,B," ")}{while(++i<=length(B))print $B[i]}' file
Since no sample output is shown, I don't know if this output is what you want. It is the output one gets from the code provided with the minimal changes required to get it to run:
$ awk -v k='3 5 8' 'BEGIN{split(k,a," ");} {for(i=1;i<=length(a);i++){print $a[i]}}' testUnix.txt
array
array
array
SA
AUS
ARZ
The above code prints out the selected columns in the same order supplied by the variable k.
Notes
The awk code never defined ksh_arr2. I presume that the value of this array was to be passed in from the shell. It is done here using the -v option to set the variable k to the value of ksh_arr2.
It is not possible to pass into awk an array directly. It is possible to pass in a string, as above, and then convert it to an array using the split function. Above the string k is converted to the awk array a.
awk syntax is different from shell syntax. For instance, awk does not use do or done.
Details
-v k='3 5 8'
This defines an awk variable k. To do this programmatically, replace 3 5 8 with a string or array from the shell.
BEGIN{split(k,a," ");}
This converts the space-separated values in variable k into an array named a.
for(i=1;i<=length(a);i++){print $a[i]}
This prints out each column in array a in order.
Alternate Output
If you want to keep the output from each line on a single line:
$ awk -v k='3 5 8' 'BEGIN{split(k,a," ");} {for(i=1;i<length(a);i++) printf "%s ",$a[i]; print $a[length(a)]}' testUnix.txt
array array array
SA AUS ARZ
awk 'NR>=1 { print $3 " " $5 " " $8 }' testUnix.txt

Picking out elements not in an 1-D array?

I have a 1-D array
x1 = [1, 2, 3, …, 10]
which is stored in the file x1.dat as one record (all on one line), separated by commas. x1.dat reads
1,2,3,4,5,..., 10
And there are two arrays
array1 = [1,3], and array2= [4,7]
(elements in array1 and array2 are some elements of the array x1).
I want to select all the element which is neither in array1 nor in array2.
The desired output will read
2,5,6,8,9,10
I tried with awk:
$awk 'BEGIN{array1 = (1,3); array2 = (4,7)} {for (i=1; i<=NF;i++) if ((!($i in a1)) && (!($i in a2))) {print $i }}' x1.dat
This does not work. Could you please help me to correct it or give a better way to do this selection?
You didn't give the text format of your data file. I assume it is one element per line.
You have a couple of problems in your codes.
variable assignment, you cannot assign an awk array like that.
the in usage is checking the array (hashtable actually) keys, not values.
it would be easier if you put the array1 and 2 in file, or input string, not in codes, but I am keeping it there for showing how to solve the problem exactly as you described
better read version:
awk -v arr1="<yourArray1Str>" -v arr2="<yourArray2Str>"
'BEGIN{
split(arr1,a,",");
split(arr2,b,",");
for(x in a)k[a[x]]=1;
for(x in b)k[b[x]]=1}
!k[$0]' file
with your example:
kent$ cat f
1
2
3
4
5
kent$ awk -v arr1="2,4,3" -v arr2="1,3,4" 'BEGIN{split(arr1,a,",");split(arr2,b,",");for(x in a)k[a[x]]=1;for(x in b)k[b[x]]=1}!k[$0]' f
5

How to read in csv file to array in bash script

I have written the following code to read in my csv file (which has a fixed number of columns but not a fixed number of rows) into my script as an array. I need it to be a shell script.
usernames x1 x2 x3 x4
username1, 5 5 4 2
username2, 6 3 2 0
username3, 8 4 9 3
My code
#!/bin/bash
set oldIFS = $IFS
set IFS=,
read -a line < something.csv
another option I have used is
#!/bin/bash
while IFS=$'\t' reaad -r -a line
do
echo $line
done < something.csv
for both I tried some test code to see what the size of the array line would be and I seem to be getting a size of 10 with the first one but the array only outputs username. For the second one, I seem to be getting a size of 0 but the array outputs the whole csv.
Help is much appreciated!
You may consider using AWK with a regular expression in FS variable like this:
awk 'BEGIN { FS=",?[ \t]*"; } { print $1,"|",$2,"|",$3,"|",$4,"|",$5; }'
or this
awk 'BEGIN { FS=",?[ \t]*"; OFS="|"; } { $1=$1; print $0; }'
($1=$1 is required to rebuild $0 with new OFS)

Resources