How to get array dimension in 1 direction in awk multidimension array - arrays

Is there any way to get only one dimension length in awk array like in php
look at this simple example
awk 'BEGIN{
a[1,1]=1;
a[1,2]=2;
a[2,1]=3;
a[2,3]=2;
print length(a)
}'
Here length of array is 4 which includes each field as an entity, my interest is to get how many rows are there in array, in real code of mine I have n number of fields setting array like this
for(i=1;i<=NF;i++)A[FNR,i]=$i
problem is fields are not fixed in my file, sometimes fields are varying in each row, so I cannot calculate even like this length(array)/NF
Is there any solution ?

Use GNU awk since it has true mufti-dimensional arrays:
awk 'BEGIN{
a[1][1]=1;
a[1][2]=2;
a[1][3]=3;
a[2][1]=4;
a[2][2]=5;
print length(a)
print length(a[1])
print length(a[2])
}'
2
3
2

This can be achieved by counting unique index in array, try something like this
awk '
function _get_rowlength(Arr,fnumber, i,t,c){
for(i in Arr){
split(i,sep,SUBSEP)
if(!(sep[fnumber] in t))
{
c++
t[sep[fnumber]]
}
}
return c;
}
BEGIN{
a[1,1]=1;
a[1,2]=2;
a[2,1]=3;
a[2,3]=2;
print _get_rowlength(a,1)
}'
Resulting
$ ./tester
2
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk

Related

Bash: how to extract longest directory paths from an array?

I put the output of find command into array like this:
pathList=($(find /foo/bar/ -type d))
How to extract the longest paths found in the array if the array contains several equal-length longest paths?:
echo ${pathList[#]}
/foo/bar/raw/
/foo/bar/raw/2020/
/foo/bar/raw/2020/02/
/foo/bar/logs/
/foo/bar/logs/2020/
/foo/bar/logs/2020/02/
After extraction, I would like to assign /foo/bar/raw/2020/02/ and /foo/bar/logs/2020/02/ to another array.
Thank you
Could you please try following. This should print the longest array(could be multiple in numbers same maximum length ones), you could assign it to later an array to.
echo "${pathList[#]}" |
awk -F'/' '{max=max>NF?max:NF;a[NF]=(a[NF]?a[NF] ORS:"")$0} END{print a[max]}'
I just created a test array with values provided by you and tested it as follows:
arr1=($(printf '%s\n' "${pathList[#]}" |\
awk -F'/' '{max=max>NF?max:NF;a[NF]=(a[NF]?a[NF] ORS:"")$0} END{print a[max]}'))
When I see new array's contents they are as follows:
echo "${arr1[#]}"
/foo/bar/raw/2020/02/
/foo/bar/logs/2020/02/
Explanation of awk code: Adding detailed explanation for awk code.
awk -F'/' ' ##Starting awk program from here and setting field separator as / for all lines.
{
max=max>NF?max:NF ##Creating variable max and its checking condition if max is greater than NF then let it be same else set its value to current NF value.
a[NF]=(a[NF]?a[NF] ORS:"")$0 ##Creating an array a with index of value of NF and keep appending its value with new line to it.
}
END{ ##Starting END section of this program.
print a[max] ##Printing value of array a with index of variable max.
}'

Split a string directly into array

Suppose I want to pass a string to awk so that once I split it (on a pattern) the substrings become the indexes (not the values) of an associative array.
Like so:
$ awk -v s="A:B:F:G" 'BEGIN{ # easy, but can these steps be combined?
split(s,temp,":") # temp[1]="A",temp[2]="B"...
for (e in temp) arr[temp[e]] #arr["A"], arr["B"]...
for (e in arr) print e
}'
A
B
F
G
Is there a awkism or gawkism that would allow the string s to be directly split into its components with those components becoming the index entries in arr?
The reason is (bigger picture) is I want something like this (pseudo awk):
awk -v s="1,4,55" 'BEGIN{[arr to arr["1"],arr["5"],arr["55"]} $3 in arr {action}'
No, there is no better way to map separated substrings to array indices than:
split(str,tmp); for (i in tmp) arr[tmp[i]]
FWIW if you don't like that approach for doing what your final pseudo-code does:
awk -v s="1,4,55" 'BEGIN{split(s,tmp,/,/); for (i in tmp) arr[tmp[i]]} $3 in arr{action}'
then another way to get the same behavior is
awk -v s=",1,4,55," 'index(s,","$3","){action}'
Probably useless and unnecessarily complex but I'll open the game with while, match and substr:
$ awk -v s="A:B:F:G" '
BEGIN {
while(match(s,/[^:]+/)) {
a[substr(s,RSTART,RLENGTH)]
s=substr(s,RSTART+RLENGTH)
}
for(i in a)
print i
}'
A
B
F
G
I'm eager to see (if there are) some useful solutions. I tried playing around with asorts and such.
Other way kind awkism
cat file
1 hi
2 hello
3 bonjour
4 hola
5 konichiwa
Run it,
awk 'NR==FNR{d[$1]; next}$1 in d' RS="," <(echo "1,2,4") RS="\n" file
you get,
1 hi
2 hello
4 hola

Sorting array in shell using awk

I need to sort this file in descending order avoiding duplicates
Bob 5 404
Mike 3 404
Bob 19 404
Bob 78 404
Mike 93 404
Joe 7 404
So my result should be
Bob 102
Mike 96
Joe 7
What I have now is this
awk '{if($3 == 404) arr[$1]+=$2}END{for(i in arr)print i, arr[i]}' file
I know that there are sort -d but how I need to use it in awk?
UPDATE
awk 'BEGIN{FS=" "}{if($9 == 404) arr[$1]+=1}END{for(i in arr) print arr[i] | sort -k2nr }' input > output
I get this result
sh: 0: not found
And my output file is now empty.
Reuben L.'s answer contains the right pointers, but doesn't spell out the full solutions:
The POSIX-compliant solution spelled out:
You need to pipe the output from awk to the sort utility, outside of awk:
awk '{ if($3 == 404) arr[$1]+=$2 } END{ for (i in arr) print i, arr[i] }' input |
sort -rn -k2,2 > output
Note the specifics of the sort command:
-r performs reverse sorting
-n performs numeric sorting
-k2,2 sorts by the 2nd whitespace-separated field only
by contrast, only specifying -k2 would sort starting from the 2nd field through the remainder of the line - doesn't make a difference here, since the 2nd field is the last field, but it's an important distinction in general.
Note that there's really no benefit to using the nonstandard -V option to get numeric sorting, as -n will do just fine; -V's true purpose is to perform version-number sorting.
Note that you could include the sort command inside your awk script - for(i in arr)print i, arr[i] | "sort -nr -k2,2" - note the " around the sort command - but there's little benefit to doing so.
The GNU awk asort() solution spelled out:
gawk '
{ if ($3 == 404) arr[$1]+=$2 } # build array
END{
for (k in arr) { amap[arr[k]] = k } # create value-to-key(!) map
asort(arr, asorted, "#val_num_desc") # sort values numerically, in descending order
# print in sort order
for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i]
}
' input > output
As you can see, this complicates the solution, because 2 extra arrays must be created:
for (k in arr) { amap[arr[k]] = k } creates the "inverse" of the original array in amap: it uses the values of the original array as keys and the corresponding keys as the values.
asort(arr, asorted, "#val_num_desc") then sorts the original array by its values in descending, numerical order ("#val_num_desc") and stores the result in new array asorted.
Note that the original keys are lost in the process: asorted keys are now numerical indices reflecting the sort order.
for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i] then enumerates asorted by sequential numerical index, which yields the desired sort order; amap[asorted[i]] returns the matching key (e.g., Bob) from the original array for the value at hand.
Two possible solutions:
Use gawk and the built-in asort() and asorti() functions
Pipe the output of your awk command to sort -k2 -Vr. This will sort descending by the second column.
note: the -V flag is non-standard and is available for GNU sort. credits to Jonathan Leffler

How to parse only selected column values using awk

I have a sample flat file which contains the following block
test my array which array is better array huh got it?
INDIA USA SA NZ AUS ARG ARM ARZ GER BRA SPN
I also have an array(ksh_arr2) which was defined like this
ksh_arr2=$(awk '{if(NR==1){for(i=1;i<=NF;i++){if($i~/^arr/){print i}}}}' testUnix.txt)
and contains the following integers
3 5 8
Now I want to parse only those column values which are at the respective numbered positions i.e. third fifth and eighth.
I also want the outputs from the 2nd line on wards.
So I tried the following
awk '{for(i=1;i<=NF;i++){if(NR >=1 && i=${ksh_arr2[i]}) do print$i ; done}}' testUnix.txt
but it is apparently not printing the desired outputs.
What am I missing ? Please help.
How i would approach it
awk -vA="${ksh_arr2[*]}" 'BEGIN{split(A,B," ")}{for(i in B)print $B[i]}' file
Explanation
-vA="${ksh_arr2[*]}" - Set variable A to expanded ksh array
'BEGIN{split(A,B," ") - Splits the expanded array on spaces
(effictively recreating it in awk)
{for(i in B)print $B[i]} - Index in the new array print the field that is the number
contained in that index
Edit
If you want to preserve the order of the fields when printing then this would be better
awk -vA="${ksh_arr2[*]}" 'BEGIN{split(A,B," ")}{while(++i<=length(B))print $B[i]}' file
Since no sample output is shown, I don't know if this output is what you want. It is the output one gets from the code provided with the minimal changes required to get it to run:
$ awk -v k='3 5 8' 'BEGIN{split(k,a," ");} {for(i=1;i<=length(a);i++){print $a[i]}}' testUnix.txt
array
array
array
SA
AUS
ARZ
The above code prints out the selected columns in the same order supplied by the variable k.
Notes
The awk code never defined ksh_arr2. I presume that the value of this array was to be passed in from the shell. It is done here using the -v option to set the variable k to the value of ksh_arr2.
It is not possible to pass into awk an array directly. It is possible to pass in a string, as above, and then convert it to an array using the split function. Above the string k is converted to the awk array a.
awk syntax is different from shell syntax. For instance, awk does not use do or done.
Details
-v k='3 5 8'
This defines an awk variable k. To do this programmatically, replace 3 5 8 with a string or array from the shell.
BEGIN{split(k,a," ");}
This converts the space-separated values in variable k into an array named a.
for(i=1;i<=length(a);i++){print $a[i]}
This prints out each column in array a in order.
Alternate Output
If you want to keep the output from each line on a single line:
$ awk -v k='3 5 8' 'BEGIN{split(k,a," ");} {for(i=1;i<length(a);i++) printf "%s ",$a[i]; print $a[length(a)]}' testUnix.txt
array array array
SA AUS ARZ
awk 'NR>=1 { print $3 " " $5 " " $8 }' testUnix.txt

Picking out elements not in an 1-D array?

I have a 1-D array
x1 = [1, 2, 3, …, 10]
which is stored in the file x1.dat as one record (all on one line), separated by commas. x1.dat reads
1,2,3,4,5,..., 10
And there are two arrays
array1 = [1,3], and array2= [4,7]
(elements in array1 and array2 are some elements of the array x1).
I want to select all the element which is neither in array1 nor in array2.
The desired output will read
2,5,6,8,9,10
I tried with awk:
$awk 'BEGIN{array1 = (1,3); array2 = (4,7)} {for (i=1; i<=NF;i++) if ((!($i in a1)) && (!($i in a2))) {print $i }}' x1.dat
This does not work. Could you please help me to correct it or give a better way to do this selection?
You didn't give the text format of your data file. I assume it is one element per line.
You have a couple of problems in your codes.
variable assignment, you cannot assign an awk array like that.
the in usage is checking the array (hashtable actually) keys, not values.
it would be easier if you put the array1 and 2 in file, or input string, not in codes, but I am keeping it there for showing how to solve the problem exactly as you described
better read version:
awk -v arr1="<yourArray1Str>" -v arr2="<yourArray2Str>"
'BEGIN{
split(arr1,a,",");
split(arr2,b,",");
for(x in a)k[a[x]]=1;
for(x in b)k[b[x]]=1}
!k[$0]' file
with your example:
kent$ cat f
1
2
3
4
5
kent$ awk -v arr1="2,4,3" -v arr2="1,3,4" 'BEGIN{split(arr1,a,",");split(arr2,b,",");for(x in a)k[a[x]]=1;for(x in b)k[b[x]]=1}!k[$0]' f
5

Resources