Sorting array in shell using awk - arrays

I need to sort this file in descending order avoiding duplicates
Bob 5 404
Mike 3 404
Bob 19 404
Bob 78 404
Mike 93 404
Joe 7 404
So my result should be
Bob 102
Mike 96
Joe 7
What I have now is this
awk '{if($3 == 404) arr[$1]+=$2}END{for(i in arr)print i, arr[i]}' file
I know that there are sort -d but how I need to use it in awk?
UPDATE
awk 'BEGIN{FS=" "}{if($9 == 404) arr[$1]+=1}END{for(i in arr) print arr[i] | sort -k2nr }' input > output
I get this result
sh: 0: not found
And my output file is now empty.

Reuben L.'s answer contains the right pointers, but doesn't spell out the full solutions:
The POSIX-compliant solution spelled out:
You need to pipe the output from awk to the sort utility, outside of awk:
awk '{ if($3 == 404) arr[$1]+=$2 } END{ for (i in arr) print i, arr[i] }' input |
sort -rn -k2,2 > output
Note the specifics of the sort command:
-r performs reverse sorting
-n performs numeric sorting
-k2,2 sorts by the 2nd whitespace-separated field only
by contrast, only specifying -k2 would sort starting from the 2nd field through the remainder of the line - doesn't make a difference here, since the 2nd field is the last field, but it's an important distinction in general.
Note that there's really no benefit to using the nonstandard -V option to get numeric sorting, as -n will do just fine; -V's true purpose is to perform version-number sorting.
Note that you could include the sort command inside your awk script - for(i in arr)print i, arr[i] | "sort -nr -k2,2" - note the " around the sort command - but there's little benefit to doing so.
The GNU awk asort() solution spelled out:
gawk '
{ if ($3 == 404) arr[$1]+=$2 } # build array
END{
for (k in arr) { amap[arr[k]] = k } # create value-to-key(!) map
asort(arr, asorted, "#val_num_desc") # sort values numerically, in descending order
# print in sort order
for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i]
}
' input > output
As you can see, this complicates the solution, because 2 extra arrays must be created:
for (k in arr) { amap[arr[k]] = k } creates the "inverse" of the original array in amap: it uses the values of the original array as keys and the corresponding keys as the values.
asort(arr, asorted, "#val_num_desc") then sorts the original array by its values in descending, numerical order ("#val_num_desc") and stores the result in new array asorted.
Note that the original keys are lost in the process: asorted keys are now numerical indices reflecting the sort order.
for (i=1; i<=length(asorted); ++i) print amap[asorted[i]], asorted[i] then enumerates asorted by sequential numerical index, which yields the desired sort order; amap[asorted[i]] returns the matching key (e.g., Bob) from the original array for the value at hand.

Two possible solutions:
Use gawk and the built-in asort() and asorti() functions
Pipe the output of your awk command to sort -k2 -Vr. This will sort descending by the second column.
note: the -V flag is non-standard and is available for GNU sort. credits to Jonathan Leffler

Related

Getting first index of bash array [duplicate]

Is there a bash way to get the index of the nth element of a sparse bash array?
printf "%s\t" ${!zArray[#]} | cut -f$N
Using cut to index the indexes of an array seems excessive, especially in reference to the first or last.
If getting the index is only a step towards getting the entry then there is an easy solution: Convert the array into a dense (= non-sparse) array, then access those entries …
sparse=([1]=I [5]=V [10]=X [50]=L)
dense=("${sparse[#]}")
printf %s "${dense[2]}"
# prints X
Or as a function …
nthEntry() {
shift "$1"
shift
printf %s "$1"
}
nthEntry 2 "${sparse[#]}"
# prints X
Assuming (just like you did) that the list of keys "${!sparse[#]}" expands in sorted order (I found neither guarantees nor warnings in bash's manual, therefore I opened another question) this approach can also be used to extract the nth index without external programs like cut.
indices=("${!sparse[#]}")
echo "${indices[2]}"
# prints 10 (the index of X)
nthEntry 2 "${!sparse[#]}"
# prints 10 (the index of X)
If I understood your question correctly, you may use it like this using read:
# sparse array
declare -a arr=([10]="10" [15]="20" [21]="30" [34]="40" [47]="50")
# desired index
n=2
# read all indices into an array
read -ra iarr < <(printf "%s\t" ${!arr[#]})
# fine nth element
echo "${arr[${iarr[n]}]}"
30

bash shell pull all values from array in random order. Pull each value only once

I need to pull the values from an array in a random order. It shouldn't pull the same value twice.
R=$(($RANDOM%5))
mixarray=("I" "like" "to" "play" "games")
echo ${mixarray[$R]}
I'm not sure what to do after the code above. I thought of putting the first pulled value into another array, and then nesting it all in a loop that checks that second array so it doesn't pull the same value twice from the first array. After many attempts, I just can't get the syntax right.
The output should be something like:
to
like
I
play
games
Thanks,
Would you please try the following:
#!/bin/bash
mixarray=("I" "like" "to" "play" "games")
mapfile -t result < <(for (( i = 0; i < ${#mixarray[#]}; i++ )); do
echo -e "${RANDOM}\t${i}"
done | sort -nk1,1 | cut -f2 | while IFS= read -r n; do
echo "${mixarray[$n]}"
done)
echo "${result[*]}"
First, it prints a random number and an index starting with 0 side by side.
The procedure above is repeated as much as the length of mixarray.
The output will look like:
13959 0
6416 1
6038 2
492 3
19893 4
Then the table is sorted with the 1st field:
492 3
6038 2
6416 1
13959 0
19893 4
Now the indices in the 2nd field are randomized. The field is extracted with
the cut command.
Next rearrange the elements of mixarray using the randomized index.
Finally the array result is assigned with the output and printed out.

Get the index of the first or nth element of sparse bash array

Is there a bash way to get the index of the nth element of a sparse bash array?
printf "%s\t" ${!zArray[#]} | cut -f$N
Using cut to index the indexes of an array seems excessive, especially in reference to the first or last.
If getting the index is only a step towards getting the entry then there is an easy solution: Convert the array into a dense (= non-sparse) array, then access those entries …
sparse=([1]=I [5]=V [10]=X [50]=L)
dense=("${sparse[#]}")
printf %s "${dense[2]}"
# prints X
Or as a function …
nthEntry() {
shift "$1"
shift
printf %s "$1"
}
nthEntry 2 "${sparse[#]}"
# prints X
Assuming (just like you did) that the list of keys "${!sparse[#]}" expands in sorted order (I found neither guarantees nor warnings in bash's manual, therefore I opened another question) this approach can also be used to extract the nth index without external programs like cut.
indices=("${!sparse[#]}")
echo "${indices[2]}"
# prints 10 (the index of X)
nthEntry 2 "${!sparse[#]}"
# prints 10 (the index of X)
If I understood your question correctly, you may use it like this using read:
# sparse array
declare -a arr=([10]="10" [15]="20" [21]="30" [34]="40" [47]="50")
# desired index
n=2
# read all indices into an array
read -ra iarr < <(printf "%s\t" ${!arr[#]})
# fine nth element
echo "${arr[${iarr[n]}]}"
30

How does bash array slicing work when start index is not provided?

I'm looking at a script, and I'm having trouble determining what is going on.
Here is an example:
# Command to get the last 4 occurrences of a pattern in a file
lsCommand="ls /my/directory | grep -i my_pattern | tail -4"
# Load the results of that command into an array
dirArray=($(echo $(eval $lsCommand) | tr ' ' '\n'))
# What is this doing?
yesterdaysFileArray=($(echo ${x[#]::$((${#x[#]} / 2))} | tr ' ' '\n'))
There is a lot going on here. I understand how arrays work, but I don't know how $x is getting referenced if it was never declared.
I see that the $((${#x[#]} / 2}} is taking the number of elements and dividing it in half, and the tr is used to create the array. But what else is going on?
I think the last line is an array slice pattern in bash of form ${array[#]:1:2}, where array[#] returns the contents of the array, :1:2 takes a slice of length 2, starting at index 1.
So for your case though you are taking the start index empty because you haven't specified any and length as half the count of array.
But there is a lot better way to do this in bash as below. Don't use eval and use the built-in globbing support from the shell itself
cd /my/directory
fileArray=()
for file in *my_pattern*; do
[[ -f "$file" ]] || { printf '%s\n' 'no file found'; return 1; }
fileArray+=( "$file" )
done
and do
printf '%s\n' "${fileArray[#]::${#fileArray[#]}/2}"

How to get array dimension in 1 direction in awk multidimension array

Is there any way to get only one dimension length in awk array like in php
look at this simple example
awk 'BEGIN{
a[1,1]=1;
a[1,2]=2;
a[2,1]=3;
a[2,3]=2;
print length(a)
}'
Here length of array is 4 which includes each field as an entity, my interest is to get how many rows are there in array, in real code of mine I have n number of fields setting array like this
for(i=1;i<=NF;i++)A[FNR,i]=$i
problem is fields are not fixed in my file, sometimes fields are varying in each row, so I cannot calculate even like this length(array)/NF
Is there any solution ?
Use GNU awk since it has true mufti-dimensional arrays:
awk 'BEGIN{
a[1][1]=1;
a[1][2]=2;
a[1][3]=3;
a[2][1]=4;
a[2][2]=5;
print length(a)
print length(a[1])
print length(a[2])
}'
2
3
2
This can be achieved by counting unique index in array, try something like this
awk '
function _get_rowlength(Arr,fnumber, i,t,c){
for(i in Arr){
split(i,sep,SUBSEP)
if(!(sep[fnumber] in t))
{
c++
t[sep[fnumber]]
}
}
return c;
}
BEGIN{
a[1,1]=1;
a[1,2]=2;
a[2,1]=3;
a[2,3]=2;
print _get_rowlength(a,1)
}'
Resulting
$ ./tester
2
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk

Resources