How to find maximum value in a column with awk - database

I have a file with two sets of data divided by a blank line:
a 3
b 2
c 1
e 5
d 8
f 1
Is there a way to find the maximum value of the second column in each set and print the corresponding line with awk ? The result should be:
b 3
d 8
Thank you.

Could you please try following, written and tested based on your shown samples in GNU awk.
awk '
!NF{
if(max!=""){ print arr[max],max }
max=""
}
{
max=( (max<$2) || (max=="") ? $2 : max )
arr[$2]=$1
}
END{
if(max!=""){ print arr[max],max }
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
!NF{ ##if NF is NULL then do following.
if(max!=""){ print arr[max],max } ##Checking if max is SET then print arr[max] and max.
max="" ##Nullifying max here.
}
{
max=( (max<$2) || (max=="") ? $2 : max ) ##Checking condition if max is greater than 2nd field then keep it as max or change max value as 2nd field.
arr[$2]=$1 ##Creating arr with 2nd field index and 1st field as value.
}
END{ ##Starting END block of this program from here.
if(max!=""){ print arr[max],max } ##Checking if max is SET then print arr[max] and max.
}
' Input_file ##mentioning Input_file name here.

You may use this alternate gnu awk:
awk -v RS= '{
max=""
split($0, a, /[^[:space:]]+/, m)
for (i=1; i in m; i+=2)
if (!max || m[i+1] > max) {
mi = i
max = m[i+1]
}
print m[mi], m[mi+1]
}' file
a 3
d 8

Another awk:
$ awk '
!$0 {
print n
m=n=""
}
$2>m {
m=$2
n=$0
}
END {
print n
}' file
Output:
a 3
d 8

another awk
$ awk '{cmd="sort -k2nr | head -1"} !NF{close(cmd)} {print | cmd}' file
a 3
d 8
runs the command for each block to find the block max.

You could try to separate the data sets by doing:
awk -v RS= 'NR == 1 {print}' yourfile > anotherfile
This will return the first data set then you change NF == 2 to get
the second data set,
and then find the maximum in each data set like suggested in
here

Related

Getting all values of various rows which have the same value in one column with awk

I have a data set (test-file.csv) with tree columns:
node,contact,mail
AAAA,Peter,peter#anything.com
BBBB,Hans,hans#anything.com
CCCC,Dieter,dieter#anything.com
ABABA,Peter,peter#anything.com
CCDDA,Hans,hans#anything.com
I like to extend the header by the column count and rename node to nodes.
Furthermore all entries should be sorted after the second column (mail).
In the column count I like to get the number of occurences of the column mail,
in nodes all the entries having the same value in the column mail should be printed (space separated and alphabetically sorted).
This is what I try to achieve:
contact,mail,count,nodes
Dieter,dieter#anything,com,1,CCCC
Hans,hans#anything.com,2,BBBB CCDDA
Peter,peter#anything,com,2,AAAA ABABA
I have this awk-command:
awk -F"," '
BEGIN{
FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
counts[$3]++; # Increment count of lines.
contact[$2]; # contact
}
END {
# Iterate over all third-column values.
for (x in counts) {
printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
}
}
' test-file.csv | sort --field-separator="," --key=2 -n
However this is my result :-(
Nothing but the amount of occurences work.
,Dieter#anything.com,1,nodes
,hans#anything.com,2,nodes
,peter#anything.com,2,nodes
contact,mail,count,nodes
Any help appreciated!
You may use this gnu awk:
awk '
BEGIN {
FS = OFS = ","
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR > 1 {
++counts[$3] # Increment count of lines.
name[$3] = $2
map[$3] = ($3 in map ? map[$3] " " : "") $1
}
END {
# Iterate over all third-column values.
PROCINFO["sorted_in"]="#ind_str_asc";
for (k in counts)
print name[k], k, counts[k], map[k]
}
' test-file.csv
Output:
contact,mail,count,nodes
Dieter,dieter#anything.com,1,CCCC
Hans,hans#anything.com,2,BBBB CCDDA
Peter,peter#anything.com,2,AAAA ABABA
With your shown samples please try following. Written and tested in GNU awk.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2
mail[nf]=$NF
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,mail[i],arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ FS=OFS="," } ##In BEGIN section setting FS, OFS as comma here.
FNR==1{ ##if this is first line then do following.
sub(/^[^,]*,/,"") ##Substituting everything till 1st comma here with NULL in current line.
$1=$1 ##Reassigning 1st field to itself.
print $0,"count,nodes" ##Printing headers as per need to terminal.
}
FNR>1{ ##If line is Greater than 1st line then do following.
nf=$2 ##Creating nf with 2nd field value here.
mail[nf]=$NF ##Creating mail with nf as index and value is last field value.
NF-- ##Decreasing value of current number of fields by 1 here.
arr[nf]++ ##Creating arr with index of nf and keep increasing its value with 1 here.
val[nf]=(val[nf]?val[nf] " ":"")$1 ##Creating val with index of nf and keep adding $1 value in it.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through arr in here.
print i,mail[i],arr[i],val[i] | "sort -t, -k1" ##printing values to get expected output and sorting it also by pipe here as per requirement.
}
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you want to sort by 2nd and 3rd fields then try following.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2 OFS $3
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file

Counting by analizing two column in difficult pattern in awk by using probably arrays

I have a huge problem. I try to create a script, which counts a specific sum (sum of water bridges never mind). This is a small part of my data file
POP62 SOL11
KAR1 SOL24
KAR5 SOL31
POP17 SOL42
POP15 SOL2
POP17 SOL2
KAR7 SOL42
KAR1 SOL11
KAR6 SOL31
In the first column, I have POP or KAR with numbers like KAR1, POP17, etc. In the second column, I have always SOL with a number, but I have max 2 the same SOL (for example, I can have maximum 2 SOL42 or SOL11 etc., KAR and POP I can have more than 2).
And now the thing that I want to do.
If I find that the same SOL is connected with both KAR and POP (whatever number) I add 1. For example:
KAR6 SOL5
POP8 SOL5
I add one to sum
In my data
POP62 SOL11
KAR1 SOL24
KAR5 SOL31
POP17 SOL42
POP15 SOL2
POP17 SOL2
KAR7 SOL42
KAR1 SOL11
KAR6 SOL31
I should have sum = 2
,because
POP17 SOL42
KAR7 SOL42
and
POP62 SOL11
KAR1 SOL11
Do you have any idea how to do that. I think about using NR=FNR and going through the file two times and check the repetitions in the $2 maybe by using an array, but what next?
#!/bin/bash
awk 'NR==FNR ??
some condition {sum++}
END {print sum}' test1.txt{,} >> water_bridges_x2.txt
Edit solution
I also add 0 if it is empty, because I want print 0 instead of null
awk '
{
s = $1
sub(/[0-9]+$/, "", s) # strip digits from end in var s
if ($2 in map && map[$2] != s) # if existing entry is not same
++sum # increment sum
map[$2] = s
}
END {print sum+0}' file
2
You may try this awk:
awk '
{
s = $1
sub(/[0-9]+$/, "", s) # strip digits from end in var s
if ($2 in map && map[$2] != s) # if existing entry is not same
++sum # increment sum
map[$2] = s
}
END {print sum+0}' file
2
With your shown samples, here is another way of doing it. Written and tested in GNU awk, should work in any awk.
awk '
{
match($1,/^[a-zA-Z]+/)
val=substr($1,RSTART,RLENGTH)
if(($2 in arr) && arr[$2]!=val){
sum++
}
arr[$2]=val
}
END{
print sum
}
' Input_file
A similar answer to #anubhava's: this uses GNU awk for the multi-dimensional array:
gawk '
{sols[$2][substr($1,0,3)] = 1}
END {
for (sol in sols)
if ("POP" in sols[sol] && "KAR" in sols[sol])
sum++
print sum
}
' file
another solution
$ sed -E 's/[0-9]+ +/ /' file | # cleanup data
sort -k2 | # sort by key
uniq | # remove dups
uniq -c -f1 | # count by key
egrep '^ +2 ' -c # report the sum where count is 2.
2

Bash: awk output to array

Im trying to put the contents of a awk command in to a bash array however im having a bit of trouble.
>>test.sh
f_checkuser() {
_l="/etc/login.defs"
_p="/etc/passwd"
## get mini UID limit ##
l=$(grep "^UID_MIN" $_l)
## get max UID limit ##
l1=$(grep "^UID_MAX" $_l)
awk -F':' -v "min=${l##UID_MIN}" -v "max=${l1##UID_MAX}" '{ if ( $3 >= min && $3 <= max && $7 != "/sbin/nologin" ) print $0 }' "$_p"
}
...
Used files:
Sample File: /etc/login.defs
>>/etc/login.defs
### Min/max values for automatic uid selection in useradd
UID_MIN 1000
UID_MAX 60000
Sample File: /etc/passwd
>>/etc/passwd
root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
admin:x:1000:1000:Administrator,,,:/home/admin:/bin/bash
daniel:x:1001:1001:Daniel,,,:/home/daniel:/bin/bash
The output looks like:
admin:x:1000:1000:Administrator,,,:/home/admin:/bin/bash
daniel:x:1001:1001:User,,,:/home/user:/bin/bash
respectively (awk ... print $1 }' "$_p")
admin
daniel
Now my problem is to save the awk output in an Array to use it as variable.
>>test.sh
...
f_checkuser
echo "Array items and indexes:"
for index in ${!LOKAL_USERS[*]}
do
printf "%4d: %s\n" $index ${array[$index]}
done
It could/should look like this example.
Array items and indexes:
0: admin
1: daniel
Specially i would become all Users of a System (not root,bin,sys,ssh,...) without blocked users in an array.
Perhaps someone has another idea to solve my Problem?
Are you trying to set the output of one script to an array? There is a bash has a way of doing this. For example,
a=( $(seq 1 10) ); echo ${a[1]}
will populate the array a with elements 1 to 10 and will print 2, the second line generated by seq (array index starts at zero). Simply replace the contents of $(...) with your script.
For those coming to this years later ...
bash 4 introduced readarray (aka mapfile) exactly for this purpose.
See also Bash capturing output of awk into array
One solution that works:
array=()
f_checkuser(){
...
...
tempfile="localuser.tmp"
touch ${tempfile}
awk -F':'...'{... print $1 }' "$_p" > ${HOME}/${tempfile}
getArrayfromFile "${tempfile}"
}
getArrayfromFile() {
i=0
while read line # Read a line
do
array[i]=$line # Put it into the array
i=$(($i + 1))
done < $1
}
f_checkuser
echo "Array items and indexes:"
for index in ${!array[*]}
do
printf "%4d: %s\n" $index ${array[$index]}
done
Output:
Array items and indexes:
0: daniel
1: admin
But I like more to observe without a new temp-file.
So, have someone any another idea without a temp-file?

How to read in csv file to array in bash script

I have written the following code to read in my csv file (which has a fixed number of columns but not a fixed number of rows) into my script as an array. I need it to be a shell script.
usernames x1 x2 x3 x4
username1, 5 5 4 2
username2, 6 3 2 0
username3, 8 4 9 3
My code
#!/bin/bash
set oldIFS = $IFS
set IFS=,
read -a line < something.csv
another option I have used is
#!/bin/bash
while IFS=$'\t' reaad -r -a line
do
echo $line
done < something.csv
for both I tried some test code to see what the size of the array line would be and I seem to be getting a size of 10 with the first one but the array only outputs username. For the second one, I seem to be getting a size of 0 but the array outputs the whole csv.
Help is much appreciated!
You may consider using AWK with a regular expression in FS variable like this:
awk 'BEGIN { FS=",?[ \t]*"; } { print $1,"|",$2,"|",$3,"|",$4,"|",$5; }'
or this
awk 'BEGIN { FS=",?[ \t]*"; OFS="|"; } { $1=$1; print $0; }'
($1=$1 is required to rebuild $0 with new OFS)

Search array for string and return position in unix

I have a 2 dimensional array made up of letters in the first dimension and numbers in the second dimension. eg
a,1
b,3
c,9
d,8
What I would like to do is search to array for a character and return it's corresponding number. eg if $var='c' then the return value would be 9.
Being unfamiliar with Unix arrays, I was wondering if anyone knew how to do this simply?
Thanks :)
Here is what I came up with
arr1=(a b c d)
arr2=(1 3 9 8)
for ((index=0; index<${#arr1[#]}; index++)); do
if [ "${arr1[$index]}" = "$myCharacter" ]; then
echo $arr2[$index]
return
fi
done
echo 'Character not found'
Not sure if there was a shorter way to do this but works okay....
Assuming you have a file called array.txt with input like you show in the question,
$ var=c
$ awk -v key="$var" -F, '$1 ~ key {print $2; found=1} END { if (! found) { print "Key "key" not found";}}' array.txt
9
$ var=z
$ awk -v key="$var" -F, '$1 ~ key {print $2; found=1} END { if (! found) { print "Key "key" not found";}}' array.txt
Key z not found
You can use bash to prepare an associative array and lookup the value using the character:
declare -A ARR
ARR=( [a]=1 [b]=3 [c]=9 [d]=8 )
echo ${ARR[c]}

Resources