print records using awk with array

print records using awk with array - arrays

/stu/class/sub/med/science score: 100 name: A status: Pass Roll: 12
/stu/class/sub/med/hist score: 75 name: B status: Pass Roll: 13
/stu/class/sub/med/comp score: 96 name: C status: Pass Roll: 14
/stu/class/sub/med/geo score: 40 name: D status: Fail Roll:15
/stu/class/sub/med/mat score: 100 name: D status: Pass Roll:16
I have above details in a file say "input.txt" and i want to print the $1 if those of scored greater than 95. In this case, I should get the following output.
science score: 100
comp score: 96
mat score: 100
Can someone help me to code this. I am trying to use awk with array. So far i have done this, but did not yet complete.
declare -A list
i=1
while read line; do
list[$i]=$(echo $line | awk '{print $3}')
((i++));
done < input.txt
for i in "${list[#]}"; do
echo "$i";
if ...
done

It is as simple as this:
awk '$3 > 95 {sub(/.*\//,"",$1); print $1, $2, $3}' test.txt
awk separates lines by spaces by default. The third field is the actual score, so this is where you check for a value greater than 95. To get the output you stated, you have to print the first three fields, and not only the first (which would be the path only).
Update: Thanks to #jaypal for the comment on how to remove the path.
Note however, that this only works if you don't have any spaces in your path- and filenames.

$ awk -F'[/ ]+' '$8>95{print $6, $7, $8}' file
science score: 100
comp score: 96
mat score: 100

Try this:
awk '{if ($3 > 95) print $1, $2, $3;}' < input.txt
Redirection is optional therefore, the following will work as well.
awk '{if ($3 > 95) print $1, $2, $3}' input.txt

Related

Computing sum of specific field from array entries

I have an array trf. Would like to compute the sum of the second element in each array entry.
Example of array contents
trf=( "2 13 144" "3 21 256" "5 34 389" )
Here is the current implementation, but I do not find it robust enough. For instance, it fails with arbitrary number of elements (but considered constant from one array element to another) in each array entry.
cnt=0
m=${#trf[#]}
while (( cnt < m )); do
while read -r one two three
do
sum+="$two"+
done <<< $(echo ${array[$count]})
let count=$count+1
done
sum+=0
result=`echo "$sum" | /usr/bin/bc -l`

You're making it way too complicated. Something like
#!/usr/bin/env bash
trf=( "2 13 144" "3 21 256" "5 34 389" )
declare -i sum=0 # Integer attribute; arithmetic evaluation happens when assigned
for (( n = 0; n < ${#trf[#]}; n++)); do
read -r _ val _ <<<"${trf[n]}"
sum+=$val
done
printf "%d\n" "$sum"
in pure bash, or just use awk (This is handy if you have floating point numbers in your real data):
printf "%s\n" "${trf[#]}" | awk '{ sum += $2 } END { print sum }'

You can use printf to print the entire array, one entry per line. On such an input, one loop (while read) would be sufficient. You can even skip the loop entirely using cut and tr to build the bc command. The echo 0 is there so that bc can handle empty arrays and the trailing + inserted by tr.
{ printf %s\\n "${trf[#]}" | cut -d' ' -f2 | tr \\n +; echo 0; } | bc -l
For your examples this generates prints 68 (= 13+21+34+0).

Try this printf + awk combo:
$ printf '%s\n' "${trf[#]}" | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68
Oh, it's already suggested by Shawn. Then with loop:
$ for item in "${trf[#]}"; do
echo $item
done | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68

For relatively small arrays a for/while double loop should be ok re: performance; placing the final sum in the $result variable (as in OP's code):
result=0
for element in "${trf[#]}"
do
while read -r a b c
do
((result+=b))
done <<< "${element}"
done
echo "${result}"
This generates:
68
For larger data sets I'd probably opt for one of the awk-only solutions (for performance reasons).

calculate the average in bash? [duplicate]

This question already has answers here:
Scripts for computing the average of a list of numbers in a data file
(3 answers)
How can I read first n and last n lines from a file?
(10 answers)
Closed 2 years ago.
I have a fasta file with a sequences (file with text) like:
file.fasta
>seq_1
AGCTAATACTTGTCCACGTTGTACTTCTTCACGAGAAACACCACGTAATAAAGCACCGAT
GTTATCTCCAGCTTCAGCGTAATCTAATAATTTACGGAACATTTCTACACCTGTAACTGT
AGTTTTAGCTGGCTCTTCAGTTAAACCGATGATTTCAACTTCTTCACCAACTTTAACTTG
TCCACGCTCAACACGTCCAGTTGCAACTGTACCACGACCAGTGATTGAGAATACGTCCTC
AACTGGCATCATGAATGGTTTGTCAGAATCACGTTCTGGAGTTGGGATGTACTCATCAAC
TGCGTTCATTAATTCCATGATTTTTTCTTCGTACTCTTCAACGCCTTCTAATGCTTTTAA
AGCAGATCCAGCGATTACAGGTACATCGTCACCAGGGAAGTCATATTCAGATAATAAGTC
ACGAACTTCC
>seq_2
AGCTAATACTTGTCCACGTTGTACTTCTTCACGAGAAACACCACGTAATAAAGCACCGAT
GTTATCTCCAGCTTCAGCGTAATCTAATAATTTACGGAACATTTCTACACCTGTAACTGT
AGTTTTAGATGGCTCTTCAGTTAAACCGATGATTTCAACTTCTTCACCAACTTTAACTTG
TCCACGCTCAACACGTCCAGTTGCAACTGTACCACGACCAGTGATTGAGAATACGTCCTC
AACTGGCATCATGAATGGTTTGTCAGAATCACGTTCTGGAGTTGGGATGTACTCATCAAC
TGCGTTCATTAATTCCATGATTTTATCTTCGTACTCTTCAACGCCTTCTAATGCTTTTAA
AGCAGATCCAGCGATTACAGGTACATCGTCACCAGGGAAGTCATATTCAGATAATAAGTC
ACGAACTTCC
>seq_3
AGCTAATACTTGTCCACGTTGTACTTCTTCACGAGAAACACCACGTAATAAAGCACCGAT
GTTATCTCCAGCTTCAGCGTAATCTAATAATTTACGGAACATTTCTACACCTGTAACTGT
AGTTTTAGATGGCTCTTCAGTTAAACCGATGATTTCAACTTCTTCACCAACTTTAACTTG
TCCACGCTCAACACGTCCAGTTGCAACTGTACCACGACCAGTGATTGAGAATACGTCCTC
AACTGGCATCATGAATGGTTTGTCAGAATCACGTTCTGGAGTTGGGATGTACTCATCAAC
TGCATTCATTAATTCCATGATTTTATCTTCGTACTCTTCAACGCCTTCTAATGCTTTTAA
AGCAGATCCAGCGATTACAGGTACATCGTCACCAGGGAAGTCATATTCAGATAATAAGTC
ACGAACTTCC
............
>seq_n
AGCAGATCCAGCGATTACAGGTACATCGTCACCAGGGAAGTCATATTCAGATAATAAGTC
..............
So I want to calculate the average length of the strings avoiding the lines with >seq_, my code to obtain the length of each line is:
array_length=$(awk '/^>/ {print n $0; n="\n"}; !/^>/ {printf "%s", $0} END {print ""}' My_file.fasta | awk '!/^>/ {print length(), $0}' | sort -n| awk '{print $1}')
until here everything is ok, I got the fist column that correspond to the length of each string:
echo "$array_length"
203
207
222
231
232
243
255
258
261
268
279
291
307
316
.....
161581
208146
242398
259601
288468
301866
427209
531340
557978
840257
well the length in the array could be variable, in this case I just show part of them.
my problem is that I want to calculate the average of the $array_length (sum of all numbers/length of the array)
A second question is how to take the fist element of the array and the last one; in order to do that, I just add a tail -1 and head -n 1 to the end of the code
awk '/^>/ {print n $0; n="\n"}; !/^>/ {printf "%s", $0} END {print ""}' My_file.fasta | awk '!/^>/ {print length(), $0}' | sort -n| awk '{print $1}' | tail -1
awk '/^>/ {print n $0; n="\n"}; !/^>/ {printf "%s", $0} END {print ""}' My_file.fasta | awk '!/^>/ {print length(), $0}' | sort -n| awk '{print $1}' | head -n 1
I know that, with a file I do it like
cat file.txt | tail -1
cat file.txt | head -n 1
But I dont want to use the same code twice to obtain the $small_one (203) and $big_one (840257), I just want to take the fist and last element of the variable $array_length like the one that I show here, how can I do it?

Picking input record fields with AWK

Let's say we have a shell variable $x containing a space separated list of numbers from 1 to 30:
$ x=$(for i in {1..30}; do echo -n "$i "; done)
$ echo $x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
We can print the first three input record fields with AWK like this:
$ echo $x | awk '{print $1 " " $2 " " $3}'
1 2 3
How can we print all the fields starting from the Nth field with AWK? E.g.
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
EDIT: I can use cut, sed etc. to do the same but in this case I'd like to know how to do this with AWK.

Converting my comment to answer so that solution is easy to find for future visitors.
You may use this awk:
awk '{for (i=3; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
or pass start position as argument:
awk -v n=3 '{for (i=n; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file

Version 4: Shortest is probably using sub to cut off the first three fields and their separators:
$ echo $x | awk 'sub(/^ *([^ ]+ +){3}/,"")'
Output:
4 5 6 7 8 9 ...
This will, however, preserve all space after $4:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"")'
4 5
so if you wanted the space squeezed, you'd need to, for example:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"") && $1=$1'
4 5
with the exception that if there are only 4 fields and the 4th field happens to be a 0:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"")&&$1=$1'
$ [no output]
in which case you'd need to:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"") && ($1=$1) || 1'
0
Version 1: cut is better suited for the job:
$ cut -d\ -f 4- <<<$x
Version 2: Using awk you could:
$ echo -n $x | awk -v RS=\ -v ORS=\ 'NR>=4;END{printf "\n"}'
Version 3: If you want to preserve those varying amounts of space, using GNU awk you could use split's fourth parameter seps:
$ echo "1 2 3 4 5 6 7" |
gawk '{
n=split($0,a,FS,seps) # actual separators goes to seps
for(i=4;i<=n;i++) # loop from 4th
printf "%s%s",a[i],(i==n?RS:seps[i]) # get fields from arrays
}'

Adding one more approach to add all value into a variable and once all fields values are done with reading just print the value of variable. Change the value of n= as per from which field onwards you want to get the data.
echo "$x" |
awk -v n=3 '{val="";for(i=n; i<=NF; i++){val=(val?val OFS:"")$i};print val}'

With GNU awk, you can use the join function which has been a built-in include since gawk 4.1:
x=$(seq 30 | tr '\n' ' ')
echo "$x" | gawk '#include "join"
{split($0, arr)
print join(arr, 4, length(arr), "|")}
'
4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30
(Shown here with a '|' instead of a ' ' for clarity...)
Alternative way of including join:
echo "$x" | gawk -i join '{split($0, arr); print join(arr, 4, length(arr), "|")}'

Using gnu awk and gensub:
echo $x | awk '{ print gensub(/^([[:digit:]]+[[:space:]]){3}(.*$)/,"\\2",$0)}'
Using gensub, split the string into two sections based on regular expressions and print the second section only.

How to grep ranges of numeric sequences from a column that contain several sequences

I'm new writing bash scripts and have the following question; how can extract ranges (first and last value) from a column which contains several incremental and decremental numeric sequences that can increase or decrease by 3 and jump to the next sequence once it detects that the increment is >3 e.g.:
1
4
7
20
23
26
100
97
94
It is required to receive as an output:
1,7
20,26
100,94

Using awk:
$ awk 'NR==1||sqrt(($0-p)*($0-p))>3{print p; printf "%s", $0 ", "} {p=$0} END{print $0}' file
1, 7
20, 26
100, 94
Explained:
NR==1 || sqrt(($0-p)*($0-p))>3 { # if the abs($0-previous) > 3
print p # print previous to end a sequence and
printf "%s", $0 ", " # start a new sequence
}
{ p=$0 }
END { print $0 }

this awk script gives you expected output:
awk '{v=$NF}
NR==1{printf "%s,",v;p=v;next}
(p-v)*(p-v)==9{p=v;next}
{printf "%s\n%s,",p,v;p=v}
END{print v}' file

Bash Indented Output for Multiple Variables

I have a script that loops over every text file in a directory, and stores the content in variables. The content can be anywhere from 1-50 characters long. The amount of text files is unknown. I would like to print the content in such a way that each variable falls into a clean column.
for file in $LIBPATH/*.txt; do
name=$( awk 'FNR == 1 {print $0}' $file )
height=$( awk 'FNR == 2 {print $0}' $file )
weight=$( awk 'FNR == 3 {print $0}' $file )
echo $name $height $weight
done
This code produces the output:
Avril Stewart 99 54
Sally Kinghorn 170 60
John Young 195 120
While the desired output is:
Avril Stewart 99 54
Sally Kinghorn 170 60
John Young 195 120
Thanks!

Use printf:
printf '%-20s %3s %3s\n' "$name" "$height" "$weight"
%3s ensures that all fields use three characters, %-20s does the same for 20 characters, but the - in front makes the output left-aligned.
If you want to limit the output to e.g. 20 characters, you can use
printf '%-20.20s %3s %3s\n' "$name" "$height" "$weight"
This will give you a left aligned minimum width of 20 characters and a maximum width of 20 characters, in other words it will ensure that you always have exactly 20 characters.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

print records using awk with array - arrays

$ awk -F'[/ ]+' '$8>95{print $6, $7, $8}' file science score: 100 comp score: 96 mat score: 100

Try this: awk '{if ($3 > 95) print $1, $2, $3;}' < input.txt Redirection is optional therefore, the following will work as well. awk '{if ($3 > 95) print $1, $2, $3}' input.txt

Related

Computing sum of specific field from array entries

calculate the average in bash? [duplicate]

Picking input record fields with AWK

How to grep ranges of numeric sequences from a column that contain several sequences

Bash Indented Output for Multiple Variables

Categories

Resources