Awk multiply column by a specific value - arrays

I'm quite new to awk, that I am using more and more to process the output files from a model I am running. Right now, I am stuck with a multiplication issue.
I would like to calculate relative change in percentage.
Example:
A B
1 150 0
2 210 10
3 380 1000
...
I would like to calculate Ax = (Ax-A1)/A1 * 100.
Output:
New_A B
1 0 0
2 10 40
3 1000 153.33
...
I can multiply columns together but don't know how to fix a value to a position in the text file (ie. Row 1 Column 1).
Thank you.

Assuming your actual file does not have the "A B" header and the row numbers in it:
$ cat file
150 0
210 10
380 1000
$ awk 'NR==1 {a1=$1} {printf "%s %.1f\n", $2, ($1-a1)/a1*100}' file | column -t
0 0.0
10 40.0
1000 153.3

Related

SWI-Prolog, Read the file and sum the numbers in the file

I am learning prolog and have the following problem:
Reads an input file, line by line. Then write the sum of each line to the output file.
Given an input.txt input file of the following form:
1 2 7 4 5
3 1 0 7 9
Each entry line are integers separated by a space.
? - calculate (‘input.txt’, ’output.txt’).
Here is the content of the output.txt file:
19
20
I have tried many ways but still not working, hope someone can help me
go :-
setup_call_cleanup(
open('output.txt', write, Out),
forall(file_line_sum('input.txt', Sum), writeln(Out, Sum)),
close(Out)
).
file_line_sum(File, Sum) :-
file_line(File, Line),
line_sum(Line, Sum).
line_sum(Line, Sum) :-
split_string(Line, " ", "", NumsStr),
maplist(string_number, NumsStr, Nums),
sum_list(Nums, Sum).
string_number(Str, Num) :-
number_string(Num, Str).
file_line(File, Line) :-
setup_call_cleanup(
open(File, read, In),
stream_line(In, Line),
close(In)
).
stream_line(In, Line) :-
repeat,
read_line_to_string(In, Line1),
(Line1 == end_of_file -> !, fail ; Line = Line1).
Contents of input.txt:
1 2 7 4 5
3 1 0 7 9
123 456 7890
Result in swi-prolog:
?- time(go).
% 115 inferences, 0.001 CPU in 0.001 seconds (90% CPU, 195568 Lips)
Generated output.txt:
19
20
8469

In AWK: Count number of ocurrences in a column in a tab separated file and write data into a new tsv file

I have data stored in a large (20Gb) tab separated text file, as the sample below (input.txt):
1234 567 T 0
1267 890 Z 1
1269 908 T 1
3142 789 T 0
7896 678 Z 0
I would like to count the occurrences of each entry in Column 4, and write this automatically into a new tab separated file.
I would like to see the following in output.txt:
0 3
1 2
Can anybody suggest a fast way to do this with AWK?
awk '{ count[$4]++ } END { for (i in count) printf "%s\t%d\n", i, count[i] }' \
big.file.txt
For each value in column 4, increment the counter for that value. At the end, print each value found and its count. This prints the values in an indeterminate order. If you want it in some order, either post-process the output with sort or sort the keys inside awk and print in the sorted key order.

Issues with manipulating matrix transposed using bash script

I wrote a bash script that is supposed to calculate the statistical average and median of each columns of an input file. The input file format is shown below. Each number is separated by a tab.
1 2 3
3 2 8
3 4 2
My approach is to first transpose the matrix, so that rows become columns and vice versa. The transposed matrix is stored in a temporary text file. Then, I calculated the average and median for rows. However, the script gives me the wrong output. First of all, the array that holds the average and median for each column only produces one output. Secondly, the median value calculated is incorrect.
After a bit of code inspection and testing, I discovered that while the transposed matrix did get written to the text file, it is not read correctly by the script. Specifically, each line read only gives one number. Below is my script.
#if column is chosen instead
89 if [[ $initial == "-c" ]]
90 then
91 echo "Calculating column stats"
92
93 #transpose columns to row to make life easier
94 WORD=$(head -n 1 $filename | wc -w); #counts the number of columns
95 for((index=1; index<=$WORD; index++)) #loop it over the number of columns
96 do
97 awk '{print $'$index'}' $filename | tr '\n' ' ';echo; #compact way of performing a row-col transposition
98 #prints the column as determined by $index, and then translates new-line with a tab
99 done > tmp.txt
100
101 array=()
102 averageArray=()
103 medianArray=()
104 sortedArray=()
105
106 #calculate average and median, just like the one used for rows
107 while read -a cols
108 do
109 total=0
110 sum=0
111
112 for number in "${cols[#]}" #for every item in the transposed column
113 do
114 (( sum += $number )) #the total sum of the numbers in the column
115 (( total++ )) #the number of items in the column
116 array+=( $number )
117 done
118
119 sortedArray=( $( printf "%s\n" "${array[#]}" | sort -n) )
120 arrayLength=${#sortedArray[#]}
121 #echo sorted array is $sortedArray
122 #based on array length, construct the median array
123 if [[ $(( arrayLength % 2 )) -eq 0 ]]
124 then #even
125 upper=$(( arrayLength / 2 ))
126 lower=$(( (arrayLength/2) - 1 ))
127 median=$(( (${sortedArray[lower]} + ${sortedArray[upper]}) / 2 ))
128 #echo median is $median
129 medianArray+=$index
130 else #odd
131 middle=$(( (arrayLength) / 2 ))
132 median=${sortedArray[middle]}
133 #echo median is $median
134 medianArray+=$index
135 fi
136 averageArray+=( $((sum/total)) ) #the final row array of averages that is displayed
137
138 done < tmp.txt
139 fi
Thanks for the help.

how to split file in arrays and find maximum value in each of them

I have a file:
1 0.5
2 0.7
3 0.55
4 0.7
5 0.45
6 0.8
7 0.75
8 0.3
9 0.35
10 0.5
11 0.65
12 0.75
I want to split the file into 4 arrays ending on every next 3rd line and then to find the maximum value in the second column for every array. So this file the outcome would be the:
3 0.7
6 0.8
9 0.75
12 0.75
I have managed so far to split the file into several by
awk 'NR%3==1{x="L"++i;}{print > x}' filename
then to find the maximum in every file:
awk 'BEGIN{max=0}{if(($2)>max) max=($2)}END {print $1,max}'
However, this creates additional files which is fine for this example but in reality the original file contains 65 million lines so I will be a bit overwhelmed by the amount of files and I am trying to avoid it by writing a short script which will combine both of the mentioned above.
I tried this one:
awk 'BEGIN {for (i=1; i<=12; i+=3) {max=0} {if(($2)>max) max=($2)}}END {print $1,max}' Filename
but it produces something irrelevant.
So if you can help me out it will be much appreciated!
You could go for something like this:
awk 'NR % 3 == 1 || $2 > max {max = $2} NR % 3 == 0 {print $1, max}' file
The value of max is always reset every three rows and updated if value of the second column is greater than it. At the end of every group of three, the first column and the max are printed.

nested for loops in awk to count number of fields matching values

I have a file with two columns (1.4 million rows) that looks like:
CLM MXL
0 0
0 1
1 1
1 1
0 0
29 42
0 0
30 15
I would like to count the instances of each possible combination of values; for example if there are x number of lines where column CLM equals 0 and column MXL matches 1, I would like to print:
0 1 x
Since the maximum value of column CLM is 188 and the maximum value of column MXL is 128, I am trying to use a nested for loop in awk that looks something like:
awk '{for (i=0; i<=188; i++) {for (j=0; j<=128; j++) {if($9==i && $10==j) {print$0}}}}' 1000Genomes.ALL.new.txt > test
But this only prints out the original file, which makes sense, I just don't know how to correctly write a for loop that prints out one file for each combination of values, which I can then wc, or print out one file with counts of each combination. Any solution in awk, bash script, perl script would be great.
1. A Pure awk Solution
$ awk 'NR>1{c[$0]++} END{for (k in c)print k,c[k]}' file | sort -n
0 0 3
0 1 1
1 1 2
29 42 1
30 15 1
How it works
The code uses a single variable c. c is an associative array whose keys are lines in the file and whose values are the number of occurrences.
NR>1{c[$0]++}
For every line except the first (which has the headings), this increments the count for the combination in that line.
END{for (k in c)print k,c[k]}
This prints out the final counts.
sort -n
This is just for aesthetics: it puts the output lines in a predictable order.
2. Alternative using uniq -c
$ tail -n+2 file | sort -n | uniq -c | awk '{print $2,$3,$1}'
0 0 3
0 1 1
1 1 2
29 42 1
30 15 1
How it works
tail -n+2 file
This prints all but the first line of the file. The purpose of this is to remove the column headings.
sort -n | uniq -c
This sorts the lines and then counts the duplicates.
awk '{print $2,$3,$1}
uniq -c puts the counts first and you wanted the counts to be the last on the line. This just rearranges the columns to the format that you wanted.

Resources