How to grep ranges of numeric sequences from a column that contain several sequences - arrays

I'm new writing bash scripts and have the following question; how can extract ranges (first and last value) from a column which contains several incremental and decremental numeric sequences that can increase or decrease by 3 and jump to the next sequence once it detects that the increment is >3 e.g.:
1
4
7
20
23
26
100
97
94
It is required to receive as an output:
1,7
20,26
100,94

Using awk:
$ awk 'NR==1||sqrt(($0-p)*($0-p))>3{print p; printf "%s", $0 ", "} {p=$0} END{print $0}' file
1, 7
20, 26
100, 94
Explained:
NR==1 || sqrt(($0-p)*($0-p))>3 { # if the abs($0-previous) > 3
print p # print previous to end a sequence and
printf "%s", $0 ", " # start a new sequence
}
{ p=$0 }
END { print $0 }

this awk script gives you expected output:
awk '{v=$NF}
NR==1{printf "%s,",v;p=v;next}
(p-v)*(p-v)==9{p=v;next}
{printf "%s\n%s,",p,v;p=v}
END{print v}' file

Related

Computing sum of specific field from array entries

I have an array trf. Would like to compute the sum of the second element in each array entry.
Example of array contents
trf=( "2 13 144" "3 21 256" "5 34 389" )
Here is the current implementation, but I do not find it robust enough. For instance, it fails with arbitrary number of elements (but considered constant from one array element to another) in each array entry.
cnt=0
m=${#trf[#]}
while (( cnt < m )); do
while read -r one two three
do
sum+="$two"+
done <<< $(echo ${array[$count]})
let count=$count+1
done
sum+=0
result=`echo "$sum" | /usr/bin/bc -l`
You're making it way too complicated. Something like
#!/usr/bin/env bash
trf=( "2 13 144" "3 21 256" "5 34 389" )
declare -i sum=0 # Integer attribute; arithmetic evaluation happens when assigned
for (( n = 0; n < ${#trf[#]}; n++)); do
read -r _ val _ <<<"${trf[n]}"
sum+=$val
done
printf "%d\n" "$sum"
in pure bash, or just use awk (This is handy if you have floating point numbers in your real data):
printf "%s\n" "${trf[#]}" | awk '{ sum += $2 } END { print sum }'
You can use printf to print the entire array, one entry per line. On such an input, one loop (while read) would be sufficient. You can even skip the loop entirely using cut and tr to build the bc command. The echo 0 is there so that bc can handle empty arrays and the trailing + inserted by tr.
{ printf %s\\n "${trf[#]}" | cut -d' ' -f2 | tr \\n +; echo 0; } | bc -l
For your examples this generates prints 68 (= 13+21+34+0).
Try this printf + awk combo:
$ printf '%s\n' "${trf[#]}" | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68
Oh, it's already suggested by Shawn. Then with loop:
$ for item in "${trf[#]}"; do
echo $item
done | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68
For relatively small arrays a for/while double loop should be ok re: performance; placing the final sum in the $result variable (as in OP's code):
result=0
for element in "${trf[#]}"
do
while read -r a b c
do
((result+=b))
done <<< "${element}"
done
echo "${result}"
This generates:
68
For larger data sets I'd probably opt for one of the awk-only solutions (for performance reasons).

How to initialize a 2D array in awk

I am using a 2D array to save the number of recurrences of certain patterns. For instance:
$4 == "Water" {s[$5]["w"]++}
$4 == "Fire" {s[$5]["f"]++}
$4 == "Air" {s[$5]["a"]++}
where $5 can be attack1, attack2 or attack3. In the END{ }, I print out these values. However, some of these patterns don't exist. So for s["attack1"]["Air"] =0, my code prints whitespace. Hence I would like to know if there is a way to initialize the array in one line instead of initializing each of the elements I need, in the BEGIN{ }.
awk -f script.awk data
This is the command I am using to run my script. I am not allowed to use any other flags.
EDIT 1:
Here's the current output
Water Air Fire
attack1 554 12
attack2 14 24
attack3 6 3
Here's the output I desire:
Water Air Fire
attack1 554 0 12
attack2 14 24 0
attack3 6 0 3
You don't need to initialise the array in this case. Awk already has a default empty value, so you just have to change the way you print the value.
Observe:
awk 'BEGIN {print "Blank:", a[1];
print "Zero: ", a[1] + 0;
printf("Blank: %s\n", a[1]);
printf("Zero: %i\n", a[1])}'
Output:
Blank:
Zero: 0
Blank:
Zero: 0

How to find and extract all words appearing between brackets?

How to put into array all words appearing between brackets in text file and replace it with random one from that array?
cat math.txt
First: {736|172|201|109} {+|-|*|%|/} {21|62|9|1|0}
Second: John had {22|12|15} apples and lost {2|4|3}
I need output like:
First: 172-9
Second: John had 15 apples and lost 4
This is trivial in awk:
$ cat tst.awk
BEGIN{ srand() }
{
for (i=1; i<=NF; i++) {
if ( match($i,/{[^}]+}/) ) {
n = split(substr($i,RSTART+1,RLENGTH-2),arr,/\|/)
idx = int(rand() * (n-1)) + 1
$i = arr[idx]
}
printf "%s%s", $i, (i<NF?OFS:ORS)
}
}
$ awk -f tst.awk file
First: 172 - 9
Second: John had 22 apples and lost 2
$ awk -f tst.awk file
First: 201 - 9
Second: John had 12 apples and lost 2
$ awk -f tst.awk file
First: 201 + 62
Second: John had 12 apples and lost 2
$ awk -f tst.awk file
First: 201 + 1
Second: John had 12 apples and lost 4
Just check the math on the line where idx is set - I think it's right but didn't put much thought into it.
$ awk 'BEGIN{srand()} {for (i=1;i<=NF;i++) {if (substr($i,1,1)=="{") {split(substr($i,2,length($i)-2),a,"|"); j=1+int(rand()*length(a)); $i=a[j]}}; print}' math.txt
First: 172 + 1
Second: John had 12 apples and lost 3
How it works
BEGIN{srand()}
This initializes the random number generator.
for (i=1;i<=NF;i++) {if (substr($i,1,1)=="{") {split(substr($i,2,length($i)-2),a,"|"); j=1+int(rand()*length(a)); $i=a[j]}
This loops through each field. If any field starts with {, then substr is used to remove the first and last characters of the field and the remainder is split with | as the divider into array a. Then, a random index j into array a is chosen. Lastly, the field is replaced with a[j].
print
The line, as revised above, is printed.
The same code as above, but reformatted over multiple lines, is:
awk 'BEGIN{srand()}
{
for (i=1;i<=NF;i++) {
if (substr($i,1,1)=="{") {
split(substr($i,2,length($i)-2),a,"|")
j=1+int(rand()*length(a))
$i=a[j]
}
}
print
}' math.txt
Revised Problem with Spaces
Suppose that match.txt now looks like:
$ cat math.txt
First: {736|172|201|109} {+|-|*|%|/} {21|62|9|1|0}
Second: John had {22|12|15} apples and lost {2|4|3}
Third: John had {22 22|12 12|15 15} apples and lost {2 2|4 4|3 3}
The last line has spaces inside the {...}. This changes how awk divides up the fields. For this situation, we can use:
$ awk -F'[{}]' 'BEGIN{srand()} {for (i=2;i<=NF;i+=2) {n=split($i,a,"|"); j=1+int(n*rand()); $i=a[j]}; print}' math.txt
First: 736 + 62
Second: John had 12 apples and lost 3
Third: John had 15 15 apples and lost 2 2
How it works:
-F'[{}]'
This tells awk to use either } or { as field separators.
BEGIN{srand()}
This initializes the random number generator
{for (i=2;i<=NF;i+=2) {n=split($i,a,"|"); j=1+int(n*rand()); $i=a[j]}
With our new definition for the field separator, the even numbered fields are the ones inside braces. Thus, we split these fields on | and randomly select one piece and assign the field to that piece: $i=a[j].
print
Having modified the line as above, we now print it.
You can try this awk:
awk -F'[{}]' 'function rand2(n) {
srand();
return 1 + int(rand() * n);
}
{
for (i=1; i<=NF; i++)
if (i%2)
printf $i OFS;
else {
split($i, arr, "|");
printf arr[rand2(length(arr))]
};
printf ORS
}' math.txt
First: 736 - 62
Second: John had 22 apples and lost 2
More runs of above may produce:
First: 172 * 9
Second: John had 12 apples and lost 4
First: 109 / 0
Second: John had 15 apples and lost 3
First: 201 % 1
Second: John had 12 apples and lost 3
...
...

awk, declare array embracing FNR and field, output

I would like to declare an array of a certain number of lines, that means from line 10 to line 78, as an example. Could be other number, this is just an example.
My sample gives me that range of lines on stdout but sets "1" in between that lines. Can anybody tell me how to get rid of that "1"?
Sample as follows should go to stdout and embraces the named lines.
awk '
myarr["range-one"]=NR~/^2$/ , NR~/^8$/;
{print myarr["range-one"]};' /home/$USER/uplog.txt;
That is giving me this output:
0
12:33:49 up 3:57, 2 users, load average: 0,61, 0,37, 0,22 21.06.2014
1
12:42:02 up 4:06, 2 users, load average: 0,14, 0,18, 0,19 21.06.2014
1
12:42:29 up 4:06, 2 users, load average: 0,09, 0,17, 0,19 21.06.2014
1
12:43:09 up 4:07, 2 users, load average: 0,09, 0,16, 0,19 21.06.2014
1
Second question: how to set in that array one field of FNR or line?
When I do it this way there comes up the field that I wanted
awk ' NR~/^1$/ , NR~/^7$/ {print $3, $11; next} ; ' /home/$USER/uplog.txt;
But I need an array, thats why I'm asking. Any hints? Thanks in advance.
What the example script does
awk '
myarr["range-one"]=NR~/^2$/ , NR~/^8$/;
{print myarr["range-one"]};'
Your script is one of the more convoluted and decidedly less-than-obvious pieces of awk that I've ever seen. Let's take a simple input file:
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Line 11
Line 12
The output from that is:
0
Line 2
1
Line 3
1
Line 4
1
Line 5
1
Line 6
1
Line 7
1
Line 8
1
0
0
0
0
Dissecting your script, it appears that the first line:
myarr["range-one"]=NR~/^2$/ , NR~/^8$/;
is equivalent to:
myarr["range-one"] = (NR ~ /^#$/, NR ~ /^8$/) { print }
That is, the value assigned to myarr["range-one"] is 1 inside the range of line numbers where NR is equal to 2 and is equal to 8, and 0 outside that range; further, when the value is 1, the line is printed.
The second line:
{print myarr["range-one"]};
print the value in myarr["range-one"] for each line of input. Thus, on the first line, the value 0 is printed. For lines 2 to 8, the line is printed followed by the value 1; for lines after that, the value 0 is printed once more.
What the question asks for
The question is not clear. It appears that lines 10 to 78 should be printed. In awk, there are essentially no variable declarations (we can debate about function parameters, but functions don't seem to figure in this). Therefore, declaring an array is not an option.
awk -v lo=10 -v hi=78 'NR >= lo && NR <= hi { print }'
This would print the lines between line 10 and line 78. It would be feasible to save the values in an array (a in the examples below). Said array could be indexed by NR or with a separate index starting at 0 or 1:
awk -v lo=10 -v hi=78 'NR >= lo && NR <= hi { a[NR] = $0 }' # Indexed by line number
awk -v lo=10 -v hi=78 'NR >= lo && NR <= hi { a[i++] = $0 }' # Indexed from 0
awk -v lo=10 -v hi=78 'NR >= lo && NR <= hi { a[++i] = $0 }' # Indexed from 1
Presumably, you'd also have an END block to do something with the data.
The semicolons in the original are both unnecessary. The blank line is ignored, of course.

Identify overlapping ranges in AWK

I have a file with rows of 3 columns (tab separated) eg:
2 45 100
And a second file with rows of 3 columns (tab separated) eg:
2 10 200
I want an awk command that matched the lines if $1 in both files matches and the range between $2-$3 in file one interstects at all with the range in $2-$3 in file 2. It can be within the range of values in file 2 or the range in file 2 can be within the range in file 1, or theer can just be a partial overlap. Any kind of intersect between the ranges would count as a match and then print the row in file 3.
My current code only matches if $1 and either $2 or $3 match, but doesn't work for when the ranges are within each other as in these cases the precise numbers don't match.
awk '
BEGIN {
FS = "\t";
}
FILENAME == ARGV[1] {
pair[ $1, $2, $3 ] = 1;
next;
}
{
if ( pair[ $1, $2, $3 ] == 1 ) {
print $1 $2 $3;
}
}
Example Input:
File1:
1 10 23
2 30 50
6 100 110
8 20 25
File2:
1 5 15
10 30 50
2 10 100
8 22 24
Here line 1(file1) matches line 1(file2) because the first column matches AND range 10-15 overlaps between both ranges
Line 2 (file1) matches line 3(file2) because first column matches and range of 30-50 is within range 10-100.
Line 4(file1) matches line 4(file2) because first column matches and the range 22-24 overlaps in both.
Therefore output would be lines 1,2 and 4 from file2 printed in a new output file.
Hope these examples help.
Your help is really appreciated.
Thank you in advance!
It is quite easy if you use join command to merge both files by its first field ($1):
If you only want the file2 lines as output:
join --nocheck-order <(sort -n file1) <(sort -n file2) | awk '{if ($2 >= $4 && $2 <= $5 || $3 >= $4 && $3 <= $5 || $4 >= $2 && $4 <= $3 || $5 >= $2 && $5 <= $3) {print $1" "$4" "$5;}}' -
Using your input files I got this output:
1 5 15
2 10 100
8 22 24

Resources