Computing sum of specific field from array entries - arrays

I have an array trf. Would like to compute the sum of the second element in each array entry.
Example of array contents
trf=( "2 13 144" "3 21 256" "5 34 389" )
Here is the current implementation, but I do not find it robust enough. For instance, it fails with arbitrary number of elements (but considered constant from one array element to another) in each array entry.
cnt=0
m=${#trf[#]}
while (( cnt < m )); do
while read -r one two three
do
sum+="$two"+
done <<< $(echo ${array[$count]})
let count=$count+1
done
sum+=0
result=`echo "$sum" | /usr/bin/bc -l`

You're making it way too complicated. Something like
#!/usr/bin/env bash
trf=( "2 13 144" "3 21 256" "5 34 389" )
declare -i sum=0 # Integer attribute; arithmetic evaluation happens when assigned
for (( n = 0; n < ${#trf[#]}; n++)); do
read -r _ val _ <<<"${trf[n]}"
sum+=$val
done
printf "%d\n" "$sum"
in pure bash, or just use awk (This is handy if you have floating point numbers in your real data):
printf "%s\n" "${trf[#]}" | awk '{ sum += $2 } END { print sum }'

You can use printf to print the entire array, one entry per line. On such an input, one loop (while read) would be sufficient. You can even skip the loop entirely using cut and tr to build the bc command. The echo 0 is there so that bc can handle empty arrays and the trailing + inserted by tr.
{ printf %s\\n "${trf[#]}" | cut -d' ' -f2 | tr \\n +; echo 0; } | bc -l
For your examples this generates prints 68 (= 13+21+34+0).

Try this printf + awk combo:
$ printf '%s\n' "${trf[#]}" | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68
Oh, it's already suggested by Shawn. Then with loop:
$ for item in "${trf[#]}"; do
echo $item
done | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68

For relatively small arrays a for/while double loop should be ok re: performance; placing the final sum in the $result variable (as in OP's code):
result=0
for element in "${trf[#]}"
do
while read -r a b c
do
((result+=b))
done <<< "${element}"
done
echo "${result}"
This generates:
68
For larger data sets I'd probably opt for one of the awk-only solutions (for performance reasons).

Related

Find items common between two Bash arrays

I have below shell script in which I have two arrays number1 and number2. I have a variable range which has list of numbers.
Now I need to figure out what are all numbers which are in number1 array are also present in range variable. Similarly for number2 array as well. Below is my shell script and it is working fine.
number1=(1220 1374 415 1097 1219 557 401 1230 1363 1116 1109 1244 571 1347 1404)
number2=(411 1101 273 1217 547 1370 286 1224 1362 1091 567 561 1348 1247 1106 304 435 317)
range=90,197,521,540,552,554,562,569:570,573,576,579,583,594,597,601,608:609,611,628,637:638,640:641,644:648
range_f=" "$(eval echo $(echo $range | perl -pe 's/(\d+):(\d+)/{$1..$2}/g;s/,/ /g;'))" "
echo "$range_f"
for item in "${number1[#]}"; do
if [[ $range_f =~ " $item " ]] ; then
new_number1+=($item)
fi
done
echo "new list: ${new_number1[#]}"
for item in "${number2[#]}"; do
if [[ $range_f =~ " $item " ]] ; then
new_number2+=($item)
fi
done
echo "new list: ${new_number2[#]}"
Is there any better way to write above stuff? As of now I have two for loops iterating and then figuring out new_number1 and new_number2 arrays.
Note:
Numbers like 644:648 means, it starts with 644 and ends with 648. It is just short form.
You can use comm with process substitution instead of looping:
mapfile -t new_number1 < <(comm -12 <(printf '%s\n' "${number1[#]}" | sort) <(printf '%s\n' $range_f | sort))
mapfile -t new_number2 < <(comm -12 <(printf '%s\n' "${number2[#]}" | sort) <(printf '%s\n' $range_f | sort))
mapfile -t name reads from the nested process substitution into the named array
printf ... | sort pair provides the sorted input streams for comm
comm -12 emits the items common to the two streams
Aside from codeforester's answer, I can think of two other ways of doing this:
Load the values of $range as keys of an associative array. The
values will be 1. Loop through each member of ${number1[#]} and
${number2[#]}, testing them against the values in the associative
array.
Use codeforester's printf ... | sort trick, but pipe both the list
and the range through sort | uniq -c, then grep for the
duplicates.
I'm not sure if either one of these is an actual improvement on your code. ... I would create a 'find duplicates' shell function, but otherwise your code looks solid.

Multiply array elements

So I'm learning bash and need to do a simple script to multiply array elements by calling a function.
My code so far is this, but it ain't working at all. I believe there is a much simpler way than this (incrementing the pos variable so as to move to the next array element feels simply wrong).
array=(1 2 3 4 5 100)
sum=0
pos=1
function multiplicate {
for i in ${array[*]};do
sum=$(($i * $array[pos]))
let pos++
done
}
multiplicate
echo $sum
I did my best to google the solution but was unable to find any relevant information, I found how to sum by using bc, but it simply wouldn't work by replacing + with *.
Use this script:
#!/bin/bash
array=(1 2 3 4 5 100)
function multiplicate {
local mul=1
for i in "${array[#]}"; do
((mul *= i))
done
echo "$mul"
}
multiplicate
$ ./script
12000
Or better yet:
#!/bin/bash
multiplicate() {
local mul=1
for i
do ((mul *= i))
done
echo "$mul"
}
multiplicate 1 2 3 4 5 100
And if you like to play with string variables use this:
multiplicate() { local IFS=* ; echo $(( $* )); }
multiplicate 1 2 3 4 5 100
Here is a method using bc:
multiply ()
{
printf '%s\n' "$#" | paste -s -d '*' | bc
}
Used as follows:
$ multiply 1 2 3 4 5 100
12000
The first command in the pipeline prints each array elements on a separate line:
$ printf '%s\n' 1 2 3 4 5 100
1
2
3
4
5
100
The paste -s ("serial") then turns the output into a single line again, but the elements now separated by *:
$ printf '%s\n' 1 2 3 4 5 100 | paste -s -d '*'
1*2*3*4*5*100
And bc finally evaluates the expression.
Alternatively, we can save a subshell and skip bc:
multiply () {
echo $(( $(printf '%s\n' "$#" | paste -s -d '*') ))
}
This uses an arithmetic expression to evaluate the output of printf and paste (which now is in a command substitution), but readability suffers a bit.
Alternatively, in pure Bash (hat tip sorontar):
multiply () {
local IFS='*'
echo "$(( $* ))"
}
This sets the field separator IFS to * so the arguments, $*, expand to a string separated by *, which is then evaluated in the arithmetic expression $(()).

How to initialize a 2D array in awk

I am using a 2D array to save the number of recurrences of certain patterns. For instance:
$4 == "Water" {s[$5]["w"]++}
$4 == "Fire" {s[$5]["f"]++}
$4 == "Air" {s[$5]["a"]++}
where $5 can be attack1, attack2 or attack3. In the END{ }, I print out these values. However, some of these patterns don't exist. So for s["attack1"]["Air"] =0, my code prints whitespace. Hence I would like to know if there is a way to initialize the array in one line instead of initializing each of the elements I need, in the BEGIN{ }.
awk -f script.awk data
This is the command I am using to run my script. I am not allowed to use any other flags.
EDIT 1:
Here's the current output
Water Air Fire
attack1 554 12
attack2 14 24
attack3 6 3
Here's the output I desire:
Water Air Fire
attack1 554 0 12
attack2 14 24 0
attack3 6 0 3
You don't need to initialise the array in this case. Awk already has a default empty value, so you just have to change the way you print the value.
Observe:
awk 'BEGIN {print "Blank:", a[1];
print "Zero: ", a[1] + 0;
printf("Blank: %s\n", a[1]);
printf("Zero: %i\n", a[1])}'
Output:
Blank:
Zero: 0
Blank:
Zero: 0

Getting output of shell command in bash array

I have a uniq -c output, that outputs about 7-10 lines with the count of each pattern that was repeated for each unique line pattern. I want to store the output of my uniq -c file.txt into a bash array. Right now all I can do is store the output into a variable and print it. However, bash currently thinks the entire output is just one big string.
How does bash recognize delimiters? How do you store UNIX shell command output as Bash arrays?
Here is my current code:
proVar=`awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c`
echo $proVar
And current output I get:
587 chr1 578 chr2 359 chr3 412 chr4 495 chr5 362 chr6 287 chr7 408 chr8 285 chr9 287 chr10 305 chr11 446 chr12 247 chr13 307 chr14 308 chr15 365 chr16 342 chr17 245 chr18 252 chr19 210 chr20 193 chr21 173 chr22 145 chrX 58 chrY
Here is what I want:
proVar[1] = 2051
proVar[2] = 1243
proVar[3] = 1068
...
proVar[22] = 814
proVar[X] = 72
proVar[Y] = 13
In the long run, I'm hoping to make a barplot based on the counts for each index, where every 50 counts equals one "=" sign. It will hopefully look like the below
chr1 ===========
chr2 ===========
chr3 =======
chr4 =========
...
chrX ==
chrY =
Any help, guys?
To build the associative array, try this:
declare -A proVar
while read -r val key; do
proVar[${key#chr}]=$val
done < <(awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c)
Note: This assumes that your command's output is composed of multiple lines, each containing one key-value pair; the single-line output shown in your question comes from passing $proVar to echo without double quotes.
Uses a while loop to read each output line from a process substitution (<(...)).
The key for each assoc. array entry is formed by stripping prefix chr from each input line's first whitespace-separated token, whereas the value is the rest of the line (after the separating space).
To then create the bar plot, use:
while IFS= read -r key; do
echo "chr${key} $(printf '=%.s' $(seq $(( ${proVar[$key]} / 50 ))))"
done < <(printf '%s\n' "${!proVar[#]}" | sort -n)
Note: Using sort -n to sort the keys will put non-numeric keys such as X and Y before numeric ones in the output.
$(( ${proVar[$key]} / 50 )) calculates the number of = chars. to display, using integer division in an arithmetic expansion.
The purpose of $(seq ...) is to simply create as many tokens (arguments) as = chars. should be displayed (the tokens created are numbers, but their content doesn't matter).
printf '=%.s' ... is a trick that effectively prints as many = chars. as there are arguments following the format string.
printf '%s\n' "${!proVar[#]}" | sort -n sorts the keys of the assoc. array numerically, and its output is fed via a process substitution to the while loop, which therefore iterates over the keys in sorted order.
You can create an array in an assignment using parentheses:
proVar=(`awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c`)
There's no built-in way to create an associative array directly from input. For that you'll need an additional loop.

How can I find the sum of the elements of an array in Bash?

I am trying to add the elements of an array that is defined by user input from the read -a command. How can I do that?
read -a array
tot=0
for i in ${array[#]}; do
let tot+=$i
done
echo "Total: $tot"
Given an array (of integers), here's a funny way to add its elements (in bash):
sum=$(IFS=+; echo "$((${array[*]}))")
echo "Sum=$sum"
e.g.,
$ array=( 1337 -13 -666 -208 -408 )
$ sum=$(IFS=+; echo "$((${array[*]}))")
$ echo "$sum"
42
Pro: No loop, no subshell!
Con: Only works with integers
Edit (2012/12/26).
As this post got bumped up, I wanted to share with you another funny way, using dc, which is then not restricted to just integers:
$ dc <<< '[+]sa[z2!>az2!>b]sb1 2 3 4 5 6 6 5 4 3 2 1lbxp'
42
This wonderful line adds all the numbers. Neat, eh?
If your numbers are in an array array:
$ array=( 1 2 3 4 5 6 6 5 4 3 2 1 )
$ dc <<< '[+]sa[z2!>az2!>b]sb'"${array[*]}lbxp"
42
In fact there's a catch with negative numbers. The number '-42' should be given to dc as _42, so:
$ array=( -1.75 -2.75 -3.75 -4.75 -5.75 -6.75 -7.75 -8.75 )
$ dc <<< '[+]sa[z2!>az2!>b]sb'"${array[*]//-/_}lbxp"
-42.00
will do.
Pro: Works with floating points.
Con: Uses an external process (but there's no choice if you want to do non-integer arithmetic — but dc is probably the lightest for this task).
My code (which I actually utilize) is inspired by answer of gniourf_gniourf. I personally consider this more clear to read/comprehend, and to modify. Accepts also floating points, not just integers.
Sum values in array:
arr=( 1 2 3 4 5 6 7 8 9 10 )
IFS='+' sum=$(echo "scale=1;${arr[*]}"|bc)
echo $sum # 55
With small change, you can get the average of values:
arr=( 1 2 3 4 5 6 7 8 9 10 )
IFS='+' avg=$(echo "scale=1;(${arr[*]})/${#arr[#]}"|bc)
echo $avg # 5.5
gniourf_gniourf's answer is excellent since it doesn't require a loop or bc. For anyone interested in a real-world example, here's a function that totals all of the CPU cores reading from /proc/cpuinfo without messing with IFS:
# Insert each processor core count integer into array
cpuarray=($(grep cores /proc/cpuinfo | awk '{print $4}'))
# Read from the array and replace the delimiter with "+"
# also insert 0 on the end of the array so the syntax is correct and not ending on a "+"
read <<< "${cpuarray[#]/%/+}0"
# Add the integers together and assign output to $corecount variable
corecount="$((REPLY))"
# Echo total core count
echo "Total cores: $corecount"
I also found the arithmetic expansion works properly when calling the array from inside the double parentheses, removing the need for the read command:
cpuarray=($(grep cores /proc/cpuinfo | awk '{print $4}'))
corecount="$((${cpuarray[#]/%/+}0))"
echo "Total cores: $corecount"
Generic:
array=( 1 2 3 4 5 )
sum="$((${array[#]/%/+}0))"
echo "Total: $sum"
I'm a fan of brevity, so this is what I tend to use:
IFS="+";bc<<<"${array[*]}"
It essentially just lists the data of the array and passes it into BC which evaluates it. The "IFS" is the internal field separate, it essentially specifies how to separate arrays, and we said to separate them with plus signs, that means when we pass it into BC, it receives a list of numbers separated by plus signs, so naturally it adds them together.
Another dc & bash method:
arr=(1 3.88 7.1 -1)
dc -e "0 ${arr[*]/-/_} ${arr[*]/*/+} p"
Output:
10.98
The above runs the expression 0 1 3.88 7.1 _1 + + + + p with dc. Note the dummy value 0 because there's one too many +s, and also note the usual negative number prefix - must be changed to _ in dc.
arr=(1 2 3) //or use `read` to fill the array
echo Sum of array elements: $(( ${arr[#]/%/ +} 0))
Sum of array elements: 6
Explanation:
"${arr[#]/%/ +}" will return 1 + 2 + 3 +
By adding additional zero at the end we will get 1 + 2 + 3 + 0
By wrapping this string with BASH's math operation like this$(( "${arr[#]/%/ +} 0")), it will return the sum instead
This could be used for other math operations.
For subtracting just use - instead
For multiplication use * and 1 instead of 0
Can be used with logic operators too.
BOOL AND EXAMPLE - check if all items are true (1)
arr=(1 0 1)
if [[ $((${arr[#]/%/ &} 1)) -eq 1 ]]; then echo "yes"; else echo "no"; fi
This will print: no
BOOL OR EXAMPLE - check if any item is true (1)
arr=(1 0 0)
if [[ $((${arr[#]/%/ |} 0)) -eq 1 ]]; then echo "yes"; else echo "no"; fi
This will print: yes
A simple way
function arraySum
{
sum=0
for i in ${a[#]};
do
sum=`expr $sum + $i`
done
echo $sum
}
a=(7 2 3 9)
echo -n "Sum is = "
arraySum ${a[#]}
I find this very simple using an increasing variable:
result2=0
for i in ${lineCoffset[#]};
do
result2=$((result2+i))
done
echo $result2

Resources