Edit a string in shell script and display it as an array - arrays

Input:
1234-A1;1235-A2;2345-B1;5678-C2;2346-D5
Expected Output:
1234
1235
2345
5678
2346
Input shown is a user input. I want to store it in an array and do some operations to display as shown in 'Expected Output'
I have done it in perl, but want to achieve it in shell script. Please help in achieving this.

To split an input text to an array you can follow this technique:
IFS="[;-]" read -r -a arr <<< "1234-A1;1235-A2;2345-B1;5678-C2;2346-D5"
printf '%s\n' "${arr[#]}"
1234
A1
1235
A2
2345
B1
5678
C2
2346
D5
If you want to keep only 1234,1234, etc as per your expected output you can either to use the corresponding array elements (0-2-4-etc) or to do something like this:
a="1234-A1;1235-A2;2345-B1;5678-C2;2346-D5"
IFS="[;]" read -r -a arr <<< "${a//-[A-Z][0-9]/}" #or more generally <<< "${a//-??/}"
declare -p arr #This asks bash to print the array for us
#Output
declare -a arr='([0]="1234" [1]="1235" [2]="2345" [3]="5678" [4]="2346")'
# Array can now be printed or used elsewhere in your script. Array counting starts from zero

#Yash:#try:
echo "1234-A1;1235-A2;2345-B1;5678-C2;2346-D5" | awk '{gsub(/-[[:alnum:]]+/,"");gsub(/;/,RS);print}'
Substituting all alpha bate, numbers with NULL, then substituting all semi colons to RS(record separator) which is a new line by default.

Thanks #George and #Vipin.
Based on your inputs the solution which best suites my environment is as under:
i=0
a="1234-A1;1235-A2;2345-B1;5678-C2;2346-D5"
IFS="[;]" read -r -a arr <<< "${a//-??/}"
#declare -p arr
for var in "${arr[#]}"
do
echo " var $((i++)) is : $var"
done
Output:
var 0 is : 1234
var 1 is : 1235
var 2 is : 2345
var 3 is : 5678
var 4 is : 2346

Try this -
awk -F'[-;]' '{for(i=1;i<=NF;i++) if(i%2!=0) {print $i}}' f
1234
1235
2345
5678
2346
OR
echo "1234-A1;1235-A2;2345-B1;5678-C2;2346-D5"|tr ';' '\n'|cut -d'-' -f1
OR
As #George Vasiliou Suggested -
awk -F'[-;]' '{for(i=1;i<=NF;i+=2) {print $i}}'f
If Data needs to store in Array and you are using gawk, try below -
awk -F'[;-]' -v k=1 '{for(i=1;i<=NF;i++) if($i !~ /[[:alpha:]]/) {a[k++]=$i}} END {
> PROCINFO["sorted_in"] = "#ind_str_asc"
> for(k in a) print k,a[k]}' f
1 1234
2 1235
3 2345
4 5678
5 2346
PROCINFO["sorted_in"] = "#ind_str_asc" used to print the data in
sorted order.

Related

Computing sum of specific field from array entries

I have an array trf. Would like to compute the sum of the second element in each array entry.
Example of array contents
trf=( "2 13 144" "3 21 256" "5 34 389" )
Here is the current implementation, but I do not find it robust enough. For instance, it fails with arbitrary number of elements (but considered constant from one array element to another) in each array entry.
cnt=0
m=${#trf[#]}
while (( cnt < m )); do
while read -r one two three
do
sum+="$two"+
done <<< $(echo ${array[$count]})
let count=$count+1
done
sum+=0
result=`echo "$sum" | /usr/bin/bc -l`
You're making it way too complicated. Something like
#!/usr/bin/env bash
trf=( "2 13 144" "3 21 256" "5 34 389" )
declare -i sum=0 # Integer attribute; arithmetic evaluation happens when assigned
for (( n = 0; n < ${#trf[#]}; n++)); do
read -r _ val _ <<<"${trf[n]}"
sum+=$val
done
printf "%d\n" "$sum"
in pure bash, or just use awk (This is handy if you have floating point numbers in your real data):
printf "%s\n" "${trf[#]}" | awk '{ sum += $2 } END { print sum }'
You can use printf to print the entire array, one entry per line. On such an input, one loop (while read) would be sufficient. You can even skip the loop entirely using cut and tr to build the bc command. The echo 0 is there so that bc can handle empty arrays and the trailing + inserted by tr.
{ printf %s\\n "${trf[#]}" | cut -d' ' -f2 | tr \\n +; echo 0; } | bc -l
For your examples this generates prints 68 (= 13+21+34+0).
Try this printf + awk combo:
$ printf '%s\n' "${trf[#]}" | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68
Oh, it's already suggested by Shawn. Then with loop:
$ for item in "${trf[#]}"; do
echo $item
done | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68
For relatively small arrays a for/while double loop should be ok re: performance; placing the final sum in the $result variable (as in OP's code):
result=0
for element in "${trf[#]}"
do
while read -r a b c
do
((result+=b))
done <<< "${element}"
done
echo "${result}"
This generates:
68
For larger data sets I'd probably opt for one of the awk-only solutions (for performance reasons).

Use bash variable as array in awk and filter input file by comparing with array

I have bash variable like this:
val="abc jkl pqr"
And I have a file that looks smth like this:
abc 4 5
abc 8 8
def 43 4
def 7 51
jkl 4 0
mno 32 2
mno 9 2
pqr 12 1
I want to throw away rows from file which first field isn't present in the val:
abc 4 5
abc 8 8
jkl 4 0
pqr 12 1
My solution in awk doesn't work at all and I don't have any idea why:
awk -v var="${val}" 'BEGIN{split(var, arr)}$1 in arr{print $0}' file
Just slice the variable into array indexes:
awk -v var="${val}" 'BEGIN{split(var, arr)
for (i in arr)
names[arr[i]]
}
$1 in names' file
As commented in the linked question, when you call split() you get values for the array, while what you want to set are indexes. The trick is to generate another array with this content.
As you see $1 in names suffices, you don't have to call for the action {print $0} when this happens, since it is the default.
As a one-liner:
$ awk -v var="${val}" 'BEGIN{split(var, arr); for (i in arr) names[arr[i]]} $1 in names' file
abc 4 5
abc 8 8
jkl 4 0
pqr 12 1
grep -E "$( echo "${val}"| sed 's/ /|/g' )" YourFile
# or
awk -v val="${val}" 'BEGIN{gsub(/ /, "|",val)} $1 ~ val' YourFile
Grep:
it use a regex (extended version with option -E) that filter all the lines that contains the value. The regex is build OnTheMove in a subshell with a sed that replace the space separator by a | meaning OR
Awk:
use the same princip as the grep but everything is made inside (so no subshell)
use the variable val assigned to the shell variable of the same name
At start of the script (before first line read) change the space, (in val) by | with BEGIN{gsub(/ /, "|",val)}
than, for every line where first field (default field separator is space/blank in awk, so first is the letter group) matching, print it (defaut action of a filter with $1 ~ val.

Output looped array data to separate columns in bash

I have three loops which process array data and print to the same log file. I would like to sort the output of each loop into columns which are separated by tabs using bash code:
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
Notice: 1 stands for the content of loop 1, 2 stands for the content of loop 2 and 3 stands for the content of loop 3.
declare -a Array1
declare -a Array2
declare -a Array3
for (( i = 0 ; i < 9 ; i++))
do
echo "${Array1[$i]}"
done | tee -a log.txt
for (( i = 0 ; i < 9 ; i++))
do
echo "(( ${Array1[$i]}-${Array2[$i]} ))" | bc
done | tee -a log.txt
for (( i = 0 ; i < 9 ; i++))
do
echo "${Array3[$i]}"
done | tee -a log.txt
I tried some stuff with the column command, but it doesn't work out as outlined above.
The simplest option may be to use a single loop.
An alternative is to take the output format that you've got already and convert it into columns. This is one way of doing it:
# Read the concatenated results into an array, $results
IFS=$'\n' read -d '' -r -a results < log.txt
# Print the concatenated results in columns
for (( i=0 ; i<9; i++ )) ; do
printf '%s\t%s\t%s\n' "${results[i]}" "${results[i+9]}" "${results[i+18]}"
done
If you don't need the log.txt file, you could just put the results into an array as you calculate them (using as many loops as you like) and print them afterwards.

Print duplicate entries in a file using linux commands

I have a file called foo.txt, which consists of:
abc
zaa
asd
dess
zaa
abc
aaa
zaa
I want the output to be stored in another file as:
this text abc appears 2 times
this text zaa appears 3 times
I have tried the following command, but this just writes duplicate entries and their number.
sort foo.txt | uniq --count --repeated > sample.txt
Example of output of above command:
abc 2
zaa 3
How do I add the line "this text appears x times" ?
Awk is your friend:
sort foo.txt | uniq --count --repeated | awk '{print($2" appears "$1" times")}'

Getting output of shell command in bash array

I have a uniq -c output, that outputs about 7-10 lines with the count of each pattern that was repeated for each unique line pattern. I want to store the output of my uniq -c file.txt into a bash array. Right now all I can do is store the output into a variable and print it. However, bash currently thinks the entire output is just one big string.
How does bash recognize delimiters? How do you store UNIX shell command output as Bash arrays?
Here is my current code:
proVar=`awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c`
echo $proVar
And current output I get:
587 chr1 578 chr2 359 chr3 412 chr4 495 chr5 362 chr6 287 chr7 408 chr8 285 chr9 287 chr10 305 chr11 446 chr12 247 chr13 307 chr14 308 chr15 365 chr16 342 chr17 245 chr18 252 chr19 210 chr20 193 chr21 173 chr22 145 chrX 58 chrY
Here is what I want:
proVar[1] = 2051
proVar[2] = 1243
proVar[3] = 1068
...
proVar[22] = 814
proVar[X] = 72
proVar[Y] = 13
In the long run, I'm hoping to make a barplot based on the counts for each index, where every 50 counts equals one "=" sign. It will hopefully look like the below
chr1 ===========
chr2 ===========
chr3 =======
chr4 =========
...
chrX ==
chrY =
Any help, guys?
To build the associative array, try this:
declare -A proVar
while read -r val key; do
proVar[${key#chr}]=$val
done < <(awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c)
Note: This assumes that your command's output is composed of multiple lines, each containing one key-value pair; the single-line output shown in your question comes from passing $proVar to echo without double quotes.
Uses a while loop to read each output line from a process substitution (<(...)).
The key for each assoc. array entry is formed by stripping prefix chr from each input line's first whitespace-separated token, whereas the value is the rest of the line (after the separating space).
To then create the bar plot, use:
while IFS= read -r key; do
echo "chr${key} $(printf '=%.s' $(seq $(( ${proVar[$key]} / 50 ))))"
done < <(printf '%s\n' "${!proVar[#]}" | sort -n)
Note: Using sort -n to sort the keys will put non-numeric keys such as X and Y before numeric ones in the output.
$(( ${proVar[$key]} / 50 )) calculates the number of = chars. to display, using integer division in an arithmetic expansion.
The purpose of $(seq ...) is to simply create as many tokens (arguments) as = chars. should be displayed (the tokens created are numbers, but their content doesn't matter).
printf '=%.s' ... is a trick that effectively prints as many = chars. as there are arguments following the format string.
printf '%s\n' "${!proVar[#]}" | sort -n sorts the keys of the assoc. array numerically, and its output is fed via a process substitution to the while loop, which therefore iterates over the keys in sorted order.
You can create an array in an assignment using parentheses:
proVar=(`awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c`)
There's no built-in way to create an associative array directly from input. For that you'll need an additional loop.

Resources