Bash Indented Output for Multiple Variables - arrays

I have a script that loops over every text file in a directory, and stores the content in variables. The content can be anywhere from 1-50 characters long. The amount of text files is unknown. I would like to print the content in such a way that each variable falls into a clean column.
for file in $LIBPATH/*.txt; do
name=$( awk 'FNR == 1 {print $0}' $file )
height=$( awk 'FNR == 2 {print $0}' $file )
weight=$( awk 'FNR == 3 {print $0}' $file )
echo $name $height $weight
done
This code produces the output:
Avril Stewart 99 54
Sally Kinghorn 170 60
John Young 195 120
While the desired output is:
Avril Stewart 99 54
Sally Kinghorn 170 60
John Young 195 120
Thanks!

Use printf:
printf '%-20s %3s %3s\n' "$name" "$height" "$weight"
%3s ensures that all fields use three characters, %-20s does the same for 20 characters, but the - in front makes the output left-aligned.
If you want to limit the output to e.g. 20 characters, you can use
printf '%-20.20s %3s %3s\n' "$name" "$height" "$weight"
This will give you a left aligned minimum width of 20 characters and a maximum width of 20 characters, in other words it will ensure that you always have exactly 20 characters.

Related

Computing sum of specific field from array entries

I have an array trf. Would like to compute the sum of the second element in each array entry.
Example of array contents
trf=( "2 13 144" "3 21 256" "5 34 389" )
Here is the current implementation, but I do not find it robust enough. For instance, it fails with arbitrary number of elements (but considered constant from one array element to another) in each array entry.
cnt=0
m=${#trf[#]}
while (( cnt < m )); do
while read -r one two three
do
sum+="$two"+
done <<< $(echo ${array[$count]})
let count=$count+1
done
sum+=0
result=`echo "$sum" | /usr/bin/bc -l`
You're making it way too complicated. Something like
#!/usr/bin/env bash
trf=( "2 13 144" "3 21 256" "5 34 389" )
declare -i sum=0 # Integer attribute; arithmetic evaluation happens when assigned
for (( n = 0; n < ${#trf[#]}; n++)); do
read -r _ val _ <<<"${trf[n]}"
sum+=$val
done
printf "%d\n" "$sum"
in pure bash, or just use awk (This is handy if you have floating point numbers in your real data):
printf "%s\n" "${trf[#]}" | awk '{ sum += $2 } END { print sum }'
You can use printf to print the entire array, one entry per line. On such an input, one loop (while read) would be sufficient. You can even skip the loop entirely using cut and tr to build the bc command. The echo 0 is there so that bc can handle empty arrays and the trailing + inserted by tr.
{ printf %s\\n "${trf[#]}" | cut -d' ' -f2 | tr \\n +; echo 0; } | bc -l
For your examples this generates prints 68 (= 13+21+34+0).
Try this printf + awk combo:
$ printf '%s\n' "${trf[#]}" | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68
Oh, it's already suggested by Shawn. Then with loop:
$ for item in "${trf[#]}"; do
echo $item
done | awk '{print $2}{a+=$2}END{print "sum:", a}'
13
21
34
sum: 68
For relatively small arrays a for/while double loop should be ok re: performance; placing the final sum in the $result variable (as in OP's code):
result=0
for element in "${trf[#]}"
do
while read -r a b c
do
((result+=b))
done <<< "${element}"
done
echo "${result}"
This generates:
68
For larger data sets I'd probably opt for one of the awk-only solutions (for performance reasons).

How to compare 2 files and returning matching values with awk [duplicate]

I want to keep only the lines in results.txt that matched the IDs in uniq.txt based on matches in column 3 of results.txt. Usually I would use grep -f uniq.txt results.txt, but this does not specify column 3.
uniq.txt
9606
234831
131
31313
results.txt
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches
A00260:70:HJM2YDSXX:4:1111:15519:16720 NC_000011.10 9606 169 0 28 151 1
A00260:70:HJM2YDSXX:3:1536:9805:14841 NW_021160017.1 9606 81 0 24 151 1
A00260:70:HJM2YDSXX:3:1366:27181:24330 NC_014803.1 234831 121 121 26 151 3
A00260:70:HJM2YDSXX:3:1366:27181:24330 NC_014973.1 443143 121 121 26 151 3
With your shown samples, please try following code.
awk 'FNR==NR{arr[$0];next} ($3 in arr)' uniq.txt results.txt
Explanation:
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition which will be TRUE when uniq.txt is being read.
arr[$0] ##Creating arrar with index of current line.
next ##next will skip all further statements from here.
}
($3 in arr) ##If 3rd field is present in arr then print line from results.txt here.
' uniq.txt results.txt ##Mentioning Input_file names here.
2nd solution: In case your field number is not set in results.txt and you want to search values in whole line then try following.
awk 'FNR==NR{arr[$0];next} {for(key in arr){if(index($0,key)){print;next}}}' uniq.txt results.txt
You can use grep in combination with sed to manipulate the input patterns and achieve what you're looking for
grep -Ef <(sed -e 's/^/^(\\S+\\s+){2}/;s/$/\\s*/' uniq.txt) result.txt
If you want to match nth column, replace 2 in above command with n-1
outputs
A00260:70:HJM2YDSXX:4:1111:15519:16720 NC_000011.10 9606 169 0 28 151 1
A00260:70:HJM2YDSXX:3:1536:9805:14841 NW_021160017.1 9606 81 0 24 151 1
A00260:70:HJM2YDSXX:3:1366:27181:24330 NC_014803.1 234831 121 121 26 151 3

Picking input record fields with AWK

Let's say we have a shell variable $x containing a space separated list of numbers from 1 to 30:
$ x=$(for i in {1..30}; do echo -n "$i "; done)
$ echo $x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
We can print the first three input record fields with AWK like this:
$ echo $x | awk '{print $1 " " $2 " " $3}'
1 2 3
How can we print all the fields starting from the Nth field with AWK? E.g.
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
EDIT: I can use cut, sed etc. to do the same but in this case I'd like to know how to do this with AWK.
Converting my comment to answer so that solution is easy to find for future visitors.
You may use this awk:
awk '{for (i=3; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
or pass start position as argument:
awk -v n=3 '{for (i=n; i<=NF; ++i) printf "%s", $i (i<NF?OFS:ORS)}' file
Version 4: Shortest is probably using sub to cut off the first three fields and their separators:
$ echo $x | awk 'sub(/^ *([^ ]+ +){3}/,"")'
Output:
4 5 6 7 8 9 ...
This will, however, preserve all space after $4:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"")'
4 5
so if you wanted the space squeezed, you'd need to, for example:
$ echo "1 2 3 4 5" | awk 'sub(/^ *([^ ]+ +){3}/,"") && $1=$1'
4 5
with the exception that if there are only 4 fields and the 4th field happens to be a 0:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"")&&$1=$1'
$ [no output]
in which case you'd need to:
$ echo "1 2 3 0" | awk 'sub(/^ *([^ ]+ +){3}/,"") && ($1=$1) || 1'
0
Version 1: cut is better suited for the job:
$ cut -d\ -f 4- <<<$x
Version 2: Using awk you could:
$ echo -n $x | awk -v RS=\ -v ORS=\ 'NR>=4;END{printf "\n"}'
Version 3: If you want to preserve those varying amounts of space, using GNU awk you could use split's fourth parameter seps:
$ echo "1 2 3 4 5 6 7" |
gawk '{
n=split($0,a,FS,seps) # actual separators goes to seps
for(i=4;i<=n;i++) # loop from 4th
printf "%s%s",a[i],(i==n?RS:seps[i]) # get fields from arrays
}'
Adding one more approach to add all value into a variable and once all fields values are done with reading just print the value of variable. Change the value of n= as per from which field onwards you want to get the data.
echo "$x" |
awk -v n=3 '{val="";for(i=n; i<=NF; i++){val=(val?val OFS:"")$i};print val}'
With GNU awk, you can use the join function which has been a built-in include since gawk 4.1:
x=$(seq 30 | tr '\n' ' ')
echo "$x" | gawk '#include "join"
{split($0, arr)
print join(arr, 4, length(arr), "|")}
'
4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30
(Shown here with a '|' instead of a ' ' for clarity...)
Alternative way of including join:
echo "$x" | gawk -i join '{split($0, arr); print join(arr, 4, length(arr), "|")}'
Using gnu awk and gensub:
echo $x | awk '{ print gensub(/^([[:digit:]]+[[:space:]]){3}(.*$)/,"\\2",$0)}'
Using gensub, split the string into two sections based on regular expressions and print the second section only.

Getting output of shell command in bash array

I have a uniq -c output, that outputs about 7-10 lines with the count of each pattern that was repeated for each unique line pattern. I want to store the output of my uniq -c file.txt into a bash array. Right now all I can do is store the output into a variable and print it. However, bash currently thinks the entire output is just one big string.
How does bash recognize delimiters? How do you store UNIX shell command output as Bash arrays?
Here is my current code:
proVar=`awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c`
echo $proVar
And current output I get:
587 chr1 578 chr2 359 chr3 412 chr4 495 chr5 362 chr6 287 chr7 408 chr8 285 chr9 287 chr10 305 chr11 446 chr12 247 chr13 307 chr14 308 chr15 365 chr16 342 chr17 245 chr18 252 chr19 210 chr20 193 chr21 173 chr22 145 chrX 58 chrY
Here is what I want:
proVar[1] = 2051
proVar[2] = 1243
proVar[3] = 1068
...
proVar[22] = 814
proVar[X] = 72
proVar[Y] = 13
In the long run, I'm hoping to make a barplot based on the counts for each index, where every 50 counts equals one "=" sign. It will hopefully look like the below
chr1 ===========
chr2 ===========
chr3 =======
chr4 =========
...
chrX ==
chrY =
Any help, guys?
To build the associative array, try this:
declare -A proVar
while read -r val key; do
proVar[${key#chr}]=$val
done < <(awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c)
Note: This assumes that your command's output is composed of multiple lines, each containing one key-value pair; the single-line output shown in your question comes from passing $proVar to echo without double quotes.
Uses a while loop to read each output line from a process substitution (<(...)).
The key for each assoc. array entry is formed by stripping prefix chr from each input line's first whitespace-separated token, whereas the value is the rest of the line (after the separating space).
To then create the bar plot, use:
while IFS= read -r key; do
echo "chr${key} $(printf '=%.s' $(seq $(( ${proVar[$key]} / 50 ))))"
done < <(printf '%s\n' "${!proVar[#]}" | sort -n)
Note: Using sort -n to sort the keys will put non-numeric keys such as X and Y before numeric ones in the output.
$(( ${proVar[$key]} / 50 )) calculates the number of = chars. to display, using integer division in an arithmetic expansion.
The purpose of $(seq ...) is to simply create as many tokens (arguments) as = chars. should be displayed (the tokens created are numbers, but their content doesn't matter).
printf '=%.s' ... is a trick that effectively prints as many = chars. as there are arguments following the format string.
printf '%s\n' "${!proVar[#]}" | sort -n sorts the keys of the assoc. array numerically, and its output is fed via a process substitution to the while loop, which therefore iterates over the keys in sorted order.
You can create an array in an assignment using parentheses:
proVar=(`awk '{printf ("%s\t\n"), $1}' file.txt | grep -P 'pattern' | uniq -c`)
There's no built-in way to create an associative array directly from input. For that you'll need an additional loop.

grep Trim txt file by certain line number

I have a txt file containing, let's say, 1000 lines. I would like to trim it obtaining a file with 100 lines, composed by lines 0, 10, 20, 30, etc of the original file.
Is that possible with grep or something? thanks
it could be easily done by awk/sed one-liner:
awk
awk '!(NR%10)' file
sed
sed -n '0~10p' file
or
sed '0~10!d` file
see below example: (sed one liner will give same output)
print the first 10 lines:
kent$ seq 1000|awk '!(NR%10)'|head -10
10
20
30
40
50
60
70
80
90
100
total lines:
kent$ seq 1000|awk '!(NR%10)'|wc -l
100

Resources