I'm trying to write a loop that pulls sequencing metrics from column 2 of a txt file (ending in full_results.txt) and writes everything into a combined tsv (ideally with headers as well, but I haven't been able to do that).
The below makes a tsv with all of the columns I want--but stopped looping. The loop was working before I added the last two columns, so I'm not sure if adding the fields with arithmetic changed anything. Let me know if there is a cleaner way to write this! Thanks for your input.
{
for i in *full_results.txt;
do
[ -d "$i" ] && continue
[ -s "$1" ] && continue
total=$(awk ' $1=="NORMALIZED_READ_COUNT" { print $2 } ' $i)
trimmed=$(awk ' $1=="RAW_FRAGMENT_TOTAL" { print $2 } ' $i)
aligned=$(awk ' $1=="RAW_FRAGMENT_FILTERED" { print $2 } ' $i)
molbin=$(awk ' $1=="AVERAGE_UNIQUE_DNA_READS_PER_GSP2" { print $2 } ' $i)
startsite=$(awk ' $1=="AVERAGE_UNIQUE_DNA_START_SITES_PER_GSP2" { print $2 } ' $i)
dedup=$(awk ' $1=="ON_TARGET_DEDUPLICATION_RATIO" { print $2 } ' $i)
printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n" "$i" $total $trimmed $aligned $molbin $startsite $dedup $((total - trimmed)) "$(( ( (total - trimmed) * 100) / total))"
done;
} > sequencing_metrics.tsv
Output:
NR02_31_S31_merged_R1_001_full_results.txt 7095319 6207119 6206544 1224.43 391.65 2.74:1 888200 12
Intended output: the same as above but looped for all files in the folder
Related
I have a data set (test-file.csv) with tree columns:
node,contact,mail
AAAA,Peter,peter#anything.com
BBBB,Hans,hans#anything.com
CCCC,Dieter,dieter#anything.com
ABABA,Peter,peter#anything.com
CCDDA,Hans,hans#anything.com
I like to extend the header by the column count and rename node to nodes.
Furthermore all entries should be sorted after the second column (mail).
In the column count I like to get the number of occurences of the column mail,
in nodes all the entries having the same value in the column mail should be printed (space separated and alphabetically sorted).
This is what I try to achieve:
contact,mail,count,nodes
Dieter,dieter#anything,com,1,CCCC
Hans,hans#anything.com,2,BBBB CCDDA
Peter,peter#anything,com,2,AAAA ABABA
I have this awk-command:
awk -F"," '
BEGIN{
FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
counts[$3]++; # Increment count of lines.
contact[$2]; # contact
}
END {
# Iterate over all third-column values.
for (x in counts) {
printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
}
}
' test-file.csv | sort --field-separator="," --key=2 -n
However this is my result :-(
Nothing but the amount of occurences work.
,Dieter#anything.com,1,nodes
,hans#anything.com,2,nodes
,peter#anything.com,2,nodes
contact,mail,count,nodes
Any help appreciated!
You may use this gnu awk:
awk '
BEGIN {
FS = OFS = ","
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR > 1 {
++counts[$3] # Increment count of lines.
name[$3] = $2
map[$3] = ($3 in map ? map[$3] " " : "") $1
}
END {
# Iterate over all third-column values.
PROCINFO["sorted_in"]="#ind_str_asc";
for (k in counts)
print name[k], k, counts[k], map[k]
}
' test-file.csv
Output:
contact,mail,count,nodes
Dieter,dieter#anything.com,1,CCCC
Hans,hans#anything.com,2,BBBB CCDDA
Peter,peter#anything.com,2,AAAA ABABA
With your shown samples please try following. Written and tested in GNU awk.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2
mail[nf]=$NF
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,mail[i],arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ FS=OFS="," } ##In BEGIN section setting FS, OFS as comma here.
FNR==1{ ##if this is first line then do following.
sub(/^[^,]*,/,"") ##Substituting everything till 1st comma here with NULL in current line.
$1=$1 ##Reassigning 1st field to itself.
print $0,"count,nodes" ##Printing headers as per need to terminal.
}
FNR>1{ ##If line is Greater than 1st line then do following.
nf=$2 ##Creating nf with 2nd field value here.
mail[nf]=$NF ##Creating mail with nf as index and value is last field value.
NF-- ##Decreasing value of current number of fields by 1 here.
arr[nf]++ ##Creating arr with index of nf and keep increasing its value with 1 here.
val[nf]=(val[nf]?val[nf] " ":"")$1 ##Creating val with index of nf and keep adding $1 value in it.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through arr in here.
print i,mail[i],arr[i],val[i] | "sort -t, -k1" ##printing values to get expected output and sorting it also by pipe here as per requirement.
}
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you want to sort by 2nd and 3rd fields then try following.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2 OFS $3
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file
I have a coma delimited file where some values can be missing like
1,f,12,f,t,18
2,t,17,t, ,17
3,t,15, ,f,16
I want to change some of the columns to numeric; f to 0 and t to 1. Here, I want to change only columns 2 and 5 and don't want to change column 4. I my result file should look like
1,0,12,f,1,18
2,1,17,t, ,17
3,1,15, ,0,16
I can use the statement
awk -F, -v OFS=',' '{ if ( $2 ~ /t/ ) { $2 = 1 } else if ( $2 ~ /f/ ) { $2 = 0 }; print}' test.csv
To change individual columns
I can also use a loop like
awk -F, -v OFS=',' 'BEGIN {
IFS = OFS = ","
}
{
for (column = 1; column <= 4; ++column) {
if ($column ~ /t/) {
$column = 1
}
else if($column ~ /f/) {
$column = 0
}
}
print
}
' test.csv
to replace multiple columns if they are together. How do I change the for loop to specify only the specific columns? I know there is a for each loop to do the same but I couldn't get it to work. Also how can I assign multiple variables to the array in a single statement like
a =[1, 2, 3, 4]
You can use this awk:
awk 'BEGIN{ FS=OFS=","; a[2]; a[5] }
{ for (i in a) if ($i=="f") $i=0; else if ($i=="t") $i=1 } 1' file
1,0,12,f,1,18
2,1,17,t, ,17
3,1,15, ,0,16
I have a text file having contents in format...
File=/opt/mgtservices/probes/logs is_SS_File=no is_Log=yes Output_File=probes_logs
This can have around 1k records. I am reading line by line from a file.
while read -r line
do
if [ $SS_SERVER -eq 0 ]
then
arr=$(echo "$line" | tr ' =' "\n")
echo $arr[1]
#do something
elif [[ $SS_SERVER -eq 1 && "$line" =~ "is_SS_File=\"no\"" ]]
then
#do something else
fi
done < "$filename"
I am expecting that arr should be an array, so that I can get output as:
arr[1]=File
arr[2]=/opt/mgtservices/probes/logs
arr[3]=is_SS_File
and so on...
Which I am not getting here. arr[1] is giving me complete line without "="
I want to use 2 delimiters "space" and "=".
Depending on what you are trying to accomplish try this:
tr ' =' '\n ' <"$file" |
while read keyword value; do
: you get one keyword and its value at a time now
done
or maybe
while IFS=' =' read -a arr; do
: arr[0] is first keyword
: arr[1] is its value
: arr[2] is second keyword
: etc
done <"$file"
I have a simple section in a PowerShell script that goes through each array in a list and grabs the data (found at [3] of the current array), using it to determine if another part of the array (found at [0]) should be added to the end of a string.
$String = "There is"
$Objects | Foreach-Object{
if ($_[3] -match "YES")
{$String += ", a " + $_[0]}
}
This works fine and dandy, resulting in a $String of something like
"There is, a car, a airplane, a truck"
But unfortunately this doesn't really make sense grammatically for what I want. I'm aware that I could either fix the string after it's been created, or include lines in the foreach/if statement that determine which characters to add. This would need to be:
$String += " a " + $_[0] - for the first match.
$String += ", a " + $_[0] - for following matches.
$String += " and a " + $_[0] + " here." - for the last match.
Furthermore I need to determine whether to use " a " if $_[0] starts with a consonant, or " an " if $_[0] starts with a vowel. All in all, I'd like the output to be
"There is a car, an airplane and a truck here."
Thanks!
Try something like this:
$vehicles = $Objects | ? { $_[3] -match 'yes' } | % { $_[0] }
$String = 'There is'
for ($i = 0; $i -lt $vehicles.Length; $i++) {
switch ($i) {
0 { $String += ' a' }
($vehicles.Length-1) { $String += ' and a' }
default { $String += ', a' }
}
if ($vehicles[$i] -match '^[aeiou]') { $String += 'n' }
$String += ' ' + $vehicles[$i]
}
$String += ' here.'
This is the exact code I am running in my system with sh lookup.sh. I don't see any details within nawk block printed or written to the file abc.txt. Only I am here 0 and I am here 1 are printed. Even the printf in nawk is not working. Please help.
processbody() {
nawk '
NR == FNR {
split($0, x, "#")
country_code[x[2]] = x[1]
next
system(" echo " I am here ">>/tmp/abc.txt")
}
{
CITIZEN_COUNTRY_NAME = "INDIA"
system(" echo " I am here 1">>/tmp/abc.txt")
if (CITIZEN_COUNTRY_NAME in country_code) {
value = country_code[CITIZEN_COUNTRY_NAME]
system(" echo " I am here 2">>/tmp/abc.txt")
} else {
value = "null"
system(" echo " I am here 3">>/tmp/abc.txt")
}
system(" echo " I am here 4">>/tmp/abc.txt")
print "found " value " for country name " CITIZEN_COUNTRY_NAME >> "/tmp/standalone.txt"
} ' /tmp/country_codes.config
echo "I am here 5" >> /tmp/abc.txt
}
# Main program starts here
echo "I am here 0" >> /tmp/abc.txt
processbody
And my country_codes.config file:
$ cat country_codes.config
IND#INDIA
IND#INDIB
USA#USA
CAN#CANADA
That's some pretty interesting awk code. The problem is that your first condition, the NR == FNR one, is active for each record read from the first file - the country_codes.config file, but the processing action contains next so after it reads a record and splits it and saves it, it goes and reads the next record - not executing the second block of the awk script. At the end, it is done - nothing more to do, so it never prints anything.
This works sanely:
processbody()
{
awk '
{
split($0, x, "#")
country_code[x[2]] = x[1]
#next
}
END {
CITIZEN_COUNTRY_NAME = "INDIA"
if (CITIZEN_COUNTRY_NAME in country_code) {
value = country_code[CITIZEN_COUNTRY_NAME]
} else {
value = "null"
}
print "found " value " for country name " CITIZEN_COUNTRY_NAME
} ' /tmp/country_codes.config
}
# Main program starts here
processbody
It produces the output:
found IND for country name INDIA
As Hai Vu notes, you can use awk's intrinsic record splitting facilities to simplify life:
processbody()
{
awk -F# '
{ country_code[$2] = $1 }
END {
CITIZEN_COUNTRY_NAME = "INDIA"
if (CITIZEN_COUNTRY_NAME in country_code) {
value = country_code[CITIZEN_COUNTRY_NAME]
} else {
value = "null"
}
print "found " value " for country name " CITIZEN_COUNTRY_NAME
} ' /tmp/country_codes.config
}
# Main program starts here
processbody
I don't know what you want to accomplish, but let me guess: if country is INDIA, then print the following output:
found IND for country name INDIA
If that is the case, the following code will accomplish that goal:
awk -F# '/INDIA/ {print "found " $1 " for country name " $2 }' /tmp/country_codes.config
The -F# flag tells awk (or nawk) to use # as the field separator.
#user549432 I think that you want one awk script that first reads in the country codes file and builds the associative array, and then reads in the input files (not # delimited) and does a substitution?
if so, let's assume that /tmp/country_codes.config has:
IND#INDIA
IND#INDIB
USA#USA
CAN#CANADA
and /tmp/input_file (not # delimited) has:
I am from INDIA
I am from INDIB
I am from CANADA
Then, we can have a nawk script like this:
nawk '
BEGIN {
while (getline < "/tmp/country_codes.config")
{
split($0,x,"#")
country_code[x[2]] = x[1]
}
}
{ print $1,$2,$3,country_code[$4]}
' /tmp/input_file
The output will be:
I am from IND
I am from IND
I am from CAN