Unix Awk array not printing values - arrays

This is the exact code I am running in my system with sh lookup.sh. I don't see any details within nawk block printed or written to the file abc.txt. Only I am here 0 and I am here 1 are printed. Even the printf in nawk is not working. Please help.
processbody() {
nawk '
NR == FNR {
split($0, x, "#")
country_code[x[2]] = x[1]
next
system(" echo " I am here ">>/tmp/abc.txt")
}
{
CITIZEN_COUNTRY_NAME = "INDIA"
system(" echo " I am here 1">>/tmp/abc.txt")
if (CITIZEN_COUNTRY_NAME in country_code) {
value = country_code[CITIZEN_COUNTRY_NAME]
system(" echo " I am here 2">>/tmp/abc.txt")
} else {
value = "null"
system(" echo " I am here 3">>/tmp/abc.txt")
}
system(" echo " I am here 4">>/tmp/abc.txt")
print "found " value " for country name " CITIZEN_COUNTRY_NAME >> "/tmp/standalone.txt"
} ' /tmp/country_codes.config
echo "I am here 5" >> /tmp/abc.txt
}
# Main program starts here
echo "I am here 0" >> /tmp/abc.txt
processbody
And my country_codes.config file:
$ cat country_codes.config
IND#INDIA
IND#INDIB
USA#USA
CAN#CANADA

That's some pretty interesting awk code. The problem is that your first condition, the NR == FNR one, is active for each record read from the first file - the country_codes.config file, but the processing action contains next so after it reads a record and splits it and saves it, it goes and reads the next record - not executing the second block of the awk script. At the end, it is done - nothing more to do, so it never prints anything.
This works sanely:
processbody()
{
awk '
{
split($0, x, "#")
country_code[x[2]] = x[1]
#next
}
END {
CITIZEN_COUNTRY_NAME = "INDIA"
if (CITIZEN_COUNTRY_NAME in country_code) {
value = country_code[CITIZEN_COUNTRY_NAME]
} else {
value = "null"
}
print "found " value " for country name " CITIZEN_COUNTRY_NAME
} ' /tmp/country_codes.config
}
# Main program starts here
processbody
It produces the output:
found IND for country name INDIA
As Hai Vu notes, you can use awk's intrinsic record splitting facilities to simplify life:
processbody()
{
awk -F# '
{ country_code[$2] = $1 }
END {
CITIZEN_COUNTRY_NAME = "INDIA"
if (CITIZEN_COUNTRY_NAME in country_code) {
value = country_code[CITIZEN_COUNTRY_NAME]
} else {
value = "null"
}
print "found " value " for country name " CITIZEN_COUNTRY_NAME
} ' /tmp/country_codes.config
}
# Main program starts here
processbody

I don't know what you want to accomplish, but let me guess: if country is INDIA, then print the following output:
found IND for country name INDIA
If that is the case, the following code will accomplish that goal:
awk -F# '/INDIA/ {print "found " $1 " for country name " $2 }' /tmp/country_codes.config
The -F# flag tells awk (or nawk) to use # as the field separator.

#user549432 I think that you want one awk script that first reads in the country codes file and builds the associative array, and then reads in the input files (not # delimited) and does a substitution?
if so, let's assume that /tmp/country_codes.config has:
IND#INDIA
IND#INDIB
USA#USA
CAN#CANADA
and /tmp/input_file (not # delimited) has:
I am from INDIA
I am from INDIB
I am from CANADA
Then, we can have a nawk script like this:
nawk '
BEGIN {
while (getline < "/tmp/country_codes.config")
{
split($0,x,"#")
country_code[x[2]] = x[1]
}
}
{ print $1,$2,$3,country_code[$4]}
' /tmp/input_file
The output will be:
I am from IND
I am from IND
I am from CAN

Related

Troubleshoot for loop with arithmetic in shell

I'm trying to write a loop that pulls sequencing metrics from column 2 of a txt file (ending in full_results.txt) and writes everything into a combined tsv (ideally with headers as well, but I haven't been able to do that).
The below makes a tsv with all of the columns I want--but stopped looping. The loop was working before I added the last two columns, so I'm not sure if adding the fields with arithmetic changed anything. Let me know if there is a cleaner way to write this! Thanks for your input.
{
for i in *full_results.txt;
do
[ -d "$i" ] && continue
[ -s "$1" ] && continue
total=$(awk ' $1=="NORMALIZED_READ_COUNT" { print $2 } ' $i)
trimmed=$(awk ' $1=="RAW_FRAGMENT_TOTAL" { print $2 } ' $i)
aligned=$(awk ' $1=="RAW_FRAGMENT_FILTERED" { print $2 } ' $i)
molbin=$(awk ' $1=="AVERAGE_UNIQUE_DNA_READS_PER_GSP2" { print $2 } ' $i)
startsite=$(awk ' $1=="AVERAGE_UNIQUE_DNA_START_SITES_PER_GSP2" { print $2 } ' $i)
dedup=$(awk ' $1=="ON_TARGET_DEDUPLICATION_RATIO" { print $2 } ' $i)
printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n" "$i" $total $trimmed $aligned $molbin $startsite $dedup $((total - trimmed)) "$(( ( (total - trimmed) * 100) / total))"
done;
} > sequencing_metrics.tsv
Output:
NR02_31_S31_merged_R1_001_full_results.txt 7095319 6207119 6206544 1224.43 391.65 2.74:1 888200 12
Intended output: the same as above but looped for all files in the folder

Getting all values of various rows which have the same value in one column with awk

I have a data set (test-file.csv) with tree columns:
node,contact,mail
AAAA,Peter,peter#anything.com
BBBB,Hans,hans#anything.com
CCCC,Dieter,dieter#anything.com
ABABA,Peter,peter#anything.com
CCDDA,Hans,hans#anything.com
I like to extend the header by the column count and rename node to nodes.
Furthermore all entries should be sorted after the second column (mail).
In the column count I like to get the number of occurences of the column mail,
in nodes all the entries having the same value in the column mail should be printed (space separated and alphabetically sorted).
This is what I try to achieve:
contact,mail,count,nodes
Dieter,dieter#anything,com,1,CCCC
Hans,hans#anything.com,2,BBBB CCDDA
Peter,peter#anything,com,2,AAAA ABABA
I have this awk-command:
awk -F"," '
BEGIN{
FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
counts[$3]++; # Increment count of lines.
contact[$2]; # contact
}
END {
# Iterate over all third-column values.
for (x in counts) {
printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
}
}
' test-file.csv | sort --field-separator="," --key=2 -n
However this is my result :-(
Nothing but the amount of occurences work.
,Dieter#anything.com,1,nodes
,hans#anything.com,2,nodes
,peter#anything.com,2,nodes
contact,mail,count,nodes
Any help appreciated!
You may use this gnu awk:
awk '
BEGIN {
FS = OFS = ","
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR > 1 {
++counts[$3] # Increment count of lines.
name[$3] = $2
map[$3] = ($3 in map ? map[$3] " " : "") $1
}
END {
# Iterate over all third-column values.
PROCINFO["sorted_in"]="#ind_str_asc";
for (k in counts)
print name[k], k, counts[k], map[k]
}
' test-file.csv
Output:
contact,mail,count,nodes
Dieter,dieter#anything.com,1,CCCC
Hans,hans#anything.com,2,BBBB CCDDA
Peter,peter#anything.com,2,AAAA ABABA
With your shown samples please try following. Written and tested in GNU awk.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2
mail[nf]=$NF
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,mail[i],arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ FS=OFS="," } ##In BEGIN section setting FS, OFS as comma here.
FNR==1{ ##if this is first line then do following.
sub(/^[^,]*,/,"") ##Substituting everything till 1st comma here with NULL in current line.
$1=$1 ##Reassigning 1st field to itself.
print $0,"count,nodes" ##Printing headers as per need to terminal.
}
FNR>1{ ##If line is Greater than 1st line then do following.
nf=$2 ##Creating nf with 2nd field value here.
mail[nf]=$NF ##Creating mail with nf as index and value is last field value.
NF-- ##Decreasing value of current number of fields by 1 here.
arr[nf]++ ##Creating arr with index of nf and keep increasing its value with 1 here.
val[nf]=(val[nf]?val[nf] " ":"")$1 ##Creating val with index of nf and keep adding $1 value in it.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through arr in here.
print i,mail[i],arr[i],val[i] | "sort -t, -k1" ##printing values to get expected output and sorting it also by pipe here as per requirement.
}
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you want to sort by 2nd and 3rd fields then try following.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2 OFS $3
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file

Compare two files with awk and map congregated multiple values

I have two files I need to compare and map a value to which multiple rows match.
My mapping file (map.csv) looks like:
id,name
123,Hans
123,Britta
232,Peter
343,Siggi
343,Horst
The data file (data.csv) is
contact,id,names
m#a.de,123,
ad#23.com,343,
adf#er.org,123,
af#go.er,232,
llk#fh.com,343,
ad#wer.org,789,
The disired output should look like this
contact,id,names
m#a.de,123,Hans Britta
ad#23.com,343,Siggi Horst
adf#er.org,123,Hans Britta
af#go.er,232,Peter
llk#fh.com,343,Siggi Horst
ad#wer.org,789,NO ENTRY
There are multiple values for one ID in the mapping-file and they should be printed space-separated into the column names of the data-file. If there is no ID in the mapping file "NO ENTRY" should be printed instead.
This is the awk-command
awk 'NR==FNR{a[$1];next}{print $0,($2 in a)? a[$2]:"NO ENTRY"}' map.csv data.csv
I clearly fail because I do not know how to loop through the mapping file for getting multiple values to one id (or currently any value at all).
With your shown samples please try following.
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
arr[$1]=(arr[$1]?arr[$1] " ":"")$2
next
}
FNR==1{
print
next
}
{
sub(/,$/,"")
print $0,($2 in arr)?arr[$2]:"NO ENTRY"
}
' map.csv data.csv
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="," ##Setting FS and OFS as comma here.
}
FNR==NR{ ##Checking condition which will be TRUE when map.csv is being read.
arr[$1]=(arr[$1]?arr[$1] " ":"")$2 ##Creating arr with index of $1 and which has value of $2 and keep concatenating its value with same index.
next ##next will skip all further statements from here.
}
FNR==1{ ##Checking condition if this is first line of data.csv then do following.
print ##Printing current line here.
next ##next will skip all further statements from here.
}
{
sub(/,$/,"") ##Substituting last comma with NULL here.
print $0,($2 in arr)?arr[$2]:"NO ENTRY" ##Printing current line and printing either value of arr with index of $2 OR printing NO ENTRY as per requirement.
}
' map.csv data.csv ##Mentioning Input_file names here.
You can use two rules in your case. One to capture the data from map.csv and then a second rule to output the results, e.g.
(edit -- updated to match 1st row of output exactly)
awk -F, '
NR==FNR { if (FNR > 1) a[$1]=a[$1]" "$2; next }
FNR==1 { print; next }
{ printf "%s,%s,%s\n", $1, $2, a[$2]?a[$2]:"NO ENTRY" }
' map.csv data.csv
The first rule is qualified by NR=FNR (current record number equal to the file record number -- e.g. the first file). The second rule is only run on the second file and outputs the heading row unchanged before outputting the aggregated data.
Example Use/Output
You can simply select-copy and middle-mouse-paste the command above into an xterm with the current directory holding map.csv and data.csv which results in the following:
$ awk -F, '
> NR==FNR { if (FNR > 1) a[$1]=a[$1]" "$2; next }
> FNR==1 { print; next }
> { printf "%s,%s,%s\n", $1, $2, a[$2]?a[$2]:"NO ENTRY" }
> ' map.csv data.csv
contact,id,names
m#a.de,123, Hans Britta
ad#23.com,343, Siggi Horst
adf#er.org,123, Hans Britta
af#go.er,232, Peter
llk#fh.com,343, Siggi Horst
ad#wer.org,789,NO ENTRY
Alternative
An alternative that does the exact same thing, but simplifies (slightly) by explicitly setting OFS="," before output begins allowing the use of print instead of printf would be:
awk -F, '
NR==FNR { if (FNR > 1) a[$1]=a[$1]" "$2; next }
FNR==1 { OFS=","; print; next }
{ print $1, $2, a[$2]?a[$2]:"NO ENTRY" }
' map.csv data.csv
(same output)

AWK searching records in one file for entries in another file

I have a results.csv file that contains names in the following layout:
name1, 2(random number)
name5, 3
and a sample.txt, that is structured in the following
record_seperator
name1
foo
bar
record_seperator
name2
bla
bluh
I would like to seach for each name in results.csv in the sample.txt file and if it is found output the record into a file.
I tried to generate an array out of the first file and search for that, but I couldn't get the syntax right.
It needs to run in a bash script. If anyone has a better idea than awk, that is also good, but I do not have admin rights on the machine it is supposed to run.
The true csv file contains 10.000 names and the sample.txt 4.5 million records.
I am a bloody beginner in awk, so explanation would be much appreciated.
This is my current try, which does not work and I don't know why:
#!/bin/bash
awk 'BEGIN{
while (getline < "results.csv")
{
split($0,name,",");
nameArr[k]=name[1];
}
{
RS="record_seperator"
FS="\n"
for (key in nameArr)
{
print nameArr[key]
print $2
if ($2==nameArr[key])
NR > 1
{
#extract file by Record separator and name from line2
print RS $0 > $2 ".txt"
}
}
}
}' sample.txt
edit:
my expected output would be two files:
name1.txt
record_seperator
name1
foo
bar
name2.txt
record_seperator
name2
bla
bluh
Here's one. As there was no expected output, it just outputs raw records:
$ awk '
NR==FNR { # process first file
a[$1]=RS $0 # hash the whole record with first field (name) as key
next # process next record in the first file
} # after this line second file processing
$1 in a { # if first field value (name) is found in hash a
f=$1 ".txt" # generate filename
print a[$1] > f # output the whole record
close(f) # preserving fds
}' RS="record_seperator\n" sample RS="\n" FS="," results # file order and related vars
Only one match:
$ cat name1.txt
record_seperator
name1
foo
bar
Tested on gawk and mawk, acts weird on original-awk.
something like this, (not tested)
$ awk -F, 'NR==FNR {a[$1]; next} # fill array with names from first file
$1 in a {print rt, $0 > ($1".txt")} # print the record from second file
{rt = RT}' results.csv RS="define_it_here" sample.txt
since your record separator is before the records, you need to delay it by one.
Use the build in line/record iterator instead of working it around.
You code's errors:
#!/bin/bash
awk 'BEGIN{
while (getline < "results.csv")
{
split($0,name,",");
nameArr[k]=name[1]; ## <-- k not exists, you are rewriting nameArr[""] again and again.
}
{
RS="record_seperator"
FS="\n"
for (key in nameArr) ## <-- only one key "" exists, it's never gonna equal to $2
{
print nameArr[key]
print $2
if ($2==nameArr[key])
NR > 1
{
#extract file by Record separator and name from line2
print RS $0 > $2 ".txt"
}
}
}
}' sample.txt
Also the sample you showed:
name1, 2(random number)
name5, 3 ## <-- name5 here, not name2 !
Changed name5 to name2, and with your own code updated:
#!/bin/bash
awk 'BEGIN{
while ( (getline line< "results.csv") > 0 ) { # Avoid infinite loop when read erorr encountered.
split(line,name,",");
nameArr[name[1]]; # Actually no need do anything, just refer once to establish the key (name[1]).
}
RS="record_seperator";
FS="\n";
}
$2 in nameArr {
print RS $0; #You can add `> $2 ".txt"` later yourself.
}' sample.txt
Output:
record_seperator
name1
foo
bar
record_seperator
name2
bla
bluh
(Following #Tiw's lead, I also changed name5 to name2 in your results file in order to get the expected output)
$ cat a.awk
# collect the result names into an array
NR == FNR {a[$1]; next}
# skip the first (empty) sample record caused by initial record separator
FNR == 1 { next }
# If found, output sample record into the appropriate file
$1 in a {
f = ($1 ".txt")
printf "record_seperator\n%s", $0 > f
}
Run with gawk for multi-character RS:
$ gawk -f a.awk FS="," results.csv FS="\n" RS="record_seperator\n" sample.txt
Check results:
$ cat name1.txt
record_seperator
name1
foo
bar
$ cat name2.txt
record_seperator
name2
bla
bluh

What is the proper way to search array using Smart Match?

I'm new to programming much less Perl; I'm having difficulty with searching an array I've made from an external text file. I'm looking for a simple way to check if the user entry is located in the array. I've used the Smart Match function before but never in an "if" statement and can't seem to get it to work. Am I implementing this function wrong, or is there an easier way to check if the user's string is in the array?
#!/usr/bin/perl
use 5.010;
#Inventory editing script - Jason Black
#-------------------------------------------------------------------------------
print "1. Add Items\n";
print "2. Search Items\n";
print "Please enter your choice: ";
chomp ($userChoice = <STDIN>); #Stores user input in $userChoice
if($userChoice == 1){
$message = "Please enter in format 'code|title|price|item-count'\n";
&ChoiceOne;
}
elsif($userChoice == 2){
$message = "Enter search terms\n";
&ChoiceTwo;
}
sub ChoiceOne{
print "$message\n";
chomp($userAddition = <STDIN>); #Stores input in $userAddition
$string1 = "$userAddition";
open (FILE, "FinalProjData.txt") or die ("File not found"); #"FILE" can be named anything
#array = <FILE>;
if ( /$string1/ ~~ #array){
print "This entry already exists. Would you like to replace? Y/N \n";
chomp($userDecision = <STDIN>); #Stores input in $userDecision
if ($userDecision eq "Y"){
$string1 =~ s/$userAddition/$userAddition/ig;
print "Item has been overwritten\n";}
elsif($userDecision eq "N"){
print FILE "$string1\n";
print "Entry has been added to end of file.\n";}
else{
print "Invalid Input";
exit;}
}
else {
print FILE "$string1\n";
print "Item has been added.\n";}
close(FILE);
exit;
}#end sub ChoiceOne
sub ChoiceTwo{
print "$message\n";
}
If you want to avoid using smartmatch alltogether:
if ( grep { /$string1/ } #array ) {
To actually match the $string1, however, it needs to be escaped, so that | doesn't mean or:
if ( grep { /\Q$string\E/ } #array ) {
or just a simple string compare:
if ( grep { $_ eq $string } #array ) {

Resources