I have a data set (test-file.csv) with tree columns:
node,contact,mail
AAAA,Peter,peter#anything.com
BBBB,Hans,hans#anything.com
CCCC,Dieter,dieter#anything.com
ABABA,Peter,peter#anything.com
CCDDA,Hans,hans#anything.com
I like to extend the header by the column count and rename node to nodes.
Furthermore all entries should be sorted after the second column (mail).
In the column count I like to get the number of occurences of the column mail,
in nodes all the entries having the same value in the column mail should be printed (space separated and alphabetically sorted).
This is what I try to achieve:
contact,mail,count,nodes
Dieter,dieter#anything,com,1,CCCC
Hans,hans#anything.com,2,BBBB CCDDA
Peter,peter#anything,com,2,AAAA ABABA
I have this awk-command:
awk -F"," '
BEGIN{
FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
counts[$3]++; # Increment count of lines.
contact[$2]; # contact
}
END {
# Iterate over all third-column values.
for (x in counts) {
printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
}
}
' test-file.csv | sort --field-separator="," --key=2 -n
However this is my result :-(
Nothing but the amount of occurences work.
,Dieter#anything.com,1,nodes
,hans#anything.com,2,nodes
,peter#anything.com,2,nodes
contact,mail,count,nodes
Any help appreciated!
You may use this gnu awk:
awk '
BEGIN {
FS = OFS = ","
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR > 1 {
++counts[$3] # Increment count of lines.
name[$3] = $2
map[$3] = ($3 in map ? map[$3] " " : "") $1
}
END {
# Iterate over all third-column values.
PROCINFO["sorted_in"]="#ind_str_asc";
for (k in counts)
print name[k], k, counts[k], map[k]
}
' test-file.csv
Output:
contact,mail,count,nodes
Dieter,dieter#anything.com,1,CCCC
Hans,hans#anything.com,2,BBBB CCDDA
Peter,peter#anything.com,2,AAAA ABABA
With your shown samples please try following. Written and tested in GNU awk.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2
mail[nf]=$NF
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,mail[i],arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ FS=OFS="," } ##In BEGIN section setting FS, OFS as comma here.
FNR==1{ ##if this is first line then do following.
sub(/^[^,]*,/,"") ##Substituting everything till 1st comma here with NULL in current line.
$1=$1 ##Reassigning 1st field to itself.
print $0,"count,nodes" ##Printing headers as per need to terminal.
}
FNR>1{ ##If line is Greater than 1st line then do following.
nf=$2 ##Creating nf with 2nd field value here.
mail[nf]=$NF ##Creating mail with nf as index and value is last field value.
NF-- ##Decreasing value of current number of fields by 1 here.
arr[nf]++ ##Creating arr with index of nf and keep increasing its value with 1 here.
val[nf]=(val[nf]?val[nf] " ":"")$1 ##Creating val with index of nf and keep adding $1 value in it.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through arr in here.
print i,mail[i],arr[i],val[i] | "sort -t, -k1" ##printing values to get expected output and sorting it also by pipe here as per requirement.
}
}
' Input_file ##Mentioning Input_file name here.
2nd solution: In case you want to sort by 2nd and 3rd fields then try following.
awk '
BEGIN{ FS=OFS="," }
FNR==1{
sub(/^[^,]*,/,"")
$1=$1
print $0,"count,nodes"
}
FNR>1{
nf=$2 OFS $3
NF--
arr[nf]++
val[nf]=(val[nf]?val[nf] " ":"")$1
}
END{
for(i in arr){
print i,arr[i],val[i] | "sort -t, -k1"
}
}
' Input_file
I have pipe delimited file, I want to check for value 'America' at 5th position (column) of each record.
America word can appear in any other columns as well, so grep -o is not giving correct result.
Any other way to check the occurrence of word at specific location when file is delimited?
Script like this can do the work
awk '$5="America" {print}' filename|wc -l
Of course you can do it only with awk like this:
awk 'BEGIN {i=0} $5="America" {i+=1} END {print i}' filename
To use different delimiter you can use in awk this way:
awk -F\| 'BEGIN {i=0} $5="America" {i+=1} END {print i}' filename
I am trying to convert heading of file content into column using awk, below is my input file -
abc.txt
1234|43245
4325|65123
5432|12342
bcd.txt
865|432
324|543
123|654
cde.txt
12|321
21|123
32|123
output :
abc.txt|1234|43245
abc.txt|4325|65123
abc.txt|5432|12342
bcd.txt|865|432
bcd.txt|324|543
bcd.txt|123|654
cde.txt|12|321
cde.txt|21|123
cde.txt|32|123
Explanation :
Fetch the filename(abc.txt) where NR==1 and put it into an array a or variable and print it will the file contents and when the file contents are completed create a blank line.
I was trying to create two array a one for NF=1 and another array b for NF>1 and loop on array b to merge file content with array a but still trying to figure out a solution.
In awk:
$ awk 'BEGIN{FS=OFS="|"} NF==1{h=$0} {print (NF==1?"": h OFS $0)}' file
abc.txt|1234|43245
abc.txt|4325|65123
abc.txt|5432|12342
bcd.txt|865|432
bcd.txt|324|543
bcd.txt|123|654
cde.txt|12|321
cde.txt|21|123
cde.txt|32|123
Downside is, that it prints an empty line in the beginning. If you can't live with that, add NR==1{next} before the printing block - or better yet: see #EdMorton's comment below.
Explained:
BEGIN{ FS=OFS="|" } # set delimiters
NF==1{ h=$0 } # if NF==1 it's header time, store it to h
# NR==1{ next } # to remove the leading enter, apply this
{ print (NF==1 ? "" : h OFS $0) } # print an empty record or the record with header
So here is my solution:
awk 'BEGIN{ OFS = "|"} /[a-z].[a-z]/{ if ($0 != header && NR > 1){print ""}; header = $0 }/[0-9]\|[0-9]/{ numbers = $0; print header, numbers }' yourfile
Output:
abc.txt|1234|43245
abc.txt|4325|65123
abc.txt|5432|12342
bcd.txt|865|432
bcd.txt|324|543
bcd.txt|123|654
cde.txt|12|321
cde.txt|21|123
cde.txt|32|123
It is without any arrays but it seems to work.
Use the following awk approach:
awk '{ if($1~/[a-z]+.txt/) { if(NR != 1) {print ""} h=$1;next } print h"|"$1;}' testfile
The output:
abc.txt|1234|43245
abc.txt|4325|65123
abc.txt|5432|12342
bcd.txt|865|432
bcd.txt|324|543
bcd.txt|123|654
cde.txt|12|321
cde.txt|21|123
cde.txt|32|123
Explanation:
if($1~/[a-z]+.txt/) - the condition checks if current column $1 matches the pattern /[a-z]+.txt/(header column)
h=$1;next - if a column matching the pattern is found, saves header value e.g. abc.txt into variable h and skips a header line via next
if(NR != 1) {print ""} - prints a linebreak if it's not the first occurance of a header line
print h"|"$1; - prints a header value with a separator and each next subsequent line
awk -F '|' '{if(NF==2)$0=F"|"$0;else{F=$1;$0=""}}NR>1' YourFile
self commented:
# use | as separator
awk -F '|' '
# for every lines
{
# line with "data" have 2 field
if(NF==2) {
# add File nameand "|" in front of current line
$0 = F"|"$0
}
else {
# File name is field 1
F=$1
# change line to empty line
$0=""
}
}
#print line (in new state, ater 1st line), default action of a trigger
NR>1
' YourFile
how can I read input from first file say file1.txt and print column 3 $3 from file2.txt if first file $1 is equal to $2 in second file?
if '$1 in file1.txt == $1 in file file2.txt {print $3 from file2.txt}'
I couldn't find simple and straight forward solution to question?
It's pretty straight-forward:
awk 'FNR == NR { a[FNR] = $1; next }
FNR != NR { if (a[FNR] == $2) print $3 }' file1.txt file2.txt
The first line saves the value of $1 for each line in file1.txt (and skips the rest of the script).
The second line doesn't formally need the FNR!=NR condition, but I think it makes it clearer. It processes file2.txt. If the value in $2 is equal to the corresponding saved value, print $3.
If the files are too big to save the $1 values from file1.txt in memory, you should have said so and you have to work harder. It can still be done with awk; it just isn't as neat and tidy and awk-ish.
I have two large files of 80,000 plus records that are identical in length. I need to compare the two files line by line by the first 8 characters of the file. Line one of file one is to be compared to line one of file two. Line two of file one is to be compared to line two of file two.
Sample file1
01234567blah blah1
11234567blah blah2
21234567blah blah3
31234567blah blah4
Sample file2
31234567blah nomatch
11234567matchme2
21234567matchme3
31234567matchme4
Lines 2 - 4 should match but line 1 should not. My script matches line 1 to line 4 but should be compared to just line 1.
awk '
FNR==NR {
a[substr($0,1,8)]=1;next
}
{if (a[substr($0,1,8)])print $0; else print "Not Found", $0;}
' $inputfile1 $inputfile2 > $outputfile1
Thank you.
For line by line compare you need to use FNR variable as key. Try:
awk 'NR==FNR{a[FNR]=substr($1,1,8);next}{print (a[FNR]==substr($1,1,8)?$0:"Not Found")}' file1 file2
Not Found
11234567matchme2
21234567matchme3
31234567matchme4
awk 'BEGIN{
while(1){
f=getline<"file1";
if(f!=1)exit;
a=substr($0,1,8);
f=getline<"file2";
if(f!=1)exit;
b=substr($0,1,8);
print a==b?$0:"Not Found"FS$0}}'
Reads one line from file1 if successful stores the substring in a then one line from file2 if successful stores the substring in b, then checks whether a and b are equal or not and prints the output.
Output:
Not Found 31234567blah nomatch
11234567matchme2
21234567matchme3
31234567matchme4
If there's a single char not in either file you could use as a delimiter, like : in your example, and a paste/awk combo like:
paste -d: data data2 | awk -F: '{prefix=substr($1,1,8)!=substr($2,1,8) ? "Not Found"OFS : ""; print prefix $2}'
paste joins the corresponding lines from each file into one line, with a : separator
awk uses the : delimiter
awk tests for a match on the first 8 chars of each field and creates prefix
awk prints out every line with a prefix that's "Not Found" (+OFS) when they don't match.