matrix from data with awk - database

Warning, not an awk programmer.
I have a file, let's call it file.txt. It has a list of numbers which I will be using to find the information I need from the rest of the directory (which is full of files *.asc). The remaining files do not have the same lengths, but since I will be drawing data based on file.txt, the matrix I will be building will have the same number of rows. All files DO however contain the same number of columns, 3. The first column will be compared to file.txt, the second column of each *.asc file will be used to build the matrix. Here is what I have so far:
awk '
NR==FNR{
A[$1];
next}
$1 in A
{print $2 >> "data.txt";}' file.txt *.asc
This, however, prints the information from each file below the previous file. I want the information side by side, like a matrix. I looked up paste, but it seems to be called before awk, and all examples were only of a couple of files. I tried it still in place of print and did not work.
If anyone could help me out, this would be the last piece to my project. Thanks so much!

You could try:
awk -f ext.awk file.txt *.asc > data.txt
where ext.awk is
NR==FNR {
A[$1]++
next
}
FNR==1 {
if (ARGIND > 2)
print ""
}
$1 in A {
printf "%s ", $2
}
END {
print ""
}
Update
If you do not have Gnu Awk, the ARGIND variable is not available. You could then try
NR==FNR {
A[$1]++
next
}
FNR==1 {
if (++ai > 1)
print ""
}
$1 in A {
printf "%s ", $2
}
END {
print ""
}

Related

awk array that overtypes itself when printed

this is my first question so please let me know if I miss anything.
This is an awk script that uses arrays to make key-value pairs.
I have a file that has a header information separated by colons. The data is below it and separated by colons as well. My goal is to make key-value pairs that print out to a new file. I have everything set to be placed in arrays and it prints out almost perfectly.
Here is the input:
...:iscsi_name:iscsi_alias:panel_name:enclosure_id:canister_id:enclosure_serial_number
...:iqn.1111-00.com.abc:2222.blah01.blah01node00::11BLAH00:::
Here is the code:
#!/bin/awk -f
BEGIN {
FS = ":"
}
{
x = 1
if (NR==1) {
num_fields = NF ###This is done incase there are uneven head fields to data fields###
while (x <= num_fields) {
head[x] = $x
x++
}
}
y = 2
while (y <= NR) {
if (NR==y) {
x = 1
while (x <= num_fields) {
data[x] = $x
x++
}
x = 1
while (x <= num_fields) {
print head[x]"="data[x]
x++
}
}
y++
}
}
END {
print "This is the end of the arrays and the beginning of the test"
print head[16]
print "I am head[16]-"head[16]"- and now I'm going to overwrite everything"
print "I am data[16]-"data[16]"- and I will not overwrite everything, also there isn't any data in data[16]"
}
Here is the output:
...
iscsi_name=iqn.1111-00.com.abc
iscsi_alias=2222.blah01.blah01node00
panel_name=
enclosure_id=11BLAH00
canister_id=
=nclosure_serial_number ### Here is my issue ###
This is the end of the arrays and the beginning of the test
enclosure_serial_number
- and now I'm going to overwrite everything
I am data[16]-- and I will not overwrite everything, also there isn't any data in data[16]
NOTE: data[16] is not at the end of a line, for some reason, there is an extra colon on the data lines, hence the num_fields note above
Why does head[16] overwrite itself? Is it that there is a newline (\n) at the end of the field? If so, how do I get rid of it? I have tried adding subtracting the last character, no luck. I have tried to limit the number of characters the array can take in on that field, no luck. I have tried many more ideas, no luck.
Full Disclosure: I am relatively new to all of this, I might have messed up these previous fixes!
Does anyone have any ideas as to why this is happening?
Thanks!
-cheezter88
your script is unnecessarily complex. If you want to adjust the record size with the first row, do it so.
(I replaced "..." prefix with "x")
awk -F: 'NR==1 {n=split($0,h); next} # populate header fields and record size
NR==2 {for(i=1;i<=n;i++) # do the assignment up to header size
print h[i]"="$i}' file
x=x
iscsi_name=iqn.1111-00.com.abc
iscsi_alias=2222.blah01.blah01node00
panel_name=
enclosure_id=11BLAH00
canister_id=
enclosure_serial_number=
if you want to do this for the rest of the records, remove the NR==2 condition,

shellscript in C: using awk getting runaway string constant error

#define SHELLSCRIPT "\
#/bin/bash \n\
awk 'BEGIN { FS=\":\"; print \"User\t\tUID\n--------------------\"; } { print $1,\"\t\t\",$3;} END { print \"--------------------\nAll Users and UIDs Printed!\" }' /etc/passwd \n\
"
void displayusers()
{
system(SHELLSCRIPT);
}
The error message is:
awk: line 1: runaway string constant "User...
The bash cmd when run and works in the terminal is:
awk 'BEGIN { FS=":"; print "User\t\tUID\n--------------------"; } { print $1,"\t\t",$3;} END { print "--------------------\nAll Users and UIDs Printed!" }' /etc/passwd
I think somewhere when using \ to block out the various " for c it messed up my awk. But I'm not sure where. Ideas?
I simply took your string and used it in a printf() statement, and then analyzed the output:
#include <stdio.h>
#define SHELLSCRIPT "\
#/bin/bash \n\
awk 'BEGIN { FS=\":\"; print \"User\t\tUID\n--------------------\"; } { print $1,\"\t\t\",$3;} END { print \"--------------------\nAll Users and UIDs Printed!\" }' /etc/passwd \n\
"
int main(void)
{
printf("[[%s]]\n", SHELLSCRIPT);
return 0;
}
Example run:
$ ./runaway
[[#/bin/bash
awk 'BEGIN { FS=":"; print "User UID
--------------------"; } { print $1," ",$3;} END { print "--------------------
All Users and UIDs Printed!" }' /etc/passwd
]]
$
When I made the line ends visible (^J marks the end of line, ^I tabs), the problem is transparent:
[[#/bin/bash ^J
awk 'BEGIN { FS=":"; print "User^I^IUID^J
--------------------"; } { print $1,"^I^I",$3;} END { print "--------------------^J
All Users and UIDs Printed!" }' /etc/passwd ^J
]]^J
You have two occurrences of \n in the string which need to be \\n. It is up to you whether you change the appearances of \t to \\t; it works either way.
#define SHELLSCRIPT "\
#/bin/bash\n\
awk 'BEGIN { FS=\":\"; print \"User\t\tUID\\n--------------------\"; } { print $1,\"\t\t\",$3;} END { print \"--------------------\\nAll Users and UIDs Printed!\" }' /etc/passwd\n"
Using that in my program yields:
[[#/bin/bash^J
awk 'BEGIN { FS=":"; print "User^I^IUID\n--------------------"; } { print $1,"^I^I",$3;} END { print "--------------------\nAll Users and UIDs Printed!" }' /etc/passwd^J
]]^J
Note, in particular, the technique used to debug this. Print the data so you can see it precisely.
Haven't tested, but here's a useful-looking article just about this topic and it would appear that your choices are something along the following lines:
if you have different quotes surrounding the text, you don't need to escape the interior ones
you can use the same quotes surrounding and escape the interior ones
you can use octal sequences for the quotes, e.g. <\42>
if it get's too confusing, move the string into a separate file where quoting will not be an issue

Find and replace in AIX 5.3

I am running AIX 5.3.
I have two flat text files.
One is a "master" list of network devices, along with their communication settings(CLLIFile.tbl).
The other is a list of specific network devices that need to have one setting changed, within the main file(specifically, cn to le). The list file is called DDM2000-030215.txt.
I have gotten as far as looping through DDM2000-030215.txt, pulling the lines I need to change with grep from CLLIFile.tbl, changing cn to le with sed, and sending the output to a file.
The trouble is, all I get are the changed lines. I need to make the changes inside CLLIFile.tbl, because I cannot disturb the formatting or structure.
Here's what we tried, so far:
for i in 'DDM2000-030215.txt'
do
grep -p $ii CLLIFile.tbl| sed s/cn/le/g >> CLLIFileNew.tbl
done
Basically, I need to replace all instances of 'le' with 'cn', within 'CLLIFile.tbl', that are on lines that contain a network element name from 'DDM2000-030215.txt'.
Your sed (on AIX) will not have an -i option (edit the input file),
and you do not want to use a temporary file.
You can try a here construction with vi:
vi CLLIFile.tbl >/dev/null <<END
:1,$ s/cn/le/g
:wq
END
You don't want grep here, because, as you've observed, it only outputs the matching lines. You want to just use sed and have it do the replacement only on the lines that match while passing the other lines through unchanged.
So instead of this:
grep 'pattern' | sed 's/old/new/'
just do this:
sed '/pattern/s/old/new/'
You will have to send the output into a new file, and then move that new file into place to replace the old CLLIfile.tbl. Something like this:
cp CLLIfile.tbl CLLIfile.tbl.bak # make a backup in case something goes awry
sed '/pattern/s/old/new/' CLLIfile.tbl >newclli && mv newclli CLLIfile.tbl
EDIT: Entirely new question, I see. For this, I would use awk:
awk 'NR == FNR { a[++n] = $0; next } { for(i = 1; i <= n; ++i) { if($0 ~ a[i]) { gsub(/cn/, "le"); break } } print }' DDM2000-030215.txt CLLIFile.txt
This works as follows:
NR == FNR { # when processing the first file
# (DDM2000-030215.txt)
a[++n] = $0 # remember the tokens. This assumes that every
# full line of the file is a search token.
next # That is all.
}
{ # when processing the second file (CLLIFile.tbl)
for(i = 1; i <= n; ++i) { # check all remembered tokens
if($0 ~ a[i]) { # if the line matches one
gsub(/cn/, "le") # replace cn with le
break # and break out of the loop, because that only
# needs to be done once.
}
}
print # print the line, whether it was changed or not.
}
Note that if the contents of DDM2000-030215.txt are to be interpreted as fixed strings rather than regexes, you should use index($0, a[i]) instead of $0 ~ a[i] in the check.

How to get index no. of the element of an array which matches regex in bash?

I have a file like this:
c
a
b<
d
f
I need to get the index no. of letter which has < as suffix in a bash script. I thought of reading the file into an array then matching it with the regex .<$. But how do I get the index no. of that element which matches this regex?
I need the index no. because I want to modify this file to get the letter which is pointed to, move the < to the next line, and if it is at the last line, shuffle the order of the lines and place < after the first line.
you need awk '/<$/ { print NR; }' <your-file>
Grep could be used also:
grep -n \< infile
Then:
grep -n \< infile|cut -d : -f 1
So I build the source file,
$ cat file
c
a
b<
d
f<
with below awk, it will move the < to next line, but if it is last line, < will be moved to fist line.
awk '{ if (/</) a[NR]
sub(/</,"")
b[NR]=$0 }
END{ for (i in a)
{ if (i==NR) { b[1]=b[1] "<" }
else{ b[i+1]=b[i+1] "<"}
}
for (i=1;i<=NR;i++) print b[i]
}' file
c<
a
b
d<
f

Awk command to get search records with number of occurence of search pattern

awk 'FNR==NR { ! a[$0]++ ; next }
{ b[$0]++ }
END {
for (i in a) {
for (k in b) {
if (a[i]==1 && i ~ k ) { print i }
}
}
}' file1 file2
The above awk script program helped me out to get the search criteria from one file and accordingly to that search pattern i am able to get the record from other file. But from this script it is taking unique search record, if the same content is exist twice in file than also it search and print only once. I want the repeated record also to get the count of occurence of that record in the file.
From your post I gather that the array 'a' is storing all the records and array 'b' is storing all the regular expression search patterns.
Just change your if statement to:
if ( i ~ k ) { print i, a[i] } #a[i] prints the count of the record

Resources