re-initializing awk array created by split - arrays

I'm trying to use split to reverse the order of characters in a string that appears as the second field in a file with many such lines. The command:
{
n=split($2,arr," ");
for(i=1;i<=n;i++)
s=arr[i] s
}
{ print s }
does this for one line. However, the arr array (and n) seem immortal, so that when I embed this code into an awk script to process multiple lines, the output corresponding to the field I want reversed accumulates (and reverses) all previous lines:
1_B.pdb
GGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRG
AARS_0001_B.pdb
GGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRGGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRG
AARS_0002_B.pdb
GLILYDGFLDKRDLEGLKYNDILNRTKDVTDVGNTTRTECPDVNRKGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRGGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRG
AARS_0003_B.pdb
DGCSLDGFTDDRDLKGALYNKILNKTLIVTDVGNTTRTEVCEKDRYGLILYDGFLDKRDLEGLKYNDILNRTKDVTDVGNTTRTECPDVNRKGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRGGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRG
This appears to me to be a problem with re-initialization. I've tried to delete all previous elements of arr[] and to reset n to 0, without any effect. What do I need to do?

It's not arr that's immortal, it's s since you never [re-]init it to "" outside of the loop. arr is getting re-inited on every call to split().
Try this:
{
n=split($2,arr,/ /)
s=""
for(i=1;i<=n;i++)
s=arr[i] s
print s
}
The 3rd arg for split(), by the way is a field separator, not a string, and a field separator is a regexp with a couple of extra properties so the correct way to call split with a fixed "string" is using RE delimiters split($2,arr,/ /), not string delimiters split($2,arr," "). It doesn't make a functional difference in this case but it does when the field separator gets more complicated so best to get used to doing it the right way.
Bonus round: you would not need to explicitly re-init s if you put that code in a function:
function rev(str, arr,n,s,i) {
n=split(str,arr,/ /)
for(i=1;i<=n;i++)
s=arr[i] s
return s
}
...
{ print rev($2) }
Reason left as an exercise :-).

Related

How to I get a string from an array element found with .select without brackets in ruby?

Right now I'm having major difficulty finding a way to get a string from an array element without brackets.
I'm using .select to find a specific element in the array of strings I'm using, but when I try to print the variable I store the result to, it ends up also storing the brackets as well. I've tried numerous things, such as using .to_s and .join(''), but unforunately
found=file_arr.select {|str| str=~/\A#{find_x} #{y}/}
if visited.include?("#{found}") == false
#Do this
end
What I want to get is
#String_here
But what I'm getting instead is
[\"#String_here\n\"]
Enumerable#select will return an array containing all elements of enum for which the given block returns a true value.
Enumerable#find will return the first for which block is not false.
So in your case, you can use find instead:
found = file_arr.find { |str| str =~ /\A#{find_x} #{y}/ }
Notice your condition can be more understandable with unless:
do_something unless visited.include?("#{found}")

Slicing Multidimensional Array by a variable

I am writing a method which accepts a two dimensional array of doubles and an int row number as parameters and returns the highest value of the elements in the given row.
it looks like this:
function getHighestInRow(A, i)
return(maximum(A[:i,:]))
end
the issue i am having is when i slice the array with
A[:i,:]
I get an argument error because the :i makes i get treated differently.
the code works in the other direction with
A[:,i,:]
Is there a way to escape the colon? so that i gets treated as a variable after a colon?
You're doing something strange with the colon. In this case you're using the symbol :i not the value of i. Just getHighestInRow(A,i) = maximum(A[i,:]) should work.
Edit: As Dan Getz said in the comment on the question, getHighestInRow(A,i) = maximum(#view A[i,:]) is more efficient, though, as the slicing will allocate a temporary unnecessary array.

perl: More concise way to branch based on whether split succeeded?

I know split returns the number of fields parsed, if it assigned to a scalar; and returns an array if assigned to an array.
Is there a way to check whether a line is successfully parsed without having to call split twice (once to check how many fields were parsed, and, if the correct number of fields were parsed, a second time to return the fields in an array)?
foreach (#lines) {
if ( split ) {
my ($ipaddr, $hostname) = split;
}
}
.. I need to check whether the split succeeded in order to avoid later uninitialized references to $ipaddr and $hostname. Just seems like I ought to be able to combine the two calls to split into a single call.
Sure:
foreach (#lines) {
if (2 == (my ($ipaddr, $hostname) = split)) {
# Got exactly two fields
}
}
So if you just want to skip bad lines, you can simply use:
foreach (#lines) {
2 == (my ($ipaddr, $hostname) = split)
or next;
# Got exactly two fields
}
Don't forget to remove trailing whitespace from your lines first (such as by using chomp to remove line feeds) or it will mess up your field count.
You can change the == to <= if there might be more fields.
I think I would prefer a regex match:
for ( #lines ) {
next unless my ($ipaddr, $hostname) = /(\S+)\s+(\S+)/;
# use $ipaddr & $hostname
}
This is different from the original in that it will succeed if more than two non-space substrings are found, but a fix is simple if it is necessary.

"For" loop than overwrites when using write.table

I am trying to extract every two consecutive columns from an array, writing a .dat with every pair. The problem is that, when using write.table() it overwrites the files. When I use print() instead of write.table() it shows the correct subsets, though.
I also need the file names to show the number of the pair of columns selected (a total of 6 pairs), as well as the number of the dimension (from 1 to 5). For this I have used an easier solution such as tagging 1:30.
for(i in 0:5) {
for (j in 1:5) {
for (k in 1:30) {
filename <- paste("Component",k, ".dat", sep="")
write.table(data[,c(2*i+1,2*i+2),j],col.names=F, row.names=F, sep= " ")
}
}
}
Any hints why it does not work?
I hope my goal is understandable. Thanks a lot for your time!
Set the argument append to TRUE:
write.table(data[,c(2*i+1,2*i+2),j],
file=filename,
append=TRUE,
col.names=F,
row.names=F,
sep= " ")
also, as correctly pointed out by #Roland, you have forgotten to pass the file argument (already added in my example above).

problem with array elements in awk not being stored

I've been using awk to process hourly weather data, storing 10 arrays with as much as 8784 data elements. If an array is incomplete, i.e., stops at 8250, etc., after the "END" command I fill the remaining array elements with the last available value for the array. However, when I then print out the complete arrays, I get 0's for the filled values instead. What's causing this?? Does awk have a limit in the array size that's preventing it from filling the arrays? Following is a snippet of the awk program. In the two print statements, the first time the array elements are filled, but the second time they're empty.
Any help is appreciated because this problem is holding up my work.
Joe Huang
END{
if (lastpresstime < tothrs)
{
diffhr = tothrs - lastpresstime
for (i=lastpresstime+1;i<=tothrs+1;i++)
{
xpressinter[i]=diffhr
xpressrecords[i]=diffhr
xipress[i]=lastpress
xpressflag[i]="R"
printf("PRS xipress[%4d] =%6.1f\n",i,xipress[i]) > "ncdcfm3.prs"
printf(" xipress[%4d] =%6.1f%1s\n",i,xipress[i],xpressflag[i])
}
for (i=1;i<=tothrs+1;i++) printf("PRS xipress[%4d] =%6.1f\n",i,xipress[i])
}
~
I don't have the rep to edit your post, but here's the formatted code:
END {
if (lastpresstime < tothrs) {
diffhr = tothrs - lastpresstime
for (i=lastpresstime+1;i<=tothrs+1;i++) {
xpressinter[i]=diffhr
xpressrecords[i]=diffhr
xipress[i]=lastpress
xpressflag[i]="R"
printf("PRS xipress[%4d] =%6.1f\n",i,xipress[i]) > "ncdcfm3.prs"
printf(" xipress[%4d] =%6.1f%1s\n",i,xipress[i],xpressflag[i])
}
for (i=1;i<=tothrs+1;i++)
printf("PRS xipress[%4d] =%6.1f\n",i,xipress[i])
}
}
Note that I added a matching brace at the end.
I don't see any inherent problems in the code, so like jhartelt, I have to ask - are all of the variables properly defined? We can't tell from this sample how lastpresstime, tothrs, and lastpress get their values. In particular, if lastpress isn't, you'll get exactly the behavior you described. Note that if you have misspelled it, it will be an undefined variable and therefore use the default value of 0.
With respect to William Pursell's comment, there should also be no difference in the output of xipress[i] between the three printfs (for lastpresstime<i).
As 0 is the default value for an unknown/unused numerical variable, I would ask if you are sure, that there is no mistype in the variable names used in the END block?

Resources