problem with array elements in awk not being stored - arrays

I've been using awk to process hourly weather data, storing 10 arrays with as much as 8784 data elements. If an array is incomplete, i.e., stops at 8250, etc., after the "END" command I fill the remaining array elements with the last available value for the array. However, when I then print out the complete arrays, I get 0's for the filled values instead. What's causing this?? Does awk have a limit in the array size that's preventing it from filling the arrays? Following is a snippet of the awk program. In the two print statements, the first time the array elements are filled, but the second time they're empty.
Any help is appreciated because this problem is holding up my work.
Joe Huang
END{
if (lastpresstime < tothrs)
{
diffhr = tothrs - lastpresstime
for (i=lastpresstime+1;i<=tothrs+1;i++)
{
xpressinter[i]=diffhr
xpressrecords[i]=diffhr
xipress[i]=lastpress
xpressflag[i]="R"
printf("PRS xipress[%4d] =%6.1f\n",i,xipress[i]) > "ncdcfm3.prs"
printf(" xipress[%4d] =%6.1f%1s\n",i,xipress[i],xpressflag[i])
}
for (i=1;i<=tothrs+1;i++) printf("PRS xipress[%4d] =%6.1f\n",i,xipress[i])
}
~

I don't have the rep to edit your post, but here's the formatted code:
END {
if (lastpresstime < tothrs) {
diffhr = tothrs - lastpresstime
for (i=lastpresstime+1;i<=tothrs+1;i++) {
xpressinter[i]=diffhr
xpressrecords[i]=diffhr
xipress[i]=lastpress
xpressflag[i]="R"
printf("PRS xipress[%4d] =%6.1f\n",i,xipress[i]) > "ncdcfm3.prs"
printf(" xipress[%4d] =%6.1f%1s\n",i,xipress[i],xpressflag[i])
}
for (i=1;i<=tothrs+1;i++)
printf("PRS xipress[%4d] =%6.1f\n",i,xipress[i])
}
}
Note that I added a matching brace at the end.
I don't see any inherent problems in the code, so like jhartelt, I have to ask - are all of the variables properly defined? We can't tell from this sample how lastpresstime, tothrs, and lastpress get their values. In particular, if lastpress isn't, you'll get exactly the behavior you described. Note that if you have misspelled it, it will be an undefined variable and therefore use the default value of 0.
With respect to William Pursell's comment, there should also be no difference in the output of xipress[i] between the three printfs (for lastpresstime<i).

As 0 is the default value for an unknown/unused numerical variable, I would ask if you are sure, that there is no mistype in the variable names used in the END block?

Related

How to change the count of a for loop during the loop

I'm trying to change the number of items in array, over which a for loop is running, during the for loop, with the objective that this changes the number of loops. In a very simplified version, the code would look something like this:
var loopArray: [Int] = []
loopArray.append(1)
loopArray.append(2)
loopArray.append(3)
loopArray.append(4)
loopArray.append(5)
for x in 0..<Int(loopArray.count) {
print(x)
if x == 4 {
loopArray.append(6)
}
}
When running this code, 5 numbers are printed, and while the number 6 is added to the Array, the loopArray.count does not seem to update. How can I make the .count dynamic?
This is a very simplified example, in the project I'm working on, appending numbers to the array depends on conditions that may or may not be met.
I have looked for examples online, but have not been able to find any similar cases. Any help or guidance is much appreciated.
sfung3 gives the correct way to do what you want, but I think there needs to be a bit of explanation as to why your solution doesn't work
The line
for x in 0..<Int(loopArray.count)
only evaluates loopArray.count once, the first time it is hit. This is because of the way for works. Conceptually a for loop iterates through the elements of a sequence. The syntax is something like
for x in s
where
s is a sequence, give it type S
x is a let constant (you can also make it a var but that is not relevant to the current discussion) with type S.Element
So the bit after the in is a sequence - any sequence. There's nothing special about the use of ..< here, it's just a convenient way to construct a sequence of consecutive integers. In fact, it constructs a Range (btw, you don't need the cast to Int, Array.count is already an Int).
The range is only constructed when you first hit the loop and it's effectively a constant because Range is a value type.
If you don't want to use Joakim's answer, you could create your own reference type (class) that conforms to Sequence and whose elements are Int and update the upper bound each time through the loop, but that seems like a lot of work to avoid a while loop.
you can use a while loop instead of a for loop.
var i = 0
while i < loopArray.count {
print(i)
if i == 4 {
loopArray.append(6)
}
i += 1
}
which prints
0 1 2 3 4 5

Dereferencing an array from an array of arrays in perl

I have various subroutines that give me arrays of arrays. I have tested them separately and somehow when i write my main routine, I fail to make the program recognize my arrays. I know it's a problem of dereferencing, or at least i suspect it heavily.
The code is a bit long but I'll try to explain it:
my #leaderboard=#arrarraa; #an array of arrays
my $parentmass=$spect[$#spect]; #scalar
while (scalar #leaderboard>0) {
for my $i(0..(scalar #leaderboard-1)) {
my $curref=$leaderboard[$i]; #the program says here that there is an uninitialized value. But I start with a list of 18 elements.
my #currentarray=#$curref; #then i try to dereference the array
my $w=sumaarray (#currentarray);
if ($w==$parentmass) {
if (defined $Leader[0]) {
my $sc1=score (#currentarray);
my $sc2=score (#Leader);
if ($sc1>$sc2) {
#Leader=#currentarray;
}
}
else {#Leader=#currentarray;}
}
elsif ($w>$parentmass) {splice #leaderboard,$i,1;} #here i delete the element if it doesn't work. I hope it's done correctly.
}
my $leadref= cut (#leaderboard); #here i take the first 10 scores of the AoAs
#leaderboard = #$leadref;
my $leaderef=expand (#leaderboard); #then i expand the AoAs by one term
#leaderboard= #$leaderef; #and i should end with a completely different list to work with in the while loop
}
So I don't know how to dereference the AoAs correctly. The output of the program says:
"Use of uninitialized value $curref in concatenation (.) or string at C:\Algorithms\22cyclic\cyclospectrumsub.pl line 183.
Can't use an undefined value as an ARRAY reference at C:\Algorithms\22cyclic\cyclospectrumsub.pl line 184."
I would appreciate enormously any insight or recommendation.
The problem is with the splice that modifies the list while it is being processed. By using the 0..(scalar #leaderboard-1) you set up the range of elements to process at the beginning, but when some elements are removed by the splice, the list ends up shorter than that and once $i runs off the end of the modified list you get undefined references.
A quick fix would be to use
for (my $i = 0; $i < #leaderboard; $i++)
although that's neither very idiomatic nor efficient.
Note that doing something like $i < #leaderboard or #leaderboard-1 already provides scalar context for the array variable, so you don't need the scalar() call, it does nothing here.
I'd probably use something like
my #result;
while(my $elem = shift #leaderboard) {
...
if ($w==$parentmass) {
# do more stuff
push #result, $elem;
}
}
So instead of deleting from the original list, all elements would be taken off the original and only the successful (by whatever criterion) ones included in the result.
There seem to be two things going on here
You're removing all arrays from #leaderboard whose sumaarray is greater than $parentmass
You're putting in #Leader the array with the highest score of all the arrays in #leaderboard whose sumaarray is equal to $parentmass
I'm unclear whether that's correct. You don't seem to handle the case where sumaarray is less than $parentmass at all. But that can be written very simply by using grep together with the max_by function from the List::UtilsBy module
use List::UtilsBy 'max_by';
my $parentmass = $spect[-1];
my #leaderboard = grep { sumaarray(#$_) <= $parentmass } #arrarraa;
my $leader = max_by { score(#$_) }
grep { sumaarray(#$_) == $parentmass }
#leaderboard;
I'm sure this could be made a lot neater if I understood the intention of your algorithm; especially how those elements with a sumarray of less that $parentmass

re-initializing awk array created by split

I'm trying to use split to reverse the order of characters in a string that appears as the second field in a file with many such lines. The command:
{
n=split($2,arr," ");
for(i=1;i<=n;i++)
s=arr[i] s
}
{ print s }
does this for one line. However, the arr array (and n) seem immortal, so that when I embed this code into an awk script to process multiple lines, the output corresponding to the field I want reversed accumulates (and reverses) all previous lines:
1_B.pdb
GGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRG
AARS_0001_B.pdb
GGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRGGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRG
AARS_0002_B.pdb
GLILYDGFLDKRDLEGLKYNDILNRTKDVTDVGNTTRTECPDVNRKGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRGGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRG
AARS_0003_B.pdb
DGCSLDGFTDDRDLKGALYNKILNKTLIVTDVGNTTRTEVCEKDRYGLILYDGFLDKRDLEGLKYNDILNRTKDVTDVGNTTRTECPDVNRKGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRGGGTGYPGLKDKDDNEGTKYNKLLNATLIVTDVGNTIRTECPDVNRG
This appears to me to be a problem with re-initialization. I've tried to delete all previous elements of arr[] and to reset n to 0, without any effect. What do I need to do?
It's not arr that's immortal, it's s since you never [re-]init it to "" outside of the loop. arr is getting re-inited on every call to split().
Try this:
{
n=split($2,arr,/ /)
s=""
for(i=1;i<=n;i++)
s=arr[i] s
print s
}
The 3rd arg for split(), by the way is a field separator, not a string, and a field separator is a regexp with a couple of extra properties so the correct way to call split with a fixed "string" is using RE delimiters split($2,arr,/ /), not string delimiters split($2,arr," "). It doesn't make a functional difference in this case but it does when the field separator gets more complicated so best to get used to doing it the right way.
Bonus round: you would not need to explicitly re-init s if you put that code in a function:
function rev(str, arr,n,s,i) {
n=split(str,arr,/ /)
for(i=1;i<=n;i++)
s=arr[i] s
return s
}
...
{ print rev($2) }
Reason left as an exercise :-).

C Storing Matrix in Array of Chars and Printing

Hey all I am trying to store a matrix in an array of chars and then print it out.
My code that I have written:
#include<stdio.h>
#include<stdlib.h>
int main() {
int i;
int j;
int row=0;
int col=0;
int temp=0;
char c;
int array[3][2] = {{}};
while((c=getchar()) !=EOF && c!=10){
if((c==getchar()) == '\n'){
array[col++][row];
break;
}
array[col][row++]=c;
}
for(i=0; i<=2; i++){
for(j=0; j<=3; j++){
printf("%c ", array[i][j]);
}
printf("\n");
}
}
Using a text file such as:
1 2 3 4
5 6 7 8
9 1 2 3
I would like to be able to print that back out to the user, however what my code outputs is:
1 2 3 4
3 4 5 6
5 6 7 8
I cannot figure out what is wrong with my code, some how I am off an iteration in one of my loops, or it has something to do with not handling new lines properly. Thanks!
A few problems that I can see are:
As user3386109 mentioned in the comments, your array should be array[3][4] to match the input file.
The line array[col++][row]; does nothing but increment col, and then uselessly indexes the array and throws away the value. You can do the same thing with just col++;. However, you're not even using col at any later point in the code, so really you don't even need that. The break; all by itself does what you need. Which leads me to...
You're not populating the array like you think you are. You're incrementing col and then immediately breaking out of the loop. So how does the entire array ever get populated? Just by pure luck. As it turns out with your array declared as array[3][4], the array access array[0][4] (which isn't even technically supposed to exist) is equivalent to array[1][0]. This is because all multidimensional arrays (in C and just about any other language) are laid out in memory as flat arrays, because memory itself uses linear addressing. In C, this flattening of multidimensional arrays is done in so-called Row-major order, meaning that as you traverse the raw memory from first address to last, the corresponding multidimensional indices (i,j,k,...z, or in your case just i,j) increment in such a way that the last index will change the fastest. So, not only does col never get incremented except for right before you break out of the loop, but row never gets reset to 0, which means you're storing values in array[0][0], array[0][1], ... array[0][11], not array[0][0] .. array[0][3], array[1][0] .. array[1][3], array[2][0] .. array[2][3] as you were expecting. It was just luck that, thanks to row-major ordering, these two sets of indices were actually equivalent (and C doesn't do array bounds checking for you because it assumes you're doing it yourself).
This is just personal preference, but you will usually see arrays referenced as array[row][col], not array[col][row]. But like I said, that's just preference. If it's easier for you to visualize it as [col][row], then by all means do it that way. Just make sure you do it consistently and don't accidentally switch gears midway through your code to doing [row][col].
Your code will break and only print out part of the matrix if you accidentally put a trailing space at the end of one of your rows of numbers, because of the weird way you're checking for the end of input (doing a second getchar after each initial getchar and checking to see if the second character is \n). This method isn't wrong per se, in the sense that it will work, but it's not very robust and relies on your input data being precisely formatted and containing no trailing spaces. Anyone who has ever spent hours trying to figure out why their Makefile didn't work, only to find out that it was because they had leading spaces instead of tabs can attest to the fact that those kinds of errors can be extremely time-consuming and frustrating to track down. Precisely formatted input data is always a good thing, but your code shouldn't break in unexpected an non-obvious ways (such as only printing out half of a matrix) when it doesn't get perfect input. Edit: It only occurred to me later on that you were actually intending to do two mutually exclusive things here: increment col for the next line of input, and break out of the loop after having (presumably) detected the end of input. You need to figure out which thing you're doing here, although thanks to item #3, your code actually (and oddly) works just by taking user3386109's advice and changing array[3][2] to array[3][4].
I can only assume you used <= 2 and <= 3 in your for loops instead of < 3 and < 4, respectively, because you prefer doing it that way. That's fine, but it generally makes for easier-to-read code if your for loop conditions match up with your array dimensions. Just speculating here, but perhaps that's why you had array[3][2] when you really meant array[3][4].

"For" loop than overwrites when using write.table

I am trying to extract every two consecutive columns from an array, writing a .dat with every pair. The problem is that, when using write.table() it overwrites the files. When I use print() instead of write.table() it shows the correct subsets, though.
I also need the file names to show the number of the pair of columns selected (a total of 6 pairs), as well as the number of the dimension (from 1 to 5). For this I have used an easier solution such as tagging 1:30.
for(i in 0:5) {
for (j in 1:5) {
for (k in 1:30) {
filename <- paste("Component",k, ".dat", sep="")
write.table(data[,c(2*i+1,2*i+2),j],col.names=F, row.names=F, sep= " ")
}
}
}
Any hints why it does not work?
I hope my goal is understandable. Thanks a lot for your time!
Set the argument append to TRUE:
write.table(data[,c(2*i+1,2*i+2),j],
file=filename,
append=TRUE,
col.names=F,
row.names=F,
sep= " ")
also, as correctly pointed out by #Roland, you have forgotten to pass the file argument (already added in my example above).

Resources