Array subscript out of range at line Error in SAS - arrays

I am trying to create new set of variables using arrays. but i am getting this error "
ERROR: Array subscript out of range at line 581 column 23."
in my program i have set of macro variables n1 to n15
Here is my code i can;t find out how does my arrays goes out of range since all arrays have 15 elements
data allsae1;
*length _a1 _a2 _a3 _a4 _a5 _a6 _a7 _a8 _a9 _a10 _a11 _a12 _a13 _a14 _a99 _b1 _b2 _b3 _b4 _b5 _b6 _b7 _b8 _b9 _b10 _b11 _b12 _b13 _b14 _b99 $10;
set allsae;
array _anum{15} a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a99;
array _bnum{15} b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b99;
array astat{15} _a1 _a2 _a3 _a4 _a5 _a6 _a7 _a8 _a9 _a10 _a11 _a12 _a13 _a14 _a99;
array bstat{15} _b1 _b2 _b3 _b4 _b5 _b6 _b7 _b8 _b9 _b10 _b11 _b12 _b13 _b14 _b99;
%macro stats;
%do i=1 %to 15;
%if _anum[i] !=. %then %do;
astat[i]=strip(put(_anum[i], best.))||" ("||strip(put(_anum[i]/(&&n&i) *100, 8.1))||"%)";
%end;
bstat[i] = strip(put(_bnum[i], best.));
%end;
%mend stats;
%stats;
run;

Why do you have macro code here? I don't see any place where you need to generate SAS code. The only place is the reference to &&n&i but I don't see where you have defined any macro variables named N1, N2, etc.
The string _anum[i] is always not equal to the string . so you always generate the SAS statement
astat[i]=strip(put(_anum[i], best.))||" ("||strip(put(_anum[i]/(<something>) *100, 8.1))||"%)";
But you never created the variable I so the index into astat and _anum arrays will be invalid.
Most likely you just want a normal DO loop and don't need to define a macro at all. If you really have those 15 macro variables and they contain numeric strings you might just use the SYMGETN() function.
do i=1 to 15;
if not missing(_anum[i]) then do;
astat[i]=strip(put(_anum[i], best.))||" ("||strip(put(_anum[i]/(symgetn(cats('n',i))) *100, 8.1))||"%)";
end;
bstat[i] = strip(put(_bnum[i], best.));
end;
Or just make a temporary array to have those 15 values.
array _n[15] _temporary_ (&n1 &n2 &n3 &n4 .... &n15);
which you then index into with the I variable.
... _n[i] ...

You need arrays not macros here. You're trying to use your macro variables but I would instead suggest you assign your macro variables N to an array. I would also recommend creating a single macro variable not N so you don't have to work about indexes and macro loops.
Create your N using something like this:
proc sql noprint;
select n into N_list_values separated by ", " from yourTable;
quit;
%put &n_list_values;
Then you can use it later on like this in the array.
array _anum{15} a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a99;
array _bnum{15} b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b99;
array astat{15} _a1 _a2 _a3 _a4 _a5 _a6 _a7 _a8 _a9 _a10 _a11 _a12 _a13 _a14 _a99;
array bstat{15} _b1 _b2 _b3 _b4 _b5 _b6 _b7 _b8 _b9 _b10 _b11 _b12 _b13 _b14 _b99;
array _n(15) _temporary_ (&N_list_values);
do i=1 to 15;
if _anum[i] !=. then do;
astat[i]=strip(put(_anum[i], best.))||" ("||strip(put(_anum[i]/(_n(i)) *100, 8.1))||"%)";
end;
bstat[i] = strip(put(_bnum[i], best.));
end;

Related

Django ORM to get a nested tree of models from DB with minimal SQL queries

I have some nested models with foreign key relationships going 4 levels deep.
A <- B <- C <- D
A has a set of B models, which each have a set of C models, which each have a set of D models.
I'm iterating of the each model (4 layers of looping from A down to B). This is producing lots of DB hits.
I don't need to do any filtering at the DB fetch level, as I need all the data from all the DB tables, so I ideally I'd like to get all the data with ONE SQL query that hits the DB (if that's possible) and then somehow have the data organized/filtered into their correct sets for each model. i.e. it's all pre-fetched and structured ready for using the data (e.g. in a web dashboard).
There seems to be a lot of django related pre-fetch helpers and packages, but none of them seem to work the way I expect. e.g. django-auto-prefetch (which seems ideal).
Is this a common use case (I thought it would be)?
How can I construct the ORM to get all the data in one hit and then just use the bits I need.
NOTE: target system is raspberry pi class device (1GHz Arm processor) with eMMC storage (similar to SD card), and using SQLite as the DB backend.
NOTE: I'm also using this with django-polymorphic, which may or may not make a difference?
Thanks, Brendan.
Using one query would result in a huge amount of bandwidth, since the values for the columns of the A model will be repeated per B model per C model per D model.
Indeed, the response would look like:
a_col1 | a_col2 | b_col1 | b_col2 | c_col1 | d_col1
A1 A1 B1 B1 C1 D1
A1 A1 B1 B1 C1 D2
A1 A1 B1 B1 C1 D3
A1 A1 B1 B1 C2 D4
A1 A1 B1 B1 C2 D5
A1 A1 B1 B1 C2 D6
A1 A1 B2 B2 C3 D7
A1 A1 B2 B2 C3 D8
A1 A1 B2 B2 C3 D9
A1 A1 B2 B2 C4 D10
A1 A1 B2 B2 C4 D11
A1 A1 B2 B2 C4 D12
A2 A2 B3 B3 C5 D13
A2 A2 B3 B3 C5 D14
A2 A2 B3 B3 C5 D15
A2 A2 B3 B3 C5 D16
We thus would repeat the values for the a_columns, b_columns, etc. a large number of times resulting in a large amount of bandwidth going from the database to the Python/Django layer. This would not only result in large amounts of data being transferred, but also large amounts of memory being used by Django to deserialize this response.
Therefore .prefetch_related makes one (or two depending on the type of relation) extra queries per level at most, so three to seven queries in total, which will minimize the bandwidth.
You thus can fetch all objects in memory with:
for a in A.objects.prefetch_related('b_set', 'b_set__c_set', , 'b_set__c_set__d_set'):
print(a)
for b in a.b_set.all():
print(b)
for c in b.c_set.all():
print(c)
for d in c.d_set.all():
print(d)

Clever way of adding an array to longer array at particular indices in Fortran?

I have two (1d) arrays, a long one A (size m) and a shorter one B (size n). I want to update the long array by adding each element of the short array at a particular index.
Schematically the arrays are structured like this,
A = [a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 ... am]
B = [ b1 b2 b3 b4 b5 b6 b7 b8 b9 ... bn ]
and I want to update A by adding the corresponding elements of B.
The most straightforward way is to have some index array indarray (same size as B) which tells us which index of A corresponds to B(i):
Option 1
do i = 1, size(B)
A(indarray(i)) = A(indarray(i)) + B(i)
end do
However, there is an organization to this problem which I feel should allow for some better performance:
There should be no barrier to doing this in vectorized way. I.e. the updates for each i are independent and can be done in any order.
There is no need to jump back and forth in array A. The machine should know to just loop once through the arrays only updating A where necessary.
There should be no need for any temporary arrays.
What is the best way to do this in Fortran?
Option 2
One way might be using PACK, UNPACK, and a boolean mask M (same size as A) that serves the same purpose as indarray:
A = [a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 ... am]
B = [ b1 b2 b3 b4 b5 b6 b7 b8 b9 ... bn ]
M = [. T T T . T T . . T . T T T T . ]
(where T represents .true. and . is .false.).
And the code would just be
A = UNPACK(PACK(A, M) + B, M, A)
This is very concise and maybe satisfies (1) and sort of (2) (seems to do two loops through the arrays instead of just one). But I fear the machine will be creating a few temporary arrays in the process which seems unnecessary.
Option 3
What about using where with UNPACK?
where (M)
A =A + UNPACK(B, M, 0.0d0)
end where
This seems about the same as option 2 (two loops and maybe creates temporary arrays). It also has to fill the M=.false. elements of the UNPACK'd array with 0's which seems like a total waste.
Option 4
In my situation the .true. elements of the mask will usually be in continuous blocks (i.e. a few true's in a row then a bunch of false's, then another block of true's, etc). Maybe this could lead to something similar to option 1. Let's say there's K of these .true. blocks. I would need an array indstart (of size K) giving the index into A of the start of each true block, and an index blocksize (size K) with the length of the true block.
j = 1
do i = 1, size(indstart)
i0 = indstart(i)
i1 = i0 + blocksize(i) - 1
A(i0:i1) = A(i0:i1) + B(j:j+blocksize(i)-1)
j = j + blocksize(i)
end do
At least this only does one loop through. This code seems more explicit about the fact that there's no jumping back and forth within the arrays. But I don't think the compiler will be able to figure that out (blocksize could have negative values for example). So this option probably won't result in a vectorized result.
--
Any thoughts on a nice way to do this? In my situation the arrays indarray, M, indstart, and blocksize would be created once but the adding operation must be done many times for different arrays A and B (though these arrays will have constant sizes). The where statement seems like it could be relevant.

Systems of inequations and counting constraint

I am playing with Gecode to solve a problem where you label the arcs of a multi-digraph such that the sums of the arc labels in each path from a single source to a single sink come from a multiset. So for example we might have a graph with 8 paths and want path arc sums to come from the set {5,5,5,4,3,2,1,0}. So we must have exactly 3 paths with a sum of 5 and 5 paths with a unique sum of 0..4.
You can reformulate this problem as asking if some permutation of {5,5,5,4,3,2,1,0} is in the column space of the path arc incidence matrix of the graph.
I model this multiset match with the "count" constraint. The arc sums are linear equations.
My graphs have many parallel edges that come in pairs. Using symmetry I use this to impose a partial order on the path sums. This also means that there are sets of pairs of path sums that have the same difference. So from my example if the paths are b0...b7 I have the following pair sets:
b0 - b1 = b3 - b4 = b5 - b6, b0 - b2 = b5 - b7, b0 - b3 = b1 - b4, b0 - b5 = b1 - b6 = b2 - b7
b1 - b2 = b6 - b7, b3 - b5 = b4 - b6
Including these differences into the model seems to cut down the search space by two orders of magnitude in Gecode. I am pleased with this because I think it's telling me something important about the graphs I am studying and it fits with some conjectures in the area I work in.
The partial order now tells us that only b0,b2,b3,b5 and b7 may take the value 5.
It's possible to now prove this system can not be satisfied. I am interested in techniques from constraint satisfaction etc that can be used to analyse a system of inequations (!=) along with "count". Obviously Gecode can prove this by assigning values and failing. I am interested in general techniques to both learn about constraint satisfaction, help improve the model and maybe gain some understanding of the things I am investigating.
To see the problem is not soluble we can show that each of the 6 sets of pair difference systems can not have a difference of zero. If they did they would generate duplicates of the wrong values or too many 5's.
For example b0 - b1 = b3 - b4 = b5 - b6 would have b0 = b1 which is impossible since b1 can not be 5 and that's the only value that can have duplicates.
Or b0 - b2 = b5 - b7 would mean that b0 = b2 and b5 = b7 requiring 4 5's.
So we end up with a set of inequations on the paths whose sum could be 5:
b0 != b2, b0 != b3, b0 != b5, b2 != b7, b5 != b7
We can see that b0 != 5 since if it was we could only get 2 fives. From the remaining 4 values we are forced to be able to set at most two to 5 so the whole system is insoluble.

SAS macro loop and dummy variable

I just want to make new dummy variable when there is a certain value.
Here is my orignal data example.
ID A1 A2... A10
1 10 1 5
2 20 8 4
...
...
And I would like to add dummy variable when there is a certain value in those attributes.
For example, ID 1 subject have "10", a new variable, Add10 would be 1..
ID A1 A2.. A10 Add1..Add4 Add5...Add20
1 10 1.. 5 1 ...0 1 ... 0
2 20 8.. 4 0 ...1 0 ... 1
...
Here is my code..
%MACRO DO_LIST;
%DO I=1 %TO 20;
data aaaa;
set aa33;
if A1 =i or
A2 =i or
A3 =i or
...
A10 =i then Add&I=I ;
RUN;
%END;
%MEND DO_LIST;
%DO_LIST;
However, my result have only Add20, which is the last variable..
I feel I took a mistake in loop statement. Would you mind helping me out?
Thanks in advance.
Right now you're always using the same data set as the input to aaaa and you're not changing this dataset with each loop. Thus, you'll always get Add20 only as this is what the last iteration of the loop will do.
A simple fix to this would be:
data append;
set aa33;
run;
%MACRO DO_LIST;
%DO I=1 %TO 20;
data append;
set append;
if A1 =i or
A2 =i or
A3 =i or
.....
A10 =i then Add&I=I ;
RUN;
%END;
%MEND DO_LIST;
%DO_LIST;
You want pretty much add a column to your dataset each time the loop runs as opposed to entirely replacing it with the original dataset (aa33) and the results of only the current iteration.
If you know the max # is 20, the following should work without a macro
data test;
set aa33;
array add[20] 1. add1 - add20;
array a[*] a:;
do i = 1 to dim(a);
value = a[i];
add[value] = 1;
end;
run;
I think that's what you're looking for, it'd help if you'd fill in at least the first two full rows of your example.

Parallel iteration over lists in makefile or CMake file

Is there a way to loop over multiple lists in parallel in a makefile or CMake file?
I would like to do something like the following in CMake, except AFAICT this syntax isn't supported:
set(a_values a0 a1 a2)
set(b_values b0 b1 b2)
foreach(a in a_values b in b_values)
do_something_with(a b)
endforeach(a b)
This would execute:
do_something_with(a0 b0)
do_something_with(a1 b1)
do_something_with(a2 b2)
I would accept an answer in either CMake or Make, though CMake would be preferred. Thanks!
Here you go:
set(list1 1 2 3 4 5)
set(list2 6 7 8 9 0)
list(LENGTH list1 len1)
math(EXPR len2 "${len1} - 1")
foreach(val RANGE ${len2})
list(GET list1 ${val} val1)
list(GET list2 ${val} val2)
message(STATUS "${val1} ${val2}")
endforeach()
As of CMake 3.17, the foreach() loop supports a ZIP_LISTS option to iterate through two (or more) lists simultaneously:
set(a_values a0 a1 a2)
set(b_values b0 b1 b2)
foreach(a b IN ZIP_LISTS a_values b_values)
message("${a} ${b}")
endforeach()
This prints:
a0 b0
a1 b1
a2 b2
In make you can use the GNUmake table toolkit to achieve this by handling the two lists as 1-column tables:
include gmtt/gmtt.mk
# declare the lists as tables with one column
list_a := 1 a0 a1 a2 a3 a4 a5
list_b := 1 b0 b1 b2
# note: list_b is shorter and will be filled up with a default value
joined_list := $(call join-tbl,$(list_a),$(list_b), /*nil*/)
$(info $(joined_list))
# Apply a function (simply output "<tuple(,)>") on each table row, i.e. tuple
$(info $(call map-tbl,$(joined_list),<tuple($$1,$$2)>))
Output:
2 a0 b0 a1 b1 a2 b2 a3 /*nil*/ a4 /*nil*/ a5 /*nil*/
<tuple(a0,b0)><tuple(a1,b1)><tuple(a2,b2)><tuple(a3,/*nil*/)><tuple(a4,/*nil*/)><tuple(a5,/*nil*/)>

Resources