I want to calculate the maximum of a list of SAS variables, where the list is determined by another variable present in the dataset. That is,
| var_1 | var_2 | var_3 | var_4 | maximum till | formula used | var_output |
|-------|-------|-------|-------|--------------|----------------------|------------|
| 3 | 6 | 9 | 12 | 4 | =max(of var_1-var_4) | 12 |
| 1 | 10 | 100 | 1000 | 2 | =max(of var_1-var_2) | 10 |
| 5 | 15 | 25 | 35 | 3 | =max(of var_1-var_3) | 25 |
Appreciate any help. Thanks :)
Use a do loop and a rolling maximum:
data want;
set have;
array vars{4} var1-var4;
do i = 1 to max_till;
var_out = max(vars{i},var_out);
end;
run;
The FCMP solution. This is similar to user667489's solution, but implemented as a function. This will only work in 9.4, and possibly only 9.4 TS1M0+.
data have; *some data;
input var1-var4 varmax;
datalines;
3 6 9 12 4
1 10 100 1000 2
5 15 25 35 3
;;;;
run;
proc fcmp outlib=work.funcs.func; *store functions here;
function maxof(mxarr[*],maxlim); *returns numeric;
do _i = 1 to maxlim;
_max = max(mxarr[_i],_max);
end;
return(_max);
endsub;
run;
options cmplib=work.funcs; *define where functions come from;
data want;
set have;
array vars var1-var4;
varout = maxof(vars,varmax); *use function (pass array by reference);
run;
Related
I am wondering if it is possible to reshape the following have table in SAS not using SAS/IML to produce the want table.
Have:
+--------+------+------+------+
| NAME | var1 | var2 | var3 |
+--------+------+------+------+
| Q1_ID1 | 1 | 2 | 3 |
| Q1_ID2 | 4 | 5 | 6 |
| Q2_ID1 | 7 | 8 | 9 |
| Q2_ID2 | 10 | 11 | 12 |
| Q3_ID1 | 13 | 14 | 15 |
| Q3_ID2 | 16 | 17 | 18 |
+--------+------+------+------+
Want:
+----------+----+----+----+
| NAME | Q1 | Q2 | Q3 |
+----------+----+----+----+
| var1_ID1 | 1 | 7 | 13 |
| var1_ID2 | 4 | 10 | 16 |
| var2_ID1 | 2 | 8 | 14 |
| var2_ID2 | 5 | 11 | 17 |
| var3_ID1 | 3 | 3 | 15 |
| var3_ID2 | 6 | 6 | 18 |
+----------+----+----+----+
The code to reproduce the have table is the following:
data have;
infile datalines delimiter=",";
input NAME :$8. var1 :8. var2 :8. var3 :8.;
datalines;
Q1_ID1,1,2,3
Q1_ID2,4,5,6
Q2_ID1,7,8,9
Q2_ID2,10,11,12
Q3_ID1,13,14,15
Q3_ID2,16,17,18
;
run;
Two transposes are needed, with some tearing apart and putting together in between.
data have;
infile datalines delimiter=",";
input NAME :$8. var1 :8. var2 :8. var3 :8.;
datalines;
Q1_ID1,1,2,3
Q1_ID2,4,5,6
Q2_ID1,7,8,9
Q2_ID2,10,11,12
Q3_ID1,13,14,15
Q3_ID2,16,17,18
;
run;
proc transpose data=have out=stage;
by name;
var var:;
run;
data stage2(keep=name col1 qtr);
set stage;
qtr = scan(name,1,'_'); * tear apart;
id = scan(name,2,'_');
name = catx('_', _name_, id); * put together;
run;
proc sort data=stage2;
by name qtr;
run;
proc transpose data=stage2 out=want;
by name;
id qtr;
run;
I have code similar to this:
data input;
input yr $ lob $ type $
allow_P25 los_P25 adm_P25;
cards;
2019 Com AMB 205.4 3.56 3444
2019 Med DME 34.4 1.11 533
;
run;
data results;
length perc_type $15 perc_value 8;
set input;
array change _numeric_;
do over change;
perc_type = vname(change);
perc_value = change;
output results;
end;
run;
This code creates an array of all numeric variables. However, I now need to create an array of variables with names ending in P25
Is there a way to do it using wildcards? I found some solutions on the internet in which they used wildcards, but it always seemed to be at the end of the variable name. What if I want to use the wildcard at the beginning of the variable name? I tried this (obviously, wrong solution)
data results;
length perc_type $15 perc_value 8;
set input;
array change :P25;
do over change;
perc_type = vname(change);
perc_value = change;
output results;
end;
run;
As a workaround, you can match the suffix of the _numeric_ array using prxmatch(perl_regex, string) and perform action only if a match is found.
Code
data results;
length perc_type $15 perc_value 8;
set input;
array change _numeric_;
do over change;
/* match vnames ending with P25 */
if (prxmatch("/P25$/", vname(change)) > 0) then do;
/* do whatever you want */
perc_type = vname(change);
perc_value = change;
output results;
end;
end;
run;
Output
| Obs | perc_type | perc_value | yr | lob | type | allow_P25 | los_P25 | adm_P25 |
|-----|----------:|------------|------|-----|-----:|----------:|--------:|---------|
| 1 | allow_P25 | 205.40 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 2 | los_P25 | 3.56 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 3 | adm_P25 | 3444.00 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 4 | allow_P25 | 34.40 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
| 5 | los_P25 | 1.11 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
| 6 | adm_P25 | 533.00 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
Notes
SAS array does not work like other languages. A SAS array is a reference to a group of variables. Therefore, if there is no magic way, just get the built-in _numeric_ group directly at first and filter the variable names subsequently.
See also the official docs in SAS regex.
There is no direct SAS syntax that allows this. Though macros have been written to deal with problems like this one.
See a few at Roger's Github here.
I have a table with a number of variables such as:
+-----------+------------+---------+-----------+--------+
| DateFrom | DateTo | Price | Discount | Cost |
+-----------+------------+---------+-----------+--------+
| 01jan17 | 01jul17 | 17 | 4 | 5 |
| 01aug17 | 01feb18 | 15 | 1 | 3 |
| 01mar18 | 01dec18 | 12 | 2 | 1 |
| ... | ... | ... | ... | ... |
+-----------+------------+---------+-----------+--------+
However I want to split this so I have:
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| DateFrom1 | DateTo1 | Price1 | Discount1 | Cost1 | DateFrom2 | DateTo2 | Price2 | Discount2 | Cost2 ... |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| 01jan17 | 01jul17 | 17 | 4 | 5 | 01aug17 | 01feb18 | 15 | 1 | 3 |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
There's a cool (not at all obvious) solution using proc summary and the idgroup statement that only takes a few lines of code. This runs in memory and you're likely to come into problems if the dataset is large, otherwise this works very well.
Note that out[3] relates to the number of rows in the source data. You could easily make this dynamic by adding a prior step that calculates the number of rows and stores it in a macro variable.
/* create initial dataset */
data have;
input (DateFrom DateTo) (:date7.) Price Discount Cost;
format DateFrom DateTo date7.;
datalines;
01jan17 01jul17 17 4 5
01aug17 01feb18 15 1 3
01mar18 01dec18 12 2 1
;
run;
/* transform data into 1 row */
proc summary data=have nway;
output out=want (drop=_:)
idgroup(out[3] (_all_)=) / autoname;
run;
I need to generate long (pseudo)random arrays (1000-25 000 000 integers) where no element is repeated. How do I do it since rand() function does not generate numbers long enough?
I tried to use this idea: array[i] = (rand() << 14) | rand() % length; however I suppose there is much better way that I don't know.
Thank you for your help.
You can use the Fisher-Yates shuffle for this.
Create an array of n elements and populate each element sequentially.
-------------------------
| 1 | 2 | 3 | 4 | 5 | 6 |
-------------------------
In this example n is 6. Now select a random index from 0 to n-1 (i.e. rand() % n) and swap the number at that index with the number at the top of the array. Let's say the random index is 2. So we swap the value at index 2 (3) and the one at n-1 (6). Now we have:
v
-------------------------
| 1 | 2 | 6 | 4 | 5 | 3 |
-------------------------
Now we do the same, this time with the upper bound of the index being n-2. Then we swap the value at that index with the value at index n-2. Let's say time we randomly get 0. So we swap index 0 (1) with index n-2 (5):
v
-------------------------
| 5 | 2 | 6 | 4 | 1 | 3 |
-------------------------
Then repeat. Let's say the next random index is 3. This happens to be our upper limit, so no change:
v
-------------------------
| 5 | 2 | 6 | 4 | 1 | 3 |
-------------------------
Next we get 0:
v
-------------------------
| 6 | 2 | 5 | 4 | 1 | 3 |
-------------------------
And finally 1:
v
-------------------------
| 6 | 2 | 5 | 4 | 1 | 3 |
-------------------------
I have been using SAS off and on for a year and I'm finally getting into arrays, macros, and all that cool stuff.
What I want to do:
I have a merged dataset with data from students in different grades on a test. I need to create different files for each grade. I don't have a grade variable to easily sort the dataset by and create different files. I do have an index of variables specific to each grade.
Example - What I have:
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 16623 | 1 | 1 | 0 | . | . |
| 16624 | 1 | 0 | 0 | . | . |
| 16626 | 1 | 1 | 1 | . | . |
| 17221 | . | . | . | 1 | 0 |
| 17222 | . | . | . | 0 | 1 |
| 17225 | . | . | . | 0 | . |
+-------+--------+--------+--------+--------+--------+
Example - What I want:
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 16623 | 1 | 1 | 0 | . | . |
| 16624 | 1 | 0 | 0 | . | . |
| 16626 | 1 | 1 | 1 | . | . |
+-------+--------+--------+--------+--------+--------+
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 17221 | . | . | . | 1 | 0 |
| 17222 | . | . | . | 0 | 1 |
| 17225 | . | . | . | 0 | . |
+-------+--------+--------+--------+--------+--------+
Where I am:
I have a lot of variables specific to each grade, and some of the variables contain missing data, so to be thorough I should check all of the grade-specific variables and output any observations containing data in any of those fields. I could use a hideously long IF THEN statement...
DATA grade1 grade2 grade3 grade4;
SET gradeall;
IF sc_132 ^= . OR sc_139 ^= . OR (AND SO ON FOR ABOUT 34 VARIABLES) THEN OUTPUT grade1;
RUN;
But I thought this would be a good time to use an array. I can't find any easy to parse documentation about where and when you can use do loops. Using my logic of other programming languages and what I've browsed about do loops I've put together the following.
%let gr1_var = sc_132 sc_139 sc_142;
/*-GRADE SPECIFIC ARRAY REPEATED FOR OTHER GRADES -*/
DATA grade1 grade2 grade3 grade4;
SET gradeall;
PUT &gr1_var;
ARRAY grade1 [*] &gr1_var;
IF (
DO i= 1 TO (DIM(items5_all)-1);
items5_all(i) ^=. OR ;
END;
DO i= DIM(items5_all);
items5_all(i) ^=.;
END;
)
THEN OUTPUT grade1;
/*-IF THEN STATEMENT THEN REPEATED FOR OTHER GRADES-*/
run;
I was hoping this would give me the equivalent of the long IF THEN statement above without having to type it. But of course it is non-functional.
Can you even use do loops within If statements (I haven't found any examples of this)?
Does anyone have any recommendations for how to accomplish this task?
I think if you only want to output any observation which contains data in any of specific fields, you can just do a sum of array. If any observation doesn't have value for a variable, the sum is empty so this observation will not be output. No loop is needed. Just like:
%let gr1_var = sc_132--sc_142; /*for array definition, you may use "--" or "-" */
%let gr2_var = sc_143 sc_151;
DATA grade1 grade2;
SET gradeall;
ARRAY grade1 [*] &gr1_var;
ARRAY grade2 [*] &gr2_var;
if sum(of grade1(*))^=. then output grade1;
if sum(of grade2(*))^=. then output grade2;
run;
By the way, if macro is used here, there is no need to write multiple if..then and array definition.
And I don't think you can use DO LOOP inside if..else statement like what you put here.