I have been using SAS off and on for a year and I'm finally getting into arrays, macros, and all that cool stuff.
What I want to do:
I have a merged dataset with data from students in different grades on a test. I need to create different files for each grade. I don't have a grade variable to easily sort the dataset by and create different files. I do have an index of variables specific to each grade.
Example - What I have:
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 16623 | 1 | 1 | 0 | . | . |
| 16624 | 1 | 0 | 0 | . | . |
| 16626 | 1 | 1 | 1 | . | . |
| 17221 | . | . | . | 1 | 0 |
| 17222 | . | . | . | 0 | 1 |
| 17225 | . | . | . | 0 | . |
+-------+--------+--------+--------+--------+--------+
Example - What I want:
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 16623 | 1 | 1 | 0 | . | . |
| 16624 | 1 | 0 | 0 | . | . |
| 16626 | 1 | 1 | 1 | . | . |
+-------+--------+--------+--------+--------+--------+
+-------+--------+--------+--------+--------+--------+
| ID | sc_132 | sc_139 | sc_142 | sc_143 | sc_151 |
+-------+--------+--------+--------+--------+--------+
| 17221 | . | . | . | 1 | 0 |
| 17222 | . | . | . | 0 | 1 |
| 17225 | . | . | . | 0 | . |
+-------+--------+--------+--------+--------+--------+
Where I am:
I have a lot of variables specific to each grade, and some of the variables contain missing data, so to be thorough I should check all of the grade-specific variables and output any observations containing data in any of those fields. I could use a hideously long IF THEN statement...
DATA grade1 grade2 grade3 grade4;
SET gradeall;
IF sc_132 ^= . OR sc_139 ^= . OR (AND SO ON FOR ABOUT 34 VARIABLES) THEN OUTPUT grade1;
RUN;
But I thought this would be a good time to use an array. I can't find any easy to parse documentation about where and when you can use do loops. Using my logic of other programming languages and what I've browsed about do loops I've put together the following.
%let gr1_var = sc_132 sc_139 sc_142;
/*-GRADE SPECIFIC ARRAY REPEATED FOR OTHER GRADES -*/
DATA grade1 grade2 grade3 grade4;
SET gradeall;
PUT &gr1_var;
ARRAY grade1 [*] &gr1_var;
IF (
DO i= 1 TO (DIM(items5_all)-1);
items5_all(i) ^=. OR ;
END;
DO i= DIM(items5_all);
items5_all(i) ^=.;
END;
)
THEN OUTPUT grade1;
/*-IF THEN STATEMENT THEN REPEATED FOR OTHER GRADES-*/
run;
I was hoping this would give me the equivalent of the long IF THEN statement above without having to type it. But of course it is non-functional.
Can you even use do loops within If statements (I haven't found any examples of this)?
Does anyone have any recommendations for how to accomplish this task?
I think if you only want to output any observation which contains data in any of specific fields, you can just do a sum of array. If any observation doesn't have value for a variable, the sum is empty so this observation will not be output. No loop is needed. Just like:
%let gr1_var = sc_132--sc_142; /*for array definition, you may use "--" or "-" */
%let gr2_var = sc_143 sc_151;
DATA grade1 grade2;
SET gradeall;
ARRAY grade1 [*] &gr1_var;
ARRAY grade2 [*] &gr2_var;
if sum(of grade1(*))^=. then output grade1;
if sum(of grade2(*))^=. then output grade2;
run;
By the way, if macro is used here, there is no need to write multiple if..then and array definition.
And I don't think you can use DO LOOP inside if..else statement like what you put here.
Related
I have code similar to this:
data input;
input yr $ lob $ type $
allow_P25 los_P25 adm_P25;
cards;
2019 Com AMB 205.4 3.56 3444
2019 Med DME 34.4 1.11 533
;
run;
data results;
length perc_type $15 perc_value 8;
set input;
array change _numeric_;
do over change;
perc_type = vname(change);
perc_value = change;
output results;
end;
run;
This code creates an array of all numeric variables. However, I now need to create an array of variables with names ending in P25
Is there a way to do it using wildcards? I found some solutions on the internet in which they used wildcards, but it always seemed to be at the end of the variable name. What if I want to use the wildcard at the beginning of the variable name? I tried this (obviously, wrong solution)
data results;
length perc_type $15 perc_value 8;
set input;
array change :P25;
do over change;
perc_type = vname(change);
perc_value = change;
output results;
end;
run;
As a workaround, you can match the suffix of the _numeric_ array using prxmatch(perl_regex, string) and perform action only if a match is found.
Code
data results;
length perc_type $15 perc_value 8;
set input;
array change _numeric_;
do over change;
/* match vnames ending with P25 */
if (prxmatch("/P25$/", vname(change)) > 0) then do;
/* do whatever you want */
perc_type = vname(change);
perc_value = change;
output results;
end;
end;
run;
Output
| Obs | perc_type | perc_value | yr | lob | type | allow_P25 | los_P25 | adm_P25 |
|-----|----------:|------------|------|-----|-----:|----------:|--------:|---------|
| 1 | allow_P25 | 205.40 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 2 | los_P25 | 3.56 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 3 | adm_P25 | 3444.00 | 2019 | Com | AMB | 205.4 | 3.56 | 3444 |
| 4 | allow_P25 | 34.40 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
| 5 | los_P25 | 1.11 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
| 6 | adm_P25 | 533.00 | 2019 | Med | DME | 34.4 | 1.11 | 533 |
Notes
SAS array does not work like other languages. A SAS array is a reference to a group of variables. Therefore, if there is no magic way, just get the built-in _numeric_ group directly at first and filter the variable names subsequently.
See also the official docs in SAS regex.
There is no direct SAS syntax that allows this. Though macros have been written to deal with problems like this one.
See a few at Roger's Github here.
I have a table with a number of variables such as:
+-----------+------------+---------+-----------+--------+
| DateFrom | DateTo | Price | Discount | Cost |
+-----------+------------+---------+-----------+--------+
| 01jan17 | 01jul17 | 17 | 4 | 5 |
| 01aug17 | 01feb18 | 15 | 1 | 3 |
| 01mar18 | 01dec18 | 12 | 2 | 1 |
| ... | ... | ... | ... | ... |
+-----------+------------+---------+-----------+--------+
However I want to split this so I have:
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| DateFrom1 | DateTo1 | Price1 | Discount1 | Cost1 | DateFrom2 | DateTo2 | Price2 | Discount2 | Cost2 ... |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| 01jan17 | 01jul17 | 17 | 4 | 5 | 01aug17 | 01feb18 | 15 | 1 | 3 |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
There's a cool (not at all obvious) solution using proc summary and the idgroup statement that only takes a few lines of code. This runs in memory and you're likely to come into problems if the dataset is large, otherwise this works very well.
Note that out[3] relates to the number of rows in the source data. You could easily make this dynamic by adding a prior step that calculates the number of rows and stores it in a macro variable.
/* create initial dataset */
data have;
input (DateFrom DateTo) (:date7.) Price Discount Cost;
format DateFrom DateTo date7.;
datalines;
01jan17 01jul17 17 4 5
01aug17 01feb18 15 1 3
01mar18 01dec18 12 2 1
;
run;
/* transform data into 1 row */
proc summary data=have nway;
output out=want (drop=_:)
idgroup(out[3] (_all_)=) / autoname;
run;
I have a table like this
MeterSizeGroup | WrenchTime | DriveTime
1,2,3 | | 7.843 || 5.099 |
I want to separate the comma delimited string into three rows as
MeterSizeGroup | WrenchTime | DriveTime
1 | | 2.614 | | 1.699 |
2 | | 2.614 | | 1.699 |
3 | | 2.614 | | 1.699 |
please help me how to write a query for this type of split it has to split in such a way that wrech time and driver time also has to be split by 3 enter image description here
I was wondering if you can help me with the following problem in spss syntax.
My dataset has nested structure.
Data are nested in companies, then each company has 1 or 2 bosses, but in this case I care only about boss 1. At a previous stage in time the boss graded the workers (not all of them). Now, the ID and the grade of the workers is on the row each worker.
I would like to move the information that was obtained during worker's assessment and create new sets of variables for each (worker ID and grade) on the line/row of the boss.
+---------+------+--------+--------------+---------+---------+--------+---------+
| company | boss |workerID|worker's grade|N:workID1|N:grade1 |N:work2 |N:grade2 |
+---------+------+--------+--------------+---------+---------+--------+---------+
| A | 1 | 1 | | 3 | A | 4 | A |
| A | 2 | 2 | | | |
| A | 0 | 3 | A | | |
| A | 0 | 4 | A | | |
| A | 0 | 5 | | | |
| B | 1 | 1 | | 3 | B | 4 | A |
| B | 0 | 2 | | | |
| B | 0 | 3 | B | | |
| B | 0 | 4 | A | | |
| C | 1 | 1 | | 2 | D | -1 | -1 |
| C | 0 | 2 | D | | |
I would like to move the worker's id and the grade that to the row of the boss in the NEW variables, without loosing the existing variables on workerID and worker's grade.
Basically, I will need to feed forward the information into the new variables and to the row of boss EQ 1 separately for each company.
I have no idea how to proceed with this. I assume that I need a loop that creates new variable for each worker ID that has a valid grade and then feeds forward the information from the worker's row to the boss' newly generated variables.
Any suggestions are very wellcome :-)
Take a look at VARSTOCASES (Data > Restructure)
I am having a logic issue in relation to querying an SQL database. I need to exclude 3 different categories and any item that is included in those categories; however, if an item under one of those categories meets the criteria for another category I need to keep said item.
This is an example output I will get after querying the database at its current version:
ExampleDB | item_num | pro_type | area | description
1 | 45KX-76Y | FLCM | Finished | coil8x
2 | 68WO-93H | FLCL | Similar | y45Kx
3 | 05RH-27N | FLDR | Finished | KH72n
4 | 84OH-95W | FLEP | Final | tar5x
5 | 81RS-67F | FLEP | Final | tar7x
6 | 48YU-40Q | FLCM | Final | bile6
7 | 19VB-89S | FLDR | Warranty | exp380
8 | 76CS-01U | FLCL | Gator | low5
9 | 28OC-08Z | FLCM | Redo | coil34Y
item_num and description are in a table together, and pro_type and area are in 2 separate tables--a total of 3 tables to pull data from.
I need to construct a query that will not pull back any item_num where area is equal to: Finished, Final, and Redo; but I also need to pull in any item_num that meets the type criteria: FLCM and FLEP. In the end my query should look like this:
ExampleDB | item_num | pro_type | area | description
1 | 45KX-76Y | FLCM | Finished | coil8x
2 | 68WO-93H | FLCL | Similar | y45Kx
3 | 84OH-95W | FLEP | Final | tar5x
4 | 81RS-67F | FLEP | Final | tar7x
5 | 19VB-89S | FLDR | Warranty | exp380
6 | 76CS-01U | FLCL | Gator | low5
7 | 28OC-08Z | FLCM | Redo | coil34Y
Try this:
select * from table
join...
where area not in('finished', 'final', 'redo') or type in('flcm', 'flep')
Are you looking for something like
SELECT *
FROM Table_1
JOIN Table_ProType ON Table_1.whatnot = Table_ProType.whatnot
JOIN Table_Area ON Table_1.whatnot = Table_Area.whatnot
WHERE Table.area NOT IN ('Finished','Final','Redo') OR ProType.pro_type IN ('FLCM','FLEP')
Giving the names of the three tables and the joining criteria will help me improve the answer.