SAS: Using an array to transpose a table - arrays

I'm attempting to recreate some code which does the opposite of what is on page 9-10 on http://support.sas.com/resources/papers/proceedings10/158-2010.pdf. So instead of making a table go from wide to long, I'd like it to become long to wide.
Id Col1 Col2
1 Val1 A
1 Val2 B
2 Val1 C
2 Val3 D
3 Val2 E
Transposes to:
Id X_Val1 X_Val2 X_Val3
1 A B .
2 C . D
3 . . E
Any ideas on how I should be going about this? I know I should be using an array and trying to create a new column X_Val1 where X_Val1 = cat('X',Val1) where X is some string.

You need to first figure out how many variables you need. Then you can create the variables, use an array, and assign the values.
data test;
input id col1 $ col2 $;
datalines;
1 Val1 A
1 Val2 B
2 Val3 C
2 Val4 D
2 Val5 E
;
run;
/*Need to get the number of variables that need to be created*/
proc sql noprint;
select max(c)
into :arr_size
from
( select ID, count(*) as c
from test
group by id
);
quit;
/*Get rid of leading spaces*/
%let arr_size=%left(&arr_size);
%put &arr_size;
data test_t;
set test;
by id;
/*Create the variables*/
format SOME_X1 - SOME_X&arr_size $8.;
/*Create an array*/
array SOME_X[&arr_size];
/*Retain the values*/
retain count SOME_X:;
if first.id then do;
count = 0;
do i=1 to &arr_size;
SOME_X[i] = "";
end;
end;
count = count + 1;
SOME_X[count] = col2;
if last.id then
output;
keep id SOME_X:;
run;

I have no idea why you would want to do this with anything other than PROC TRANSPOSE.
proc transpose data=have out=want prefix='X_';
by id;
id col1;
var col2;
run;

Related

SAS proc freq of multiple tables into a single one

I have the following dataset in SAS;
City grade1 grade2 grade3
NY A. A. A
CA. B. A. C
CO. A. B. B
I would "combine" the three variables grades and get a proc freq that tells me the number of grades for each City; the expected output should therefore be:
A. B. C
NY 3. 0. 0
CA. 1. 1. 1
CO. 1. 2. 0
How could I do that in SAS?
Quite a few steps but it gives the expected result.
*-- Creating sample data --*;
data have;
infile datalines delimiter="|";
input City $ grade1 $ grade2 $ grade3 $;
datalines;
NY|A|A|A
CA|B|A|C
CO|A|B|B
;
*-- Sorting in order to use the transpose procedure --*;
proc sort data=have; by city; quit;
*-- Transposing from wide to tall format --*;
proc transpose data=have out=stage1(rename=(col1=grade) drop= _name_);
by city;
var grade:;
run;
*-- Assigning a value to 1 for each record for later sum --*;
data stage2;
set stage1;
val = 1;
run;
*-- Tabulate to create val_sum --*;
ods exclude all; *turn off default tabulate print;
proc tabulate data=stage2 out=stage3;
class city grade;
var val;
table city,grade*sum=''*val='';
run;
ods select all; *turn on;
*-- Transpose back using val_sum --*;
proc transpose data=stage3 out=stage4(drop=_name_);
by city;
id grade;
var val_sum;
run;
*-- Replace missing values by 0 to achieve desired output --*;
proc stdize data=stage4 out=want reponly missing=0;run;
City A B C
CA 1 1 1
CO 1 2 0
NY 3 0 0
In general:
Transpose data to a long format
Use PROC FREQ with the SPARSE option to generate the counts
Save the output from PROC FREQ to a data set
Transpose the output from PROC FREQ to the desired output format
*create sample data;
data have;
input City $ grade1 $ grade2 $ grade3 $;
cards;
NY A A A
CA B A C
CO A B B
;;;;
*sort;
proc sort data=have; by City;run;
*transpose to long format;
proc transpose data=have out=want1 prefix=Grade;
by City;
var grade1-grade3;
run;
*displayed output and counts;
proc freq data=want1;
table City*Grade1 / sparse out=freq norow nopercent nocol;
run;
*output table in desird format;
proc transpose data=freq out=want2;
by city;
id Grade1;
var count;
run;
Here is a way to do it in two steps: a sort step and a data step.
proc sort data=have; by city; run;
data count (drop grade1-grade3);
set have;
* create an array of all your grades;
array grade(3) 3 grade1-grade3;
by city;
*set the count to zero for each city;
if first.city then do;
A = 0;
B = 0;
C = 0;
end;
* use a do loop to count the grades;
do i = 1 to 3;
if grade(i) = 'A' then A + 1;
else if grade(i) = 'B' then B + 1;
else if grade(i) = 'C' then C + 1;
end;
run;

Creating loop for proc freq in SAS

I have the following data
DATA HAVE;
input yr_2001 yr_2002 yr_2003 area;
cards;
1 1 1 3
0 1 0 4
0 0 1 3
1 0 1 6
0 0 1 4
;
run;
I want to do the following proc freq for variable yr_2001 to yr_2003.
proc freq data=have;
table yr_2001*area;
where yr_2001=1;
run;
Is there a way I can do it without having to repeat it for each year, may be using a loop for proc freq??
Two ways:
1. Transpose it
Add a counter variable to your data, n, and transpose it by n area, then only keep values where the year flag is equal to 1. Because we set an index on the transposed group year, we do not need to re-sort it before doing by-group processing.
data have2;
set have;
n = _N_;
run;
proc transpose data=have
name=year
out=have2_tpose(rename = (COL1 = year_flag)
where = (year_flag = 1)
index = (year)
drop = n
);
by n area;
var yr_:;
run;
proc freq data=have2_tpose;
by year;
table area;
run;
2. Macro loop
Since they all start with yr_, it will be easy to get all the variable names from dictionary.columns and loop over all the variables. We'll use SQL to read the names into a |-separated list and loop over that list.
proc sql noprint;
select name
, count(*)
into :varnames separated by '|'
, :nVarnames
from dictionary.columns
where memname = 'HAVE'
AND libname = 'WORK'
AND name LIKE "yr_%"
;
quit;
/* Take a look at the variable names we found */
%put &varnames.;
/* Loop over all words in &varnames */
%macro freqLoop;
%do i = 1 %to &nVarnames.;
%let varname = %scan(&varnames., &i., |);
title "&varname.";
proc freq data=have;
where &varname. = 1;
table &varname.*area;
run;
title;
%end;
%mend;
%freqLoop;

Lookup table using hash on multiple (>50) columns

I am working with a table with more than 50 columns. I am trying to replace the value of multiple columns using a lookup table.
Table:
data have;
infile datalines delimiter=",";
input ID $1. SUB_ID :$2. COUNTRY :$2. A $1. B $1.;
datalines;
1,A,FR,A,B
2,B,CH,,B
3,C,DE,B,A
4,D,CZ,,B
5,E,GE,A,
6,F,EN,B,
7,G,US,,A
;
run;
Lookup table:
data lookup;
infile datalines delimiter=",";
input value_before $1. value_after :$2.;
datalines;
A,1
B,2
C,3
;
run;
Actual code:
data want;
if 0 then set lookup;
if _n_ = 1 then do;
declare hash lookup(dataset:'lookup');
lookup.defineKey('value_before');
lookup.defineData('value_after');
lookup.defineDone();
end;
set have;
if (lookup.find(key:A) = 0) then
A = value_after;
if (lookup.find(key:B) = 0) then
B = value_after;
/* ... */
/* if (lookup.find(key:Z) = 0) then
Z = value_after; */
drop value_before value_after;
run;
I guess this code would do the job if I would hardcode the 50 columns.
I wonder if there is a way to "apply" the hash.find() to all variables except the first three (ID, SUB_ID and Country) (maybe by indexing ?) without having to hardcode them or to use macros. For the sake of example I only computed 2 variables to replace the value (A and B) but there are more than 50 (with really different names and no pattern like var1,var2,...,varn).
In cases like this, I like to use proc sql and the dictionary table to fill in the column names for me to create an array. The below code will pull the variable names from dictionary.columns and save them as space-delimited into the macro variable varnames. We can feed this into an array and then use array logic to do the rest.
proc sql noprint;
select name
into :varnames separated by ' '
from dictionary.columns
where libname = 'WORK'
AND memname = 'HAVE'
AND name NOT IN('ID', 'SUB_ID', 'COUNTRY')
;
quit;
data want;
if 0 then set lookup;
if _n_ = 1 then do;
declare hash lookup(dataset:'lookup');
lookup.defineKey('value_before');
lookup.defineData('value_after');
lookup.defineDone();
end;
set have;
array vars[*] &varnames.;
do i = 1 to dim(vars);
if lookup.Find(key:vars[i])=0 then vars[i] = value_after;
end;
drop value_before value_after i;
run;

Transpose a correlation matrix into one long vector in SAS

I'm trying to turn a correlation matrix into one long column vector such that I have the following structure
data want;
input _name1_$ _name2_$ _corr_;
datalines;
var1 var2 0.54
;
run;
I have the following code, which outputs name1 and corr; however, I'm struggling to get name2!
DATA TEMP_1
(DROP=I J);
ARRAY VAR[*] VAR1-VAR10;
DO I = 1 TO 10;
DO J = 1 TO 10;
VAR(J) = RANUNI(0);
END;
OUTPUT;
END;
RUN;
PROC CORR
DATA=TEMP_1
OUT=TEMP_CORR
(WHERE=(_NAME_ NE " ")
DROP=_TYPE_)
;
RUN;
PROC SORT DATA=TEMP_CORR; BY _NAME_; RUN;
PROC TRANSPOSE
DATA=TEMP_CORR
OUT=TEMP_CORR_T
;
BY _NAME_;
RUN;
Help is appreciated
You're close. You're running into a weird issue with the name variable because that becomes a variable out of PROC TRANSPOSE as well. If you rename it, you get what you want. I also list the variables explicitly and add some RENAME data set options to get what you likely want.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr))
;
by name1;
var var1-var10;
RUN;
Edit: If you don’t want duplicates you can add a WHERE to the OUT dataset.
PROC TRANSPOSE
DATA=TEMP_CORR (rename=_name_ = Name1)
OUT=TEMP_CORR_T (rename = (_name_ = Name2 col1=corr) where = name1 > name2)
;
by name1;
var var1-var10;
RUN;
Just an ARRAY with VNAME() function. To just output the upper triangle set lower bound of DO loop to _N_.
data want ;
length _name1_ _name2_ $32 _corr_ 8 ;
keep _name1_ _name2_ _corr_;
set corr;
where _type_ = 'CORR';
array x _numeric_;
_name1_=_name_;
do i=_n_ to dim(x);
_name2_ = vname(x(i));
_corr_ = x(i);
output;
end;
run;

Filling missing values for many variables from previous observation by group in SAS

My dataset looks like this:
Date ID Var1 Var2 ... Var5
200701 1 x .
200702 1 . a
200703 1 . .
200701 2 . b
200702 2 y b
200703 2 y .
200702 3 z .
200703 3 . .
I want my results to look like this:
Date ID Var1 Var2 ... Var5
200701 1 x .
200702 1 x a
200703 1 x a
200701 2 . b
200702 2 y b
200703 2 y b
200702 3 z .
200703 3 z .
I tried the following code below, but it didn't work. What's wrong with it?
Am I better off using array? If so, how?
%macro a(variable);
length _&variable $10.;
retain _&variable;
if first.ID then _&variable = '';
if &variable ne '' then _&variable=&variable;
else if &variable = '' then &variable=_&variable;
drop _&variable;
%mend;
data want;
set have;
%a(Var1)
%a(Var2)
%a(Var3)
%a(Var4)
%a(Var5)
run;
Appreciate the help! Thanks!
The UPDATE statement can do that. It is intended to process transactions against a master dataset so when the transaction value is missing the current value from the master table is left unchanged. You can use your single dataset as both the master and the transaction data by adding OBS=0 dataset option. Normally it will expect to output only one observation per BY group, but if you add an OUTPUT statement you can have it output all of the observations.
data want;
set have(obs=0) have ;
by id;
output;
run;
The full code works! Thanks
%macro a(variable);
length _&variable $10.;
retain _&variable;
if first.ID then _&variable = '';
if &variable ne '' then _&variable=&variable;
else if &variable = '' then &variable=_&variable;
drop _&variable;
%mend;
data want;
update have(obs=0) have;
by id;
output;
%a(Var1)
%a(Var2)
%a(Var3)
%a(Var4)
%a(Var5)
run;

Resources