Trouble with reading datasets with read_array in proc fcmp - arrays

I'm trying to read dataset that has 4030 observations and 23 variables. I'm doing that in proc fcmp, using read_array (...) statement.
Most of the variables have character type, but when I'm trying to read the code:
proc fcmp;
array a[&Numobs., &Nvar.] / NOSYMBOLS ;
rcl = read_array ("input", a);
res = write_array ('output', a);
quit;
I get error for every character variable:
ERROR: Column "Variable2" in data set "WORK.input" is not numeric in
function READ_ARRAY.
Does read_arrray work only for numeric variables? What am I doing wrong?
(the rest of my code is simple, and I'm sure it's correct).
I am using SAS Enterprise Guide 4.3.

In SAS all variables in an array must be of the same data type. Your Variable1 is probably numeric, Variable2 is character.

Read_array and write_array are numeric only. By default you're reading in all columns, but you can specify which columns you're interested in using quoted strings.

Related

SAS Help: Using Index function to compare 2 columns

I want to compare string value of A and B by using the index function. I want to check if A contains B in its column. The only way I know how to do it is Index but the problem is index doesn't allow column name in its parameters. You have to enter a string value.
Tried this: index(Address, HouseNumber)>0 but it doesn't work.
Example:
Address HouseNumber
123 Road Road
So I want to see if Address column contains House number value in its field. It wont be a direct match but just want to check if A contains the string. I think using a macro variable or array is the solution but I do not know how to do it.
You need to account for the padding that SAS does since all variables are fixed length.
data have ;
length Address HouseNumber $50;
infile cards dsd dlm='|';
input address housenumber ;
cards;
123 Road|Road
;;;;
data want ;
set have ;
if index(address,strip(HouseNumber));
run;
This works - is it what you're trying to do??
data _null_;
a = '52 Festive Rd';
b = 'Festive';
if index(a,b) then put 'yes';
else put 'no';
run;

SAS / Using an array to convert multiple character variables to numeric

I am a SAS novice. I am trying to convert character variables to numeric. The code below works for one variable, but I need to convert more than 50 variables, hopefully simultaneously. Would an array solve this problem? If so, how would I write the syntax?
DATA conversion_subset;
SET have;
new_var = input(oldvar,4.);
drop oldvar;
rename newvar=oldvar;
RUN;
#Reeza
DATA conversion_subset;
SET have;
Array old_var(*) $ a_20040102--a_20040303 a_302000--a_302202;
* The first list contains 8 variables. The second list contains 7 variables;
Array new_var(15) var1-var15;
Do i=1 to dim(old_var);
new_var(i) = input(old_var(i),4.);
End;
*drop a_20040102--a_20040303 a_302000--a_302202;
*rename var1-var15 = a_20040102--a_20040303 a_302000--a_302202;
RUN;
NOTE: Invalid argument to function INPUT at line 64 column 19
(new_var(i) = input(old_var(i),4.)
#Reeza
I am still stuck on this array. Your help would be greatly appreciated. My code:
DATA conversion_subset (DROP= _20040101 _20040201 _20040301);
SET replace_nulls;
Array _char(*) $ _200100--_601600;
Array _num(*) var1-var90;
Do i=1 to dim(_char);
_num(i) = input(_char(i),4.);
End;
RUN;
I am receiving the following error: ERROR: Array subscript out of range at line 64 column 6. Line 64 contains the input statement.
Yes, an array solves this issue. You will want a simple way to list the variables so look into SAS variable lists as well. For example if your converting all character variables between first and last you could list them as first_var-character-last_var.
The rename/drop are illustrated in other questions across SO.
DATA conversion_subset;
SET have;
Array old_var(50) $ first-character-last;
Array new_var(50) var1-var50;
Do i=1 to 50;
new_var(i) = input(oldvar(i),4.);
End;
RUN;
As #Parfait suggests, it would be best to adjust it when you are getting it, rather than after it is already in a SAS data set. However, if you're given the data set and have to convert that, that's what you have to do. You can add a WHERE clause to the PROC SQL to exclude variables that should not be converted. If you do so, they won't be in the final data set unless you add them in the CREATE TABLE's SELECT clause.
PROC CONTENTS DATA=have OUT=havelist NOPRINT ;
RUN ; %* get variable names ;
PROC SQL ;
SELECT 'INPUT(' || name || ',4.) AS ' || name
INTO :convert SEPARATED BY ','
FROM havelist
; %* create the select statement ;
CREATE TABLE conversion_subset AS
SELECT &convert
FROM have
;
QUIT ;
If excluding variables is an issue and/or you want to use a DATA step, then use the PROC CONTENTS above and follow with:
PROC SQL ;
SELECT COMPRESS(name || '_n=INPUT(' || name || ',4.)'),
COMPRESS(name || '_n=' || name),
COMPRESS(name)
INTO :convertlst SEPARATED BY ';',
:renamelst SEPARATED BY ' ',
:droplst SEPARATED BY ' '
FROM havelist
;
QUIT ;
DATA conversion_subset (RENAME=(&renamelst)) ;
SET have ;
&convertlst ;
DROP &droplst ;
RUN ;
Again, add a where clause to exclude variables that should not be converted. This will automatically preserve any variables that you exclude from conversion with a WHERE in the PROC SQL SELECT.
If you have too many variables, or their names are very long, or adding _n to the end causes a name collision, things can go badly (too much data for a macro variable, illegal field name, one field overwriting another, respectively).

SAS how to create variable with corresponding values efficiently

I am trying to complete the following.
Variable Letter has three values (a, b, c). I would like to create a variable Letter_2 with values corresponding to the values of Letter, namely (1, 2, 3).
I know I can do this using three IF Then statements.
if Letter='a' then Letter_2='1';
if Letter='b' then Letter_2='2';
if Letter='c' then Letter_2='3';
Suppose I have 15 values for the variable Letter, and 15 corresponding values for the replacement. Is there a way to do it efficiently without typing the same If Then statement 15 times?
I am new to SAS. Any clue will be appreciated.
Lisa
Looks like an application for a FORMAT.
First define the format.
proc format ;
value $lookup 'a'='1' 'b'='2' 'c'='3' ;
run;
Then use it to re-code your variable.
data want;
set have;
letter2 = put(letter,$lookup.);
run;
Or perhaps you could use two temporary arrays and the WHICHC() function?
data have;
input letter $10. ;
cards;
car
apple
box
;;;;
data want ;
set have ;
array from (3) $10 _temporary_ ('apple','box','car');
array to (3) $10 _temporary_ ('first','second','third');
if whichc(letter,of from(*)) then
letter_2 = to(whichc(letter,of from(*)))
;
run;

SAS Put Certain Row as New Variable Names After Manipulation

After importing my CSV data with GETNAMES=NO, I have 59 columns with variable names VAR1, VAR2, . . . VAR59. My first row contains the names I need for the new variables, but they first needed manipulated by removing special characters and turning spaces into underscores since SAS doesn't like spaces in variable names. This is the array I used for that piece:
DATA DATA1; SET DATA (FIRSTOBS=7);
ARRAY VAR(59) VAR1-VAR59;
IF _N_ = 1 THEN DO;
DO I = 1 TO 59;
VAR[I] = COMPRESS(TRANSLATE(TRIM(VAR[I]),'_',' '),'?()');
PUT VAR[I]=;
END;
END;
DROP I;
RUN;
This worked perfectly, but now I need to get this first row up to the new variable names. I tried a similar array to perform this:
DATA DATA2; SET DATA1;
ARRAY V(59) VAR1-VAR59;
DO I = 1 TO 59;
IF _N_ = 1 AND V[I] NE "" THEN CALL SYMPUT("NEWNAME",V[I]);
RENAME VAR[I] = &NEWNAME;
END;
DROP I;
RUN;
This only puts the name of VAR59 since there is no [i] connected to the &NEWNAME, and it still isn't working quite right. Any suggestions to moving a row up to variable names AFTER manipulation?
Your primary problem is you are trying to use a macro variable in the data step it's created in. You can't. You're also trying to create rename statements in the data step; rename, as with other similar statements (keep, drop), must be defined before the data step is compiled.
You need to write code somewhere - either in a text file, a macro variable, whatever - with this information. For example:
filename renamef temp;
data _null_;
set myfile (obs=1);
file renamef;
array var[59];
do _i = 1 to dim(Var);
[your code to clean it out];
strput = cat("rename",vname(var[_i]),'=',var[_i],';');
put strput;
end;
run;
data want;
set myfile (firstobs=2);
%include renamef;
run;
There are lots of other examples to this on the site and on the web, "list processing" is the term for this.
Joe -- using your suggestions and another one of your posts, the following worked flawlessly:
Put the row of needed variables into long format (in my case, first row so n = 1)
DATA NEWVARS; SET DATA;
IF _N_ = 1 THEN OUTPUT NEWVARS;
RUN;
PROC TRANSPOSE DATA = NEWVARS OUT=NEWVARS1;
VAR _ALL_;
RUN;
Create a list of rename macro calls.
PROC SQL;
SELECT CATS('%RENAME(VAR=',_NAME_,',NEWVAR=',COL1,')')
INTO :RENAMELIST SEPARATED BY ' '
FROM NEWVARS1;
QUIT;
%MACRO RENAME(VAR=,NEWVAR=);
RENAME &VAR.=&NEWVAR.;
%MEND RENAME;
Call in the list created in Step 2 to rename all variables.
PROC DATASETS LIB=WORK NOLIST;
MODIFY DATA;
&RENAMELIST.;
QUIT;
I had to perform a few additional checks making sure that the variable names were not greater than 32 characters, and this was easy to check for when the data was in long format after transposing. If there are certain words that make the lengths too long, a TRANWRD statement can easily replace them with abbreviations.

SAS use value from one observation to overwrite different one

I have a data set with two main variables of interest now - Major and Major_Code. These should match up 1 to 1 but there are some errors I need to fix and what I've found is that for 14 Major_Code values, there are two different Majors. This is only due to a change in spelling or punctuation, like "ed." and "education". They are supposed to have the same value here but don't.
So I have a table with 7 pairs. Each pair has the same Major_Code and different a Major. How can I select one of the Major vales to use for each code? My only idea was through an if-then statement but that seems horribly inefficient.
I found the doubled values like this:
proc freq data=majorslist;
tables Major_Code/out=majorcodedups;
run;
proc print data=majorcodedups;
where COUNT > 1;
run;
So I can easily find these observations but can't extract certain values to overwrite onto another observation. I've looked into arrays, macros, sql and transpose but it's all a bit over my head right now.
Logically it would work like this:
from obs i to n, find value for variable x at obs i, output value onto variable y at obs i, go to obs(i+1) and repeat.
Assuming you have some rule for determining which MAJOR is correct for a MAJOR_CODE, you should do this:
This assumes majorslist is a dataset of every major/major_code pair whether unique or not - but only one per major/major_code pair.
proc sort data=majorslist;
by major_code major;
run;
data majorslist_unique;
set majorslist;
by major_code major;
if first.major_code and last.major_code then output;
else do;
*rule to determine whether to output it or not;
end;
run;
So, you now have the major_code/major relationship. Let's say you picked if first.major_code then output; as your rule (ie, take the major_code with the alphabetically first major value).
Now, you need to apply this to your larger dataset. There are a lot of ways to do that - merge this on is one, format is another, for starters. Format works like this:
Create a dataset with FMTNAME, START, LABEL defined. For each value of MAJOR_CODE, construct one row like that, where START is MAJOR_CODE and LABEL is MAJOR. We'll also add an extra line that says what to do with non-matches (in case you get new values of major_code).
data for_fmt;
set majorslist_unique;
fmtname='MAJORF'; *add a $ if MAJOR_CODE is a character variable;
start=major_code;
label=major;
output;
if _n_=1 then do;
hlo='o';
call missing(start);
label='NONMATCHED';
output;
end;
keep fmtname start label hlo;
run;
proc format cntlin=for_fmt;
quit;
Now you have a format, MAJORF. (or $MAJORF. if MAJOR_CODE is character), that you can use in a PUT statement.
data my_bigdata2;
set my_bigdata;
major = put(major_code,MAJORF.);
run;

Resources