create "pair key=value" file with SAS datastep - file

I have to create a file from a dataset that is JSON style but without CR between each variable.
All variables have to be on the same line.
I would like to have something like that :
ID1 "key1"="value1" "key2"="value2" .....
Each key is a column of a dataset.
I work this SAS 9.3 on UNIX.
Sample :
I have
ID Name Sex Age
123 jerome M 30
345 william M 26
456 ingrid F 25`
I would like
123 "Name"="jerome" "sex"="M" "age"="30"
345 "Name"="william" "sex"="M" "age"="26"
456 "Name"="ingrid" "sex"="F" "age"="25"
Thanks

If your data looked like this...
Obs Name _NAME_ COL1
1 Alfred Name Alfred
2 Alfred Sex M
3 Alfred Age 14
4 Alfred Height 69
5 Alfred Weight 112.5
6 Alice Name Alice
7 Alice Sex F
8 Alice Age 13
9 Alice Height 56.5
10 Alice Weight 84
11 Barbara Name Barbara
12 Barbara Sex F
13 Barbara Age 13
14 Barbara Height 65.3
15 Barbara Weight 98
16 Carol Name Carol
17 Carol Sex F
18 Carol Age 14
19 Carol Height 62.8
20 Carol Weight 102.5
21 Henry Name Henry
22 Henry Sex M
23 Henry Age 14
24 Henry Height 63.5
25 Henry Weight 102.5
You could use code like this to write the value pairs. Assuming this is what you're talking about.
189 data _null_;
190 do until(last.name);
191 set class;
192 by name;
193 col1 = left(col1);
194 if first.name then put name #;
195 put _name_:$quote. +(-1) '=' col1:$quote. #;
196 end;
197 put;
198 run;
Alfred "Name"="Alfred" "Sex"="M" "Age"="14" "Height"="69" "Weight"="112.5"
Alice "Name"="Alice" "Sex"="F" "Age"="13" "Height"="56.5" "Weight"="84"
Barbara "Name"="Barbara" "Sex"="F" "Age"="13" "Height"="65.3" "Weight"="98"
Carol "Name"="Carol" "Sex"="F" "Age"="14" "Height"="62.8" "Weight"="102.5"
Henry "Name"="Henry" "Sex"="M" "Age"="14" "Height"="63.5" "Weight"="102.5"
NOTE: There were 25 observations read from the data set WORK.CLASS.

Consider these non-transposing variations:
Actual JSON, use Proc JSON
data have;input
ID Name $ Sex $ Age; datalines;
123 jerome M 30
345 william M 26
456 ingrid F 25
run;
filename out temp;
proc json out=out;
export have;
run;
* What hath been wrought ?;
data _null_; infile out; input; put _infile_; run;
----- LOG -----
{"SASJSONExport":"1.0","SASTableData+HAVE":[{"ID":123,"Name":"jerome","Sex":"M","Age":30},{"ID":345,"Name":"william","Sex":"M","Age":26},{"ID":456,"Name":"ingrid","Sex":"F","Age":25}]}
A concise name-value pair output of the variables using the PUT statement specification syntax (variable-list) (format-list), using _ALL_ for the variable list and = for the format.
filename out2 temp;
data _null_;
set have;
file out2;
put (_all_) (=);
run;
data _null_;
infile out2; input; put _infile_;
run;
----- LOG -----
ID=123 Name=jerome Sex=M Age=30
ID=345 Name=william Sex=M Age=26
ID=456 Name=ingrid Sex=F Age=25
Iterate the variables using the VNEXT routine. Extract the formatted values using VVALUEX function, and conditionally construct the quoted name and value parts.
filename out3 temp;
data _null_;
set have;
file out3;
length _name_ $34 _value_ $32000;
do _n_ = 1 by 1;
call vnext(_name_);
if _name_ = "_name_" then leave;
if _n_ = 1
then _value_ = strip(vvaluex(_name_));
else _value_ = quote(strip(vvaluex(_name_)));
_name_ = quote(trim(_name_));
if _n_ = 1
then put _value_ #;
else put _name_ +(-1) '=' _value_ #;
end;
put;
run;
data _null_;
infile out3; input; put _infile_;
run;
----- LOG -----
123 "Name"="jerome" "Sex"="M" "Age"="30"
345 "Name"="william" "Sex"="M" "Age"="26"
456 "Name"="ingrid" "Sex"="F" "Age"="25"

Related

SAS: Adding value after last entry in column

I have the following data set:
Student TestDay Score
001 1 85
001 6 76
001 7 89
002 1 92
002 5 82
002 7 93
I'd like to add a '100' value after the last non-empty value in the column 'Score', as well as add one to the value of TestDay. So the new data would look like the following:
Student TestDay Score
001 1 85
001 6 76
001 7 89
001 8 100
002 1 92
002 5 82
002 7 93
002 8 100
No need for arrays or loops.
data want;
set have;
by student;
output;
if last.student then do;
score=100;
testday=testday+1;
output;
end;
run;

Modifying final column value in SAS by group

I have the following data set:
Student TestDayStart TestDayEnd
001 1 5
001 6 10
001 11 15
002 1 4
002 5 9
002 10 14
I would like to make the last 'TestDayEnd' the final value for 'TestDayStart' for each Student.
So the data should look like this:
Student TestDayStart TestDayEnd
001 1 5
001 6 10
001 11 15
001 15 15
002 1 4
002 5 9
002 10 14
002 14 14
I'm not quite sure how I can do this in SAS. Any insight would be appreciated.
After sorting the dataset you can do this within a data step.
proc sort data=have;
by student testdaystart testdayend;
run;
Now you can use the by and retain statements in the data step. The by statement allows you to find the last student, and the retain statement lets you keep the previous value in the dataset.
data want;
set have;
retain last_testdayend;
by student testdaystart testdayend;
output;
last_testdayend = testdayend;
if last.student then do;
if testdaystart ne testdayend then do;
testdaystart = last_testdayend;
testdayend = last_testdayend;
output; * this second output statement creates a new record in the dataset;
end;
end;
drop last_testdayend;
run;

SAS use a lookup dataset like array in another dataset

I have 1 data set with content description for a school
contents:
num description
content1 math
content2 spanish
content3 geography
content4 chemistry
content5 history
in another data set (students) i have the array content1-content5 and i use a flag to indicate content that have each student.
students
name age content1 content2 content3 content4 content5
BOB 15 1 1 1 1
BRYA 16
CARL 15 1 1
SUE 17 1 1 1
LOU 15 1
if i use a code like this:
data students1;
set students;
array content[5];
format allcontents $100.;
do i=1 to dim(content);
if content[i]=1 then do;
allcontents=cat(vname(content[i]),',',allcontents);
end;
end;
run;
the result is:
name age content1 content2 content3 content4 content5 allcontents
BOB 15 1 1 1 1 content1,content2,content3,content5,
BRYA 16
CARL 15 1 1 content2,content5,
SUE 17 1 1 1 content3,content4,content5,
LOU 15 1 content5
1) i want to use the name of the lookup table (data set contents) to use the name of the content and not the arrays names of content[1-5] in the variable allcontents. how can i do that?
2) and later i want the result by content description, not by student, like this:
description name age
math BOB 15
spanish BOB 15
geography BOB 15
history BOB 15
spanish CARL 15
history CARL 15
spanish SUE 17
chemistry SUE 17
history SUE 17
history LOU 15
is it possible?
thanks.
First, grab the %create_hash() macro from this post.
Use the hash table to look up the values.
data students1;
set students
array content[5];
format num $32. description $16.;
if _n_ = 1 then do;
%create_hash(cnt,num,description,"contents");
end;
do i=1 to 5;
if content[i]=1 then do;
num = vname(content[i]);
rc = cnt.find();
output;
end;
end;
keep description name age;
run;
I find proc transpose suitable. Doing once is enough for question 2) and twice for renaming the variables contents1-5 (hence question 1). The key is the ID statement in proc transpose which automatically rename variables by their corresponding transposed orders.
The code below should give you the desired answers (albeit the name are ordered alphabetically, which may not be the same as your original ordering).
/* original data sets */
data names;
input num $ description $;
cards;
content1 math
content2 spanish
content3 geography
content4 chemistry
content5 history
;run;
data students;
input name $ age content1 content2 content3 content4 content5;
cards;
BOB 15 1 1 1 . 1
BRYA 16 . . . . .
CARL 15 . 1 . . 1
SUE 17 . . 1 1 1
LOU 15 . . . . 1
;run;
/* transpose */
proc sort data=students out=tmp_sorted;
by name age;
run;
proc transpose data=tmp_sorted out=tmp_transposed;
by name age;
run;
/* merge the names of content1-5 */
* If you want to preserve ordering from contents1-contents5
* instead of alphabetical ordering of "description" column
* from a-z, do not drop the "num" column for further use.;
proc sql;
create table tmp_merged as
select B.description, A.name, A.age, B.num, A.COL1
from tmp_transposed as A
left join names as B
on A._NAME_=B.num
order by A.name, B.num;
quit;
/* transpose again */
proc transpose data=tmp_merged(drop=num) out=tmp_renamed(drop=_name_);
by name age;
ID description; *name the transposed variables;
run;
/* answer (1) */
data ans1;
set tmp_renamed;
array content[5] math--history;
format allcontents $100.;
do i=1 to dim(content);
* better use cats (cat does not seem to work);
if content[i]=1 then allcontents=cats(allcontents,',',vname(content[i]));
end;
*kill the leading comma;
allcontents=substr(allcontents,2,99);
run;
/* answer (2) */
data ans2(drop=num col1);
set tmp_merged;
where col1=1;
run;
*cleanup;
proc datasets lib=work nolist;
delete tmp_:;
quit;

SAS combining datasets, binary search, indices

In SAS, for the two test datasets below - for every value of "amount" that falls within "y" and "z", I need to extract the corresponding "x". There could be multiple values of "x" that fit into the criteria.
The final result should look something like this:
/*
4 banana eggs
15 .
31 .
7 banana
22 fig
1 eggs
11 coconut
17 date
41 apple
*/
I realize this relies on using indices or binary searches but I can't figure out a workable solution! Any help would appreciated! Thanks!
data test1;
input x $ y z;
datalines;
apple 29 43
banana 2 7
coconut 9 13
date 17 20
eggs 1 5
fig 18 26
;
run;
data test2;
input amount;
datalines;
4
15
31
7
22
1
11
17
41
;
run;
Join the two datasets so amount falls between y and z.
proc sql;
create table join as
select a.amount
,b.*
from test2 a
left join
test1 b
on a.amount between b.y and b.z;
quit;
Sort the result by amount for transpose.
proc sort data=join; by amount; run;
Transpose it.
proc transpose data=join out=trans;
by amount;
var x;
run;
Now you have your fruits each in its own variable named col1, col2, ....
If you want them all in one variable separated by a blank, just concatenate them.
data trans2(keep= amount text);
set trans(drop=_name_);
array v{*} _character_;
text = catx(' ', of v{*});
run;
Here is a possible solution using "old-fashioned" data step code plus PROC TRANSPOSE:
data test1;
input x $ y z;
datalines;
apple 29 43
banana 2 7
coconut 9 13
date 17 20
eggs 1 5
fig 18 26
run;
data test2;
input amount;
datalines;
4
15
31
7
22
1
11
17
41
run;
data want(keep=amount x);
set test2;
found = 0;
do _i_=1 to nobs;
set test1 point=_i_ nobs=nobs;
if y <= amount <= z then do;
found = 1;
output;
end;
end;
if not found then do;
x = ' ';
output;
end;
run;
proc transpose data=want out=want2(drop=_name_);
by amount notsorted;
var x;
run;
Note my results do not match that in your example; amount 31 is an "apple".

parsing a text file in sas

So I have a rather messy text file I'm trying to convert to a sas data set. It looks something like this (though much bigger):
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
I'm trying to set 5 separate variables (ID, name, job, time, hours) but clearly only 3 of the variables appear after the first line. I tried this:
infile "C:\Users\Desktop\jobs.txt" dlm = ' ' dsd missover;
input ID $ name $ job $ time hours;
and didn't get the right output, then I tried to parse it
infile "C:\Users\Desktop\jobs.txt" dlm = ' ' dsd missover; input
allData $; id = substr(allData, find(allData,"305")-2, 7);
but I'm still not getting the right output. Any ideas?
EDIT: I'm trying now to use .scan() and .substr() to apart the larger data set, how do I subset a single line from the data?
Your data might not be all that messy; it just might be in a hierarchical format where the first row contains all five variables and subsequent rows contain values for variables 3-5. In other words, ID and NAME should be retained as you read through the file.
If that is correct (it's a hierarchical layout) this here is a possible solution:
data have;
retain ID NAME;
informat ID 7. JOB $6. TIME 3. HOURS 1.;
input #1 test_string $7. #;
if notdigit(test_string) = 0
then input #1 ID NAME $12. JOB time hours;
else input #1 JOB time hours;
drop test_string;
datalines;
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
0305680 JONES, MARY ARCH06 002 4
ARCH06 005 3
ARCH07 001 7
run;
The key thing is to really understand how your raw file is organized. Once you know the rules, using SAS to read it is a snap!
A list input solution could be the following:
data have;
array all(6) $20. ID LNAME FNAME JOB TIME HOURS;
retain Id Lname Fname;
drop i;
input #;
nitems = countw(_infile_,', ');
if notdigit(scan(_infile_,1)) = 0 then
do i = 1 to nitems;
all(i) = Scan(_infile_,i);
end;
else
do i = 1 to 3;
all(i+3) = Scan(_infile_,i);
if i = 6 then all(i) = all(i)*1;
end;
datalines;
0305679 SMITH, JOHN ARCH05 001 2
ARCH05 005 3
ARCH05 001 7
0305680 JONES, MARY ARCH06 002 4
ARCH06 005 3
ARCH07 001 7
run;
proc print; run;

Resources