Looping a proc transpose through multiple data ranges SAS

Looping a proc transpose through multiple data ranges SAS - loops

I am trying to transpose a sequence of ranges from an excel file into SAS. The excel file looks something like this:
31 Dec 01Jan 02Jan 03Jan 04Jan
Book id1 23 24 35 43 98
Book id2 3 4 5 4 1
(few blank rows in between)
05Jan 06Jan 07Jan 08Jan 09Jan
Book id1 14 100 30 23 58
Book id2 2 7 3 8 6
(and it repeats..)
My final output should have a first column for the date and then two additional columns for the book Ids:
Date Book id1 Book id2
31 Dec 23 3
01Jan 24 4
02Jan 35 5
03Jan 43 4
04Jan 98 1
05Jan 14 2
06Jan 100 7
07Jan 30 3
08Jan 23 8
09Jan 58 6
In this particular case I am asking for a simpler method to:
Either import and transpose each range of data and replacing the data range with macro variables to separately import and transpose each individual range
Or to import the whole datafile first and then to create a loop that
transposes each range of data
Code I used for a simple import and transpose of a specific data range:
proc import datafile="&input./have.xlsx"
out=want
dbms=xlsx replace;
range="Data$A3:F5" ;
run;
proc transpose data=want
out=want_transposed
name=date;
id A;
run;
Thanks!

A data row that is split over several segments or blocks of rows in an Excel file can be imported raw into SAS and then processed into a categorical form using a DATA Step.
In this example sample data is put into a text file and imported such that the column names are generic VAR-1 ... VAR-n. The generic import is then processed across each row, outputting one SAS data set row per import cell.
The column names in each segment are retained within a temporary array an updated whenever a blank book id is encountered.
* mock data;
filename demo "%sysfunc(pathname(WORK))\demo.txt";
data _null_;
input;
file demo;
put _infile_;
datalines;
., 31Dec, 01Jan, 02Jan, 03Jan, 04Jan
Book_id1, 23 , 24 , 35 , 43 , 98
Book_id2, 3 , 4 , 5 , 4 , 1
., 05Jan, 06Jan, 07Jan, 08Jan, 09Jan
Book_id1, 14 , 100 , 30 , 23 , 58
Book_id2, 2 , 7 , 3 , 8 , 6
run;
* mock import;
proc import replace out=work.haveraw file=demo dbms=csv;
getnames = no;
datarow = 1;
run;
ods listing;
proc print data=haveraw;
run;
When Excel import is be made to look like this:
Obs VAR1 VAR2 VAR3 VAR4 VAR5 VAR6
1 31Dec 01Jan 02Jan 03Jan 04Jan
2 Book_id1 23 24 35 43 98
3 Book_id2 3 4 5 4 1
4
5 05Jan 06Jan 07Jan 08Jan 09Jan
6 Book_id1 14 100 30 23 58
7 Book_id2 2 7 3 8 6
It can be processed in a transposing way, outputting only the name value pairs corresponding to a original cell.
data have (keep=bookid date value);
set haveraw;
array dates(1000) $12 _temporary_ ;
array vars var:;
if missing(var1) then do;
do index = 2 by 1 while (index <= dim(vars));
if not missing(vars(index)) then
dates(index) = put(index-1,z3.) || '_' || vars(index); * adjust as you see fit;
else
dates(index) = '';
end;
end;
else do;
bookid = var1;
do index = 2 by 1 while (index <= dim(vars));
date = dates(index);
value = input(vars(index),??best12.);
output;
end;
end;
run;

Related

sum values across any 365 day period

I've got a dataset that has id, start date and a claim value (in dollars) in each row - most ids have more than one row - some span over 50 rows. The earliest date for each ID/claim varies, and the claim values are mostly different.
I'd like to do a rolling sum of the value of IDs that have claims within 365 days of each other, to report each ID that has claims that have exceeded a limiting value across each period. So for an ID that had a claim date on 1 January, I'd sum all claims to 31 December (inclusive). Most IDs have several years of data so for the example above, I'd also need to check that if they had a claim on 1 May that they hadn't exceeded the limit by 30 April the following year and so on. I normally see this referred to as a 'rolling sum'. My site has many SAS products including base, stat, ets, and others.
I'm currently testing code on a small mock dataet and so far I've converted a thin file to a fat file with one column for each claim value and each date of the claim. The mock dataset is similar to the client dataset that I'll be using. Here's what I've done so far (noting that the mock data uses days rather than dates - I'm not at the stage where I want to test on real data yet).
data original_data;
input ppt $1. day claim;
datalines;
a 1 7
a 2 12
a 4 12
a 6 18
a 7 11
a 8 10
a 9 14
a 10 17
b 1 27
b 2 12
b 3 14
b 4 12
b 6 18
b 7 11
b 8 10
b 9 14
b 10 17
c 4 2
c 6 4
c 8 8
;
run;
proc sql;
create table ppt_counts as
select ppt, count(*) as ppts
from work.original_data
group by ppt;
select cats('value_', max(ppts) ) into :cats
from work.ppt_counts;
select cats('dates_',max(ppts)) into :cnts
from work.ppt_counts;
quit;
%put &cats;
%put &cnts;
data flipped;
set original_data;
by ppt;
array vars(*) value_1 -&cats.;
array dates(*) dates_1 - &cnts.;
array m_vars value_1 - &cats.;
array m_dates dates_1 - &cnts.;
if first.ppt then do;
i=1;
do over m_vars;
m_vars="";
end;
do over m_dates;
m_dates="";
end;
end;
if first.ppt then do:
i=1;
vars(i) = claim;
dates(i)=day;
if last.ppt then output;
i+1;
retain value_1 - &cats dates_1 - &cnts. 0.;
run;
data output;
set work.flipped;
max_date =max(of dates_1 - &cnts.);
max_value =max(of value_1 - &cats.);
run;
This doesn't give me even close to what I need - not sure how to structure code to make this correct.
What I need to end up with is one row per time that an ID exceeds the yearly limit of claim value (say in the mock data if a claim exceeds 75 across a seven day period), and to include the sum of the claims. So it's likely that there may be multiple lines per ID and the claims from one row may also be included in the claims for the same ID on another row.
type of output:
ID sum of claims
a $85
a $90
b $80
On separate rows.
Any help appreciated.
Thanks

If you need to perform a rolling sum, you can do this with proc expand. The code below will perform a rolling sum of 5 days for each group. First, expand your data to fill in any missing gaps:
proc expand data = original_data
out = original_data_expanded
from = day;
by ppt;
id day;
convert claim / method=none;
run;
Any days with gaps will have missing value of claim. Now we can calculate a moving sum and ignore those missing days when performing the moving sum:
proc expand data = original_data
out = want(where=(NOT missing(claim)));
by ppt;
id day;
convert claim = rolling_sum / transform=(movsum 5) method=none;
run;
Output:
ppt day rolling_sum claim
a 1 7 7
a 2 19 12
a 4 31 12
a 6 42 18
a 7 41 11
...
b 9 53 14
b 10 70 17
c 4 2 2
c 6 6 4
c 8 14 8
The reason we use two proc expand statements is because the rolling sum is calculated before the days are expanded. We need the rolling sum to occur after the expansion. You can test this by running the above code all in a single statement:
/* Performs moving sum, then expands */
proc expand data = original_data
out = test
from = day;
by ppt;
id day;
convert claim = rolling_sum / transform=(movsum 5) method=none;
run;

Use a SQL self join with the dates being within 365 days of itself. This is time/resource intensive if you have a very large data set.
Assuming you have a date variable, the intnx is probably the better way to calculate the date interval than 365 depending on how you want to account for leap years.
If you have a claim id to group on, that would also be better than using the group by clause in this example.
data have;
input ppt $1. day claim;
datalines;
a 1 7
a 2 12
a 4 12
a 6 18
a 7 11
a 8 10
a 9 14
a 10 17
b 1 27
b 2 12
b 3 14
b 4 12
b 6 18
b 7 11
b 8 10
b 9 14
b 10 17
c 4 2
c 6 4
c 8 8
;
run;
proc sql;
create table want as
select a.*, sum(b.claim) as total_claim
from have as a
left join have as b
on a.ppt=b.ppt and
b.day between a.day and a.day+365
group by 1, 2, 3;
/*b.day between a.day and intnx('year', a.day, 1, 's')*/;
quit;

Assuming that you have only one claim per day you could just use a circular array to keep track of the pervious N days of claims to generate the rolling sum. By circular array I mean one where the indexes wrap around back to the beginning when you increment past the end. You can use the MOD() function to convert any integer into an index into the array.
Then to get the running sum just add all of the elements in the array.
Add an extra DO loop to zero out the days skipped when there are days with no claims.
%let N=5;
data want;
set original_data;
by ppt ;
array claims[0:%eval(&n-1)] _temporary_;
lagday=lag(day);
if first.ppt then call missing(of lagday claims[*]);
do index=max(sum(lagday,1),day-&n+1) to day-1;
claims[mod(index,&n)]=0;
end;
claims[mod(day,&n)]=claim;
running_sum=sum(of claims[*]);
drop index lagday ;
run;
Results:
running_
OBS ppt day claim sum
1 a 1 7 7
2 a 2 12 19
3 a 4 12 31
4 a 6 18 42
5 a 7 11 41
6 a 8 10 51
7 a 9 14 53
8 a 10 17 70
9 b 1 27 27
10 b 2 12 39
11 b 3 14 53
12 b 4 12 65
13 b 6 18 56
14 b 7 11 55
15 b 8 10 51
16 b 9 14 53
17 b 10 17 70
18 c 4 2 2
19 c 6 4 6
20 c 8 8 14

Working in a known domain of date integers, you can use a single large array to store the claims at each date and slice out the 365 days to be summed. The bookkeeping needed for the modular approach is not needed.
Example:
data have;
call streaminit(20230202);
do id = 1 to 10;
do date = '01jan2012'd to '02feb2023'd;
date + rand('integer', 25);
claim = rand('integer', 5, 100);
output;
end;
end;
format date yymmdd10.;
run;
options fullstimer;
data want;
set have;
by id;
array claims(100000) _temporary_;
array slice (365) _temporary_;
if first.id then call missing(of claims(*));
claims(date) = claim;
call pokelong(
peekclong(
addrlong (claims(date-365))
, 8*365)
,
addrlong(slice(1))
);
rolling_sum_365 = sum(of slice(*));
if dif1(claim) < 365 then
claims_out_365 = lag(claim) - dif1(rolling_sum_365);
if first.id then claims_out_365 = .;
run;
Note: SAS Date 100,000 is 16OCT2233

Read excel and populate 2 dimensional array in typescript

i would like to read a simple table in an excel sheet (x rows, y cols) and populate a 2 dimensional array in typescript. Something like below:
client_id orderid
1 : 12 3 45 78 97
2 : 67 89 12
3 : 7 90 23
Do you know a good excel library ? Thank you
regards,
Titi

You just need to specify the table name and then you can use getRange().getValues() to capture the values within the table as a 2 day array. Example:
function main(workbook: ExcelScript.Workbook) {
const theTableName:string = "table1"; // or whatever the name is
var theTable = workbook.getTable(theTableName);
var allValues = theTable.getRange().getValues();
// allVallues is now an 2-day array with values
}

expanding a dataset with blanks

I have a dataset as follows:
data have;
input;
ID Base Adverse Fixed$ Date RepricingFrequency
1 38 50 FIXED 2016 2
2 40 60 FLOATING 2017 3
3 20 20 FIXED 2016 2
4 ...
5
6
I am looking to build an array such that each ID has four vintage years 2017-2020, where the subsequent years are to be filled out with a piece of array code I have that works
like such
ID Vintage Base Adverse Fixed$ Date RepricingFrequency
1 2017 38 50 FIXED 2016 2
1 2018
1 2019
1 2020
In the beginning I just need to duplicate the dataset with the blanks,
The code I've tried so far is
data want;
set have;
do I=1 to 4;
output;
drop I;
run;
but of course that keeps the repeats of all the observations. So I tried an array.
data want;
set have;
array Base(2017:2020) Base2017-Base2020
array Vintage(2017:2020) Vintage2017-Vintage2020
But I'm not sure where to go from here on either accord.
The question is how do I extrapolate my data set for ID1-8 to a dataset where I have ID 1111-8888 where each ID is repeated 4 times with blanks.

Make a dummy dataset with all of the observations
data frame ;
set have(keep=id);
by id ;
if first.id then do date=2017 to 2020 ;
output;
end;
run;
and merge it back with the original.
data want ;
merge have frame ;
by id date ;
run;

SAS combining datasets, binary search, indices

In SAS, for the two test datasets below - for every value of "amount" that falls within "y" and "z", I need to extract the corresponding "x". There could be multiple values of "x" that fit into the criteria.
The final result should look something like this:
/*
4 banana eggs
15 .
31 .
7 banana
22 fig
1 eggs
11 coconut
17 date
41 apple
*/
I realize this relies on using indices or binary searches but I can't figure out a workable solution! Any help would appreciated! Thanks!
data test1;
input x $ y z;
datalines;
apple 29 43
banana 2 7
coconut 9 13
date 17 20
eggs 1 5
fig 18 26
;
run;
data test2;
input amount;
datalines;
4
15
31
7
22
1
11
17
41
;
run;

Join the two datasets so amount falls between y and z.
proc sql;
create table join as
select a.amount
,b.*
from test2 a
left join
test1 b
on a.amount between b.y and b.z;
quit;
Sort the result by amount for transpose.
proc sort data=join; by amount; run;
Transpose it.
proc transpose data=join out=trans;
by amount;
var x;
run;
Now you have your fruits each in its own variable named col1, col2, ....
If you want them all in one variable separated by a blank, just concatenate them.
data trans2(keep= amount text);
set trans(drop=_name_);
array v{*} _character_;
text = catx(' ', of v{*});
run;

Here is a possible solution using "old-fashioned" data step code plus PROC TRANSPOSE:
data test1;
input x $ y z;
datalines;
apple 29 43
banana 2 7
coconut 9 13
date 17 20
eggs 1 5
fig 18 26
run;
data test2;
input amount;
datalines;
4
15
31
7
22
1
11
17
41
run;
data want(keep=amount x);
set test2;
found = 0;
do _i_=1 to nobs;
set test1 point=_i_ nobs=nobs;
if y <= amount <= z then do;
found = 1;
output;
end;
end;
if not found then do;
x = ' ';
output;
end;
run;
proc transpose data=want out=want2(drop=_name_);
by amount notsorted;
var x;
run;
Note my results do not match that in your example; amount 31 is an "apple".

How to save data in .txt file in MATLAB

I have 3 txt files s1.txt, s2.txt, s3.txt.Each have the same format and number of data.I want to combine only the second column of each of the 3 files into one file. Before I combine the data, I sorted it according to the 1st column:
UnSorted file:
s1.txt s2.txt s3.txt
1 23 2 33 3 22
4 32 4 32 2 11
5 22 1 10 5 28
2 55 8 11 7 11
Sorted file:
s1.txt s2.txt s3.txt
1 23 1 10 2 11
2 55 2 33 3 22
4 32 4 32 5 28
5 22 8 11 7 11
Here is the code I have so far:
BaseFile ='s'
n=3
fid=fopen('RT.txt','w');
for i=1:n
%Open each file consecutively
d(i)=fopen([BaseFile num2str(i)'.txt']);
%read data from file
A=textscan(d(i),'%f%f')
a=A{1}
b=A{2}
ab=[a,b];
%sort the data according to the 1st column
B=sortrows(ab,1);
%delete the 1st column after being sorted
B(:,1)=[]
%write to a new file
fprintf(fid,'%d\n',B');
%close (d(i));
end
fclose(fid);
How can I get the output in the new txt file in this format?
23 10 11
55 33 22
32 32 28
22 11 11
instead of this format?
23
55
32
22
10
33
32
11
11
22
28
11

Create the output matrix first, then write it to the file.
Here is the new code:
BaseFile ='s';
n=3;
for i=1:n % it's not recommended to use i or j as variables, since they used in complex math, but I'll leave it up to you
% Open each file consecutively
d=fopen([BaseFile num2str(i) '.txt']);
% read data from file
A=textscan(d,'%f%f', 'CollectOutput',1);
% sort the data according to the 1st column
B=sortrows(A{:},1);
% Instead of deleting a column create new matrix
if(i==1)
C = zeros(size(B,1),n);
end
% Check input file and save the 2nd column
if size(B,1) ~= size(C,1)
error('Input files have different number of rows');
end
C(:,i) = B(:,2);
% don't write yet
fclose (d);
end
% write to a new file
fid=fopen('RT.txt','w');
for k=1:size(C,1)
fprintf(fid, [repmat('%d\t',1,n-1) '%d\n'], C(k,:));
end
fclose(fid);
EDIT:
Actually to write only numbers to a file you don't need FPRINTF. Use DLMWRITE instead:
dlmwrite('RT.txt',C,'\t')