I need to use the day and month variables to create a date variable using the array function. For example, let's say the year is 2022, which needs to be included in the date.
I tried the following codes, but it doesn't seem to work. SAS gave me a number but not sure that's the correct date. Thanks for your help.
array day{1} day_1;
array month{1} month_1;
array date{1} date_1;
do i=1 to dim(day);
date{i}=MDY(month{i},day{i},2022);
end;
format date_1 mmyydd10.;
run;
Your posted code does not have a DATA statement telling SAS what dataset you want to create. Nor anyway to find the existing DAY_1 or MONTH_1 variables. If the variables are in an existing SAS dataset then add a SET statement.
If there is only one DAY variable and one MONTH variable there is no need for the ARRAY or the DO loop.
data want;
set have;
date_1=MDY(month_1,day_1,2022);
format date_1 mmyydd10.;
run;
If you do have multiple variables then include them in the arrays and the FORMAT statement.
data want;
set have;
array day day_1-day_3
array month month_1-month_3;
array date date_1-date_3;
do i=1 to dim(day);
date{i}=MDY(month{i},day{i},2022);
end;
format date_1-date_3 mmyydd10.;
drop i;
run;
Related
I want to create a data set where I only want to keep 5 specific dates.
So my &date is 31mar2020 and &enddate is 31mar2025 and I only want to keep 31mar every year until 2025.
With my code below it creates dates for everyday up to 31mar2025 and thats to much so I only want to keep 5 specific dates.
How can i do that?
Thank you
DATA LOOP;FORMAT ROLL_BASE_DT DATE9.;DO ROLL_BASE_DT =&DATE TO &ENDdate;OUTPUT;END;RUN;
enter code here
enter code here
You can use commas in the DO statement to list multiple values.
do date='31mar2021'd,'31mar2022'd,'31mar2023'd,'31mar2024'd,'31mar2025'd;
...
end;
You could loop over the YEAR value instead.
do year=2021 to 2025;
date=mdy(3,31,year);
...
end;
You could use INTNX() to increment the date by YEAR. You can use INTCK() to figure out how many times to run the loop.
do index=0 to intck('year',&DATE,&ENDdate);
date=intnx('year',&date,index,'s');
...
end;
If it's just the 5 dates you want, you could use the cards input (I know of it but have never used it personally).
Alternatively, rather than using a loop just set the values individually with the output keyword after each time you set the value. That should do it.
I am trying to add a Data step that creates the work.orders_fin_qtr_tot data set from the work.orders_fin_tot data set. This new data set should contain new variables for quarterly sales and profit. Use two arrays to create the new variables: QtrSales1-QtrSales4 and QtrProfit1-QtrProfit4. These represent total sales and total profit for the quarter (1-4). Use the quarter number of the year in which the order was placed to index into the correct variable to add either the TotalSales or TotalProfit to the new appropriate variable.
Add a Proc step that displays the first 10 observations of the work.orders_fin_qtr_tot data set.
My issue is that I can't seem to get the two diff arrays to meld with out spaces
proc sort data=work.orders_fin_tot_qtr;
by workqtr;
run;
data work.orders_fin_tot_qtr;
set work.orders_fin_tot_qtr;
array QtrSales{4} quarter1-quarter4 ;
do i = 1 by 1 until (last.order_id);
if workqtr=i then QtrSales{i}=totalsales;
end;
drop totalsales totalprofit _TYPE_ _FREQ_;
run;
proc print data=work.orders_fin_tot_qtr;
run;
The syntax last.order_id is only appropriate if there is a BY statement in the DATA Step -- if not present, the last. reference is always missing and the loop will never end; so you have coded an infinite loop!
The step has drop totalsales totalprofit _TYPE_ _FREQ_. Those underscored variables indicate the incoming data set was probably created with a Proc SUMMARY.
Your orders_fin_tot data set should have columns order_id quarter (valid values 1,2,3,4), and totalsales. If the data is multi-year, it should have another column named year.
The missing BY and present last.id indicate you are reshaping the data from acategorical vector going down a column to one that goes across a row -- this is known as a pivot or transpose. The do construct you show in the question is incorrect but similar to that of a technique known in SAS circles as a DOW loop -- the specialness of the technique is that the SET and BY are coded inside the loop.
Try adjusting your code to the following pattern
data want;
do _n_ = 1 by 1 until (last.order_id);
SET work.orders_fin_tot; * <--- presumed to have data 'down' a column for each quarter of an order_id;
BY order_id; * <--- ensures data is sorted and makes automatic flag variable LAST.ORDER_ID available for the until test;
array QtrSales quarter1-quarter4 ; * <--- define array for step and creates four variables in the program data vector (PDV);
* this is where the pivot magic happens;
* the (presumed) quarter value (1,2,3,4) from data going down the input column becomes an
* index into an array connected to variables going across the PDV (the output row);
QtrSales{quarter} = totalsales;
end;
run;
Notice there is no OUTPUT statement inside or outside the loop. When the loop completes it's iteration the code flow reaches the bottom of the data step and does an implicit OUTPUT (because there is no explicit OUTPUT elsewhere in the step).
Also, for any data set specified in code, you can use data set option OBS= to select which observation numbers are used.
proc print data=MyData(obs=10);
OBS is a tricky option name because it really means last observation number to use. FIRSTOBS is another data set option for specifying the row numbers to use, and when not present defaults to 1. So the above is equivalent to
proc print data=MyData(firstobs=1 obs=10);
OBS= should be thought of conceptually as LASTOBS=; there is no actual option name LASTOBS. The following would log an ERROR: because OBS < FIRSTOBS
proc print data=MyData(firstobs=10 obs=50);
I am a STATA user and am therefore not familiar with using SAS. However, all of the files that I require for my current project are stored in SAS format, so I would like to convert them from SAS to .dta format, using SAS code.
The files are stored as monthly sets like so:
1976 - x1976M1, x1976M2, x1976M3.... x1976M12
where 1976 is the folder, and each month, eg. x1976M1, is a file containing the observations for that month and year.
I would like to export those files to .dta format, with the same file structure so that I can easily read them into STATA.
I am not picky about whether or not I can loop over each folder, or will have to loop each folder individually--there are forty folders with 12 files in each.
Therefore, I will need to at least create a loop that goes from m1 to m2 that is appended to the end of the filename, eg. filename1976 + my, where y = [1, 12]. Ideally, I will be able to create a loop that goes from one folder to the next, executing this process via a nested loop.
I hope this is satisfactorily clear! If not, please comment and I will adjust my question accordingly.
Some code given to me by a coworker. Hope this helps anybody with the same issue. This will need to be updated for each individual folder, as it does not loop.
Cheers!
libname name 'G:\folder\'; run`;
%macro subset1976(month=);
data subset1976_&month;
set name.file1976_&month;
keep xyz /*varnames*/
;
if age>=15;
noc2011 = soc4+0;
run;
%mend;
%subset1976(month=jan);
%subset1976(month=feb);
....
%macro export1976(month=);
proc export data=subset1976_&month outfile='G:\lfs\subset1976_&month.dta' replace dbms=stata; run;
%mend;
%export1976(month=jan);
%export1976(month=feb);
I'm working on a spreadsheet that will keep track of attendance violations for employees.
I have an array called BoxA and it will be filled with dates. I would like a report to be written where a cell from the array is called twice, but once to return the day of the week, and the second time to return the actual date that is in the cell. The goal is a line in the report that reads:
Sunday, 8/24/2014 John was absent.
Is there a way to format the Array. I don't mind creating a second array based on the first that is formatted differently, but I can't seem to figure out how to make the array return the day of the week when it consists only of dates.
Thank you in advance.
motizer
With a date in A1, in another cell enter:
=TEXT(A1,"dddd mm/dd/yyyy") & " John was absent"
I have a range of weekly variables describing a person's "status" (from week 1, 2010 to 2012 week 17).
The variables are given by:
y_1001, y_1002,...y1052, y1101, y1102,......y_1217
I define the period of variables like this:
%let period = y_1001-1052 y_1101-y1148;
I also have a treatment period given as a start date and an end date. My challenge is to find the status given by the y_ variables in the week after the person stops the treatment.
I am not too familiar with SAS, but my idea was to "pick" the correct y_ variable based on a week counter, say by counting the number of weeks since the beginning of the period (week 1 in 2010) and until the date where the treatment ends.
I get the weeks until end of treatment like this
week_count = 1 + intck( 'week.2', '1JAN2010'd, end_treatment_date, 'd');
But how can I retrieve the corresponding y_ variable based on this count?
After fruitless search on how to loop over the period variables and pick the number corresponding to the week_count variable for each person, I thought about going a different way... say something like this.
array weeks(*) .
do i = 1 to dim(weeks) by 1;
if week_count = i then end_status = y_10&i;
end
...but with modifications to take into account that there is a mismatch between the dimension of the array and the number of weeks and years.
But then my challenge is to make the following part work...
if week_count = i then end_status = y_10&i;
How can I make SAS pick the right y_ variable based on the loop index? This seems like a really simple problem, but somehow I have not managed to find a solution. Is there no way to use the variable "i" as input in defining the correct y_ variable?
Would really appreciate if somebody could throw some hints.
I think you want:
if week_count = i then end_status = weeks{i};
VVALUEX is a little known function in SAS which can help you to extract a value from a SAS variable name. Here your problem is to construct the SAS variable name given the number of weeks from the 1st Jan to the end of treatment. You can avoid using a DO LOOP for every observation by using the ideas in the example below -
data _null_;
end_treatment_date = "09FEB2010"d;
y_1007 = 'D';
status = vvaluex(compress("y_" || substr(strip(year(end_treatment_date)), 3, 2) || put(1 + intck("week.2", "01JAN2010"d, end_treatment_date), z2.)));
put status;
run;
The variable name is constructed as follows - initally you take the string "y_" and then append the last two digits of the year followed by the week using similar logic to your variable week_count as before. You get the variable you want, and then apply VVALUEX to get the value. Running the DO LOOP for every observation can be inefficient if you have millions of them.