I need to print the fuel consumption and mileage of the car in SAS code. if given that mileage is 20 miles per gallon.
It should stop generating output if fuel reaches to 10 gallon OR car travel 250 miles.
My code:
data milage;
fuel=1;
do while (fuel<11);
miles = fuel*20;
fuel+1;
output;
end;
run;
My output:
Code Output
The output for fuel needs to be started from 1 for first 20 miles which incorrect in my code. Any suggestion on what I am missing here.
Thanks!!
Add an explicit OUTPUT for the first line or start at 0 instead. If
you start at 0, make sure the order of the fuel and miles
calculation are correct.
Change your loop condition to be <10 and add in the MILES criteria
as well. In this case you're only looping if fuel<10 AND the miles
lt 250. Make sure the boundaries are what you want.
data milage;
fuel=0; miles=0;
do while (fuel<10 and miles lt 250);
fuel+1;
miles = fuel*20;
output;
end;
run;
Related
I am editing my original question to simplify the problem statement:
I need to create a dataset that contains the principal paydown schedule of a security, which is split into 3 tranches. For each period for the security, I need to calculate the ending balances of principal owed for each tranche. For period 0 (i.e. starting period), I already have the balances owed. For subsequent periods, I need to take the balances from the previous periods and subtract the principal paid down in the current period. The same logic should continue through the last period.
In my SAS code, I am able to get period 1 to do the calculations correctly, but the balances from period 1 don't correctly make it into period 2, causing the calculation to break from that point onwards. I know lag or its placement is what is not working correctly. I am not able to figure out where to place it, or how to use retain (if not lag), such that my balances go from one row to the next.
%let n_t=3;
data xyz;
INFILE DATALINES DLM='#';
input ID $6. period PrincipalPaid best12.2;
datalines;
ABC123#00#0.0
ABC123#01#4.0
ABC123#02#3.92
ABC123#03#3.84
ABC123#04#3.76
ABC123#05#3.69
ABC123#06#3.62
ABC123#07#3.54
;run;
data xyz2;
set xyz;
by id;
if period=0 then do;
Bal1= 120;
Bal2= 8;
Bal3= 2;
end;
/*Code to push all starting balances from period 0 to 1*/
array prev_bal{&N_t.} prev_bal1-prev_bal&n_t.;
array bal{&N_t.} bal1-bal&n_t.;
do i=1 to &N_t.;
prev_bal{i}=lag(bal{i});
end;
/*code to calculate balances for periods >=1*/
if period>=1 then do;
array PrincipalPayDown{&N_t.} PrincipalPayDown1-PrincipalPayDown&N_t.;
do i = 1 to &N_t. ;
PrincipalPayDown{i}=round(PrincipalPaid*prev_bal{i}/sum(of prev_bal:),0.01);
bal{i}=max(prev_bal{i}-PrincipalPayDown{i},0);
end;
end;
drop i ;
run;
proc sql;
create table final as
select
id,period,PrincipalPaid,prev_bal1,prev_bal2,prev_bal3,
PrincipalPayDown1,PrincipalPayDown2,PrincipalPayDown3,Bal1,Bal2,Bal3
from xyz2;
quit;
I am also adding a picture of the final dataset with the correct output calculated in Excel. I want SAS to give me the same output for periods >=2.
Screenshot showing correct output in Excel
thanks for taking time to help me out.
Basically, I would like to generate a random number from 0 to 1, 15,000 times and if the generated value is below .25, then I would like it to display 0 in that spot in the table. If it is greater than .25, I would like to keep the original value.
Any tips on what I should use in the asterik part? The code is pasted below in the format I used:
data rand_data;
call streaminit(123);
do i = 1 to 15000;
u = rand('Uniform');
***if u < .25 then do;
0
else***
output;
end;
run;
proc print data= rand_data;
run;
It sounds like you want to replace values that are less than 0.25 with 0.
if u < 0.25 then u=0;
output;
If instead you want to ignore the values that are less than 0.25 then make the OUTPUT statement conditional.
if u >= 0.25 then output;
Given your code structure this could result in less than 15,000 observations.
I'm working on a payroll calculation and I have to get it to loop
while name= karen:
name = (input("enter employees name"))
if karen>0:
hours=float(input("enter hours worked"))
rate=float(input('enter hourly rate'))
total= (hours)*(rate)+(overtimetotal)
print (total)
# when over time
if hours>40:
overtimerate=1.50
overtimetotal= (total)* (overtimerate)
print (overtimetotal)
my instructor said "There should not be an input for overtimehours. You are calculating it by taking the hours-40
You need to calculate the total as total=hours*rate+overtimetotal
You need to have a while loop to checking the name !="0"
everything in the loop block will be indented."
Im lost at what to write to start the loop.
I'm working in SAS and I'm trying to sum all observations, leaving out one each time.
For example, if I have:
Count Name Grade
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
I want to output a value for Sam that is the sum of all grades but his own, and a value for Adam that is a sum of all grades but his own - etc.
Any ideas? Thanks!
You can do it in a single proc sql instead, using key word calculated:
data have;
input Count Name $ Grade;
datalines;
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
;;;;
run;
proc sql;
create table want as
select *, sum(grade) as all_grades, calculated all_grades-grade as minus_grade
from have;
quit;
Here's a nearly one pass solution (it will be about the same speed as a one pass solution if the dataset fits in the read buffer). I actually calculate the mean here instead of just the sum, as I feel that's a more interesting result (and the sum is of course the mean without the division).
data have;
input Count Name $ Grade;
datalines;
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
;;;;
run;
data want;
retain grademean;
if _n_=1 then do;
do _n_ = 1 to nobs_have;
set have(keep=grade) point=_n_ nobs=nobs_have;
gradesum+grade;
end;
grademean=gradesum/nobs_have;
end;
set have;
grade_noti = ((grademean*nobs_have)-grade)/(nobs_have-1);
run;
Calculate the mean, then for each record subtract the portion that record contributed to the mean. This is a super useful technique for stat testing when you want to compare a record to the rest of the population, and you have a complicated class combination where you'd rather do the mean first. In those cases you use PROC MEANS first and then merge it on, then do this subtraction.
proc sql;
create table temp as select
sum(grade) as all_grades
from orig_data;
quit;
proc sql;
create table temp2 as select
a.count,
a.name,
a.grade,
(b.all_grades-a.grade) as sum_other_grades
from orig_data a
left join temp b;
quit;
Haven't tested it but the above should work. It creates a new dataset temp that has the sum of all grades and merges that back to create a new table with the sum of all grades less the current students grade as sum_other_grades.
This solution performs takes each observation of your starting dataset, and then loops through the same dataset summing up grade values for any records with different names, so beginning with 'Sam', we only add the oth_g variable when we find names that are NOT 'Sam':
data want;
set have;
oth_g=0;
do i=1 to n;
set have
(keep=name grade rename=(name=name_loop grade=grade_loop))
nobs=n point=i;
if name^=name_loop then oth_g+grade_loop;
end;
drop grade_loop name_loop i n;
run;
This is a slight modification to the answer #Reese provided above.
proc sql;
create table want as
select *,
(select sum(grade) from have) as all_grades,
calculated all_grades - grade as minus_grade
from have;
quit;
I've rearranged it this way to avoid the below message being printed to the log:
NOTE: The query requires remerging summary statistics back with the original data.
If you see the above message, it almost always means that you have made a mistake. If you actually did mean to remerge summary stats back with the original data, you should do so explicitly (like I have done above by refactoring #reese 's query.
Personally I think the refactored version is also easier to understand.
I am still new at SAS and I was wondering how I can do the following:
Say that I have a database with the following info:
Time_during_the day date prices volume_traded
930am sep02 42 300
10am sep02 41 200
..4pm sep02 40 200
930am sep03 40 500
10am sep03 41 100
..4pm sep03 40 350
.....
What I want is to take the average of the total daily volume and divide this number by 50 (always). So say avg.daily vol./50 = V; and what I want is to record the price/time/date at every interval of size V. Now, say that V=500, I start by recording the first price,time,and date in my database and then record the same info 500 volume trade later. It is possible that on one day that the traded volume is say 300 and half of it will cover the v=500, the other 150 will be use to fill up up the following interval.
How can I get this information in one database?
Thank you!
Assume your input dataset is called tick_data, and that it is sorted by both date and time_during_the_day. Then here's what I got:
%LET n = 50;
/* Calculate V - the breakpoint size */
PROC SUMMARY DATA=tick_data;
BY date;
OUTPUT OUT = temp_1
SUM (volume_traded)= volume_traded_agg;
RUN;
DATA temp_2 ;
SET temp_1;
V = volume_traded_agg / &n;
RUN;
/* Merge it into original dataset so that it is available */
DATA temp_3;
MERGE tick_data temp_2;
BY date;
RUN;
/* Final walk through tick data to output at breakpoints */
DATA results
/* Comment out the KEEP to see what is happening under the hood */
(KEEP=date time_during_the_day price volume_traded)
;
SET temp_3;
/* The IF FIRST will not work without the BY below */
BY date;
/* Stateful counters */
RETAIN
volume_cumulative
breakpoint_next
breakpoint_counter
;
/* Reset stateful counters at the beginning of each day */
IF (FIRST.date) THEN DO;
volume_cumulative = 0;
breakpoint_next = V;
breakpoint_counter = 0;
END;
/* Breakpoint test */
volume_cumulative = volume_cumulative + volume_traded;
IF (breakpoint_counter <= &n AND volume_cumulative >= breakpoint_next) THEN DO;
OUTPUT;
breakpoint_next = breakpoint_next + V;
breakpoint_counter = breakpoint_counter + 1;
END;
RUN;
The key SAS language feature to keep in mind for the future is the use of BY, FIRST, and RETAIN together. This enables stateful walks through data like this one. Conditional OUTPUT also figures here.
Note that whenever you use BY <var>, the dataset must be sorted on a key that includes <var>. In the case of tick_data and all intermediate temporary tables, it is.
Additional: Alternative V
In order to make V equal the (average total daily volume / n), replace the matching code block above with this one:
. . . . . .
/* Calculate V - the breakpoint size */
PROC SUMMARY DATA=tick_data;
BY date;
OUTPUT OUT = temp_1
SUM (volume_traded)= volume_traded_agg;
RUN;
PROC SUMMARY DATA = temp_1
OUTPUT OUT = temp_1a
MEAN (volume_traded_agg) =;
RUN;
DATA temp_2 ;
SET temp_1a;
V = volume_traded_agg / &n;
RUN;
/* Merge it into original dataset so that it is available */
DATA temp_3 . . . . . .
. . . . . .
Basically you just insert a second PROC SUMMARY to take the mean of the sums. Notice how there is no BY statement because we're averaging over the whole set, not by any groupings or buckets. Also notice the MEAN (...) = without a name after the =. That will make the output variable have the same name as the input variable.