Could anyone help with the translation of the following Stata code? I need this code for further analysis in SPSS.
if year<1990 {
bysort country year ID: egen sum080=sum(PY080g)
gen hydisp=(HY020+sum080)*HY025
}
else gen hydisp=HY020*HY025
I tried to solve the problem with the following SPSS code:
DO IF year<1990.
SORT CASES BY country year ID.
COMPUTE sum080 = SUM(PY080g).
COMPUTE hydisp=(HY020+sum080)*HY025.
ELSE.
COMPUTE hydisp=HY020*HY025.
END IF.
EXECUTE.
But this code appears to be wrong. Do you have any idea how to resolve the problem?
This particular use of egen in Stata can be replicated in SPSS by using the AGGREGATE command. Using Nick Cox's revised Stata code:
bysort country year ID: egen sum080 = sum(PY080g)
gen hydisp = (HY020 + sum080) * HY025 if year < 1990
replace hydisp = HY020 * HY025 if year >= 1990
A synonymous set of code in SPSS would be:
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK = country year ID
/sum080 = SUM(PY080g).
DO IF Year < 1990.
COMPUTE hydisp = (HY020+sum080)*HY025.
ELSE.
COMPUTE hydisp = HY020*HY025.
END IF.
This is in no sense an answer on SPSS code, but it makes a point that would not go well in a comment.
The Stata code
if year < 1990 {
bysort country year ID: egen sum080=sum(PY080g)
gen hydisp=(HY020+sum080)*HY025
}
else gen hydisp=HY020*HY025
would get interpreted as
if year[1] < 1990 {
bysort country year ID: egen sum080=sum(PY080g)
gen hydisp=(HY020+sum080)*HY025
}
else gen hydisp=HY020*HY025
i.e. the branching is on the value of year in the first observation (case, record). The if command and the if qualifier are quite different constructs. It seems much more likely that the code desired is something like
bysort country year ID: egen sum080 = sum(PY080g)
gen hydisp = (HY020 + sum080) * HY025 if year < 1990
replace hydisp = HY020 * HY025 if year >= 1990
or
bysort country year ID: egen sum080 = sum(PY080g)
gen hydisp = cond(year < 1990, (HY020 + sum080) * HY025, HY020 * HY025)
The OP's comment that the code appears to be wrong is a poor problem report. What is wrong precisely? It may be nothing more than inability to replicate the results gained in Stata, which would not be surprising as the Stata code is almost certainly not what is intended. It seems unlikely that the first observation is special, but rather that the calculation should be carried out for all observations according to the value of year
Detail: sum() as an egen function is undocumented in favour of total(), but the syntax remains legal.
Detail: The Stata code here would not be considered a loop just because there is a tacit loop over observations.
Related
I currently work in SAS and utilise arrays in this way:
Data Test;
input Payment2018-Payment2021;
datalines;
10 10 10 10
20 20 20 20
30 30 30 30
;
run;
In my opinion this automatically assumes a limit, either the start of the year or the end of the year (Correct me if i'm wrong please)
So, if I wanted to say that this is June data and payments are set to increase every 9 months by 50% I'm looking for a way for my code to recognise that my years go from end of June to the next end of june
For example, if I wanted to say
Data Payment_Pct;
set test;
lastpayrise = "31Jul2018";
array payment:
array Pay_Inc(2018:2021) Pay_Inc: ;
Pay_Inc2018 = 0;
Pay_Inc2019 = 2; /*2 because there are two increments in 2019*/
Pay_Inc2020 = 1;
Pay_Inc2021 = 1;
do I = 2018 to 2021;
if i = year(pay_inc) then payrise(i) * 50% * Pay_Inc(i);
end;
run;
It's all well and good for me to manually do this for one entry but for my uni project, I'll need the algorithm to work these out for themselves and I am currently reading into intck but any help would be appreciated!
P.s. It would be great to have an algorithm that creates the following
Pay_Inc2019 Pay_Inc2020 Pay_Inc2021
1 2 1
OR, it would be great to know how the SAS works in setting the array for 2018:2021 , does it assume end of year or can you set it to mid year or?
Regarding input Payment2018-Payment2021; there is no automatic assumption of yearness or calendaring. The numbers 2018 and 2021 are the bounds for a numbered range list
In a numbered range list, you can begin with any number and end with any number as long as you do not violate the rules for user-supplied names and the numbers are consecutive.
The meaning of the numbers 2018 to 2021 is up to the programmer. You state the variables correspond to the June payment in the numbered year.
You would have to iterate a date using 9-month steps and increment a counter based on the year in which the date falls.
Sample code
Dynamically adapts to the variable names that are arrayed.
data _null_;
array payments payment2018-payment2021;
array Pay_Incs pay_inc2018-pay_inc2021; * must be same range numbers as payments;
* obtain variable names of first and last element in the payments array;
lower_varname = vname(payments(1));
upper_varname = vname(payments(dim(payments)));
* determine position of the range name numbers in those variable names;
lower_year_position = prxmatch('/\d+\s*$/', lower_varname);
upper_year_position = prxmatch('/\d+\s*$/', upper_varname);
* extract range name numbers from the variable names;
lower_year = input(substr(lower_varname,lower_year_position),12.);
upper_year = input(substr(upper_varname,upper_year_position),12.);
* prepare iteration of a date over the years that should be the name range numbers;
date = mdy(06,01,lower_year); * june 1 of year corresponding to first variable in array;
format date yymmdd10.;
do _n_ = 1 by 1; * repurpose _n_ for an infinite do loop with interior leave;
* increment by 9-months;
date = intnx('month', date, 9);
year = year(date);
if year > upper_year then leave;
* increment counter for year in which iterating date falls within;
Pay_Incs( year - lower_year + 1 ) + 1;
end;
put Pay_Incs(*)=;
run;
Increment counter notes
There is a lot to unpack in this statement
Pay_Incs( year - lower_year + 1 ) + 1;
+ 1 at the end of the statement increments the addressed array element by 1, and is the syntax for the SUM Statement
variable + expression The sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:
retain variable 0;
variable=sum(variable,expression);
year - lower_year + 1 computes the array base-1 index, 1..N, that addresses the corresponding variable in the named range list pay_inc<lower_year>-pay_inc<upper_year>
Pay_Incs( <computed index> ) selects the variable of the SUM statement
This is a wonderful use case of the intnx() function. intnx() will be your best friend when it comes to aligning dates.
In the traditional calendar, the year starts on 01JAN. In your calendar, the year starts in 01JUN. The difference between these two dates is exactly 6 months. We want to shift our date so that the year starts on 01JUN. This will allow you to take the year part of the date and determine what year you are on in the new calendar.
data want;
format current_cal_year
current_new_year year4.
;
current_cal_year = intnx('year', '01JUN2018'd, 0, 'B');
current_new_year = intnx('year.6', '01JUN2018'd, 1, 'B');
run;
Note that we shifted current_new_year by one year. To illustrate why, let's see what happens if we don't shift it by one year.
data want;
format current_cal_year
current_new_year year4.
;
current_cal_year = intnx('year', '01JUN2018'd, 0, 'B');
current_new_year = intnx('year.6', '01JUN2018'd, 0, 'B');
run;
current_new_year shows 2018, but we really are in 2019. For 5 months out of the year, this value will be correct. From June-December, the year value will be incorrect. By shifting it one year, we will always have the correct year associated with this date value. Look at it with different months of the year and you will see that the year part remains correct throughout time.
data want;
format cal_month date9.
cal_year
new_year year4.
;
do i = 0 to 24;
cal_month = intnx('month', '01JAN2016'd, i, 'B');
cal_year = intnx('year', cal_month, i, 'B');
new_year = intnx('year.6', cal_month, i+1, 'B');
year_not_same = (year(cal_year) NE year(new_year) );
output;
end;
drop i;
run;
I have a table like this:
Year Month Code Amount
---------------------------------------
2017 11 a 7368
2017 11 b 3542
2017 12 a 4552
2017 12 b 7541
2018 1 a 6352
2018 1 b 8376
2018 2 a 1287
2018 2 b 3625
I make slicer base on Year and Month (ignore the Code), and I want to show SUM of Amount like this :
If I select on slicer Year 2017 and Month 12, the value to be shown is SUM Amount base on 2017-11, and select on slicer Year 2018 and Month 1 should be SUM Amount base on 2017-12
I have tried this one for testing with, but this not allowed:
Last Month = CALCULATE(SUM(Table[Amount]); Table[Month] = SELECTEDVALUE(Table[Month]) - 1)
How to do it right?
I want something like this
NB: I use direct query to SQL Server
Update: At this far, I added Last_Amount column in SQL Server Table by sub-query, maybe you guys have a better way for my issue
The filters in a CALCULATE statement are only designed to take simple statements that don't have further calculations involved. There are a couple of possible remedies.
1. Use a variable to compute the previous month number before you use it in the CALCULATE function.
Last Month =
VAR PrevMonth = SELECTEDVALUE(Table[Month]) - 1
RETURN CALCULATE(SUM(Table[Amount]), Table[Month] = PrevMonth)
2. Use a FILTER() function. This is an iterator that allows more complex filtering.
Last Month = CALCULATE(SUM(Table[Amount]),
FILTER(ALL(Table),
Table[Month] = SELECTEDVALUE(Table[Month]) - 1))
Edit: Since you are using year and month, you need to have a special case for January.
Last Month =
VAR MonthFilter = MOD(SELECTEDVALUE(Table[Month]) - 2, 12) + 1
VAR YearFilter = IF(PrevMonth = 12,
SELECTEDVALUE(Table[Year]) - 1,
SELECTEDVALUE(Table[Year]))
RETURN CALCULATE(SUM(Table[Amount]),
Table[Month] = MonthFilter,
Table[Year] = YearFilter)
I am trying to create a loop in Stata. I run a model for the data <= year and <= quarter. Then predict one year look ahead. That is the model is run all time points upto the loop, while the prediction happens in the next quarter out of sample. So my question is how do I handle so that when yridx = 2000, and qtr = 4, the next quarter inside the loop look ahead would be year = 2005, and year = 1.
foreach yridx of numlist 2000/2012 {
forvalues qtridx = 1/4 {
regress Y X if year <= yridx and qtr <= qtridx
predict
}
}
It sounds as if it would be much easier to work in terms of quarterly dates. Here is one of several ways to do it.
gen qdate = yq(year, qtridx)
forval m = `=yq(2000,1)'/`=yq(2012, 4)' {
regress Y X if qdate <= `m'
predict <whatever>
}
I am encountering some difficulty with a dataset that I am analyzing with Stata. The dataset I have is a repeated cross section of the following form:
Individual Year Age VarA VarB VarC
Variable C has been calculated for each individual by year, using the egen command. As a result, this variable is year specific. I now want to match the value of this variable corresponding to the year when each individual was x years old. (I create this new variable by the transform variableD=Year-Age+x).
I want to match the value of Variable C that was obtained in the year "variableD" for each individual.
Here's an example of how to do this with a user-written xfill:
net install xfill, from("http://www.sealedenvelope.com/")
webuse nlswork, clear
duplicates drop idcode age, force
gen x=20 if mod(idcode,2)==1
replace x=25 if mod(idcode,2)!=1
bys idcode year: egen var_c = mean(ln_wage)
bys idcode: gen var_c_at_x = var_c if age == x
xfill var_c_at_x, i(idcode)
edit idcode ln_wage year age x var_c*
I feel like I'm making this harder than it should be. I'm trying to display a sum for month C but this sum must include totals for months A, B & C. Then I need to do the same thing for Month D which includes totals for months B, C & D. Once I have this figured out I need to break it down by individual accounts but that part shouldn't be too difficult.
I have a date table to call on but it doesn't have month start or end dates which seems to be causing my difficulty.
So the solution to the above issue is to use a CTE (Common Table Expression) in the join statement identifying the date range accepted for the time period.
Select *
FROM A
LEFT JOIN B CASE WHEN a.DateID >= b.PeriodStart AND a.DateID <= b.PeriodEnd THEN 1 ELSE 0 END = 1