Calculating standard deviations of past entries for all entries for multiple variables - loops

My dataset looks like this:
date var1 var2 var3
01/01/2000 20 . .
02/01/2000 15 . .
03/01/2000 3 . .
. . . .
. . . .
. . . .
26/01/2023 3 . .
I want to generate variables which measure the standard deviation of a variable for a window 90 days before an entry. For example, I want a variable which tells me that in the 90 days before 26/01/2023 the standard deviation of entries of variable 1 was x. I want that for each observation for each variable. I don't need the sd for the first 90 entries.
Furthermore I have a large set of variables hence I would like to do this operation using a foreach var in loop. I would appreciate any input.

Your data example leaves ambiguous whether your dates are Stata daily dates (which you need) or strings or indeed integers with value labels.
rangestat from SSC can do this. As far as it is concerned, you could specify all you want on the command line. A loop isn't essential to do this.
Here is a reproducible example.
webuse grunfeld, clear
foreach v in invest mvalue kstock {
rangestat (count) count_`v' = `v' (sd) sd_`v'=`v' , int(year -5 -1) by(company)
}
In your case, you want -90 not -5, clearly. You may not need an analogue of by(company).
Asking for a count too is always a good idea in practice.
This just takes a date variable literally. In the case of daily data you often need to think hard about definitions, especially if there are no values for weekends and holidays.

Related

Reordering dataset in stata

We have a dataset of 1222x20 in Stata.
There are 611 individuals, such that each individual is in 2 rows of the dataset. There is only one variable of interest in each second row of each individual that we would like to use.
This means that we want a dataset of 611x21 that we need for our analysis.
It might also help if we could discard each odd/even row, and merge it later.
However, my Stata skills let me down at this point and I hope someone can help us.
Maybe someone knows any command or menu option that we might give a try.
If someone knows such a code, the individuals are characterized by the variable rescode, and the variable of interest on the second row is called enterprise.
Below, the head of our dataset is given. There is a binary time variable followup, where we want to regress the enterprise(yes/no) as dependent variable at time followup = followup onto enterprise as independent variable at time followup = baseline
We have tried something like this:
reg enterprise(if followup="Folowup") i.aimag group loan_baseline eduvoc edusec age16 under16 marr_cohab age age_sq buddhist hahl sep_f nov_f enterprise(if followup ="Baseline"), vce(cluster soum)
followup is a numeric variable with value labels, as its colouring in the Data Editor makes clear, so you can't test its values directly for equality or inequality with literal strings. (And if you could, the match needs to be exact, as Folowup would not be read as implying Followup.)
There is a syntax for identifying observations by value labels: see [U] 13.11 in the pdf documentation or https://www.stata-journal.com/article.html?article=dm0009.
However, it is usually easiest just to use the numeric value underneath the value label. So if the variable followup had numeric values 0 and 1, you would test for equality with 0 or 1.
You must use == not = for testing for equality here:
... if followup == 1
For any future Stata questions, please see the Stata tag wiki for detailed advice on how to present data. Screenshots are usually difficult to read and impossible to copy, and leave many details about the data obscure.

how to compute multiple variables using loop

In the dataset, there are two columns "start_year" and "end_year", indicating the year a patient start and end the registration in the GP clinic. I want to know whether each patient was registered in the clinic from 1990 to 2019. Probably compute 20 new variables (1=yes,0=no) for each year.
I used ifelse (R) to compute the variable one by one:
test$pt_1990<-ifelse(test$start_year<=1990 & 1990<=test$end_year,1,0)
Hope loops could have a better solution instead of write 20 lines of same code. Thank u very much

Cleaning up variable with highly similar observations

So I have a dataset in Stata that has a variable called "program description" that has very similar observations although the observations don't follow any pattern. My objective is to clean the variable so that the observations which are very similar will have the same name.
Here is an example of what the variable looks like:
Variable Name
phys ed
physical education
phys ed k-12
learning disabilities
learn dis
learn disable
Therefore, I would like the first three to just be called "phys ed" (or some derivative of that) and the last three to just be called "learning disabilities"
I've been using the function strpos() to replace observations that contain certain phrases but because the variable has 100k observations and a lot of different names, this takes a while.
You can use strgroup from SSC, but it's unlikely to get you all the way there. For example, this seems to work:
. strgroup string , gen(group) threshold(.7) normalize(longer)
. list, clean noobs
string group
phys ed 1
physical education 1
phys ed k-12 1
learning disabilities 2
learn dis 2
learn disable 2
However, "physics" would have been mapped to group 1 with these settings. Also, note that this command is case sensitive, so it might make sense to uppercase/lowercase everything first. The threshold is really a kind of tuning parameter.
I've also had some luck with Google/Open Refine with these problems. This is called reconciliation.
With all these approaches, some standardization goes a long way.

SPSS Identifying Different Lagged Values Through Loops

I have this dataset with 2 variables: week and brand_chosen, where brand chosen designates which product from e.g. a super market was chosen, an it looks like this.
Week brand_chosen
2 19
2 15
2 50
2 12
3 19
3 16
3 50
4 77
4 19
What I am trying to do is for each line, to note the week in which the brand purchase was made, and check if in the week before that the same brand purchase was made. In case it did, a variable dummy would take the value of 1, otherwise 0.
Because week appears multiple times I cannot take just the lag(week,1), so I probably need to loop through the week variables for each case, until it finds the first different value.
This is what i tried to do
loop i=1 to 70.
do if (week<>lag(week,i) and brand_chosen=lag(brand_chosen,i)).
compute dummy=1.
end loop.
else.
compute dummy=0.
end if.
end loop.
execute.
Where 70 is just an arbitrary number so that I am sure that it will check all the previous cases.
I get two problems with that. First the lag function needs to contain a number from what I understand but "i" is not considered a number here.
The second problem is that i would like to close the loop if the condition is satisfied, and move to the next case but I get an error.
I am new to spss syntax and I am struggling with that one, so any help is greatly appreciated.
I assume that every combination of week--brand_chosen is unique. In this case the solution is quite simple. Just reorder your dataset by brand_chosen and then week, and then run a simple lag command.
This should do the trick:
SORT CASES BY brand_chosen week.
COMPUTE dummy=0.
IF (brand_chosen=LAG(brand_chosen) AND week>LAG(week)) dummy = 1.

getting values return by commands in stata

I am trying to run the reg and geting back the coefficient values in Stata. I did the following. Assume that y is dependent variable, k,l,m,n are independent variables, and there is a new variable new that I created.
loc vars k l m n
reg y `vars'
# I know that I can get back the coefficients using mat list e(b) but I need to
get coefficient of each variable and use it to compute the elasticity (one at a time).
# so, I run the following loop but it doesn't work.
foreach i in vars {
sca coeff`i' = _b[`i'] # main problem here
sca cons = _b[_cons] # main problem here
corr new `i' , c # correlation of new with each independent vars
sca cov_`i' = r(cov_12)
sum `i'
sca elas_`i' = (coeff`i'*r(mean))/10 # elasticity not working
}
Any help in this regard will be highly appreciated.
As Fr. says, you shouldn't need to do this, given margins. But why is your code not working? You are using the wrong syntax for foreach.
You should be typing not
foreach i in vars
but
foreach i of local vars
as otherwise Stata will use the literal text vars not the contents of local macro vars. The two syntaxes are explained in the help and at greater length in http://www.stata-journal.com/sjpdf.html?articlenum=pr0005
Smaller points:
The assignment sca cons = _b[_cons] should work but you don't need to repeat it every time round the loop.
You don't show us your code for generating new, so we have to assume that is all right.
By the way, "doesn't work" doesn't mean much. I once compiled a list of about 20 meanings I have encountered, the most important including "is illegal" and "doesn't do what I want". So, giving detail on exactly what happened -- in this case exactly what Stata typed in response -- is always helpful.

Resources