I'm having some problems with a loop that I'm trying to perform and probably with the syntax for generating the variable that I want.
Putting in words, what I am trying to do make is a sum of a particular set of observations and storing each sum in a cell for a new variable. Here is an example of syntax that I used:
forvalues j=1/50 {
replace x1 = sum(houses) if village==j'& year==2010
}
gen x2=.
forvalues j=1/50 {
replace x2 = sum(houses) if village==j' & year==2011
}
gen x3 =.
forvalues j=1/50 {
replace x3 = sum(houses) if village==j' & year==2012
}
This is from a dataset with more than 4000 observations. So, for each particular j, if I were successful with the code above, I would get an unique observation for each j (what I want to obtain), but I'm not obtaining this -- which is a sum of all houses, conditioned with the year and village; the total sum of houses per village in each year. I would greatly appreciate if someone could help me obtain one particular observation for each j in each variable.
sum() will return a running sum, so that is probably not what you want. This type of problem is usually much easier to solve with the by: prefix in combination with the egen command. The one line command below will give you the total number of houses per village and year:
bys village year: egen Nhouses = total(houses)
Related
read.csv("C:\Users\easy\Desktop\workbook.csv")
I need to estimate the structural breakpoint of regression over a list of countries in my dataset and I need to store these breakeven points for each country I have and display these breakeven points in a table form once the loop finishes. My dataset is panel data that is why I need to loop over the countries.
I estimate the regression for each country in my countrynum variable of countries' list. And I try to store the breakeven point for each country regression estimation as follows
foreach i in countrynum {
by countrynum, sort: reg y x1 x2 x3 if `i'== countrynum
est store `r'(breakdate)
}
Stata is returning the following error message:
( invalid name
) invalid name
r(7);
Any idea what is wrong with my code?
Assuming the syntax fixes that Nick Cox aptly laid out, what you are missing is sbsingle or some other structural break command before asking Stata for r(breakdate); see here for more. After that you could do something like this, assuming that your panels are identified by countrynum.
* EX DATA
webuse usmacro, clear
tempfile append
save `append', replace
append using `append', gen(countrynum)
* Run By program (ssc install runby)
capture program drop panel_breakdate
program panel_breakdate
tsset date
regress fedfunds L.fedfunds
estat sbsingle
gen breakdate = r(breakdate)
end
runby panel_breakdate, by(countrynum) verbose
* After this format your breakdate how you please.
There is a lot wrong with your code, unfortunately, although you haven't noticed various errors because they are errors of meaning, not errors of syntax.
For a start,
foreach i in countrynum {
does not trigger a loop over the distinct values of countrynum. It is a loop over one item, the variable name countrynum.
So your test becomes
if countrynum == countrynum
which is always true, and the loop is no loop, but equivalent to
by countrynum, sort: reg y x1 x2 x3
est store `r'(breakdate)
Now the next problem is that the first command runs through several regressions, but only results for the last regression (for the last country named) will remain in memory.
The error that Stata noticed is that it does not know what you mean by
`r'(breakdate)
You are, it seems, referring to a result that requires extra syntax to get
`r(breakdate)'
Positive suggestion. Using statsby is a much better idea.
General Solution
I have a solution to your problem I believe. This program needs to all be run at the same time due to the use of local variables. This worked for me on the usmacro test data where I made half the observations country 1 and the other half country 2. It should work for you as well as long as your data is tsset already.
levelsof countrynum
foreach lev in `r(levels)' {
reg y x1 x2 x3 if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
As long as you have no scalars previously made, it will return a list of all the breakdates for the countries with the syntax of (break)(countrynum) without the parentheses. Let me know if this doesn't work for you, it's difficult without any example data from you but it works in my test environment.
Example
If you want to see how this works before you run it on your dataset use the following commands at once,
clear all
webuse usmacro
gen countrynum = 01 if _n < 35
replace countrynum = 22 if countrynum == .
tsset date
levelsof countrynum
foreach lev in `r(levels)' {
reg fedfunds L.fedfunds inflation if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
which will return the following in the stata output,
. scalar list
break22 = 1980q4
break1 = 1958q1
Ultimately, I want to change scores of 0 to 1, scores of 1 to 2, and scores of 2 to 3. I thought one way to do that was using +1, but I realize I could also use a more complicated if then series.
Here is what I did so far:
I used the existing variable (x) to create a new variable (y=x+1) using SPSS syntax. I only want to do this for variables with values >=0 (this was my approach to excluding cells with missing data; the range for x is 0-2).
I can create x+1, but it overwrites the existing variables.
DO REPEAT x =var_1 TO var_86.
if (x>=0) x=(x+1).
end repeat.
exe.
I tried this modification, but it doesn't work:
DO REPEAT x = var_1 TO var_86 / y = var_1a TO var_86a.
IF (x >= 0) y=x +1.
END REPEAT.
EXE.
The error message is:
DO REPEAT The form VARX TO VARY to refer to a range of variables has
been used incorrectly. When using VARX TO VARY to create new
variables, X must be an integer less than or equal to the integer Y.
(Can't use A3 TO A1.)
I tried many other configurations including vectors and loops but haven't yet figured out how to do this computation across the range of variables without overwriting the existing ones. Thanks in advance for any recommendations.
The message you are getting is because SPSS doesn't understand the form var_1a TO var_86a.
For the x to y form to work the number has to be at the end of the name, so for example varA_1 to varA_86 should work.
While you're at it, here's a simple way to go about your task:
recode var_1 TO var_86 (0=1)(1=2)(2=3) into varA_1 TO varA_86.
I am trying to calculate the Gini coefficient as the average over five repetitions. My code doesn't correctly work, and I cannot find a way to do it.
inequal7 is a user-written command.
gen gini=.
forval i=1/5 {
mi xeq `i' : inequal7 income [aw=hw0010]
gen gini_`i'=.
scalar gini_`i' = r(gini)
replace gini_`i'= r(gini)
if `i' ==5 {
replace gini = sum(gini_1+gini_2+gini_3+gini_4+gini_5)/5
}
}
Can someone help me?
There is no context on or example of the dataset you're using. This may not work but it's likely to be closer to legal and correct than what you have.
scalar gini = 0
forval i=1/5 {
mi xeq `i' : inequal7 income [aw=hw0010]
scalar gini = scalar(gini) + r(gini)
}
scalar gini = scalar(gini) / 5
Notes:
Using variables to hold constants is legal, but not necessarily good style.
sum() gives the running or cumulative sum; applied to a variable that's a constant it does far more work than you need, and at best the correct answer will just be that in observation 1. As you're feeding it the sum of 5 values, it's redundant any way.
Watch out: names for scalars and variables occupy the same namespace.
If this is a long way off what you want, and you get no better answer, you're likely to need to give much more detail.
In Stata, I want to explore regressions with many combinations of different dependent and independent variables.
For this, I decided to use a loop that does all these regressions, and then saves the relevant results (coefficients, R2, etc.) in a matrix in a concise and convenient form.
For this matrix, I want to name rows and columns to make reading easier.
Here is my code so far:
clear
sysuse auto.dta
set more off
scalar i = 1
foreach v in price mpg {
foreach w in weight length {
quietly: reg `v' `w' foreign
local result_`v'_`w'_b = _b[`w']
local result_`v'_`w'_t = ( _b[`w'] / _se[`w'] )
local result_`v'_`w'_r2 = e(r2)
if scalar(i) == 1 {
mat A = `result_`v'_`w'_b', `result_`v'_`w'_t', `result_`v'_`w'_r2'
local rownms: var label `v'
}
if i > 1 {
mat A = A \ [`result_`v'_`w'_b', `result_`v'_`w'_t', `result_`v'_`w'_r2']
*local rownms: `rownms' "var label `v'"
}
scalar i = i+1
}
}
mat coln A = b t r2
mat rown A = `rownms'
matrix list A
It will give a resulting matrix A that looks like this:
. matrix list A
A[4,3]
b t r2
Price 3.3207368 8.3882744 .4989396
Price 90.212391 5.6974982 .31538316
Price -.00658789 -10.340218 .66270291
Price -.22001836 -9.7510366 .63866239
Clearly, there is something not quite finished yet. The row names of the matrix should be "price, price, mpg, mpg" because that is what the dependent variable is in the four regressions.
In the code above, consider the now-commented-out line
*local rownms: `rownms' "var label `v'"
It is commented out because in the current form, it gives an error.
I wish to append the local macro rownms with the label (or name) of the variable on every iteration, producing Price Price Mileage (MPG) Mileage (MPG).
But I cannot seem to get the quotes right to append the macro with the label of the current variable.
Matrix row and column names are limited in what they can hold. In general, variable labels won't be very suitable.
Here is some simpler code.
sysuse auto.dta, clear
matrix drop A
local rownms
foreach v in price mpg {
foreach w in weight length {
quietly: reg `v' `w' foreign
mat A = nullmat(A) \ (_b[`w'], _b[`w']/_se[`w'], e(r2))
local rownms `rownms' `v':`w'
}
}
mat coln A = b t r2
mat rown A = `rownms'
matrix list A
Notes:
The nullmat() trick removes the need for a branch of the code on first and later runs through.
Putting results into locals and then taking them out again is not needed. To get out of the habit, think of this analogy. You have a pen in your hand. You put it in a box. You take it out again. Now you have a pen in your hand. Why do the box thing if you don't need to?
This works with your example, but the results are not very good.
local rownms `rownms' "`: var label `v''"
I have a loop in Stata 12 that looks at each record in a file and if it finds a flag equal to 1 it generates five imputed values. My code looks like this:
forvalues i=1/5 {
gen y3`i' = y2
gen double h`i' = (uniform()*(1-a)+a) if flag==1
replace y3`i' = 1.6*(invibeta(7.2,2.6,h`i')/(1-invibeta(7.2,2.6,h`i')))^(1/1.7) if
flag==1
}
a is defined elsewhere. I want to check the individual imputations. Thus, I need to display the imputed variable preferably only for those cases where flag=1. I also would like to display another value, s, alongside. I need help in figuring out the syntax. I've tried every combination of quotes and subscripts that I can think of, but I keep getting error messages.
One other useful modification occurs to me. Suppose I have three concatenated files on which I want to perform this routine. Let them have a variable file equal to 1, 2 or 3. I'd like to set a separate seed for each and do it in my program so I have a record. I envision something like:
forvalues j=1/3 {
set seed=12345 if file=1
set seed=56789 if file=2
set seed=98765 if file=3
insert code above
}
Will this work?
No comment is possible on code you don't show, but the word "display" may be misleading you.
list y3`i' if flag == 1
or some variation may be what you seek. Note that display is geared to showing at most one line of output at a time.
P.S. As you are William Shakespeare, know that the mug http://www.stata.com/giftshop/much-ado-mug/ was inspired by your work.
A subsidiary question asks about choosing a different seed each time around a loop. That is easy:
forval j = 1/3 {
local seed : word `j' of 12345 56789 98765
set seed `seed'
...
}
or
tokenize 12345 56789 98765
forval j = 1/3 {
set seed ``j''
...
}