In Stata, I want to explore regressions with many combinations of different dependent and independent variables.
For this, I decided to use a loop that does all these regressions, and then saves the relevant results (coefficients, R2, etc.) in a matrix in a concise and convenient form.
For this matrix, I want to name rows and columns to make reading easier.
Here is my code so far:
clear
sysuse auto.dta
set more off
scalar i = 1
foreach v in price mpg {
foreach w in weight length {
quietly: reg `v' `w' foreign
local result_`v'_`w'_b = _b[`w']
local result_`v'_`w'_t = ( _b[`w'] / _se[`w'] )
local result_`v'_`w'_r2 = e(r2)
if scalar(i) == 1 {
mat A = `result_`v'_`w'_b', `result_`v'_`w'_t', `result_`v'_`w'_r2'
local rownms: var label `v'
}
if i > 1 {
mat A = A \ [`result_`v'_`w'_b', `result_`v'_`w'_t', `result_`v'_`w'_r2']
*local rownms: `rownms' "var label `v'"
}
scalar i = i+1
}
}
mat coln A = b t r2
mat rown A = `rownms'
matrix list A
It will give a resulting matrix A that looks like this:
. matrix list A
A[4,3]
b t r2
Price 3.3207368 8.3882744 .4989396
Price 90.212391 5.6974982 .31538316
Price -.00658789 -10.340218 .66270291
Price -.22001836 -9.7510366 .63866239
Clearly, there is something not quite finished yet. The row names of the matrix should be "price, price, mpg, mpg" because that is what the dependent variable is in the four regressions.
In the code above, consider the now-commented-out line
*local rownms: `rownms' "var label `v'"
It is commented out because in the current form, it gives an error.
I wish to append the local macro rownms with the label (or name) of the variable on every iteration, producing Price Price Mileage (MPG) Mileage (MPG).
But I cannot seem to get the quotes right to append the macro with the label of the current variable.
Matrix row and column names are limited in what they can hold. In general, variable labels won't be very suitable.
Here is some simpler code.
sysuse auto.dta, clear
matrix drop A
local rownms
foreach v in price mpg {
foreach w in weight length {
quietly: reg `v' `w' foreign
mat A = nullmat(A) \ (_b[`w'], _b[`w']/_se[`w'], e(r2))
local rownms `rownms' `v':`w'
}
}
mat coln A = b t r2
mat rown A = `rownms'
matrix list A
Notes:
The nullmat() trick removes the need for a branch of the code on first and later runs through.
Putting results into locals and then taking them out again is not needed. To get out of the habit, think of this analogy. You have a pen in your hand. You put it in a box. You take it out again. Now you have a pen in your hand. Why do the box thing if you don't need to?
This works with your example, but the results are not very good.
local rownms `rownms' "`: var label `v''"
Related
I want to aggregate my variables A1,.., A5 by creating a new variable A that is equal to A1,..,A5 whenever the corresponding r`k' is equal to 1. I have too many variables that I want to aggregate like this way and I was wondering maybe is there a way to write it more compactly than my code below. (I'm guessing that foreach can be used here but I'm not sure how)
gen A=.
gen B=.
forvalues k=1/5 {
replace A=A`k' if r`k'==1
replace B=B`k' if r`k'==1
}
This maps one loop into two:
foreach v in A B
gen `v' = .
forvalues k=1/5 {
replace `v' = `v'`k' if r`k'==1
}
}
But perhaps your data structure needs revisiting, so that instead of looping twice you could get something else more simply. So, if it makes sense, stacking A B whatever into one variable would remove one loop.
read.csv("C:\Users\easy\Desktop\workbook.csv")
I need to estimate the structural breakpoint of regression over a list of countries in my dataset and I need to store these breakeven points for each country I have and display these breakeven points in a table form once the loop finishes. My dataset is panel data that is why I need to loop over the countries.
I estimate the regression for each country in my countrynum variable of countries' list. And I try to store the breakeven point for each country regression estimation as follows
foreach i in countrynum {
by countrynum, sort: reg y x1 x2 x3 if `i'== countrynum
est store `r'(breakdate)
}
Stata is returning the following error message:
( invalid name
) invalid name
r(7);
Any idea what is wrong with my code?
Assuming the syntax fixes that Nick Cox aptly laid out, what you are missing is sbsingle or some other structural break command before asking Stata for r(breakdate); see here for more. After that you could do something like this, assuming that your panels are identified by countrynum.
* EX DATA
webuse usmacro, clear
tempfile append
save `append', replace
append using `append', gen(countrynum)
* Run By program (ssc install runby)
capture program drop panel_breakdate
program panel_breakdate
tsset date
regress fedfunds L.fedfunds
estat sbsingle
gen breakdate = r(breakdate)
end
runby panel_breakdate, by(countrynum) verbose
* After this format your breakdate how you please.
There is a lot wrong with your code, unfortunately, although you haven't noticed various errors because they are errors of meaning, not errors of syntax.
For a start,
foreach i in countrynum {
does not trigger a loop over the distinct values of countrynum. It is a loop over one item, the variable name countrynum.
So your test becomes
if countrynum == countrynum
which is always true, and the loop is no loop, but equivalent to
by countrynum, sort: reg y x1 x2 x3
est store `r'(breakdate)
Now the next problem is that the first command runs through several regressions, but only results for the last regression (for the last country named) will remain in memory.
The error that Stata noticed is that it does not know what you mean by
`r'(breakdate)
You are, it seems, referring to a result that requires extra syntax to get
`r(breakdate)'
Positive suggestion. Using statsby is a much better idea.
General Solution
I have a solution to your problem I believe. This program needs to all be run at the same time due to the use of local variables. This worked for me on the usmacro test data where I made half the observations country 1 and the other half country 2. It should work for you as well as long as your data is tsset already.
levelsof countrynum
foreach lev in `r(levels)' {
reg y x1 x2 x3 if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
As long as you have no scalars previously made, it will return a list of all the breakdates for the countries with the syntax of (break)(countrynum) without the parentheses. Let me know if this doesn't work for you, it's difficult without any example data from you but it works in my test environment.
Example
If you want to see how this works before you run it on your dataset use the following commands at once,
clear all
webuse usmacro
gen countrynum = 01 if _n < 35
replace countrynum = 22 if countrynum == .
tsset date
levelsof countrynum
foreach lev in `r(levels)' {
reg fedfunds L.fedfunds inflation if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
which will return the following in the stata output,
. scalar list
break22 = 1980q4
break1 = 1958q1
I have data in Stata with 3 variables, a string id and numeric variables (GPS data - latitude and longitude). I would like to convert the variables into a matrix in the following way (the lower table) to calculate the distance between two id-spots for all combinations. So a newly created subsequent column (e.g, id_1) has a subsequent(i+1) value of the original variable (e.g., id), and so on. However, the following command works only until the n-th row is reached to get a value; then the subsequent new rows become empty. Thus, the half bottom of the matrix gets missing (the upper table: ///) . For 2000 observations:
foreach num of numlist 1/2000 {
foreach var of varlist id num1 num2 {
gen `var'_`num'=`var'[_n+`num']
}
}
I post an answer if anybody finds any use.
//duplicate all observation to create all filled matrix
expand 2, gen(dupindex)
forvalue i = 1/1999 {
foreach var of varlist id num1 num2 {
gen `var'`i'=`var'[_n+`i']
}
}
//delete the unnecessary columns & rows
forvalue i = 2000/3999 {
drop id`i' num1`i' num2`i'
}
drop in 2001/3999
drop dupindex
I want to generate variable names that depend on given variable games.
For example, if games is given as 3, the result is
game1 = list of uniformly distributed values
game2 = same, etc.
While there are multiple examples and answers on similar questions, I cannot see why my code cannot produce the results I want
Stata shows syntax error for the following loop:
set obs 1000
forvalues i = 1(1)games {
generate game`i' = runiform()
}
Is games really a variable in Stata's sense? Holding the same constant again and again is unnecessary and inefficient.
The problem is that forvalues expects to see numbers; it won't perform evaluations on the fly. But other parts of Stata will do that.
If you know you want just 3 variables then you could do just this:
clear
set obs 1000
forvalues i = 1/3 {
generate game`i' = runiform()
}
Or you could do something like this:
clear
set obs 1000
local games = 3
forvalues i = 1/`games' {
generate game`i' = runiform()
}
That does not contradict my opening paragraph. All macro evaluations are performed before any command is executed; thus forvalues sees 3, not a local macro name.
If you were really were holding a constant in a variable, then this would work:
clear
gen games = 3
set obs 1000
forvalues i = 1/`=games[1]' {
generate game`i' = runiform()
}
I'm having some problems with a loop that I'm trying to perform and probably with the syntax for generating the variable that I want.
Putting in words, what I am trying to do make is a sum of a particular set of observations and storing each sum in a cell for a new variable. Here is an example of syntax that I used:
forvalues j=1/50 {
replace x1 = sum(houses) if village==j'& year==2010
}
gen x2=.
forvalues j=1/50 {
replace x2 = sum(houses) if village==j' & year==2011
}
gen x3 =.
forvalues j=1/50 {
replace x3 = sum(houses) if village==j' & year==2012
}
This is from a dataset with more than 4000 observations. So, for each particular j, if I were successful with the code above, I would get an unique observation for each j (what I want to obtain), but I'm not obtaining this -- which is a sum of all houses, conditioned with the year and village; the total sum of houses per village in each year. I would greatly appreciate if someone could help me obtain one particular observation for each j in each variable.
sum() will return a running sum, so that is probably not what you want. This type of problem is usually much easier to solve with the by: prefix in combination with the egen command. The one line command below will give you the total number of houses per village and year:
bys village year: egen Nhouses = total(houses)