Storing structural break points with "foreach" loop in stata - loops

read.csv("C:\Users\easy\Desktop\workbook.csv")
I need to estimate the structural breakpoint of regression over a list of countries in my dataset and I need to store these breakeven points for each country I have and display these breakeven points in a table form once the loop finishes. My dataset is panel data that is why I need to loop over the countries.
I estimate the regression for each country in my countrynum variable of countries' list. And I try to store the breakeven point for each country regression estimation as follows
foreach i in countrynum {
by countrynum, sort: reg y x1 x2 x3 if `i'== countrynum
est store `r'(breakdate)
}
Stata is returning the following error message:
( invalid name
) invalid name
r(7);
Any idea what is wrong with my code?

Assuming the syntax fixes that Nick Cox aptly laid out, what you are missing is sbsingle or some other structural break command before asking Stata for r(breakdate); see here for more. After that you could do something like this, assuming that your panels are identified by countrynum.
* EX DATA
webuse usmacro, clear
tempfile append
save `append', replace
append using `append', gen(countrynum)
* Run By program (ssc install runby)
capture program drop panel_breakdate
program panel_breakdate
tsset date
regress fedfunds L.fedfunds
estat sbsingle
gen breakdate = r(breakdate)
end
runby panel_breakdate, by(countrynum) verbose
* After this format your breakdate how you please.

There is a lot wrong with your code, unfortunately, although you haven't noticed various errors because they are errors of meaning, not errors of syntax.
For a start,
foreach i in countrynum {
does not trigger a loop over the distinct values of countrynum. It is a loop over one item, the variable name countrynum.
So your test becomes
if countrynum == countrynum
which is always true, and the loop is no loop, but equivalent to
by countrynum, sort: reg y x1 x2 x3
est store `r'(breakdate)
Now the next problem is that the first command runs through several regressions, but only results for the last regression (for the last country named) will remain in memory.
The error that Stata noticed is that it does not know what you mean by
`r'(breakdate)
You are, it seems, referring to a result that requires extra syntax to get
`r(breakdate)'
Positive suggestion. Using statsby is a much better idea.

General Solution
I have a solution to your problem I believe. This program needs to all be run at the same time due to the use of local variables. This worked for me on the usmacro test data where I made half the observations country 1 and the other half country 2. It should work for you as well as long as your data is tsset already.
levelsof countrynum
foreach lev in `r(levels)' {
reg y x1 x2 x3 if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
As long as you have no scalars previously made, it will return a list of all the breakdates for the countries with the syntax of (break)(countrynum) without the parentheses. Let me know if this doesn't work for you, it's difficult without any example data from you but it works in my test environment.
Example
If you want to see how this works before you run it on your dataset use the following commands at once,
clear all
webuse usmacro
gen countrynum = 01 if _n < 35
replace countrynum = 22 if countrynum == .
tsset date
levelsof countrynum
foreach lev in `r(levels)' {
reg fedfunds L.fedfunds inflation if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
which will return the following in the stata output,
. scalar list
break22 = 1980q4
break1 = 1958q1

Related

Compute if loop SPSS

Ultimately, I want to change scores of 0 to 1, scores of 1 to 2, and scores of 2 to 3. I thought one way to do that was using +1, but I realize I could also use a more complicated if then series.
Here is what I did so far:
I used the existing variable (x) to create a new variable (y=x+1) using SPSS syntax. I only want to do this for variables with values >=0 (this was my approach to excluding cells with missing data; the range for x is 0-2).
I can create x+1, but it overwrites the existing variables.
DO REPEAT x =var_1 TO var_86.
if (x>=0) x=(x+1).
end repeat.
exe.
I tried this modification, but it doesn't work:
DO REPEAT x = var_1 TO var_86 / y = var_1a TO var_86a.
IF (x >= 0) y=x +1.
END REPEAT.
EXE.
The error message is:
DO REPEAT The form VARX TO VARY to refer to a range of variables has
been used incorrectly. When using VARX TO VARY to create new
variables, X must be an integer less than or equal to the integer Y.
(Can't use A3 TO A1.)
I tried many other configurations including vectors and loops but haven't yet figured out how to do this computation across the range of variables without overwriting the existing ones. Thanks in advance for any recommendations.
The message you are getting is because SPSS doesn't understand the form var_1a TO var_86a.
For the x to y form to work the number has to be at the end of the name, so for example varA_1 to varA_86 should work.
While you're at it, here's a simple way to go about your task:
recode var_1 TO var_86 (0=1)(1=2)(2=3) into varA_1 TO varA_86.

Structuring a for loop to output classifier predictions in python

I have an existing .py file that prints a classifier.predict for a SVC model. I would like to loop through each row in the X feature set to return a prediction.
I am currently trying to define the element from which to iterate over so as to allow for definition of the test statistic feature set X.
The test statistic feature set X is written in code as:
X_1 = xspace.iloc[testval-1:testval, 0:5]
testval is the element name used in the for loop in the above line:
for testval in X.T.iterrows():
print(testval)
I am having trouble returning a basic set of index values for X (X is the pandas dataframe)
I have tested the following with no success.
for index in X.T.iterrows():
print(index)
for index in X.T.iteritems():
print(index)
I am looking for the set of index values, with base 1 if possible, like 1,2,3,4,5,6,7,8,9,10...n
seemingly simple stuff...i haven't located an existing question via stackoverflow or google.
ALSO, the individual dataframes I used as the basis for X were refined with the line:
df1.set_index('Date', inplace = True)
Because dates were used as the basis for the concatenation of the individual dataframes the loops as written above are returning date values rather than
location values as I would prefer hence:
X_1 = xspace.iloc[testval-1:testval, 0:5]
where iloc, location is noted
please ask for additional code if you'd like to see more
the loops i've done thus far are returning date values, I would like to return index values of the location of the rows to accommodate the line:
X_1 = xspace.iloc[testval-1:testval, 0:5]
The loop structure below seems to be working for my application.
i = 1
j = list(range(1, len(X),1)
for i in j:

Append local macro in Stata

In Stata, I want to explore regressions with many combinations of different dependent and independent variables.
For this, I decided to use a loop that does all these regressions, and then saves the relevant results (coefficients, R2, etc.) in a matrix in a concise and convenient form.
For this matrix, I want to name rows and columns to make reading easier.
Here is my code so far:
clear
sysuse auto.dta
set more off
scalar i = 1
foreach v in price mpg {
foreach w in weight length {
quietly: reg `v' `w' foreign
local result_`v'_`w'_b = _b[`w']
local result_`v'_`w'_t = ( _b[`w'] / _se[`w'] )
local result_`v'_`w'_r2 = e(r2)
if scalar(i) == 1 {
mat A = `result_`v'_`w'_b', `result_`v'_`w'_t', `result_`v'_`w'_r2'
local rownms: var label `v'
}
if i > 1 {
mat A = A \ [`result_`v'_`w'_b', `result_`v'_`w'_t', `result_`v'_`w'_r2']
*local rownms: `rownms' "var label `v'"
}
scalar i = i+1
}
}
mat coln A = b t r2
mat rown A = `rownms'
matrix list A
It will give a resulting matrix A that looks like this:
. matrix list A
A[4,3]
b t r2
Price 3.3207368 8.3882744 .4989396
Price 90.212391 5.6974982 .31538316
Price -.00658789 -10.340218 .66270291
Price -.22001836 -9.7510366 .63866239
Clearly, there is something not quite finished yet. The row names of the matrix should be "price, price, mpg, mpg" because that is what the dependent variable is in the four regressions.
In the code above, consider the now-commented-out line
*local rownms: `rownms' "var label `v'"
It is commented out because in the current form, it gives an error.
I wish to append the local macro rownms with the label (or name) of the variable on every iteration, producing Price Price Mileage (MPG) Mileage (MPG).
But I cannot seem to get the quotes right to append the macro with the label of the current variable.
Matrix row and column names are limited in what they can hold. In general, variable labels won't be very suitable.
Here is some simpler code.
sysuse auto.dta, clear
matrix drop A
local rownms
foreach v in price mpg {
foreach w in weight length {
quietly: reg `v' `w' foreign
mat A = nullmat(A) \ (_b[`w'], _b[`w']/_se[`w'], e(r2))
local rownms `rownms' `v':`w'
}
}
mat coln A = b t r2
mat rown A = `rownms'
matrix list A
Notes:
The nullmat() trick removes the need for a branch of the code on first and later runs through.
Putting results into locals and then taking them out again is not needed. To get out of the habit, think of this analogy. You have a pen in your hand. You put it in a box. You take it out again. Now you have a pen in your hand. Why do the box thing if you don't need to?
This works with your example, but the results are not very good.
local rownms `rownms' "`: var label `v''"

Generating sums of variables with loops in Stata

I'm having some problems with a loop that I'm trying to perform and probably with the syntax for generating the variable that I want.
Putting in words, what I am trying to do make is a sum of a particular set of observations and storing each sum in a cell for a new variable. Here is an example of syntax that I used:
forvalues j=1/50 {
replace x1 = sum(houses) if village==j'& year==2010
}
gen x2=.
forvalues j=1/50 {
replace x2 = sum(houses) if village==j' & year==2011
}
gen x3 =.
forvalues j=1/50 {
replace x3 = sum(houses) if village==j' & year==2012
}
This is from a dataset with more than 4000 observations. So, for each particular j, if I were successful with the code above, I would get an unique observation for each j (what I want to obtain), but I'm not obtaining this -- which is a sum of all houses, conditioned with the year and village; the total sum of houses per village in each year. I would greatly appreciate if someone could help me obtain one particular observation for each j in each variable.
sum() will return a running sum, so that is probably not what you want. This type of problem is usually much easier to solve with the by: prefix in combination with the egen command. The one line command below will give you the total number of houses per village and year:
bys village year: egen Nhouses = total(houses)

Display data in Stata loop

I have a loop in Stata 12 that looks at each record in a file and if it finds a flag equal to 1 it generates five imputed values. My code looks like this:
forvalues i=1/5 {
gen y3`i' = y2
gen double h`i' = (uniform()*(1-a)+a) if flag==1
replace y3`i' = 1.6*(invibeta(7.2,2.6,h`i')/(1-invibeta(7.2,2.6,h`i')))^(1/1.7) if
flag==1
}
a is defined elsewhere. I want to check the individual imputations. Thus, I need to display the imputed variable preferably only for those cases where flag=1. I also would like to display another value, s, alongside. I need help in figuring out the syntax. I've tried every combination of quotes and subscripts that I can think of, but I keep getting error messages.
One other useful modification occurs to me. Suppose I have three concatenated files on which I want to perform this routine. Let them have a variable file equal to 1, 2 or 3. I'd like to set a separate seed for each and do it in my program so I have a record. I envision something like:
forvalues j=1/3 {
set seed=12345 if file=1
set seed=56789 if file=2
set seed=98765 if file=3
insert code above
}
Will this work?
No comment is possible on code you don't show, but the word "display" may be misleading you.
list y3`i' if flag == 1
or some variation may be what you seek. Note that display is geared to showing at most one line of output at a time.
P.S. As you are William Shakespeare, know that the mug http://www.stata.com/giftshop/much-ado-mug/ was inspired by your work.
A subsidiary question asks about choosing a different seed each time around a loop. That is easy:
forval j = 1/3 {
local seed : word `j' of 12345 56789 98765
set seed `seed'
...
}
or
tokenize 12345 56789 98765
forval j = 1/3 {
set seed ``j''
...
}

Resources