Stata: Generate Variable with foreach - loops

I want to aggregate my variables A1,.., A5 by creating a new variable A that is equal to A1,..,A5 whenever the corresponding r`k' is equal to 1. I have too many variables that I want to aggregate like this way and I was wondering maybe is there a way to write it more compactly than my code below. (I'm guessing that foreach can be used here but I'm not sure how)
gen A=.
gen B=.
forvalues k=1/5 {
replace A=A`k' if r`k'==1
replace B=B`k' if r`k'==1
}

This maps one loop into two:
foreach v in A B
gen `v' = .
forvalues k=1/5 {
replace `v' = `v'`k' if r`k'==1
}
}
But perhaps your data structure needs revisiting, so that instead of looping twice you could get something else more simply. So, if it makes sense, stacking A B whatever into one variable would remove one loop.

Related

Storing structural break points with "foreach" loop in stata

read.csv("C:\Users\easy\Desktop\workbook.csv")
I need to estimate the structural breakpoint of regression over a list of countries in my dataset and I need to store these breakeven points for each country I have and display these breakeven points in a table form once the loop finishes. My dataset is panel data that is why I need to loop over the countries.
I estimate the regression for each country in my countrynum variable of countries' list. And I try to store the breakeven point for each country regression estimation as follows
foreach i in countrynum {
by countrynum, sort: reg y x1 x2 x3 if `i'== countrynum
est store `r'(breakdate)
}
Stata is returning the following error message:
( invalid name
) invalid name
r(7);
Any idea what is wrong with my code?
Assuming the syntax fixes that Nick Cox aptly laid out, what you are missing is sbsingle or some other structural break command before asking Stata for r(breakdate); see here for more. After that you could do something like this, assuming that your panels are identified by countrynum.
* EX DATA
webuse usmacro, clear
tempfile append
save `append', replace
append using `append', gen(countrynum)
* Run By program (ssc install runby)
capture program drop panel_breakdate
program panel_breakdate
tsset date
regress fedfunds L.fedfunds
estat sbsingle
gen breakdate = r(breakdate)
end
runby panel_breakdate, by(countrynum) verbose
* After this format your breakdate how you please.
There is a lot wrong with your code, unfortunately, although you haven't noticed various errors because they are errors of meaning, not errors of syntax.
For a start,
foreach i in countrynum {
does not trigger a loop over the distinct values of countrynum. It is a loop over one item, the variable name countrynum.
So your test becomes
if countrynum == countrynum
which is always true, and the loop is no loop, but equivalent to
by countrynum, sort: reg y x1 x2 x3
est store `r'(breakdate)
Now the next problem is that the first command runs through several regressions, but only results for the last regression (for the last country named) will remain in memory.
The error that Stata noticed is that it does not know what you mean by
`r'(breakdate)
You are, it seems, referring to a result that requires extra syntax to get
`r(breakdate)'
Positive suggestion. Using statsby is a much better idea.
General Solution
I have a solution to your problem I believe. This program needs to all be run at the same time due to the use of local variables. This worked for me on the usmacro test data where I made half the observations country 1 and the other half country 2. It should work for you as well as long as your data is tsset already.
levelsof countrynum
foreach lev in `r(levels)' {
reg y x1 x2 x3 if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
As long as you have no scalars previously made, it will return a list of all the breakdates for the countries with the syntax of (break)(countrynum) without the parentheses. Let me know if this doesn't work for you, it's difficult without any example data from you but it works in my test environment.
Example
If you want to see how this works before you run it on your dataset use the following commands at once,
clear all
webuse usmacro
gen countrynum = 01 if _n < 35
replace countrynum = 22 if countrynum == .
tsset date
levelsof countrynum
foreach lev in `r(levels)' {
reg fedfunds L.fedfunds inflation if countrynum == `lev'
estat sbsingle
scalar break`lev' = r(breakdate)
}
scalar list
which will return the following in the stata output,
. scalar list
break22 = 1980q4
break1 = 1958q1

Optimizing custom fill of a 2d array in Julia

I'm a little new to Julia and am trying to use the fill! method to improve code performance on Julia. Currently, I read a 2d array from a file say read_array and perform row-operations on it to get a processed_array as follows:
function preprocess(matrix)
# Initialise
processed_array= Array{Float64,2}(undef, size(matrix));
#first row of processed_array is the difference of first two row of matrix
processed_array[1,:] = (matrix[2,:] .- matrix[1,:]) ;
#last row of processed_array is difference of last two rows of matrix
processed_array[end,:] = (matrix[end,:] .- matrix[end-1,:]);
#all other rows of processed_array is the mean-difference of other two rows
processed_array[2:end-1,:] = (matrix[3:end,:] .- matrix[1:end-2,:]) .*0.5 ;
return processed_array
end
However, when I try using the fill! method I get a MethodError.
processed_array = copy(matrix)
fill!(processed_array [1,:],d[2,:]-d[1,:])
MethodError: Cannot convert an object of type Matrix{Float64} to an object of type Float64
I'll be glad if someone can tell me what I'm missing and also suggest a method to optimize the code. Thanks in advance!
fill!(A, x) is used to fill the array A with a unique value x, so it's not what you want anyway.
What you could do for a little performance gain is to broadcast the assignments. That is, use .= instead of =. If you want, you can also use the #. macro to automatically add dots everywhere for you (for maybe cleaner/easier-to-read code):
function preprocess(matrix)
out = Array{Float64,2}(undef, size(matrix))
#views #. out[1,:] = matrix[2,:] - matrix[1,:]
#views #. out[end,:] = matrix[end,:] - matrix[end-1,:]
#views #. out[2:end-1,:] = 0.5 * (matrix[3:end,:] - matrix[1:end-2,:])
return out
end
For optimal performance, I think you probably want to write the loops explicitly and use multithreading with a package like LoopVectorization.jl for example.
PS: Note that in your code comments you wrote "cols" instead of "rows", and you wrote "mean" but take a difference. (Not sure it was intentional.)

Looping over array values in Lua

I have a variable as follows
local armies = {
[1] = "ARMY_1",
[2] = "ARMY_3",
[3] = "ARMY_6",
[4] = "ARMY_7",
}
Now I want to do an action for each value. What is the best way to loop over the values? The typical thing I'm finding on the internet is this:
for i, armyName in pairs(armies) do
doStuffWithArmyName(armyName)
end
I don't like that as it results in an unused variable i. The following approach avoids that and is what I am currently using:
for i in pairs(armies) do
doStuffWithArmyName(armies[i])
end
However this is still not as readable and simple as I'd like, since this is iterating over the keys and then getting the value using the key (rather imperatively). Another boon I have with both approaches is that pairs is needed. The value being looped over here is one I have control over, and I'd prefer that it can be looped over as easily as possible.
Is there a better way to do such a loop if I only care about the values? Is there a way to address the concerns I listed?
I'm using Lua 5.0 (and am quite new to the language)
The idiomatic way to iterate over an array is:
for _, armyName in ipairs(armies) do
doStuffWithArmyName(armyName)
end
Note that:
Use ipairs over pairs for arrays
If the key isn't what you are interested, use _ as placeholder.
If, for some reason, that _ placeholder still concerns you, make your own iterator. Programming in Lua provides it as an example:
function values(t)
local i = 0
return function() i = i + 1; return t[i] end
end
Usage:
for v in values(armies) do
print(v)
end

How to manipulate filenames with a macro

I want to save the results of my Stata forvalues loop into individual files. One component of the filename should be the value j assigned to the macro within a forvalues loop.
Apparently my code leads to an instruction always to save with 1995. As such, I get messages telling me that this file already exists.
I am using the following code:
local j = 1995
forvalues `j'= 1995 / 2012 {
clear
use "/Users/carl/Desktop/STATA/Neustart/eventdates.dta", clear
keep if eventyear == `j'
sort acq_cusip eventdate
compress
save "/Users/carl/Desktop/STATA/Neustart/eventdates_`j'.dta"
}
Does anyone have an answer to that?
In your original code Stata sees `j' inside the forvalues command (instead of the correct j), so it starts to evaluate that before it starts the loop. So what is eventually run is
forvalues 1995=1995/2012 {
This means that forvalues is changing the content of the local macro confusingly but legally called `1995' to 1995 in the first iteration, 1996 in the second iteration, etc. So when you refer to the local `j' inside the loop, it will not have changed and remains at the original value which you defined before the loop.
The way to solve this is to replace:
local j = 1995
forvalues `j' = 1995/2012 {
with:
forvalues j = 1995/2012 {
use replace
save "/Users/carl/Desktop/STATA/Neustart/eventdates_`j'.dta",replace
Updated
cd "C:\Users\Vista\Desktop\Stataproject"
forvalues j=1/5 {
sysuse auto,clear
keep if rep78== `j'
save "auto_`j'.dta",replace
}
Example with auto data in Stata . For details please see, Speaking Stata: How to face lists with fortitude

looping through variable names according to their prefix

I have to carry out a number of paired t-tests, and I was wondering how to automate the overall procedure.
Suppose I have only the following variables:
int_ma est_ma tot_ma int_pa est_pa tot_pa
What I need is to compute:
ttest int_ma=int_pa
ttest est_ma=est_pa
ttest tot_ma=tot_pa
Of course in some way it should be possible to exploit the fact that each pair has a unique prefix and the "_pa, _ma" suffixes, but unfortunately I cannot find an easy way to refer to only a part of each variable name..
Thank you very much for any help!
There are a few ways to do this. I would use a foreach loop with a general list. Here I loop over your three prefixes, which Stata passes to the loop as local macros, and append _ma and _pa to generate and ttest the variables.
* generate some data
clear
set obs 100
foreach x in int est tot {
foreach y in ma pa {
generate `x'_`y' = runiform()
}
}
* -ttest- in -foreach- loop
foreach x in int est tot {
ttest `x'_pa = `x'_ma
}
The foreach help file is a worth a few reads, as is the macro help file. The syntax is a little odd at first (more like bash scripting than R or Matlab), but it is very flexible.

Resources