I found the following question/answer that I think does what I would like to do: https://www.stata.com/statalist/archive/2009-09/msg00449.html
However, I am unclear what is going on in all of it, and would like to understand better. The code for the solution is as follows:
unab vars : var1-var30
local nvar : word count `vars'
forval i = 1/`nvar' {
forval j = 1/`=`i'-1' {
local x : word `i' of `vars'
local y : word `j' of `vars'
generate `x'X`y' = `x' * `y'
}
}
I do not understand what is going on in line 4 with the statement: ``=i'-1'.
The i refers to the number in the set {1,...,n}, but I do not understand what the equals or the -1 are doing. My assumption is that the -1 is somehow removing the own observation, but I am unclear. Any explanation would be appreciated.
Suppose you have local macro i that varies over a range and you want its value minus 1. You can always do this
local j = `i' - 1
and then refer to j. You can also do this on the fly:
`= `i' - 1'
Within
`= '
Stata will evaluate the expression, here
`i' - 1
and substitute the result of that expression in a command line.
You can do this with scalars too:
scalar foo = 42
and then refer to
`= foo'
However, watch out. Scalar names and variable names occupy the same namespace.
`= scalar(foo)'
disambiguates and arguably is good style in any case.
Related
I am using gnuplot 5.2.4.
I am doing a for-loop, to analyse in each cycle a different column (or combination of columns) of a file. The column to be used is written in an array and is passed to the commands as a macro.
The code is written in a .plt file that I load in gnuplot.
It looks like this
(here for example I'm using the 1.dat file in the \gnuplot\demo)
:
set macros
array label[2]
label[1]="First_Cycle"
label[2]="Second_Cycle"
array index[2]
index[1]="(column(2))"
index[2]="(column(2)-column(1))"
fileDat = "1.dat"
do for [k = 1 : 2] {
fileExport = sprintf("%s.gif",label[k])
print fileExport
set term gif size 1920,1280 enhanced
set output fileExport
indexk = index[k]
print k," ",index[k]," ", indexk
stats fileDat u #indexk name "DV" #noout
plot fileDat u #indexk ti label[k]
}
indexk is printed correctly at each cycle, however, I obtain the following warning:
warning: indexk is not a string variable
and the commands stats and plot analise always the column in index[1] at each cycle.
However, if I comment the do for loop and increase the k by hand, the code works correctly, no warning, no problem,
like this:
set macros
array label[2]
label[1]="First_Cycle"
label[2]="Second_Cycle"
array index[2]
index[1]="(column(2))"
index[2]="(column(2)-column(1))"
fileDat = "1.dat"
k=2
#do for [k = 1 : 2] {
fileExport = sprintf("%s.gif",label[k])
print fileExport
set term gif size 1920,1280 enhanced
set output fileExport
indexk = index[k]
print k," ",index[k]," ", indexk
stats fileDat u #indexk name "DV" #noout
plot fileDat u #indexk ti label[k]
#}
You can think of gnuplot macros (# + string variable name) as roughly equivalent to C language preprocessor directives. They describe a substitution that is performed only once, the first time the line of code is encountered. So trying to change one inside a loop will not work.
In some contexts the gnuplot directive "evaluate( )" is equivalent to dynamic macro substitution, but in the case shown you need an expression rather than a statement. So something like:
Col(k) = (k == 1) ? column(2) : column(2) - column(1)
do for [k=1:2] {
...
stats fileDat using Col(k) name "DV"
plot fileDat using Col(k) ti label[k]
}
I am trying to calculate the Gini coefficient as the average over five repetitions. My code doesn't correctly work, and I cannot find a way to do it.
inequal7 is a user-written command.
gen gini=.
forval i=1/5 {
mi xeq `i' : inequal7 income [aw=hw0010]
gen gini_`i'=.
scalar gini_`i' = r(gini)
replace gini_`i'= r(gini)
if `i' ==5 {
replace gini = sum(gini_1+gini_2+gini_3+gini_4+gini_5)/5
}
}
Can someone help me?
There is no context on or example of the dataset you're using. This may not work but it's likely to be closer to legal and correct than what you have.
scalar gini = 0
forval i=1/5 {
mi xeq `i' : inequal7 income [aw=hw0010]
scalar gini = scalar(gini) + r(gini)
}
scalar gini = scalar(gini) / 5
Notes:
Using variables to hold constants is legal, but not necessarily good style.
sum() gives the running or cumulative sum; applied to a variable that's a constant it does far more work than you need, and at best the correct answer will just be that in observation 1. As you're feeding it the sum of 5 values, it's redundant any way.
Watch out: names for scalars and variables occupy the same namespace.
If this is a long way off what you want, and you get no better answer, you're likely to need to give much more detail.
I have a set of variables that are string variables. For each value in the string, I create a series of binary (0, 1) variables.
Let's say my variables are Engine1 Engine2 Engine3.
The possible values are BHM, BMN, HLC, or missing (coded as ".").
The values of the variables are mutually exclusive, except missing.
In a hypothetical example, to write the new variables, I would write the following code:
egen BHM=1 if Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"`
replace BHM=0 if BHM==.
gen BMN=1 if Engine1=="BMN"|Engine2=="BMN"|Engine3=="BMN"`
replace BMN=0 if BMN==.
gen HLC=1 if Engine1=="HLC"|Engine2=="HLC"|Engine3=="HLC"
replace HLC=0 if HLC==.
How could I rewrite this code in a loop? I don't understand how to use the "or" operator | in a loop.
First note that egen is a typo for gen in your first line.
Second, note that
gen BHM=1 if Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"
replace BHM=0 if BHM==.
can be rewritten in one line:
gen BHM = Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"
Now learn about the handy inlist() function:
gen BHM = inlist("BHM", Engine1, Engine2, Engine3)
If that looks odd, it's because your mathematics education led you to write things like
if x = 1 or y = 1 or z = 1
but only convention stops you writing
if 1 = x or 1 = y or 1 = z
The final trick is to write a loop:
foreach v in BHM BMN HLC {
gen `v' = inlist("`v'", Engine1, Engine2, Engine3)
}
It's not clear what you are finding difficult about |. Your code was fine in that respect.
An bug often seen in learner code is like
gen y = 1 if x == 11|12|13
which is legal Stata but almost never what you want. Stata parses it as
gen y = 1 if (x == 11)|12|13
and uses its rule that non-zero arguments mean true in true-or-false evaluations. Thus y is 1 if
x == 11
or
12 // a non-zero argument, evaluates as true regardless of x
or
13 // same comment
The learner needs
gen y = 1 if (x == 11)|(x == 12)|(x == 13)
where the parentheses can be omitted. That's repetitive, so
gen y = 1 if inlist(x, 11, 12, 13)
can be used instead.
For more on inlist() see
articles here
and
here Section 2.2
and
here.
For more on true and false in Stata, see this FAQ
use "locationdata.dta", clear
gen ring=.
* Philly City Hall
gen lat_center = 39.9525468
gen lon_center = -75.1638855
destring(INTPTLAT10), replace
destring(INTPTLON10), replace
vincenty INTPTLAT10 INTPTLON10 lat_center lon_center , hav(distance_km) inkm
quietly su distance_km
local min = r(min)
replace ring=0 if (`min' <= distance_km < 1)
local max = ceil(r(max))
* forval loop does not work
forval i=1/`max'{
local j = `i'+1
replace ring=`i' if (`i' <= distance_km < `j')
}
I am drawing rings by 1 km from a point. The last part of the code (forval) does not work. Something wrong here?
EDIT:
The result for the forval part is as follows:
. forval i=1/`max'{
2. local j = `i'+1
3. replace ring=`i' if `i' <= distance_km < `j'
4. }
(1746 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
....
So, replacing does not work for i = 2 and beyond.
A double inequality such as
(`min' <= distance_km < 1)
which makes sense according to mathematical conventions is clearly legal in Stata. Otherwise, your code would have triggered a syntax error. However, Stata does not hold evaluation until the entire expression is evaluated. The parentheses here are immaterial, as what is key is how their contents are evaluated. As it turns out, the result is unlikely to be what you want.
In more detail: Stata interprets this expression from left to right as follows. The first equality
`min' <= distance_km
is true or false and thus evaluated as 0 or 1. Evidently you want to select values such that
distance_km >= `min'
and for such values the inequality above is true and returns 1. Stata would then take a result of 1 forward and turn to the second inequality, evaluating
1 < 1
(i.e. result of first inequality < 1), but that is false for such values. Conversely,
(`min' <= distance_km < 1)
will be evaluated as
0 < 1
-- which is true (returns 1) -- if and only if
`min' > distance_km
In short, what you intend would need to be expressed differently, namely by
(`min' <= distance_km) & (distance_km < 1)
I conjecture that this is at the root of your problem.
Note that Stata has an inrange() function but it is not quite what you want here.
But, all that said, from looking at your code the inequalities and your loop both appear quite unnecessary. You want your rings to be 1 km intervals, so that
gen ring = floor(distance_km)
can replace the entire block of code after your call to vincenty, as floor() rounds down with integer result. You appear to know of its twin ceil().
Some other small points, incidental but worth noting:
You can destring several variables at once.
Putting constants in variables with generate is inefficient. Use scalars for this purpose. (However, if vincenty requires variables as input, that would override this point, but it points up that vincenty is too strict.)
summarize, meanonly is better for calculating just the minimum and maximum. The option name is admittedly misleading here. See http://www.stata-journal.com/sjpdf.html?articlenum=st0135 for discussion.
As a matter of general Stata practice, your post should explain where the user-written vincenty comes from, although it seems that is quite irrelevant in this instance.
For completeness, here is a rewrite, although you need to test it against your data.
use "locationdata.dta", clear
* Philly City Hall
scalar lat_center = 39.9525468
scalar lon_center = -75.1638855
destring INTPTLAT10 INTPTLON10, replace
vincenty INTPTLAT10 INTPTLON10 lat_center lon_center , hav(distance_km) inkm
gen ring = floor(distance_km)
I want to save the results of my Stata forvalues loop into individual files. One component of the filename should be the value j assigned to the macro within a forvalues loop.
Apparently my code leads to an instruction always to save with 1995. As such, I get messages telling me that this file already exists.
I am using the following code:
local j = 1995
forvalues `j'= 1995 / 2012 {
clear
use "/Users/carl/Desktop/STATA/Neustart/eventdates.dta", clear
keep if eventyear == `j'
sort acq_cusip eventdate
compress
save "/Users/carl/Desktop/STATA/Neustart/eventdates_`j'.dta"
}
Does anyone have an answer to that?
In your original code Stata sees `j' inside the forvalues command (instead of the correct j), so it starts to evaluate that before it starts the loop. So what is eventually run is
forvalues 1995=1995/2012 {
This means that forvalues is changing the content of the local macro confusingly but legally called `1995' to 1995 in the first iteration, 1996 in the second iteration, etc. So when you refer to the local `j' inside the loop, it will not have changed and remains at the original value which you defined before the loop.
The way to solve this is to replace:
local j = 1995
forvalues `j' = 1995/2012 {
with:
forvalues j = 1995/2012 {
use replace
save "/Users/carl/Desktop/STATA/Neustart/eventdates_`j'.dta",replace
Updated
cd "C:\Users\Vista\Desktop\Stataproject"
forvalues j=1/5 {
sysuse auto,clear
keep if rep78== `j'
save "auto_`j'.dta",replace
}
Example with auto data in Stata . For details please see, Speaking Stata: How to face lists with fortitude