Using an "or" operator between variables for a loop in Stata

Using an "or" operator between variables for a loop in Stata - loops

I have a set of variables that are string variables. For each value in the string, I create a series of binary (0, 1) variables.
Let's say my variables are Engine1 Engine2 Engine3.
The possible values are BHM, BMN, HLC, or missing (coded as ".").
The values of the variables are mutually exclusive, except missing.
In a hypothetical example, to write the new variables, I would write the following code:
egen BHM=1 if Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"`
replace BHM=0 if BHM==.
gen BMN=1 if Engine1=="BMN"|Engine2=="BMN"|Engine3=="BMN"`
replace BMN=0 if BMN==.
gen HLC=1 if Engine1=="HLC"|Engine2=="HLC"|Engine3=="HLC"
replace HLC=0 if HLC==.
How could I rewrite this code in a loop? I don't understand how to use the "or" operator | in a loop.

First note that egen is a typo for gen in your first line.
Second, note that
gen BHM=1 if Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"
replace BHM=0 if BHM==.
can be rewritten in one line:
gen BHM = Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"
Now learn about the handy inlist() function:
gen BHM = inlist("BHM", Engine1, Engine2, Engine3)
If that looks odd, it's because your mathematics education led you to write things like
if x = 1 or y = 1 or z = 1
but only convention stops you writing
if 1 = x or 1 = y or 1 = z
The final trick is to write a loop:
foreach v in BHM BMN HLC {
gen `v' = inlist("`v'", Engine1, Engine2, Engine3)
}
It's not clear what you are finding difficult about |. Your code was fine in that respect.
An bug often seen in learner code is like
gen y = 1 if x == 11|12|13
which is legal Stata but almost never what you want. Stata parses it as
gen y = 1 if (x == 11)|12|13
and uses its rule that non-zero arguments mean true in true-or-false evaluations. Thus y is 1 if
x == 11
or
12 // a non-zero argument, evaluates as true regardless of x
or
13 // same comment
The learner needs
gen y = 1 if (x == 11)|(x == 12)|(x == 13)
where the parentheses can be omitted. That's repetitive, so
gen y = 1 if inlist(x, 11, 12, 13)
can be used instead.
For more on inlist() see
articles here
and
here Section 2.2
and
here.
For more on true and false in Stata, see this FAQ

Related

What is true and false and & | used for?

I am very new to c programming and we are studying and/or truth tables. I get the general idea of how to do something like C & 5 would come out to 4, but I am very confused on the purpose of this? What is the end goal with truths and falses? What is the question you are seeking to answer when you evaluate such expressions? I'm sorry if this is dumb question but I feel like it is so stupid that no one has a clear answer for it

You need to understand the difference between & and && , they are very different things.
`&` is a bitwise operation. 0b1111 & 0b1001 = 0b1001
ie perform a logical and on each bit in a value. You give the example
0x0C & 0x05 = 0b00001100 & 0b00000101 = 0b00000100 = 4
&& is a way of combining logical tests, its close the the english word 'and'
if(a==4 && y ==9) xxx();
xxx will only be executed it x is 4 and y is 9
What creates confusion is that logical values in C map to 0 (false) and not 0 (true). So for example try this
int a = 6;
int j = (a == 4);
you will see that j is 0 (because the condition a==4 is false.
So you will see things like this
if(a & 0x01) yyy();
this performs the & operation and looks to see if the result is zero or not. So yyy will be executed if the last bit is one, ie if its odd

Clarification for a loop to create interaction terms in Stata

I found the following question/answer that I think does what I would like to do: https://www.stata.com/statalist/archive/2009-09/msg00449.html
However, I am unclear what is going on in all of it, and would like to understand better. The code for the solution is as follows:
unab vars : var1-var30
local nvar : word count `vars'
forval i = 1/`nvar' {
forval j = 1/`=`i'-1' {
local x : word `i' of `vars'
local y : word `j' of `vars'
generate `x'X`y' = `x' * `y'
}
}
I do not understand what is going on in line 4 with the statement: ``=i'-1'.
The i refers to the number in the set {1,...,n}, but I do not understand what the equals or the -1 are doing. My assumption is that the -1 is somehow removing the own observation, but I am unclear. Any explanation would be appreciated.

Suppose you have local macro i that varies over a range and you want its value minus 1. You can always do this
local j = `i' - 1
and then refer to j. You can also do this on the fly:
`= `i' - 1'
Within
`= '
Stata will evaluate the expression, here
`i' - 1
and substitute the result of that expression in a command line.
You can do this with scalars too:
scalar foo = 42
and then refer to
`= foo'
However, watch out. Scalar names and variable names occupy the same namespace.
`= scalar(foo)'
disambiguates and arguably is good style in any case.

Stata: replace, if, forvalues

use "locationdata.dta", clear
gen ring=.
* Philly City Hall
gen lat_center = 39.9525468
gen lon_center = -75.1638855
destring(INTPTLAT10), replace
destring(INTPTLON10), replace
vincenty INTPTLAT10 INTPTLON10 lat_center lon_center , hav(distance_km) inkm
quietly su distance_km
local min = r(min)
replace ring=0 if (`min' <= distance_km < 1)
local max = ceil(r(max))
* forval loop does not work
forval i=1/`max'{
local j = `i'+1
replace ring=`i' if (`i' <= distance_km < `j')
}
I am drawing rings by 1 km from a point. The last part of the code (forval) does not work. Something wrong here?
EDIT:
The result for the forval part is as follows:
. forval i=1/`max'{
2. local j = `i'+1
3. replace ring=`i' if `i' <= distance_km < `j'
4. }
(1746 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
....
So, replacing does not work for i = 2 and beyond.

A double inequality such as
(`min' <= distance_km < 1)
which makes sense according to mathematical conventions is clearly legal in Stata. Otherwise, your code would have triggered a syntax error. However, Stata does not hold evaluation until the entire expression is evaluated. The parentheses here are immaterial, as what is key is how their contents are evaluated. As it turns out, the result is unlikely to be what you want.
In more detail: Stata interprets this expression from left to right as follows. The first equality
`min' <= distance_km
is true or false and thus evaluated as 0 or 1. Evidently you want to select values such that
distance_km >= `min'
and for such values the inequality above is true and returns 1. Stata would then take a result of 1 forward and turn to the second inequality, evaluating
1 < 1
(i.e. result of first inequality < 1), but that is false for such values. Conversely,
(`min' <= distance_km < 1)
will be evaluated as
0 < 1
-- which is true (returns 1) -- if and only if
`min' > distance_km
In short, what you intend would need to be expressed differently, namely by
(`min' <= distance_km) & (distance_km < 1)
I conjecture that this is at the root of your problem.
Note that Stata has an inrange() function but it is not quite what you want here.
But, all that said, from looking at your code the inequalities and your loop both appear quite unnecessary. You want your rings to be 1 km intervals, so that
gen ring = floor(distance_km)
can replace the entire block of code after your call to vincenty, as floor() rounds down with integer result. You appear to know of its twin ceil().
Some other small points, incidental but worth noting:
You can destring several variables at once.
Putting constants in variables with generate is inefficient. Use scalars for this purpose. (However, if vincenty requires variables as input, that would override this point, but it points up that vincenty is too strict.)
summarize, meanonly is better for calculating just the minimum and maximum. The option name is admittedly misleading here. See http://www.stata-journal.com/sjpdf.html?articlenum=st0135 for discussion.
As a matter of general Stata practice, your post should explain where the user-written vincenty comes from, although it seems that is quite irrelevant in this instance.
For completeness, here is a rewrite, although you need to test it against your data.
use "locationdata.dta", clear
* Philly City Hall
scalar lat_center = 39.9525468
scalar lon_center = -75.1638855
destring INTPTLAT10 INTPTLON10, replace
vincenty INTPTLAT10 INTPTLON10 lat_center lon_center , hav(distance_km) inkm
gen ring = floor(distance_km)

How to remove repetitions using retract in this particular situation?

%Examples:
%days([saturday,sunday,monday,tuesday,wednesday,thursday]).
%slots([1,2,3,4,5]).
%course_meetings(csen402,tutorial,t07,nehal,'tutorial for t07').
%course_meetings(comm401,lecture,all_group_4,dr_amr_talaat,'lecture 1')
%tutorialrooms([c6301,b4108,c2201,c2301,c2202,c2203]).
day_tut(Day,Slot,Place,Course,Group,Instructor,Descr):-
days(X),member(Day,X),
tutorialrooms(X1),member(Place,X1),
course_meetings(Course,tutorial,Group,Instructor,Descr),
slots(X2),member(Slot,X2),
assert(day(Day,Slot,tutorial,Place,Course,Group,Instructor,Descr)).
I would like to find a way to remove certain facts after asserting for example
every (day) fact has to have only one room for each day and slot
example: we can have day(sat,1,_,c6301,_,_,_,_) and
day(sat,1,_,c6302,_,_,_,_) but we can't have
another occurrence of day(sat,1,_,c6301,_,_,_,_).

If you simply want to remove redundant solutions of a Goal – this is what you probably mean with removing repetitions – simply replace Goal by setof(t,Goal,_). This works as long as there are only ground solutions for Goal and as long as Goal terminates universally. Thus, there is no need for any data base manipulation to remove redundant solutions.
?- member(X, [a,b,a,c]).
X = a
; X = b
; X = a % redundant!
; X = c.
?- setof(t,member(X, [a,b,a,c]),_).
X = a
; X = b
; X = c.

inconsistent results using isreal

Take this simple example:
a = [1 2i];
x = zeros(1,length(a));
for n=1:length(a)
x(n) = isreal(a(n));
end
In an attempt to vectorize the code, I tried:
y = arrayfun(#isreal,a);
But the results are not the same:
x =
1 0
y =
0 0
What am I doing wrong?

This certainly appears to be a bug, but here's a workaround:
>> y = arrayfun(#(x) isreal(x(1)),a)
ans =
1 0
Why does this work? I'm not totally sure, but it appears that when you perform an indexing operation on the variable before calling ISREAL it removes the "complex" attribute from the array element if the imaginary component is zero. Try this in the Command Window:
>> a = [1 2i]; %# A complex array
>> b = a(1); %# Indexing element 1 removes the complex attribute...
>> c = complex(a(1)); %# ...but we can put that attribute back
>> whos
Name Size Bytes Class Attributes
a 1x2 32 double complex
b 1x1 8 double %# Not complex
c 1x1 16 double complex %# Still complex
Apparently, ARRAYFUN must internally maintain the "complex" attribute of the array elements it passes to ISREAL, thus treating them all as being complex numbers even if the imaginary component is zero.

It might help to know that MATLAB stores the real/complex parts of a matrix separately. Try the following:
>> format debug
>> a = [1 2i];
>> disp(a)
Structure address = 17bbc5b0
m = 1
n = 2
pr = 1c6f18a0
pi = 1c6f0420
1.0000 0 + 2.0000i
where pr is a pointer to the memory block containing the real part of all values, and pi pointer to the complex part of all values in the matrix. Since all elements are stored together, then in this case they all have a complex part.
Now compare these two approaches:
>> arrayfun(#(x)disp(x),a)
Structure address = 17bbcff8
m = 1
n = 1
pr = 1bb8a8d0
pi = 1bb874d0
1
Structure address = 17c19aa8
m = 1
n = 1
pr = 1c17b5d0
pi = 1c176470
0 + 2.0000i
versus
>> for n=1:2, disp(a(n)), end
Structure address = 17bbc930
m = 1
n = 1
pr = 1bb874d0
pi = 0
1
Structure address = 17bbd180
m = 1
n = 1
pr = 1bb874d0
pi = 1bb88310
0 + 2.0000i
So it seems that when you access a(1) in the for loop, the value returned (in the ans variable) has a zero complex-part (null pi), thus is considered real.
One the other hand, ARRAYFUN seems to be directly accessing the values of the matrix (without returning them in ANS variable), thus it has access to both pr and pi pointers which are not null, thus are all elements are considered non-real.
Please keep in mind this just my interpretation, and I could be mistaken...

Answering really late on this one... The MATLAB function ISREAL operates in a really rather counter-intuitive way for many purposes. It tells you if a given array taken as a whole has no complex part at all - it tells you about the storage, it doesn't really tell you anything about the values in the array. It's a bit like the ISSPARSE function in that regard. So, for example
isreal(complex(1)) % returns FALSE
What you'll find in MATLAB is that certain operations automatically trim any all-zero imaginary parts. So, for example
x = complex(1);
isreal(x); % FALSE, we just forced there to be an imaginary part
isreal(x(1)); % TRUE - indexing realised it could drop the zero imaginary part
isreal(x(:)); % FALSE - "(:)" indexing is just a reshape, not real indexing
In short, MATLAB really needs a function which answers the question "does this value have zero imaginary part", in an elementwise way on an array.