Stata: replace, if, forvalues - loops

use "locationdata.dta", clear
gen ring=.
* Philly City Hall
gen lat_center = 39.9525468
gen lon_center = -75.1638855
destring(INTPTLAT10), replace
destring(INTPTLON10), replace
vincenty INTPTLAT10 INTPTLON10 lat_center lon_center , hav(distance_km) inkm
quietly su distance_km
local min = r(min)
replace ring=0 if (`min' <= distance_km < 1)
local max = ceil(r(max))
* forval loop does not work
forval i=1/`max'{
local j = `i'+1
replace ring=`i' if (`i' <= distance_km < `j')
}
I am drawing rings by 1 km from a point. The last part of the code (forval) does not work. Something wrong here?
EDIT:
The result for the forval part is as follows:
. forval i=1/`max'{
2. local j = `i'+1
3. replace ring=`i' if `i' <= distance_km < `j'
4. }
(1746 real changes made)
(0 real changes made)
(0 real changes made)
(0 real changes made)
....
So, replacing does not work for i = 2 and beyond.

A double inequality such as
(`min' <= distance_km < 1)
which makes sense according to mathematical conventions is clearly legal in Stata. Otherwise, your code would have triggered a syntax error. However, Stata does not hold evaluation until the entire expression is evaluated. The parentheses here are immaterial, as what is key is how their contents are evaluated. As it turns out, the result is unlikely to be what you want.
In more detail: Stata interprets this expression from left to right as follows. The first equality
`min' <= distance_km
is true or false and thus evaluated as 0 or 1. Evidently you want to select values such that
distance_km >= `min'
and for such values the inequality above is true and returns 1. Stata would then take a result of 1 forward and turn to the second inequality, evaluating
1 < 1
(i.e. result of first inequality < 1), but that is false for such values. Conversely,
(`min' <= distance_km < 1)
will be evaluated as
0 < 1
-- which is true (returns 1) -- if and only if
`min' > distance_km
In short, what you intend would need to be expressed differently, namely by
(`min' <= distance_km) & (distance_km < 1)
I conjecture that this is at the root of your problem.
Note that Stata has an inrange() function but it is not quite what you want here.
But, all that said, from looking at your code the inequalities and your loop both appear quite unnecessary. You want your rings to be 1 km intervals, so that
gen ring = floor(distance_km)
can replace the entire block of code after your call to vincenty, as floor() rounds down with integer result. You appear to know of its twin ceil().
Some other small points, incidental but worth noting:
You can destring several variables at once.
Putting constants in variables with generate is inefficient. Use scalars for this purpose. (However, if vincenty requires variables as input, that would override this point, but it points up that vincenty is too strict.)
summarize, meanonly is better for calculating just the minimum and maximum. The option name is admittedly misleading here. See http://www.stata-journal.com/sjpdf.html?articlenum=st0135 for discussion.
As a matter of general Stata practice, your post should explain where the user-written vincenty comes from, although it seems that is quite irrelevant in this instance.
For completeness, here is a rewrite, although you need to test it against your data.
use "locationdata.dta", clear
* Philly City Hall
scalar lat_center = 39.9525468
scalar lon_center = -75.1638855
destring INTPTLAT10 INTPTLON10, replace
vincenty INTPTLAT10 INTPTLON10 lat_center lon_center , hav(distance_km) inkm
gen ring = floor(distance_km)

Related

Compute if loop SPSS

Ultimately, I want to change scores of 0 to 1, scores of 1 to 2, and scores of 2 to 3. I thought one way to do that was using +1, but I realize I could also use a more complicated if then series.
Here is what I did so far:
I used the existing variable (x) to create a new variable (y=x+1) using SPSS syntax. I only want to do this for variables with values >=0 (this was my approach to excluding cells with missing data; the range for x is 0-2).
I can create x+1, but it overwrites the existing variables.
DO REPEAT x =var_1 TO var_86.
if (x>=0) x=(x+1).
end repeat.
exe.
I tried this modification, but it doesn't work:
DO REPEAT x = var_1 TO var_86 / y = var_1a TO var_86a.
IF (x >= 0) y=x +1.
END REPEAT.
EXE.
The error message is:
DO REPEAT The form VARX TO VARY to refer to a range of variables has
been used incorrectly. When using VARX TO VARY to create new
variables, X must be an integer less than or equal to the integer Y.
(Can't use A3 TO A1.)
I tried many other configurations including vectors and loops but haven't yet figured out how to do this computation across the range of variables without overwriting the existing ones. Thanks in advance for any recommendations.
The message you are getting is because SPSS doesn't understand the form var_1a TO var_86a.
For the x to y form to work the number has to be at the end of the name, so for example varA_1 to varA_86 should work.
While you're at it, here's a simple way to go about your task:
recode var_1 TO var_86 (0=1)(1=2)(2=3) into varA_1 TO varA_86.

accessing boundaries of array without duplicating lots of code

When I try to compile my code using -fcheck=all I get a runtime error since it seems I step out of bounds of my array dimension size. It comes from the part of my code shown below. I think it is because my loops over i,j only run from -ny to ny, -nx to nx but I try to use points at i+1,j+1,i-1,j-1 which takes me out of bounds in my arrays. When the loop over j starts at -ny, it needs j-1, so it immediately takes me out of bounds since I'm trying to access -ny-1. Similarly when j=ny, i=-nx,nx.
My question is, how can I fix this problem efficiently using minimal code?
I need the array grad(1,i,j) correctly defined on the boundary, and it needs to be defined exactly as on the right hand side of the equality below, I just don't know an efficient way of doing this. I can explicitly define grad(1,nx,j), grad(1,-nx,j), etc, separately and only loop over i=-nx+1,nx-1,j=-ny+1,ny-1 but this causes lots of duplicated code and I have many of these arrays so I don't think this is the logical/efficient approach. If I do this, I just end up with hundreds of lines of duplicated code that makes it very hard to debug. Thanks.
integer :: i,j
integer, parameter :: nx = 50, ny = 50
complex, dimension (3,-nx:nx,-ny:ny) :: grad,psi
real, parameter :: h = 0.1
do j = -ny,ny
do i = -nx,nx
psi(1,i,j) = sin(i*h)+sin(j*h)
psi(2,i,j) = sin(i*h)+sin(j*h)
psi(3,i,j) = sin(i*h)+sin(j*h)
end do
end do
do j = -ny,ny
do i = -nx,nx
grad(1,i,j) = (psi(1,i+1,j)+psi(1,i-1,j)+psi(1,i,j+1)+psi(1,i,j-1)-4*psi(1,i,j))/h**2 &
- (psi(2,i+1,j)-psi(2,i,j))*psi(1,i,j)/h &
- (psi(3,i,j+1)-psi(3,i,j))*psi(1,i,j)/h &
- psi(2,i,j)*(psi(1,i+1,j)-psi(1,i,j))/h &
- psi(3,i,j)*(psi(1,i,j+1)-psi(1,i,j))/h
end do
end do
If I was to do this directly for grad(1,nx,j), grad(1,-nx,j), it would be given by
do j = -ny+1,ny-1
grad(1,nx,j) = (psi(1,nx,j)+psi(1,nx-2,j)+psi(1,nx,j+1)+psi(1,nx,j-1)-2*psi(1,nx-1,j)-2*psi(1,nx,j))/h**2 &
- (psi(2,nx,j)-psi(2,nx-1,j))*psi(1,nx,j)/h &
- (psi(3,nx,j+1)-psi(3,nx,j))*psi(1,nx,j)/h &
- psi(2,nx,j)*(psi(1,nx,j)-psi(1,nx-1,j))/h &
- psi(3,nx,j)*(psi(1,nx,j+1)-psi(1,nx,j))/h
grad(1,-nx,j) = (psi(1,-nx+2,j)+psi(1,-nx,j)+psi(1,-nx,j+1)+psi(1,-nx,j-1)-2*psi(1,-nx+1,j)-2*psi(1,-nx,j))/h**2 &
- (psi(2,-nx+1,j)-psi(2,-nx,j))*psi(1,-nx,j)/h &
- (psi(3,-nx,j+1)-psi(3,-nx,j))*psi(1,-nx,j)/h &
- psi(2,-nx,j)*(psi(1,-nx+1,j)-psi(1,-nx,j))/h &
- psi(3,-nx,j)*(psi(1,-nx,j+1)-psi(1,-nx,j))/h
end do
One possible way for you could be using an additional index variable for the boundaries, modified from the original index to avoid getting out-of-bounds. I mean something like this:
do j = -ny,ny
jj = max(min(j, ny-1), -ny+1)
do i = -nx,nx
ii = max(min(i, nx-1), -nx+1)
grad(1,i,j) = (psi(1,ii+1,j)+psi(1,ii-1,j)+psi(1,i,jj+1)+psi(1,i,jj-1)-4*psi(1,i,j))/h**2 &
- (psi(2,ii+1,j)-psi(2,ii,j))*psi(1,i,j)/h &
- (psi(3,i,jj+1)-psi(3,i,jj))*psi(1,i,j)/h &
- psi(2,i,j)*(psi(1,ii+1,j)-psi(1,ii,j))/h &
- psi(3,i,j)*(psi(1,i,jj+1)-psi(1,i,jj))/h
end do
end do
It's hard for me to write a proper code because it seems you trimmed part of the original expression in the code you presented in the question, but I hope you understand the idea and apply it correctly for your logic.
Opinions:
Even though this is what you are asking for (as far as I understand), I would not recommend doing this before profiling and checking if assigning the boundary conditions manually after a whole array operation wouldn't be more efficient, instead. Maybe those extra calculations on the indices on each iteration could impact on performance (arguably less than if conditionals or function calls). Using "ghost cells", as suggested by #evets, could be even more performant. You should profile and compare.
I'd recommend you declaring your arrays as dimension(-nx:nx,-ny:ny,3) instead. Fortran stores arrays in column-major order and, as you are accessing values on the neighborhood of the "x" and "y", they would be non-contiguous memory locations for a fixed "other" dimension is the leftest, and that could mean less cache-hits.
In somewhat pseudo-code, you can do
do j = -ny, ny
if (j == -ny) then
p1jm1 = XXXXX ! Some boundary condition
else
p1jm1 = psi(1,i,j-1)
end if
if (j == ny) then
p1jp1 = YYYYY ! Some other boundary condition
else
p1jp1 = psi(1,i,j+1)
end if
do i = -nx, ny
grad(1,i,j) = ... term involving p1jm1 ... term involving p1jp1 ...
...
end do
end do
The j-loop isn't bad in that you are adding 2*2*ny conditionals. The inner i-loop is adding 2*2*nx conditionals for each j iteration (or 2*2*ny * 2*2*nx conditional). Note, you need a temporary for each psi with the triplet indices are unique, ie., psi(1,i,j+1), psi(1,i,j-1), and psi(3,i,j+1).

Clarification for a loop to create interaction terms in Stata

I found the following question/answer that I think does what I would like to do: https://www.stata.com/statalist/archive/2009-09/msg00449.html
However, I am unclear what is going on in all of it, and would like to understand better. The code for the solution is as follows:
unab vars : var1-var30
local nvar : word count `vars'
forval i = 1/`nvar' {
forval j = 1/`=`i'-1' {
local x : word `i' of `vars'
local y : word `j' of `vars'
generate `x'X`y' = `x' * `y'
}
}
I do not understand what is going on in line 4 with the statement: ``=i'-1'.
The i refers to the number in the set {1,...,n}, but I do not understand what the equals or the -1 are doing. My assumption is that the -1 is somehow removing the own observation, but I am unclear. Any explanation would be appreciated.
Suppose you have local macro i that varies over a range and you want its value minus 1. You can always do this
local j = `i' - 1
and then refer to j. You can also do this on the fly:
`= `i' - 1'
Within
`= '
Stata will evaluate the expression, here
`i' - 1
and substitute the result of that expression in a command line.
You can do this with scalars too:
scalar foo = 42
and then refer to
`= foo'
However, watch out. Scalar names and variable names occupy the same namespace.
`= scalar(foo)'
disambiguates and arguably is good style in any case.

What is the advantage of linspace over the colon ":" operator?

Is there some advantage of writing
t = linspace(0,20,21)
over
t = 0:1:20
?
I understand the former produces a vector, as the first does.
Can anyone state me some situation where linspace is useful over t = 0:1:20?
It's not just the usability. Though the documentation says:
The linspace function generates linearly spaced vectors. It is
similar to the colon operator :, but gives direct control over the
number of points.
it is the same, the main difference and advantage of linspace is that it generates a vector of integers with the desired length (or default 100) and scales it afterwards to the desired range. The : colon creates the vector directly by increments.
Imagine you need to define bin edges for a histogram. And especially you need the certain bin edge 0.35 to be exactly on it's right place:
edges = [0.05:0.10:.55];
X = edges == 0.35
edges = 0.0500 0.1500 0.2500 0.3500 0.4500 0.5500
X = 0 0 0 0 0 0
does not define the right bin edge, but:
edges = linspace(0.05,0.55,6); %// 6 = (0.55-0.05)/0.1+1
X = edges == 0.35
edges = 0.0500 0.1500 0.2500 0.3500 0.4500 0.5500
X = 0 0 0 1 0 0
does.
Well, it's basically a floating point issue. Which can be avoided by linspace, as a single division of an integer is not that delicate, like the cumulative sum of floting point numbers. But as Mark Dickinson pointed out in the comments:
You shouldn't rely on any of the computed values being exactly what you expect. That is not what linspace is for. In my opinion it's a matter of how likely you will get floating point issues and how much you can reduce the probabilty for them or how small can you set the tolerances. Using linspace can reduce the probability of occurance of these issues, it's not a security.
That's the code of linspace:
n1 = n-1
c = (d2 - d1).*(n1-1) % opposite signs may cause overflow
if isinf(c)
y = d1 + (d2/n1).*(0:n1) - (d1/n1).*(0:n1)
else
y = d1 + (0:n1).*(d2 - d1)/n1
end
To sum up: linspace and colon are reliable at doing different tasks. linspace tries to ensure (as the name suggests) linear spacing, whereas colon tries to ensure symmetry
In your special case, as you create a vector of integers, there is no advantage of linspace (apart from usability), but when it comes to floating point delicate tasks, there may is.
The answer of Sam Roberts provides some additional information and clarifies further things, including some statements of MathWorks regarding the colon operator.
linspace and the colon operator do different things.
linspace creates a vector of integers of the specified length, and then scales it down to the specified interval with a division. In this way it ensures that the output vector is as linearly spaced as possible.
The colon operator adds increments to the starting point, and subtracts decrements from the end point to reach a middle point. In this way, it ensures that the output vector is as symmetric as possible.
The two methods thus have different aims, and will often give very slightly different answers, e.g.
>> a = 0:pi/1000:10*pi;
>> b = linspace(0,10*pi,10001);
>> all(a==b)
ans =
0
>> max(a-b)
ans =
3.5527e-15
In practice, however, the differences will often have little impact unless you are interested in tiny numerical details. I find linspace more convenient when the number of gaps is easy to express, whereas I find the colon operator more convenient when the increment is easy to express.
See this MathWorks technical note for more detail on the algorithm behind the colon operator. For more detail on linspace, you can just type edit linspace to see exactly what it does.
linspace is useful where you know the number of elements you want rather than the size of the "step" between them. So if I said make a vector with 360 elements between 0 and 2*pi as a contrived example it's either going to be
linspace(0, 2*pi, 360)
or if you just had the colon operator you would have to manually calculate the step size:
0:(2*pi - 0)/(360-1):2*pi
linspace is just more convenient
For a simple real world application, see this answer where linspace is helpful in creating a custom colour map

Are there any languages that have a do-until loop?

Is there any programming language that has a do-until loop?
Example:
do
{
<statements>
}
until (<condition>);
which is basically equivalent to:
do
{
<statements>
}
while (<negated condition>);
NOTE: I'm looking for post-test loops.
Ruby has until.
i=0
begin
puts i
i += 1
end until i==5
VBA!
Do-Until-Loop
Do-Loop-Until
Although I think quite a number of people here would doubt if it is a real language at all, but well, BASIC is how Microsoft started (quite weak argument for many, I know)...
It is possible in VB.Net
bExitFromLoop = False
Do
'Executes the following Statement
Loop Until bExitFromLoop
It is also possible in SDF-P on BS2000 (Fujitsu/Siemens Operating System)
/ DECLARE-VARIABLE A
/ DECLARE-VARIABLE SWITCH-1(TYPE=*BOOLEAN)
/ SET-VARIABLE A = 5
/ SET-VARIABLE SWITCH-1 = ON
/ REPEAT
/ A = A + 10
/ IF (A > 50)
/ SET-VARIABLE SWITCH-1 = OFF
/ END-IF
/ UNTIL (SWITCH-1 = OFF)
/ SHOW-VARIABLE A
A = 55
Is is also possible is C or C++ using a macro that define until
Example (definition):
#define until(cond) while(!(##cond))
Example (utilisation):
int i = 0;
do {
cout << i << "\n";
i++;
} until(i == 5);
In VB we can find something like:
Reponse = InputBox("Please Enter Pwd")
Do Until Reponse = "Bob-pwr148" ...
Eiffel offers you an until loop.
from
x := 1
until
x > 100
loop
...
end
There is also an "across" loop as well. Both are very powerful and expressive.
The design of this loop has more to offer. There are two more parts to its grammar that will help us resolve two important "correctness" problems.
Endless loop protection.
Iteration failure detection.
Endless Loop Protection
Let's modify our loop code a little by adding a loop variant.
from
x := 1
v := 1_000
until
x > 100
variant
v
loop
...
v := v - 1
end
The loop variant is (essentially) a count-down variable, but not just any old variable. By using the variant keyword, we are telling the compiler to pay attention to v. Specifically, the compiler is going to generate code that watchdogs the v variable for two conditions:
Does v decrease with each iteration of the loop (are we counting down). It does no good to try and use a count-down variable if it is (in fact) not counting down, right? If the loop variant is not counting down (decreasing by any amount), then we throw an exception.
Does v ever reach a condition of less than zero? If so, then we throw an exception.
Both of these work together through the compiler and variant variable to detect when and if our iterating loop fails to iterate or iterates too many times.
In the example above, our code is communicating to us a story that it expects to iterate zero to 1_000 times, but not more. If it is more, then we stop the loop, which leaves us to wonder: Do we really have cases were we iterate more than 1_000 times, or is there something wrong that our condition is failing to become True?
Loop Invariant
Now that we know what a loop variant is, we need to understand what a loop invariant is.
The invariant is a set of one or more Boolean conditions that must hold True after each iteration through the loop. Why do we want these?
Imagine you have 1_000_000 iterations and one of them fails. You don't have time to walk through each iteration, examining it to see it is okay or not. So, you create a set of one or more conditions that are tested upon completion of each iteration. If the one or all of the conditions fail, then you know precisely which iteration (and its deterministic state) is causing the problem!
The loop invariant might look something like:
from
x := 1
y := 0
v := 1_000
invariant
y = x - 1
until
x > 100
variant
v
loop
...
x := x + 1
y := y + 1
v := v - 1
end
In the example above, y is trailing x by 1. We expect that after each iteration, y will always be x - 1. So, we create a loop invariant using the invariant keyword that states our Boolean assertion. If y fails to be x - 1, the loop will immediately throw an exception and let us know precisely which iteration has failed to keep the assertion True.
CONCLUSION
Our loop is now quite tight and secure—well guarded against failure (bugs, errors).

Resources