Stata: assign numbers in a loop - loops

I have a problem creating a loop in Stata.
I have a dataset in Stata where I classified my observations into 6 categories via variable k10. So k10 takes on values 1,2,3,4,5,6.
Now I want to assign each observation one value according to its class:
value 15 for k10=1
value 10 for k10=2
value 8 for k10=3
value 5 for k10=4
value 4 for k10=5
value 2 for k10=6
It is easy if I create a new variable w10 and do it like the following:
gen w10 =.
replace w10 = 15 if k10==1
replace w10 = 10 if k10==2
replace w10 = 8 if k10==3
replace w10 = 5 if k10==4
replace w10 = 4 if k10==5
replace w10 = 2 if k10==6
Now I tried to simplify the code by using a loop, unfortunately it does not do what I want to achieve.
My loop:
gen w10=.
local A "1 2 3 4 5 6"
local B "15 10 8 5 4 2"
foreach y of local A {
foreach x of local B {
replace w10 = `x' if k10= `y'
}
}
The loop assigns value 2 to each observation though. The reason is that the if-condition k10=`y' is always true and overwrites the replaced w10s each time until the end, right?
So how can I write the loop correctly?

It's really just one loop, not two nested loops. That's your main error, which is general programming logic. Only the last time you go through the inner loop has an effect that lasts. Try tracing the loops by hand to see this.
Specifically in Stata, looping over the integers 1/6 is much better done with forval; there is no need at all for the indirection of defining a local macro and then obliging foreach to look inside that macro. That can be coupled with assigning the other values to local macros with names 1 ... 6. Here tokenize is the dedicated command to use.
Try this:
gen w10 = .
tokenize "15 10 8 5 4 2"
quietly forval i = 1/6 {
replace w10 = ``i'' if k10 == `i'
}
Note incidentally that you need == not = when testing for equality.
See (e.g.) this discussion.
Many users of Stata would want to do it in one line with recode. Here I concentrate on the loop technique, which is perhaps of wider interest.

Related

SPSS recoding variables data from multiple variables into boolean variables

I have 26 variables and each of them contain numbers ranging from 1 to 61. I want for each case of 1, each case of 2 etc. the number 1 in a new variable. If there is no 1, the variable should contain 2.
So 26 variables with data like:
1 15 28 39 46 1 12 etc.
And I want 61 variables with:
1 2 1 2 2 1 etc.
I have been reading about creating vectors, loops, do if's etc but I can't find the right way to code it. What I have done is just creating 61 variables and writing
do if V1=1 or V2=1 or (etc until V26).
recode newV1=1.
end if.
exe.
**repeat this for all 61 variables.
recode newV1 to newV61(missing=2).
So this is a lot of code and quite a detour from what I imagine it could be.
Anyone who can help me out with this one? Your help is much appreciated!
noumenal is correct, you could do it with two loops. Another way though is to access the VECTOR using the original value though, writing that as 1, and setting all other values to zero.
To illustrate, first I make some fake data (with 4 original variables instead of 26) named X1 to X4.
*Fake Data.
SET SEED 10.
INPUT PROGRAM.
LOOP Id = 1 TO 20.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
VECTOR X(4,F2.0).
LOOP #i = 1 TO 4.
COMPUTE X(#i) = TRUNC(RV.UNIFORM(1,62)).
END LOOP.
EXECUTE.
Now what this code does is create four vector sets to go along with each variable, then uses DO REPEAT to actually refer to the VECTOR stub. Then finishes up with RECODE - if it is missing it should be coded a 2.
VECTOR V1_ V2_ V3_ V4_ (61,F1.0).
DO REPEAT orig = X1 TO X4 /V = V1_ V2_ V3_ V4_.
COMPUTE V(orig) = 1.
END REPEAT.
RECODE V1_1 TO V4_61 (SYSMIS = 2).
It is a little painful, as for the original VECTOR command you need to write out all of the stubs, but then you can copy-paste that into the DO REPEAT subcommand (or make a macro to do it for you).
For a more simple illustration, if we have our original variable, say A, that can take on integer values from 1 to 61, and we want to expand to our 61 dummy variables, we would then make a vector and then access the location in that vector.
VECTOR DummyVec(61,F1.0).
COMPUTE DummyVec(A) = 1.
For a record if A = 10, then here DummyVec10 will equal 1, and all the others DummyVec variables will still by system missing by default. No need to use DO IF for 61 values.
The rest of the code is just extra to do it in one swoop for multiple original variables.
This should do it:
do repeat NewV=NewV1 to NewV61/vl=1 to 61.
compute NewV=any(vl,v1 to v26).
end repeat.
EXPLANATION:
This syntax will go through values 1 to 61, for each one checking whether any of the variables v1 to v26 has that value. If any of them do, the right NewV will receive the value of 1. If none of them do, the right NewV will receive the value of 0.
Just make sure v1 to v26 are consecutively ordered in the file. if not, then change to:
compute NewV=any(vl,v1, v2, v3, v4 ..... v26).
You need a nested loop: two loops - one outer and one inner.

foreach using numlist of numbers with leading 0s

In Stata, I am trying to use a foreach loop where I am looping over numbers from, say, 05-11. The problem is that I wish to keep the 0 as part of the value. I need to do this because the 0 appears in variable names. For example, I may have variables named Y2005, Y2006, Var05, Var06, etc. Here is an example of the code that I tried:
foreach year of numlist 05/09 {
...do stuff with Y20`year` or with Var`year`
}
This gives me an error that e.g. Y205 is not found. (I think that what is happening is that it is treating 05 as 5.)
Also note that I can't add a 0 in at the end of e.g. Y20 to get Y200 because of the 10 and 11 values.
Is there a work-around or an obvious thing I am not doing?
Another work-around is
forval y = 5/11 {
local Y : di %02.0f `y'
<code using local Y, which must be treated as a string>
}
The middle line could be based on
`: di %02.0f `y''
so that using another macro can be avoided, but at the cost of making the code more cryptic.
Here I've exploited the extra fact that foreach over such a simple numlist is replaceable with forvalues.
The main trick here is documented here. This trick avoids the very slight awkwardness of treating 5/9 differently from 10/11.
Note. To understand what is going on, it often helps to use display interactively on very simple examples. The detail here is that Stata is happily indifferent to leading zeros when presented with numbers. Usually this is immaterial to you, or indeed a feature as when you appreciate that Stata does not insist on a leading zero for numbers less than 1.
. di 05
5
. di 0.3
.3
. di .3
.3
Here we really need the leading zero, and the art is to see that the problem is one of string manipulation, the strings such as "08" just happening to contain numeric characters. Agreed that this is obvious only when understood.
There's probably a better solution but here's how this one goes:
clear
set more off
*----- example data -----
input ///
var2008 var2009 var2010 var2011 var2012
0 1 2 3 4
end
*----- what you want -----
numlist "10(1)12"
local nums 08 09 `r(numlist)'
foreach x of local nums {
display var20`x'
}
The 01...09 you can insert manually. The rest you build with numlist. Put all that in a local, and finally use it in the loop.
As you say, the problem with your code is that Stata will read 5 when given 05, if you've told it is a number (which you do using numlist in the loop).
Another solution would be to use an if command to count the number of characters in the looping value, and then if needed you can add a leading zero by reassigning the local.
clear
input var2008 var2009 var2010 var2011 var2012
0 1 2 3 4
end
foreach year of numlist 08/12{
if length("`year'") == 1 local year 0`year'
di var20`year'
}

Stata Nested foreach loop substring comparison

I have just started learning Stata and I'm having a hard time.
My problem is this: I have two different variables, ATC and A, where A is potentially a substring of ATC.
Now I want to mark all the observations in which A is a substring of ATC with OK = 1.
I tried this using a simple nested loop:
foreach x in ATC {
foreach j in A {
replace OK = 1 if strpos(`x',`j')!=0
}
}
However, whenever I run this loop no changes are being made even though there should be plenty.
I feel like I should probably give an index specifying which OK is being changed (the one belonging to the ATC/x), but I have no idea how to do this. This is probably really simple but I've been struggling with it for some time.
I should have clarified: my A list is separate from the main list (simply appended to it) and only contains unique keys which I use to identify the ATCs which I want. So I have ~120 A-keys and a couple million ATC keys. What I wanted to do was iterate over every ATC key for every single A-key and mark those ATC-keys with A that qualify.
That means I don't have complete tuples of (ATC,A,OK) but instead separate lists of different sizes.
For example: I have
ATC OK A
ABCD 0 .
EFGH 0 .
... ... ...
. . AB
. . ET
and want the result that "ABCD" having OK is marked as 1 while "EFGH" remains at 0.
We can separate your question into two parts. Your title implies a problem with loops, but your loops are just equivalent to
replace OK = 1 if strpos(ATC, A)!=0
so the use of looping appears irrelevant. That leaves the substring comparison.
Let's supply an example:
. set obs 3
obs was 0, now 3
. gen OK = 0
. gen A = cond(_n == 1, "42", "something else")
. gen ATC = "answer is 42"
. replace OK = 1 if strpos(ATC, A) != 0
(1 real change made)
. list
+------------------------------------+
| OK A ATC |
|------------------------------------|
1. | 1 42 answer is 42 |
2. | 0 something else answer is 42 |
3. | 0 something else answer is 42 |
+------------------------------------+
So it works fine; and you really need to give a reproducible example if you think you have something different.
As for specifying where the variable should be changed: your code does precisely that, as again the example above shows.
The update makes the problem clear. Stata will only look in the same observation for a matching substring when you specify the syntax you gave. A variable in Stata is a field in a dataset. To cycle over a set of values, something like this should suffice
gen byte OK = 0
levelsof A, local(Avals)
quietly foreach A of local Avals {
replace OK = 1 if strpos(ATC, `"`A'"') > 0
}
Notes:
Specifying byte cuts down storage.
You may need an if or in restriction on levelsof.
quietly cuts out messages about changed values. When debugging, it is often better left out.
> 0 could be omitted as a positive result from strpos() is automatically treated as true in logical comparisons. See this FAQ.

Stata: Count a variable by another one?

My little Stata Problem:
I have a table like this:
I want to create a variable that counts the number of different cat for each citing. This is... For the A citing there are 2 cat... the 3 and the 6. So I want another variable (dif_cat) with two 2.
For this sample it would look something like this:
I have tried different methods I always feel like I am getting close but then I can't do it.
I tried bysort with preserve and restore but I don't seem to get there.
One attempt was:
egen tag = tag(cat citing)
egen distinct = total(tag), by(citing)
Can you help me?
PS: I know this has nothing to do with Stata (but it may inspire someone) with an actually programming language I would try something such as:
Having a cycle doing citing column and checking if equal to the one before
Having an auxiliary empty vector
Having a second cycle within the first that wouldsee if the current cat was in the vector and if not put it there.
When the citing changed I would count the lenght of the auxiliary matrix, reset it and do it again. The problem is that I need this in Stata code :S
One way (from Stata FAQ) is:
clear all
set more off
input ///
str1 citing cat
A 3
A 6
B 5
B 2
B 5
B 2
C 2
C 4
C 3
D 5
E 1
E 1
end
list, sepby(citing)
bysort citing cat: gen numvals = (_n == 1)
by citing: replace numvals = sum(numvals)
by citing: replace numvals = numvals[_N]
list, sepby(citing)

Why does Lua have no "continue" statement?

I have been dealing a lot with Lua in the past few months, and I really like most of the features but I'm still missing something among those:
Why is there no continue?
What workarounds are there for it?
In Lua 5.2 the best workaround is to use goto:
-- prints odd numbers in [|1,10|]
for i=1,10 do
if i % 2 == 0 then goto continue end
print(i)
::continue::
end
This is supported in LuaJIT since version 2.0.1
The way that the language manages lexical scope creates issues with including both goto and continue. For example,
local a=0
repeat
if f() then
a=1 --change outer a
end
local a=f() -- inner a
until a==0 -- test inner a
The declaration of local a inside the loop body masks the outer variable named a, and the scope of that local extends across the condition of the until statement so the condition is testing the innermost a.
If continue existed, it would have to be restricted semantically to be only valid after all of the variables used in the condition have come into scope. This is a difficult condition to document to the user and enforce in the compiler. Various proposals around this issue have been discussed, including the simple answer of disallowing continue with the repeat ... until style of loop. So far, none have had a sufficiently compelling use case to get them included in the language.
The work around is generally to invert the condition that would cause a continue to be executed, and collect the rest of the loop body under that condition. So, the following loop
-- not valid Lua 5.1 (or 5.2)
for k,v in pairs(t) do
if isstring(k) then continue end
-- do something to t[k] when k is not a string
end
could be written
-- valid Lua 5.1 (or 5.2)
for k,v in pairs(t) do
if not isstring(k) then
-- do something to t[k] when k is not a string
end
end
It is clear enough, and usually not a burden unless you have a series of elaborate culls that control the loop operation.
You can wrap loop body in additional repeat until true and then use do break end inside for effect of continue. Naturally, you'll need to set up additional flags if you also intend to really break out of loop as well.
This will loop 5 times, printing 1, 2, and 3 each time.
for idx = 1, 5 do
repeat
print(1)
print(2)
print(3)
do break end -- goes to next iteration of for
print(4)
print(5)
until true
end
This construction even translates to literal one opcode JMP in Lua bytecode!
$ luac -l continue.lua
main <continue.lua:0,0> (22 instructions, 88 bytes at 0x23c9530)
0+ params, 6 slots, 0 upvalues, 4 locals, 6 constants, 0 functions
1 [1] LOADK 0 -1 ; 1
2 [1] LOADK 1 -2 ; 3
3 [1] LOADK 2 -1 ; 1
4 [1] FORPREP 0 16 ; to 21
5 [3] GETGLOBAL 4 -3 ; print
6 [3] LOADK 5 -1 ; 1
7 [3] CALL 4 2 1
8 [4] GETGLOBAL 4 -3 ; print
9 [4] LOADK 5 -4 ; 2
10 [4] CALL 4 2 1
11 [5] GETGLOBAL 4 -3 ; print
12 [5] LOADK 5 -2 ; 3
13 [5] CALL 4 2 1
14 [6] JMP 6 ; to 21 -- Here it is! If you remove do break end from code, result will only differ by this single line.
15 [7] GETGLOBAL 4 -3 ; print
16 [7] LOADK 5 -5 ; 4
17 [7] CALL 4 2 1
18 [8] GETGLOBAL 4 -3 ; print
19 [8] LOADK 5 -6 ; 5
20 [8] CALL 4 2 1
21 [1] FORLOOP 0 -17 ; to 5
22 [10] RETURN 0 1
Straight from the designer of Lua himself:
Our main concern with "continue" is that there are several other control structures that (in our view) are more or less as important as "continue" and may even replace it. (E.g., break with labels [as in Java] or even a more generic goto.) "continue" does not seem more special than other control-structure mechanisms, except that it is present in more languages. (Perl actually has two "continue" statements, "next" and "redo". Both are useful.)
The first part is answered in the FAQ as slain pointed out.
As for a workaround, you can wrap the body of the loop in a function and return early from that, e.g.
-- Print the odd numbers from 1 to 99
for a = 1, 99 do
(function()
if a % 2 == 0 then
return
end
print(a)
end)()
end
Or if you want both break and continue functionality, have the local function perform the test, e.g.
local a = 1
while (function()
if a > 99 then
return false; -- break
end
if a % 2 == 0 then
return true; -- continue
end
print(a)
return true; -- continue
end)() do
a = a + 1
end
I've never used Lua before, but I Googled it and came up with this:
http://www.luafaq.org/
Check question 1.26.
This is a common complaint. The Lua authors felt that continue was only one of a number of possible new control flow mechanisms (the fact that it cannot work with the scope rules of repeat/until was a secondary factor.)
In Lua 5.2, there is a goto statement which can be easily used to do the same job.
Lua is lightweight scripting language which want to smaller as possible. For example, many unary operation such as pre/post increment is not available
Instead of continue, you can use goto like
arr = {1,2,3,45,6,7,8}
for key,val in ipairs(arr) do
if val > 6 then
goto skip_to_next
end
# perform some calculation
::skip_to_next::
end
We can achieve it as below, it will skip even numbers
local len = 5
for i = 1, len do
repeat
if i%2 == 0 then break end
print(" i = "..i)
break
until true
end
O/P:
i = 1
i = 3
i = 5
We encountered this scenario many times and we simply use a flag to simulate continue. We try to avoid the use of goto statements as well.
Example: The code intends to print the statements from i=1 to i=10 except i=3. In addition it also prints "loop start", loop end", "if start", and "if end" to simulate other nested statements that exist in your code.
size = 10
for i=1, size do
print("loop start")
if whatever then
print("if start")
if (i == 3) then
print("i is 3")
--continue
end
print(j)
print("if end")
end
print("loop end")
end
is achieved by enclosing all remaining statements until the end scope of the loop with a test flag.
size = 10
for i=1, size do
print("loop start")
local continue = false; -- initialize flag at the start of the loop
if whatever then
print("if start")
if (i == 3) then
print("i is 3")
continue = true
end
if continue==false then -- test flag
print(j)
print("if end")
end
end
if (continue==false) then -- test flag
print("loop end")
end
end
I'm not saying that this is the best approach but it works perfectly to us.
Again with the inverting, you could simply use the following code:
for k,v in pairs(t) do
if not isstring(k) then
-- do something to t[k] when k is not a string
end
Why is there no continue?
Because it's unnecessary¹. There's very few situations where a dev would need it.
A) When you have a very simple loop, say a 1- or 2-liner, then you can just turn the loop condition around and it's still plenty readable.
B) When you're writing simple procedural code (aka. how we wrote code in the last century), you should also be applying structured programming (aka. how we wrote better code in the last century)
C) If you're writing object-oriented code, your loop body should consist of no more than one or two method calls unless it can be expressed in a one- or two-liner (in which case, see A)
D) If you're writing functional code, just return a plain tail-call for the next iteration.
The only case when you'd want to use a continue keyword is if you want to code Lua like it's python, which it just isn't.²
What workarounds are there for it?
Unless A) applies, in which case there's no need for any workarounds, you should be doing Structured, Object-Oriented or Functional programming. Those are the paradigms that Lua was built for, so you'd be fighting against the language if you go out of your way to avoid their patterns.³
Some clarification:
¹ Lua is a very minimalistic language. It tries to have as few features as it can get away with, and a continue statement isn't an essential feature in that sense.
I think this philosophy of minimalism is captured well by Roberto Ierusalimschy in this 2019 interview:
add that and that and that, put that out, and in the end we understand the final conclusion will not satisfy most people and we will not put all the options everybody wants, so we don’t put anything. In the end, strict mode is a reasonable compromise.
² There seems to be a large number of programmers coming to Lua from other languages because whatever program they're trying to script for happens to use it, and many of them want don't seem to want to write anything other than their language of choice, which leads to many questions like "Why doesn't Lua have X feature?"
Matz described a similar situation with Ruby in a recent interview:
The most popular question is: "I’m from the language X community; can’t you introduce a feature from the language X to Ruby?", or something like that. And my usual answer to these requests is… "no, I wouldn’t do that", because we have different language design and different language development policies.
³ There's a few ways to hack your way around this; some users have suggested using goto, which is a good enough aproximation in most cases, but gets very ugly very quickly and breaks completely with nested loops. Using gotos also puts you in danger of having a copy of SICP thrown at you whenever you show your code to anybody else.

Resources