Stata: Count a variable by another one? - loops

My little Stata Problem:
I have a table like this:
I want to create a variable that counts the number of different cat for each citing. This is... For the A citing there are 2 cat... the 3 and the 6. So I want another variable (dif_cat) with two 2.
For this sample it would look something like this:
I have tried different methods I always feel like I am getting close but then I can't do it.
I tried bysort with preserve and restore but I don't seem to get there.
One attempt was:
egen tag = tag(cat citing)
egen distinct = total(tag), by(citing)
Can you help me?
PS: I know this has nothing to do with Stata (but it may inspire someone) with an actually programming language I would try something such as:
Having a cycle doing citing column and checking if equal to the one before
Having an auxiliary empty vector
Having a second cycle within the first that wouldsee if the current cat was in the vector and if not put it there.
When the citing changed I would count the lenght of the auxiliary matrix, reset it and do it again. The problem is that I need this in Stata code :S

One way (from Stata FAQ) is:
clear all
set more off
input ///
str1 citing cat
A 3
A 6
B 5
B 2
B 5
B 2
C 2
C 4
C 3
D 5
E 1
E 1
end
list, sepby(citing)
bysort citing cat: gen numvals = (_n == 1)
by citing: replace numvals = sum(numvals)
by citing: replace numvals = numvals[_N]
list, sepby(citing)

Related

Stata: assign numbers in a loop

I have a problem creating a loop in Stata.
I have a dataset in Stata where I classified my observations into 6 categories via variable k10. So k10 takes on values 1,2,3,4,5,6.
Now I want to assign each observation one value according to its class:
value 15 for k10=1
value 10 for k10=2
value 8 for k10=3
value 5 for k10=4
value 4 for k10=5
value 2 for k10=6
It is easy if I create a new variable w10 and do it like the following:
gen w10 =.
replace w10 = 15 if k10==1
replace w10 = 10 if k10==2
replace w10 = 8 if k10==3
replace w10 = 5 if k10==4
replace w10 = 4 if k10==5
replace w10 = 2 if k10==6
Now I tried to simplify the code by using a loop, unfortunately it does not do what I want to achieve.
My loop:
gen w10=.
local A "1 2 3 4 5 6"
local B "15 10 8 5 4 2"
foreach y of local A {
foreach x of local B {
replace w10 = `x' if k10= `y'
}
}
The loop assigns value 2 to each observation though. The reason is that the if-condition k10=`y' is always true and overwrites the replaced w10s each time until the end, right?
So how can I write the loop correctly?
It's really just one loop, not two nested loops. That's your main error, which is general programming logic. Only the last time you go through the inner loop has an effect that lasts. Try tracing the loops by hand to see this.
Specifically in Stata, looping over the integers 1/6 is much better done with forval; there is no need at all for the indirection of defining a local macro and then obliging foreach to look inside that macro. That can be coupled with assigning the other values to local macros with names 1 ... 6. Here tokenize is the dedicated command to use.
Try this:
gen w10 = .
tokenize "15 10 8 5 4 2"
quietly forval i = 1/6 {
replace w10 = ``i'' if k10 == `i'
}
Note incidentally that you need == not = when testing for equality.
See (e.g.) this discussion.
Many users of Stata would want to do it in one line with recode. Here I concentrate on the loop technique, which is perhaps of wider interest.

Stata Nested foreach loop substring comparison

I have just started learning Stata and I'm having a hard time.
My problem is this: I have two different variables, ATC and A, where A is potentially a substring of ATC.
Now I want to mark all the observations in which A is a substring of ATC with OK = 1.
I tried this using a simple nested loop:
foreach x in ATC {
foreach j in A {
replace OK = 1 if strpos(`x',`j')!=0
}
}
However, whenever I run this loop no changes are being made even though there should be plenty.
I feel like I should probably give an index specifying which OK is being changed (the one belonging to the ATC/x), but I have no idea how to do this. This is probably really simple but I've been struggling with it for some time.
I should have clarified: my A list is separate from the main list (simply appended to it) and only contains unique keys which I use to identify the ATCs which I want. So I have ~120 A-keys and a couple million ATC keys. What I wanted to do was iterate over every ATC key for every single A-key and mark those ATC-keys with A that qualify.
That means I don't have complete tuples of (ATC,A,OK) but instead separate lists of different sizes.
For example: I have
ATC OK A
ABCD 0 .
EFGH 0 .
... ... ...
. . AB
. . ET
and want the result that "ABCD" having OK is marked as 1 while "EFGH" remains at 0.
We can separate your question into two parts. Your title implies a problem with loops, but your loops are just equivalent to
replace OK = 1 if strpos(ATC, A)!=0
so the use of looping appears irrelevant. That leaves the substring comparison.
Let's supply an example:
. set obs 3
obs was 0, now 3
. gen OK = 0
. gen A = cond(_n == 1, "42", "something else")
. gen ATC = "answer is 42"
. replace OK = 1 if strpos(ATC, A) != 0
(1 real change made)
. list
+------------------------------------+
| OK A ATC |
|------------------------------------|
1. | 1 42 answer is 42 |
2. | 0 something else answer is 42 |
3. | 0 something else answer is 42 |
+------------------------------------+
So it works fine; and you really need to give a reproducible example if you think you have something different.
As for specifying where the variable should be changed: your code does precisely that, as again the example above shows.
The update makes the problem clear. Stata will only look in the same observation for a matching substring when you specify the syntax you gave. A variable in Stata is a field in a dataset. To cycle over a set of values, something like this should suffice
gen byte OK = 0
levelsof A, local(Avals)
quietly foreach A of local Avals {
replace OK = 1 if strpos(ATC, `"`A'"') > 0
}
Notes:
Specifying byte cuts down storage.
You may need an if or in restriction on levelsof.
quietly cuts out messages about changed values. When debugging, it is often better left out.
> 0 could be omitted as a positive result from strpos() is automatically treated as true in logical comparisons. See this FAQ.

Basic Python loop

How does Python know what "i" is when it is not defined, shouldn't there be an error? Probably a simple explanation, but I am new to learning Python.
def doubles (sum):
return sum * 2
myNum = 2
for i in range (0,3):
myNum = doubles(myNum)
print (myNum)
Haha :-) People are marking down your question, but I know that is one question must have came in every person's mind. Specially those who learned Python through Online courses and not through a teacher in person.
Well let me explain that in layman's term,
The method that you used is specially used for 1) lists and 2) lists within lists.
For eg,
example1= ['a','b','c'] # This is a simple list
example2 = [['a','b','c'],['a','b','c'],['a','b','c']] # This is list within lists.
Now, 'a','b' & 'c' are items in list.
So by saying,
for i in example1:
print i
we are actually saying,
for item in the list(example1):
print item
-------------------------
People use 'i', probably taken as abbreviation to item, or something else.
I don't know the history.
But, the fact is that, we can use anything instead or 'i' and Python will still consider it as an item in list.
Let me give you examples again.
example1= ['a','b','c'] # This is a simple list
example2 = [['a','b','c'],['a','b','c'],['a','b','c']] # This is list within lists.
for i in example1:
print i
[out]: a
b
c
now in example2, items are lists within lists. --- also, now i will use the word 'item' instead of 'i' --- the results regardless would be the same for both.
for item in example2:
print item
[out]: ['a','b','c']
['a','b','c']
['a','b','c']
people also use singulars and plurals to remember things,
so lets we have a list of alphabet.
letters=['a','b','c','d']
for letter in letters:
print letter
[out]: a
b
c
d
Hope that helps. There is much more to explain.
Keep researching and keep learning.
Regards,
Md. Mohsin
Using a variable as a loop control variable does assign to it each time through the loop.
As to "what it is"... Python is dynamically typed. The only thing it "is" is a name, just like any other variable.
i is assigned the value in the loop itself, it has no value (it is not defined) before the Python interpreter reaches the for line.
Its similar to how other for loops define variables. In C++ for example:
for(int i=0; i<5; i++){
cout << i << endl;
}
Here the variable i is only exists once the for loop is called.
i is assigned a value when the for loop runs, so the Python interpreter will not raise an error when the loop is run
long story short it creates a new variable without having to be defined and its value is whatever number your loop is on, for example if you had written:
num = 0
for i in range(3):
print(num)
num = num + 1
so for the first time this loop ran 'i' would equal 0 (because python lists/loops etc always start on 0 not 1), the second time it would equal 1, etc. and the 'num' you can ignore it's just an example of code you could have in a loop which would print out numbers in ascending order.
Levin

Display data in Stata loop

I have a loop in Stata 12 that looks at each record in a file and if it finds a flag equal to 1 it generates five imputed values. My code looks like this:
forvalues i=1/5 {
gen y3`i' = y2
gen double h`i' = (uniform()*(1-a)+a) if flag==1
replace y3`i' = 1.6*(invibeta(7.2,2.6,h`i')/(1-invibeta(7.2,2.6,h`i')))^(1/1.7) if
flag==1
}
a is defined elsewhere. I want to check the individual imputations. Thus, I need to display the imputed variable preferably only for those cases where flag=1. I also would like to display another value, s, alongside. I need help in figuring out the syntax. I've tried every combination of quotes and subscripts that I can think of, but I keep getting error messages.
One other useful modification occurs to me. Suppose I have three concatenated files on which I want to perform this routine. Let them have a variable file equal to 1, 2 or 3. I'd like to set a separate seed for each and do it in my program so I have a record. I envision something like:
forvalues j=1/3 {
set seed=12345 if file=1
set seed=56789 if file=2
set seed=98765 if file=3
insert code above
}
Will this work?
No comment is possible on code you don't show, but the word "display" may be misleading you.
list y3`i' if flag == 1
or some variation may be what you seek. Note that display is geared to showing at most one line of output at a time.
P.S. As you are William Shakespeare, know that the mug http://www.stata.com/giftshop/much-ado-mug/ was inspired by your work.
A subsidiary question asks about choosing a different seed each time around a loop. That is easy:
forval j = 1/3 {
local seed : word `j' of 12345 56789 98765
set seed `seed'
...
}
or
tokenize 12345 56789 98765
forval j = 1/3 {
set seed ``j''
...
}

Fixed Place Permutation/Combination

I am looking for a way where I can generated different combination of 4 sets elements in such a manner that every set's element has a fixed place in the final combination:
To explain better my requirement let me give sample of those 4 sets and finally what I am looking for:
Set#1(Street Pre Direction) { N, S }
Set#2(Street Name) {Frankford, Baily}
Set#3(Street Type) {Ave, St}
Set#4(Street Post Direction) {S}
Let me list few expecting combinations:
N Baily Ave S
S Frankford St S
S Baily Av S
.
.
.
Now as you can see that every set's element is falling into its place
Pre Direction is in Place 1
Street Name is in Place 2
Streety Type is in Place 3
Street Description is in Place 4
I am looking for the most efficient way to carry out this task, One way to do it is to work at 2 sets at a time like:
Make Combination of Set 1 and Set 2 --> create a new set 5 of resulting combinations
Make Combination of Set 5 and Set 3 --> create a new set 6 of resulting combinations
Make Combination of Set 6 and Set 4 --> This will give me the final combinations
Is there a best way to do this thing? Kindly help. I will prefer C# or Java.
Thanks
Here's some linq (c#) that gives you all combinations, it is not "the most efficient way".
var query =
from d in predirections
from n in names
from t in types
from s in postdirections
select new {d, n, t, s};
It sounds like you're looking for the cartesian product of some sets. You can do it using nested for loops. Here's the Haskell code, which you didn't ask for.
Prelude> [[x,y] | x <- ['1'..'3'], y <- ['A'..'C']]
["1A","1B","1C","2A","2B","2C","3A","3B","3C"]
# David B
What if predirections list is empty, is there a way that we still get the combinations since through your way none cartesian product will be returned.
David B here:
var query =
from d in predirections.DefaultIfEmpty()
from n in names.DefaultIfEmpty()
from t in types.DefaultIfEmpty()
from s in postdirections.DefaultIfEmpty()
select new {d, n, t, s};

Resources