Loop Postestimation Tests after Regression - loops

A loop for a number of regressions is performed. For each regression we need to conduct some heteroscedasticity tests. The following code unfortunately does not work:
gen p_hettest = .
quietly forvalues i = 1/10 {
reg y x if id == `i'
estat hettest if id == `i'
replace p_hettest=r(p) if id == `i'
}
Here is a data sample:
clear
input float(y x id)
-.006994963 -7.015742e-06 1
.002128173 2.7695405e-06 1
.01837084 .000015578877 1
-.018459747 -.000017552491 1
-.008869853 -8.115663e-06 1
0 0 1
.00081374 1.039456e-06 1
.0192536 .00001801726 1
-.004777103 -2.800596e-06 1
.006691461 4.95152e-06 1
-.015235436 -.000015264517 1
.03523033 -.00001293428 2
.037114896 .00001956828 2
.0041321944 -6.849998e-06 2
-.000645176 .000012979223 2
-.015742416 -4.716876e-06 2
.005813865 -2.943401e-06 2
.00220989 -4.920239e-06 2
.003843212 8.216926e-06 2
.013684767 -4.7989766e-07 2
.02013146 3.841124e-07 2
.0714285 2.9144696e-06 3
.02564108 6.107174e-06 3
-.01336905 -7.19949e-06 3
0 .000031617565 3
.034420278 3.418627e-06 3
-.04042552 .00004654335 3
.03571425 .000024398614 3
-.002500042 -3.514139e-06 3
-.04651165 -.00004515287 3
.05263159 -7.449272e-06 3
.08727269 -7.16101e-06 3
end
A r(101) error occurs, indicating: "if not allowed".
Is there an alternative way to loop regress-postestimation tests?

The issue is that estat hettest does not take if qualifiers. I am not familiar with the command, but I would guess that it uses only the values from the regression to perform the test.
If you modify your code to look like:
gen p_hettest = .
quietly forvalues i = 1/10 {
reg y x if id == `i'
estat hettest
replace p_hettest=r(p) if id == `i'
}
you should be all set.
If you take off the quietly, you can see that the values for r(p) are changing for each call of estat hettest

Related

how to create a loop for a macro in stata?

I have a very large dataset but to cut it short I demonstrated the data with the following example:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(patid death dateofdeath)
1 0 .
2 0 .
3 0 .
4 0 .
5 1 15007
6 0 .
7 0 .
8 1 15526
9 0 .
10 0 .
end
format %d dateofdeath
I am trying to sample for a case-control study based on date of death. At this stage, I need to first create a variable with each date of death repeated for all the participants (hence we end up with a dataset with 20 participants) and a pairid equivalent to the patient id patid of the corresponding case.
I created a macro for one case (which works) but I am finding it difficult to have it repeated for all cases (where death==1) in a loop.
The successful macro is as follows:
local i "5" //patient id who died
gen pairid= `i'
gen matchedindexdate = dateofdeath
replace matchedindexdate=0 if pairid != patid
gsort matchedindexdate
replace matchedindexdate= matchedindexdate[_N]
format matchedindexdate %d
save temp`i'
and the loop I attempted is:
* (min and max patid id)
forval j = 1/10 {
count if patid == `j' & death==1
if r(N)=1 {
gen pairid= `j'
gen matchedindexdate = dateofdeath
replace matchedindexdate=0 if pairid != patid
gsort matchedindexdate
replace matchedindexdate= matchedindexdate[_N]
save temp/matched`j'
}
}
use temp/matched1, clear
forval i=2/10 {
capture append using temp/matched`i'
save matched, replace
}
but I get:
invalid syntax
How can I do the loop?
I finally had it solved, please check:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1591811-how-to-create-a-loop-for-a-macro

SPSS: using IF function with REPEAT when each case has multiple linked instances

I have a dataset as such:
Case #|DateA |Drug.1|Drug.2|Drug.3|DateB.1 |DateB.2 |DateB.3 |IV.1|IV.2|IV.3
------|------|------|------|------|--------|---------|--------|----|----|----
1 |DateA1| X | Y | X |DateB1.1|DateB1.2 |DateB1.3| 1 | 0 | 1
2 |DateA2| X | Y | X |DateB2.1|DateB2.2 |DateB2.3| 1 | 0 | 1
3 |DateA3| Y | Z | X |DateB3.1|DateB3.2 |DateB3.3| 0 | 0 | 1
4 |DateA4| Z | Z | Z |DateB4.1|DateB4.2 |DateB4.3| 0 | 0 | 0
For each case, there are linked variables i.e. Drug.1 is linked with DateB.1 and IV.1 (Indicator Variable.1); Drug.2 is linked with DateB.2 and IV.2, etc.
The variable IV.1 only = 1 if Drug.1 is the case that I want to analyze (in this example, I want to analyze each receipt of Drug "X"), and so on for the other IV variables. Otherwise, IV = 0 if the drug for that scenario is not "X".
I want to calculate the difference between DateA and DateB for each instance where Drug "X" is received.
e.g. In the example above I want to calculate a new variable:
DateDiffA1_B1.1 = DateA1 - DateB1.1
DateDiffA1_B2.1 = DateA1 - DateB2.1
DateDiffA1_B1.3 = DateA1 - DateB1.3
DateDiffA1_B2.3 = DateA1 - DateB2.3
DateDiffA1_B3.3 = DateA1 - DateB3.3
I'm not sure if this new variable would need to be linked to each instance of Drug "X" as for the other variables, or if it could be a single variable that COUNTS all the instances for each case.
The end goal is to COUNT how many times each case had a date difference of <= 2 weeks when they received Drug "X". If they did not receive Drug "X", I do not want to COUNT the date difference.
I will eventually want to compare those who did receive Drug "X" with a date difference <= 2 weeks to those who did not, so having another indicator variable to help separate out these specific patients would be beneficial.
I am unsure about the best way to go about this; I suspect it will require a combination of IF and REPEAT functions using the IV variable, but I am relatively new with SPSS and syntax and am not sure how this should be coded to avoid errors.
Thanks for your help!
EDIT: It seems like I may need to use IV as a vector variable to loop through the linked variables in each case. I've tried the syntax below to no avail:
DATASET ACTIVATE DataSet1.
vector IV = IV.1 to IV.3.
loop #i = .1 to .3.
do repeat DateB = DateB.1 to DateB.3
/ DrugDateDiff = DateDiff.1 to DateDiff.3.
if IV(#i) = 1
/ DrugDateDiff = datediff(DateA, DateB, "days").
end repeat.
end loop.
execute.
Actually there is no need to add the vector and the loop, all you need can be done within one DO REPEAT:
compute N2W=0.
do repeat DateB = DateB.1 to DateB.3 /IV=IV.1 to IV.3 .
if IV=1 and datediff(DateA, DateB, "days")<=14 N2W = N2W + 1.
end repeat.
execute.
This syntax will first put a zero in the count variable N2W. Then it will loop through all the dates, and only if the matching IV is 1, the syntax will compare them to dateA, and add 1 to the count if the difference is <=2 weeks.
if you prefer to keep the count variable as missing when none of the IV are 1, instead of compute N2W=0. start the syntax with:
If any(1, IV.1 to IV.3) N2W=0.

Check 2d array for same values

I am trying to make a game and i have a 2d array
So its like this:
Grid[x][y]
lets pretend these values are in it:
Column 1 Column 2 Column 3 Column 4 Column 5
1 2 5 2 5
2 2 3 1 1
1 4 3 4 5
1 3 3 3 5 <-- match this row
3 5 3 4 5
2 4 3 4 5
2 4 4 4 5
In the middle (index 4) i want to check if there are at least 3 times the same number and what about if there are 4 times the same or even 5.
How do you check this ? What would be a good way to find the same and delete those that are the same... I am stuck to figure out the logic to make something like this
this is what i tried:
grid = {}
for x = 1, 5 do
grid[x] = {finish = false}
for y = 1, 7 do
grid[x][y] = {key= math.random(1,4)}
end
end
function check(t)
local tmpArray = {}
local object
for i = 1,5 do
object = t[i][1].key
if object == t[i+1][1].key then
table.insert( tmpArray, object )
else
break
end
end
end
print_r(grid)
check(grid)
print_r(grid)
where print_r prints the grid:
function print_r ( t )
local print_r_cache={}
local function sub_print_r(t,indent)
if (print_r_cache[tostring(t)]) then
print(indent.."*"..tostring(t))
else
print_r_cache[tostring(t)]=true
if (type(t)=="table") then
for pos,val in pairs(t) do
if (type(val)=="table") then
print(indent.."["..pos.."] => "..tostring(t).." {")
sub_print_r(val,indent..string.rep(" ",string.len(pos)+8))
print(indent..string.rep(" ",string.len(pos)+6).."}")
else
print(indent.."["..pos.."] => "..tostring(val))
end
end
else
print(indent..tostring(t))
end
end
end
sub_print_r(t," ")
end
It doesnt work that great because i check with the index after that one and if that isnt the same it doesnt add the last one..
I dont know if it is the best way to go...
If i "delete" the matched indexes my plan is to move the index row above or beneath it into the 4 index row... but first things first
You should compare the second index not the first: in the table
g = {{1,2,3}, {4,5,6}}
g[1] is first row i.e. {1,2,3}, not {1,4} the first column (first element of first and second rows). You were doing same thing in previous post of yours, you should reread the Lua docs about tables. You should do something like
for i = 1,#t do
object = t[i][1].key
if object == t[i][2].key then
This will only compare first two items in row. If you want to check whether the row has any identical consecutive items you will have to loop over the second index from 1 to #t[i]-1.
You might find the following print function much more useful, as it prints table as a grid, easier to see before/after:
function printGrid(g)
for i, t in ipairs(g) do
print('{' .. table.concat(t, ',') .. '}')
end
end

Setting Up a Dynamic Stopping Point for a Loop

Data is setup with a bunch of information corresponding to an ID, which can show-up more than once.
ID Data
1 X
1 Y
2 A
2 B
2 Z
3 X
I want a loop that signifies which instance of the ID I am looking at. Is it the first time, second time, etc? I want it as a string in the form _# so I have to go beyond the simple _n function in Stata, to my knowledge. If someone knows a way to do what I want without the loop let me know, but I would still like the answer.
I have the following loop in Stata
by ID: gen count_one = _n
gen count_two = ""
quietly forval j = 1/3 {
replace count_two = "_`j'" if count_one == `j'
}
The output now looks like this:
ID Data count_one count_two
1 X 1 _1
1 Y 2 _2
2 A 1 _1
2 B 2 _2
2 Z 3 _3
3 X 1 _1
The question is how can I replace the 16 above with to tell Stata to take the max of the count_one column because I need to run this weekly and that max will change and I want to reduce errors.
It's hard to understand why you want this, but it is one line whether you want numeric or string:
bysort ID : gen nummax = _N
bysort ID : gen strmax = "_" + string(_N)
Note that the sort order within ID is irrelevant to the number of observations for each.
Some parts of your question aren't clear ("...replace the 16 above with to tell Stata...") but:
Why don't you just use _n with tostring?
gsort +ID +data
bys ID: g count_one=_n
tostring count_one, gen(count_two)
replace count_two="_"+count_two
Then to generate the max (answering the partial question at the end there) -- although note this value will be repeated across instances of each ID value:
bys ID: egen maxcount1=max(count_one)
or more elegantly:
bys ID: g maxcount2=_N

dtrace - aggregation of variables passing to funcion from diffrent places where function was called

Sorry for compilcated title, but here is a dtrace output of my script, which I think will help explain what im talking about:
16384 1
38048 1
38050 1
38051 1
38055 1
-58632623 1
-5180681 1
-4576706 1
-4498881 1
-4472021 1
-4464140 1
<...>
mymodule.so`FuncXXX
mymodule.so`FirstFunc+0x23c
8
mymodule.so`FuncXXX
mymodule.so`SecondFunc+0x4bc
9
mymodule.so`FuncXXX
mymodule.so`ThirdFunc+0x1e1
35
mymodule.so`FuncXXX
mymodule.so`FourthFunc+0x70
39
dtrace script is:
pid$1:mymodule:FuncXXX:entry{
#a[arg1] = count();
#b[arg2] = count();
#c[ustack()] = count();
}
FuncXXX has singature: void FuncXXX(void *arg, long int p, int q);
Now, i want to aggregate variables p and q, but in order where FuncXXX has been called, eg:
mymodule.so`FuncXXX'
mymodule.so`FirstFunc'+0x23c
8
16384 1
38048 1
38050 1
38051 1
38055 1
-58632623 1
-5180681 1
-4576706 1
-4498881 1
-4472021 1
-4464140 1
mymodule.so`FuncXXX'
mymodule.so`SecondFunc'+0x4bc
9
49599 1
51533 1
51535 1
52149 1
52152 1
-148909 1
-135530 1
-121514 1
-117860 1
-97633 1
and so on
Is it possible to do it ? Or should I trace FirstFunc, SecondFunc, ThirdFunc and FourthFunc independently ? But thing is, that in all of that funcions, FuncXXX can not always be called.
Best regards and thx for all the answers.
Aggregates can be indexed in multiple dimensions, i.e. you can write:
#a[ustack(2), arg1, arg2] = count();
You can also, if you're only interested in the caller of your probe function, use either the DTrace variable "caller" (gives you the address the probe func got called from unfortunately - not the symbol name) or give ustack() a number of stackframes to record, ustack(2) just giving you the probefunc (FuncXXX in your case) and the place it got called from.
The above aggregation therefore tells you "how many times has FuncXXX been called from ... with a combination of arguments arg1/arg2".

Resources