Drawing forest plot Stata - loops

I would like to ask if it is possible to create a forest plot for the effect size after running a regression loop for multiple outcomes. I have used the command below, which I found in this question. https://www.statalist.org/forums/forum/general-stata-discussion/general/1701936-extract-coefficient-and-p-value-for-certain-variable-from-regression-loop
frame create results_NProt_Pulse_Velocity
frame results_NProt_Pulse_Velocity {
set obs 1000
gen outcome= ""
gen coef=.
gen SE=.
gen pvalue=.
gen ci_l=.
gen ci_u=.
}
local counter 0
foreach outcome of varlist CHIP-C34_HIV_F {
regress `outcome' PulseWaveVelocity Age sq_age
if r(table)[4, 1] < 0.05{
local ++counter
frame results_NProt_Pulse_Velocity {
replace outcome= "`outcome'" in `counter'
replace coef= `=r(table)[1, 1]' in `counter'
replace SE= `=r(table)[2, 1]' in `counter'
replace pvalue= `=r(table)[4, 1]' in `counter'
replace ci_l= `=r(table)[5, 1]' in `counter'
replace ci_u= `=r(table)[6, 1]' in `counter'
}
}
}
frame change results_NProt_Pulse_Velocity
drop if missing(outcome)
browse
And after this command I got something like this:
Click image for larger version
So how can I create a forest plot after getting this result?
I want the forest plot to include the names of the outcomes and the coefficient value on the y-axis.

Related

Extract coefficient and p-value for certain variable from regression loop

In Stata have applied a regression loop to 1000 metabolites (outcome), and the exposure variable is BMI. I also have other variables in the model. I would like to know how I can extract only the coefficient, p-value, and 95% CI for BMI if and only aif BMI is significant. And then I want to extract them into an Excel file.
This is the code I have used. It informed me that there were, for example, 100 significant results. So I'm trying to figure out which 100 are those and extract them for BMI only, without other variables in the model.
local counter = 0
local counter_pos = 0
local counter_neg = 0
foreach outcome of varlist B - Z {
regress `outcome' bmi Age i.sex i.smoking i.lpa2c i.cholestrol
matrix M = r(table)
if M[4, 1] < 0.05 {
local ++counter
if _b[bmi] < 0 {
local ++counter_neg
}
else {
local ++counter_pos
}
}
}
display as text "Total of significant results: " as result `counter'
Here is a reproducible example showing how to send a variable name and some results to a new file. In your case, posting is conditional on a conventionally significant result; here it is unconditional.
sysuse auto, clear
local counter = 0
local negative = 0
local positive = 0
tempname RESULTS
postfile `RESULTS' str32 varname coefficient using myresults.dta, replace
foreach v in price mpg rep78 headroom trunk length turn displacement gear_ratio {
quietly regress `v' weight
local ++counter
if _b[weight] < 0 local ++negative
else local ++positive
post `RESULTS' ("`v'") (_b[weight])
}
di "variables tried: " `counter'
di "negative relation: " `negative'
di "positive relation: " `positive'
postclose `RESULTS'
use myresults, clear
compress
list

how to create a loop for a macro in stata?

I have a very large dataset but to cut it short I demonstrated the data with the following example:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(patid death dateofdeath)
1 0 .
2 0 .
3 0 .
4 0 .
5 1 15007
6 0 .
7 0 .
8 1 15526
9 0 .
10 0 .
end
format %d dateofdeath
I am trying to sample for a case-control study based on date of death. At this stage, I need to first create a variable with each date of death repeated for all the participants (hence we end up with a dataset with 20 participants) and a pairid equivalent to the patient id patid of the corresponding case.
I created a macro for one case (which works) but I am finding it difficult to have it repeated for all cases (where death==1) in a loop.
The successful macro is as follows:
local i "5" //patient id who died
gen pairid= `i'
gen matchedindexdate = dateofdeath
replace matchedindexdate=0 if pairid != patid
gsort matchedindexdate
replace matchedindexdate= matchedindexdate[_N]
format matchedindexdate %d
save temp`i'
and the loop I attempted is:
* (min and max patid id)
forval j = 1/10 {
count if patid == `j' & death==1
if r(N)=1 {
gen pairid= `j'
gen matchedindexdate = dateofdeath
replace matchedindexdate=0 if pairid != patid
gsort matchedindexdate
replace matchedindexdate= matchedindexdate[_N]
save temp/matched`j'
}
}
use temp/matched1, clear
forval i=2/10 {
capture append using temp/matched`i'
save matched, replace
}
but I get:
invalid syntax
How can I do the loop?
I finally had it solved, please check:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1591811-how-to-create-a-loop-for-a-macro

lag over columns/ variables SPSS

I want to do something I thought was really simple.
My (mock) data looks like this:
data list free/totalscore.1 to totalscore.5.
begin data.
1 2 6 7 10 1 4 9 11 12 0 2 4 6 9
end data.
These are total scores accumulating over a number of trials (in this mock data, from 1 to 5). Now I want to know the number of scores earned in each trial. In other words, I want to subtract the value in the n trial from the n+1 trial.
The most simple syntax would look like this:
COMPUTE trialscore.1 = totalscore.2 - totalscore.1.
EXECUTE.
COMPUTE trialscore.2 = totalscore.3 - totalscore.2.
EXECUTE.
COMPUTE trialscore.3 = totalscore.4 - totalscore.3.
EXECUTE.
And so on...
So that the result would look like this:
But of course it is not possible and not fun to do this for 200+ variables.
I attempted to write a syntax using VECTOR and DO REPEAT as follows:
COMPUTE #y = 1.
VECTOR totalscore = totalscore.1 to totalscore.5.
DO REPEAT trialscore = trialscore.1 to trialscore.5.
COMPUTE #y = #x + 1.
END REPEAT.
COMPUTE trialscore(#i) = totalscore(#y) - totalscore(#i).
EXECUTE.
But it doesn't work.
Any help is appreciated.
Ps. I've looked into using LAG but that loops over rows while I need it to go over 1 column at a time.
I am assuming respid is your original (unique) record identifier.
EDIT:
If you do not have a record indentifier, you can very easily create a dummy one:
compute respid=$casenum.
exe.
end of EDIT
You could try re-structuring the data, so that each score is a distinct record:
varstocases
/make totalscore from totalscore.1 to totalscore.5
/index=scorenumber
/NULL=keep.
exe.
then sort your cases so that scores are in descending order (in order to be bale to use lag function):
sort cases by respid (a) scorenumber (d).
Then actually do the lag-based computations
do if respid=lag(respid).
compute trialscore=totalscore-lag(totalscore).
end if.
exe.
In the end, un-do the restructuring:
casestovars
/id=respid
/index=scorenumber.
exe.
You should end up with a set of totalscore variables (the last one will be empty), which will hold what you need.
you can use do repeat this way:
do repeat
before=totalscore.1 to totalscore.4
/after=totalscore.2 to totalscore.5
/diff=trialscore.1 to trialscore.4 .
compute diff=after-before.
end repeat.

SPSS for loop based on a variable

I'm just learning SPSS and I want to do simple subgroup analysis based on a variable "status" I created which can take values from 0 to 8. I would like to print outputs in one go.
this is the pseudocode for what I want to do:
for( i = 1, i = 8, i++)
{
filter by (ststus = i)
display analysis
remove filter
}
That way I can do it all in one go but also i can add to the analysis code and do something easily for the 8 subgroups.
I don't know if it's relevant but here is the code I want to iterate over currently:
USE ALL.
COMPUTE filter_$=(Workforce EQ 1 AND SurveySample = 1 AND State = 1).
VARIABLE LABELS filter_$ 'Workforce EQ 1 (FILTER)'.
> VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMATS filter_$
> (f1.0). FILTER BY filter_$. EXECUTE.
>
>
> FREQUENCIES VARIABLES = Q86 Q33 Q34 Q88 FSEScore /BARCHART FREQ
> /ORDER=ANALYSIS.
>
> CROSSTABS /TABLES=FSEScore BY Q86 /FORMAT=AVALUE TABLES
> /CELLS=ROW /COUNT ROUND CELL.
>
> FILTER OFF. USE ALL.
Thanks guys.
split file command may solve the problem - it causes your analysis reports to show results for each category of your split variable separately:
*run your transformations.
sort cases by status.
split file by status.
FREQUENCIES .....
CROSSTABS ....
split file off.
If this is not enough, you can use a macro to run through "status" categories:
first define the macro:
define MyMacro ()
!do !ST=1 !to 8
* filter commands using **status = !ST**
* transformations using **status = !ST**
FREQUENCIES .....
CROSSTABS ....
!doend
!enddefine.
now call your macro:
MyMacro .
this is probably a very getto way of doing this, the suggestion above is probably more sensible.
You can initialise Python is spss. The following code works:
begin program.
import spss
for i in xrange(1,8):
string = str(i)
spss.Submit("""
USE ALL.
COMPUTE filter_$=(Workforce EQ 1 AND SurveySample = 1 AND Status = %s).
VARIABLE LABELS filter_$ 'Workforce EQ 1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
#analysis as required
FREQUENCIES VARIABLES = Q86
/BARCHART FREQ
/ORDER=ANALYSIS.
"""%(' '.join(string)) )
end program.
Many thanks to eli-k I probably should have just used splitfile.

Tabulate multiple variables with common prefix using a local macro

I have a number of variables whose name begins with the prefix indoor. What comes after indoor is not numeric (that would make everything simpler).
I would like a tabulation for each of these variables.
My code is the following:
local indoor indoor*
foreach i of local indoor {
tab `i' group, col freq exact chi2
}
The problem is that indoor in the foreach command resolves to indoor* and not to the list of the indoor questions, as I hoped. For this reason, the tab command is followed by too many variables (it can only handle two) and this results in an error.
The simple fix is to substitute the first command with:
local indoor <full list of indoor questions>
But this is what I would like to avoid, that is to have to find all the names for these variables and then paste them in the code. It seems there is a quicker fix for this but I can't think of any.
The trick is to use ds or unab to create the varlist expansion before asking Stata to loop over values in the foreach loop.
Here's an example of each:
******************! BEGIN EXAMPLE
** THIS FIRST SECTION SIMPLY CREATES SOME FAKE DATA & INDOOR VARS **
clear
set obs 10000
local suffix `c(ALPHA)'
token `"`suffix'"'
while "`1'" != "" {
g indoor`1'`2'`3' = 1+int((5-1+1)*runiform())
lab var indoor`1'`2'`3' "Indoor Values for `1'`2'`3'"
mac shift 1
}
g group = rbinomial(1,.5)
lab var group "GROUP TYPE"
** NOW, YOU SHOULD HAVE A BUNCH OF FAKE INDOOR
**VARS WITH ALPHA, NOT NUMERIC SUFFIXES
desc indoor*
**USE ds TO CREATE YOUR VARLIST FOR THE foreach LOOP:
ds indoor*
di "`r(varlist)'"
local indoorvars `r(varlist)'
local n 0
foreach i of local indoorvars {
**LET'S CLEAN UP YOUR TABLES A BIT WITH SOME HEADERS VIA display
local ++n
di in red "--------------------------------------------"
di in red "Table `n': `:var l `i'' by `:var l group'"
di in red "--------------------------------------------"
**YOUR tab TABLES
tab `i' group, col freq chi2 exact nolog nokey
}
******************! END EXAMPLE
OR using unab instead:
******************! BEGIN EXAMPLE
unab indoorvars: indoor*
di "`indoorvars'"
local n 0
foreach i of local indoorvars {
local ++n
di in red "--------------------------------------------"
di in red "Table `n': `:var l `i'' by `:var l group'"
di in red "--------------------------------------------"
tab `i' group, col freq chi2 nokey //I turned off exact to speed things up
}
******************! END EXAMPLE
The advantages of ds come into play if you want to select your indoor vars using a tricky selection rule, like selecting indoor vars based on information in the variable label or some other characteristic.
You could do this with
foreach i of var `indoor' {
tab `i' group, col freq exact chi2
}
This would work. It is almost identical to the code in the question.
unab indoor : indoor*
foreach i of local indoor {
tab `i' group, col freq exact chi2
}
foreach v of varlist indoo* {
do sth with `v'
}

Resources