Extract coefficient and p-value for certain variable from regression loop - loops

In Stata have applied a regression loop to 1000 metabolites (outcome), and the exposure variable is BMI. I also have other variables in the model. I would like to know how I can extract only the coefficient, p-value, and 95% CI for BMI if and only aif BMI is significant. And then I want to extract them into an Excel file.
This is the code I have used. It informed me that there were, for example, 100 significant results. So I'm trying to figure out which 100 are those and extract them for BMI only, without other variables in the model.
local counter = 0
local counter_pos = 0
local counter_neg = 0
foreach outcome of varlist B - Z {
regress `outcome' bmi Age i.sex i.smoking i.lpa2c i.cholestrol
matrix M = r(table)
if M[4, 1] < 0.05 {
local ++counter
if _b[bmi] < 0 {
local ++counter_neg
}
else {
local ++counter_pos
}
}
}
display as text "Total of significant results: " as result `counter'

Here is a reproducible example showing how to send a variable name and some results to a new file. In your case, posting is conditional on a conventionally significant result; here it is unconditional.
sysuse auto, clear
local counter = 0
local negative = 0
local positive = 0
tempname RESULTS
postfile `RESULTS' str32 varname coefficient using myresults.dta, replace
foreach v in price mpg rep78 headroom trunk length turn displacement gear_ratio {
quietly regress `v' weight
local ++counter
if _b[weight] < 0 local ++negative
else local ++positive
post `RESULTS' ("`v'") (_b[weight])
}
di "variables tried: " `counter'
di "negative relation: " `negative'
di "positive relation: " `positive'
postclose `RESULTS'
use myresults, clear
compress
list

Related

Python-constraint variables that take their name instead of the value

I want to multiply the range of each variable because I can't use more than 5 kg and 50 euros so I multiply the weight of each product and its value but instead, the program returns me an error that it's taking the value an instead of the range.
from constraint import *
problem = Problem()
problem.addVariable("a",range(0,51))
problem.addVariable("b",range(0,51))
problem.addVariable("c",range(0,11))
problem.addVariable("d",range(0,6))
problem.addConstraint(MaxSumConstraint(5000),['a'*340, 'b'*120,'c'*105,'d'*300])
problem.addConstraint(MaxSumConstraint(50),['a'*2,'b','c'*4,'d'*5])
soluciones = problem.getSolutions()
for solucion in soluciones:
solucion_string = ""
for i in range(4):
solucion_string += "("+str(i)+","+str(solucion[i])+")"
print(solucion_string)
print(len(soluciones))
I want to use the value of the range of each variable and multiply it.

how to create a loop for a macro in stata?

I have a very large dataset but to cut it short I demonstrated the data with the following example:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(patid death dateofdeath)
1 0 .
2 0 .
3 0 .
4 0 .
5 1 15007
6 0 .
7 0 .
8 1 15526
9 0 .
10 0 .
end
format %d dateofdeath
I am trying to sample for a case-control study based on date of death. At this stage, I need to first create a variable with each date of death repeated for all the participants (hence we end up with a dataset with 20 participants) and a pairid equivalent to the patient id patid of the corresponding case.
I created a macro for one case (which works) but I am finding it difficult to have it repeated for all cases (where death==1) in a loop.
The successful macro is as follows:
local i "5" //patient id who died
gen pairid= `i'
gen matchedindexdate = dateofdeath
replace matchedindexdate=0 if pairid != patid
gsort matchedindexdate
replace matchedindexdate= matchedindexdate[_N]
format matchedindexdate %d
save temp`i'
and the loop I attempted is:
* (min and max patid id)
forval j = 1/10 {
count if patid == `j' & death==1
if r(N)=1 {
gen pairid= `j'
gen matchedindexdate = dateofdeath
replace matchedindexdate=0 if pairid != patid
gsort matchedindexdate
replace matchedindexdate= matchedindexdate[_N]
save temp/matched`j'
}
}
use temp/matched1, clear
forval i=2/10 {
capture append using temp/matched`i'
save matched, replace
}
but I get:
invalid syntax
How can I do the loop?
I finally had it solved, please check:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1591811-how-to-create-a-loop-for-a-macro

what is the difference between SAS ARRAY and SAS IF-THEN

I have a table with students exams scores;
veriables: name, score1, score2, score3 and gender
wherever there is a missing value in one of the scores,
the score is set to 999.
I want to transform all 999's to missing (.) values.
I realized there are 2 main ways and I would like to know the MAIN difference between them.
As written above, both give the same output:
first:
data try ;
set mis_999 ;
if score1 = 999 then score1 = . ;
if score2 = 999 then score2 = . ;
if score3 = 999 then score3 = . ;
run ;
second (with array):
data array_try ;
set mis_999 ;
array try2{*} score1-score3 ;
do i=1 to dim(try2) ;
if try2(i) = 999 then try2(i) = . ;
end ;
run ;
For that example the main difference is that the code using an array is easier to expand to more variables.
In your first example you have what is referred to as wallpaper code, a lot of code that repeats the same pattern. If you have 500 variables instead of 3 you would need to write 500 statements. But with the array method you would just need to change the list of variables in the array definition. The DO loop would be the same.

matlab complex for-loop correlation calcul

This is the script that I have. It works till the ------ separation. Under I do not get any error from Matlab, but neither do I get a return of bestDx nor bestDy. Please help. (The first part is given just to put you in context)
%%
% Variables after running script Read_eA3_file.m
%date_time_UTC
%reflectivity
%clutter_mask
%Convert units
dBZ = reflectivity * 0.375 - 30;
dBZ_Mask = clutter_mask * 0.375 - 30;
%Replace clutter values with NaN
weather = NaN(size(dBZ)); %initialise to constant
weather(dBZ>=dBZ_Mask) = dBZ(dBZ>=dBZ_Mask); %copy values when A >= B
%Reduce to range -- those are 384x384 arrays
dBZ_range = dBZ(:,:,1:16); %16:18 to 16:23 included
weather_range = weather(:,:,1:16); %16:18 to 16:23 included
weather1618 = weather(:,:,1); %16:18 map only
weather1623 = weather(:,:,16); %16:23 map only
% Plot maps
image(imrotate(-weather1618,90)); %of 16:18
image(imrotate(-weather1623,90)); %of 16:23
%Find x,y of strongest dBZ
%Since the value are all negative. I look for their minimun
[M,I] = min(weather1618(:)); %for 16:18
[I_row, I_col] = ind2sub(size(weather1618),I); %values are 255 and 143
[M2,I2] = min(weather1623(:)); %for 16:23
[I2_row, I2_col] = ind2sub(size(weather1623),I2); %values are 223 and 7
%Calc displacement
%I get a value of 139.7140
max_displ=sqrt((I2_row-I_row)^2+(I2_col-I_col)^2); %between 1618 and 1623
%%
% -----Section below does not work; ONLY RUN the section ABOVE---------
%% Find Dx Dy for max_corr between two maps
maxCoeff=0;
weather1618Modified = zeros(384,384); %create weather array for time range
%weather1618Modified(:) = {NaN}; % Matlab cannot mix cell & double
%%
for x = 1:384
for y = 1:384
%30 pixel appx.
for Dx = -max_displ:30: max_displ
for Dy = -max_displ:30: max_displ
%Limit range of x+Dx and y+Dy to 1:384
if x+Dx<1 | y+Dy<1 | x+Dx>384 | y+Dy>384
continue
%weather1618Modified is the forecasted weather1823
weather1618Modified(x+Dx,y+Dy) = weather1618(x,y)
%Find the best correlation; Is corrcoef the right formula?
newCoeff=corrcoef(weather1623,weather1618Modified);
if newCoeff>maxCoeff
maxCoeff=newCoeff;
bestDx=Dx;
bestDy=Dy;
end
end
end
end
end
end
%% Calc displacement
bestDispl = sqrt(bestDx^2+bestDy^2); %bestDispl for a 5 min frame
%Calc speed
speed = bestDispl/time;
You have to delete the continue statement after the first if (or place it somewhere else).
The continue statement makes the program skip the remaining part of the for-loop and go directly to the next iteration. Therefore bestDx and bestDy will never be set.
Documentation: https://se.mathworks.com/help/matlab/ref/continue.html

SPSS: using IF function with REPEAT when each case has multiple linked instances

I have a dataset as such:
Case #|DateA |Drug.1|Drug.2|Drug.3|DateB.1 |DateB.2 |DateB.3 |IV.1|IV.2|IV.3
------|------|------|------|------|--------|---------|--------|----|----|----
1 |DateA1| X | Y | X |DateB1.1|DateB1.2 |DateB1.3| 1 | 0 | 1
2 |DateA2| X | Y | X |DateB2.1|DateB2.2 |DateB2.3| 1 | 0 | 1
3 |DateA3| Y | Z | X |DateB3.1|DateB3.2 |DateB3.3| 0 | 0 | 1
4 |DateA4| Z | Z | Z |DateB4.1|DateB4.2 |DateB4.3| 0 | 0 | 0
For each case, there are linked variables i.e. Drug.1 is linked with DateB.1 and IV.1 (Indicator Variable.1); Drug.2 is linked with DateB.2 and IV.2, etc.
The variable IV.1 only = 1 if Drug.1 is the case that I want to analyze (in this example, I want to analyze each receipt of Drug "X"), and so on for the other IV variables. Otherwise, IV = 0 if the drug for that scenario is not "X".
I want to calculate the difference between DateA and DateB for each instance where Drug "X" is received.
e.g. In the example above I want to calculate a new variable:
DateDiffA1_B1.1 = DateA1 - DateB1.1
DateDiffA1_B2.1 = DateA1 - DateB2.1
DateDiffA1_B1.3 = DateA1 - DateB1.3
DateDiffA1_B2.3 = DateA1 - DateB2.3
DateDiffA1_B3.3 = DateA1 - DateB3.3
I'm not sure if this new variable would need to be linked to each instance of Drug "X" as for the other variables, or if it could be a single variable that COUNTS all the instances for each case.
The end goal is to COUNT how many times each case had a date difference of <= 2 weeks when they received Drug "X". If they did not receive Drug "X", I do not want to COUNT the date difference.
I will eventually want to compare those who did receive Drug "X" with a date difference <= 2 weeks to those who did not, so having another indicator variable to help separate out these specific patients would be beneficial.
I am unsure about the best way to go about this; I suspect it will require a combination of IF and REPEAT functions using the IV variable, but I am relatively new with SPSS and syntax and am not sure how this should be coded to avoid errors.
Thanks for your help!
EDIT: It seems like I may need to use IV as a vector variable to loop through the linked variables in each case. I've tried the syntax below to no avail:
DATASET ACTIVATE DataSet1.
vector IV = IV.1 to IV.3.
loop #i = .1 to .3.
do repeat DateB = DateB.1 to DateB.3
/ DrugDateDiff = DateDiff.1 to DateDiff.3.
if IV(#i) = 1
/ DrugDateDiff = datediff(DateA, DateB, "days").
end repeat.
end loop.
execute.
Actually there is no need to add the vector and the loop, all you need can be done within one DO REPEAT:
compute N2W=0.
do repeat DateB = DateB.1 to DateB.3 /IV=IV.1 to IV.3 .
if IV=1 and datediff(DateA, DateB, "days")<=14 N2W = N2W + 1.
end repeat.
execute.
This syntax will first put a zero in the count variable N2W. Then it will loop through all the dates, and only if the matching IV is 1, the syntax will compare them to dateA, and add 1 to the count if the difference is <=2 weeks.
if you prefer to keep the count variable as missing when none of the IV are 1, instead of compute N2W=0. start the syntax with:
If any(1, IV.1 to IV.3) N2W=0.

Resources