Binomial logistic regression: Interpreting multiple categorical interaction results (Refrence group) - logistic-regression

I have a question about interpreting thresults of binomial logistic regression.
I added interaction terms between two categorical variables. Each variable had 3 categories so the interaction means 9 categories to examine.
Let's say:
Variable 1 categories: A, B, C
Variable 2 categories; X, Y, Z
The results are as below:
Reference group: A & X
B & Y: Odds-ratio = 4.2 (p < .05)
B & Z: Odds-ratio = 4.4 (p < .05)
C & Y: Odds-ratio = 5.9 (p < .05)
C & Z: Odds-ratio = 5.2 (p < .05)
In this case, is it correct to interpret that the case B & Y leading to the outcome variable is 4.2 times more likely than B & X?
Or, is it that the case B & Y leading to the outcome variable is 4.2 times more likely than A & X?
Thanks a lot in advance!

Related

recursive brute force implementation in 2d array

i am tackling on a problem. i have gotten stuck, so i decided to ask here. so, the problem is, given n team and their points respectively of a world cup group. determine whether the set is possible or not. each team plays with every other team in the group once. hence, each group plays (n-1) times. for 1<=n<=5. in a match if a team win, they'll get 3 points, if lose 0 points, and tied, 1 point. my idea of the solution is using 2d(n x n) array which act like a scoreboard.
A B C D E //column
A X 1 3 0 1 //r
B 1 X 0 1 0 //o
C 0 3 X 0 3 //w
D 3 1 3 X 1
E 1 3 0 1 X
so for every column and row representing one distinct team in a multiplication table fashion(team in column 1(a) is same as team row 1(A), and so on)note that the alphabet above and beside the array(A,B..) isn't included, just for clearance. every intersection between a row and a column is representing a match, except intersection between same column and row. e.g. column 1, row 2, means team A tied against team B, column 2, row 1 means team B tied against A.
my idea is to use recursive brute-force-wise algorithm to check every possibilities. i have developed one, it's work good enough in 4 teams setting, but doesn't so well for 5. so the algorithm work like starting from column 2 row 1 check 1 out of 3 possibility then crawl to the bottom-side and right-side of it and repeat through the second last column, and last row.
you may have noticed that x diagonal act like mirror. when we change column 1 row 3(A against C) to win, we must change column 3 row 1(C against A) to lose simultaneously. here some part of my code
/*
* scoreBoard[][] array <- the array which i have described above
* scores[] array <- store the given score
* x <- current column
* y <- current row
* n <- gnumber of team
*/
bool Solve(int x, int y, int scoreBoard[][5], int scores[], int n)
{
bool con1, con2, con3;
if((x < y)&&(y < n)) {
scoreBoard[x][y] = 3;//win-lose - possibiiity 1
scoreBoard[y][x] = 0;
//crawl to the right and bottom side array
con1 = (Solve( x + 1, y, scoreBoard, scores, n)) || (Solve( x, y + 1, scoreBoard, scores, n));
scoreBoard[x][y] = 0;//lose-win - possibility 2
scoreBoard[y][x] = 3;
con2 = (Solve( x + 1, y, scoreBoard, scores, n)) || (Solve( x, y + 1, scoreBoard, scores, n));
scoreBoard[x][y] = 1;//tied - possibility 3
scoreBoard[y][x] = 1;
//crawl to the right and bottom side array
con3 = (Solve( x + 1, y, scoreBoard, scores, n)) || (Solve( x, y + 1, scoreBoard, scores,n));
return con1 || con2 || con3;
} else {
if((x==y)&&(y==n-1))
return CheckArr(scoreBoard, scores, n); //to check whether the current array equal with the given score or not
else
return 0;
}
}
i presume, the problem is that this algorithm does not cover every possibility, because it work on(give the expected output for some, and dont so for other) a few 5 team setting possiblity. but i haven't managed how to fix it.
thanks in advance for every suggestion, and helpful link, also, i'll welcome any other strategy. hope this clear enough.

Fisher test for a given large sets of 'p' values using matlab?

The following problem is the extension of This problem
I have written the following codes :
load y; P = y;k = length(P);
% the following matrix is used to sum each 'n' elements in a row:
`n = 2; %For sum of n elements in a row
summer = diag(ones(k,1));
for d = 1:n-1
summer = summer + diag(ones(k-d,1),-d);
end
X = -2.*log(P(:).')*summer;`
The value of X I am getting is 'NAN' values for all given P datasets array(of size 200x1) but when I am testing this with say 10 values of P then it is working very fine and not giving me any 'NAN' values.Can anyone help me why I am not getting values for large datasets while for small number it is working...??
There are probably -Inf, Inf, or NaN values in your P vector to begin with. Based on the arithmetic operations being done this seems to be the only possible source of NaN values resulting in X

Using an "or" operator between variables for a loop in Stata

I have a set of variables that are string variables. For each value in the string, I create a series of binary (0, 1) variables.
Let's say my variables are Engine1 Engine2 Engine3.
The possible values are BHM, BMN, HLC, or missing (coded as ".").
The values of the variables are mutually exclusive, except missing.
In a hypothetical example, to write the new variables, I would write the following code:
egen BHM=1 if Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"`
replace BHM=0 if BHM==.
gen BMN=1 if Engine1=="BMN"|Engine2=="BMN"|Engine3=="BMN"`
replace BMN=0 if BMN==.
gen HLC=1 if Engine1=="HLC"|Engine2=="HLC"|Engine3=="HLC"
replace HLC=0 if HLC==.
How could I rewrite this code in a loop? I don't understand how to use the "or" operator | in a loop.
First note that egen is a typo for gen in your first line.
Second, note that
gen BHM=1 if Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"
replace BHM=0 if BHM==.
can be rewritten in one line:
gen BHM = Engine1=="BHM"|Engine2=="BHM"|Engine3=="BHM"
Now learn about the handy inlist() function:
gen BHM = inlist("BHM", Engine1, Engine2, Engine3)
If that looks odd, it's because your mathematics education led you to write things like
if x = 1 or y = 1 or z = 1
but only convention stops you writing
if 1 = x or 1 = y or 1 = z
The final trick is to write a loop:
foreach v in BHM BMN HLC {
gen `v' = inlist("`v'", Engine1, Engine2, Engine3)
}
It's not clear what you are finding difficult about |. Your code was fine in that respect.
An bug often seen in learner code is like
gen y = 1 if x == 11|12|13
which is legal Stata but almost never what you want. Stata parses it as
gen y = 1 if (x == 11)|12|13
and uses its rule that non-zero arguments mean true in true-or-false evaluations. Thus y is 1 if
x == 11
or
12 // a non-zero argument, evaluates as true regardless of x
or
13 // same comment
The learner needs
gen y = 1 if (x == 11)|(x == 12)|(x == 13)
where the parentheses can be omitted. That's repetitive, so
gen y = 1 if inlist(x, 11, 12, 13)
can be used instead.
For more on inlist() see
articles here
and
here Section 2.2
and
here.
For more on true and false in Stata, see this FAQ

Optimize parameters of a pairwise distance function in Matlab

This question is related to matlab: find the index of common values at the same entry from two arrays.
Suppose that I have an 1000 by 10000 matrix that contains value 0,1,and 2. Each row are treated as a sample. I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a,b,c,d can treated as as the row vector of length 10000 according to some definition and p=10000. c and d are probabilities such that c+d=1.
An example of how to find the values of a,b,c,d: suppose we want to find d between sample i and bj, then I look at row i and j.
If kth entry of row i and j has value 2 and 2, then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case).
If kth entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4.
The similar assignment will give to the case for 2,0(a=0,b=0,c=1/2,d=1/2),1,1(a=1,b=1,c=1/2,d=1/2),1,0(a=0,b=1,c=1/4,d=3/4),0,0(a=0,b=2,c=0,d=1).
The matlab code I have so far is using for loops for i and j, then find the cases above by using find, then create two arrays for a/c and b/d. This is extremely slow, is there a way that I can improve the efficiency?
Edit: the distance d is the formula given in this paper on page 13.
Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. Figuring out the formulae was fun. I flipped things around a bit to minimise division, and since I wasn't aware of pdist until #horchler's comment, you get it wrapped in loops with the constants factored out:
% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
for jj=ii+1:n
a = min(m(ii,:), m(jj,:));
b = 2 - max(m(ii,:), m(jj,:));
c = 4 ./ (m(ii,:) + m(jj,:));
c(c == Inf) = 0;
d = 1 - c;
distance(ii,jj) = sum(a.*c + b.*d);
% distance(jj,ii) = distance(ii,jj); % optional for the full matrix
end
end
distance = 1 - (1 / (2 * p)) * distance;

inconsistent results using isreal

Take this simple example:
a = [1 2i];
x = zeros(1,length(a));
for n=1:length(a)
x(n) = isreal(a(n));
end
In an attempt to vectorize the code, I tried:
y = arrayfun(#isreal,a);
But the results are not the same:
x =
1 0
y =
0 0
What am I doing wrong?
This certainly appears to be a bug, but here's a workaround:
>> y = arrayfun(#(x) isreal(x(1)),a)
ans =
1 0
Why does this work? I'm not totally sure, but it appears that when you perform an indexing operation on the variable before calling ISREAL it removes the "complex" attribute from the array element if the imaginary component is zero. Try this in the Command Window:
>> a = [1 2i]; %# A complex array
>> b = a(1); %# Indexing element 1 removes the complex attribute...
>> c = complex(a(1)); %# ...but we can put that attribute back
>> whos
Name Size Bytes Class Attributes
a 1x2 32 double complex
b 1x1 8 double %# Not complex
c 1x1 16 double complex %# Still complex
Apparently, ARRAYFUN must internally maintain the "complex" attribute of the array elements it passes to ISREAL, thus treating them all as being complex numbers even if the imaginary component is zero.
It might help to know that MATLAB stores the real/complex parts of a matrix separately. Try the following:
>> format debug
>> a = [1 2i];
>> disp(a)
Structure address = 17bbc5b0
m = 1
n = 2
pr = 1c6f18a0
pi = 1c6f0420
1.0000 0 + 2.0000i
where pr is a pointer to the memory block containing the real part of all values, and pi pointer to the complex part of all values in the matrix. Since all elements are stored together, then in this case they all have a complex part.
Now compare these two approaches:
>> arrayfun(#(x)disp(x),a)
Structure address = 17bbcff8
m = 1
n = 1
pr = 1bb8a8d0
pi = 1bb874d0
1
Structure address = 17c19aa8
m = 1
n = 1
pr = 1c17b5d0
pi = 1c176470
0 + 2.0000i
versus
>> for n=1:2, disp(a(n)), end
Structure address = 17bbc930
m = 1
n = 1
pr = 1bb874d0
pi = 0
1
Structure address = 17bbd180
m = 1
n = 1
pr = 1bb874d0
pi = 1bb88310
0 + 2.0000i
So it seems that when you access a(1) in the for loop, the value returned (in the ans variable) has a zero complex-part (null pi), thus is considered real.
One the other hand, ARRAYFUN seems to be directly accessing the values of the matrix (without returning them in ANS variable), thus it has access to both pr and pi pointers which are not null, thus are all elements are considered non-real.
Please keep in mind this just my interpretation, and I could be mistaken...
Answering really late on this one... The MATLAB function ISREAL operates in a really rather counter-intuitive way for many purposes. It tells you if a given array taken as a whole has no complex part at all - it tells you about the storage, it doesn't really tell you anything about the values in the array. It's a bit like the ISSPARSE function in that regard. So, for example
isreal(complex(1)) % returns FALSE
What you'll find in MATLAB is that certain operations automatically trim any all-zero imaginary parts. So, for example
x = complex(1);
isreal(x); % FALSE, we just forced there to be an imaginary part
isreal(x(1)); % TRUE - indexing realised it could drop the zero imaginary part
isreal(x(:)); % FALSE - "(:)" indexing is just a reshape, not real indexing
In short, MATLAB really needs a function which answers the question "does this value have zero imaginary part", in an elementwise way on an array.

Resources