I have following bayesian network :
I was asked to find:
Value of P(b)
The solution
P(b) = ΣA={a,¬a} P(A)P(b|A)
= 0.1 × 0.5 + 0.9 × 0.8 = 0.77
and value of P(d/a)
The solution:
P (d|a) = ΣB={b,¬b} P (d|B)p(B|a)
= 0.9 × 0.5 + 0.2 × 0.5 = 0. 55
How did they come up with above formula?
What rule they have used to find marginal probability from bayesian network graph?
I understand basic joint probability distribution formula which is just product of individual probabilities given its parents.
Some explanation and resources relating to this will be helpful.
Thank you
I guess I found my answer.
It uses Marginal Probabilities.
The formula is:
P(X/Y) = ΣZ={all possible values of z} P(X/Y,Z)P(Z|Y)
Now you can easily find above two probabilities.
Related
Suppose I have two arrays representing a probabilistic graph:
2
/ \
1 -> 4 -> 5 -> 6 -> 7
\ /
3
Where the probability of going to state 2 is 0.81 and the probability of going to state 3 is (1-0.81) = 0.19. My arrays represent the estimated values of the states as well as the rewards. (Note: Each index of the array represents its respective state)
V = [0, 3, 8, 2, 1, 2, 0]
R = [0, 0, 0, 4, 1, 1, 1]
The context doesn't matter so much, it's just to give an idea of where I'm coming from. I need to write a k-step look ahead function where I sum the discounted value of rewards and add it to the estimated value of the kth-state.
I have been able to do this so far by creating separate functions for each step look ahead. My goal of asking this question is to figure out how to refactor this code so that I don't repeat myself and use idiomatic Julia.
Here is an example of what I am talking about:
function E₁(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + V[2]) + 0.19*(R[2] + V[3])
end
function E₂(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + R[3]) + 0.19*(R[2] + R[4]) + V[4]
end
function E₃(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + R[3]) + 0.19*(R[2] + R[4]) + R[5] + V[5]
end
.
.
.
So on and so forth. It seems that if I was to ignore E₁() this would be exceptionally easy to refactor. But because I have to discount the value estimate at two different states, I'm having trouble thinking of a way to generalize this for k-steps.
I think obviously I could write a single function that took an integer as a value and then use a bunch of if-statements but that doesn't seem in the spirit of Julia. Any ideas on how I could refactor this? A closure of some sort? A different data type to store R and V?
It seems like you essentially have a discrete Markov chain. So the standard way would be to store the graph as its transition matrix:
T = zeros(7,7)
T[1,2] = 0.81
T[1,3] = 0.19
T[2,4] = 1
T[3,4] = 1
T[5,4] = 1
T[5,6] = 1
T[6,7] = 1
Then you can calculate the probabilities of ending up at each state, given an intial distribution, by multiplying T' from the left (because usually, the transition matrix is defined transposedly):
julia> T' * [1,0,0,0,0,0,0] # starting from (1)
7-element Array{Float64,1}:
0.0
0.81
0.19
0.0
0.0
0.0
0.0
Likewise, the probability of ending up at each state after k steps can be calculated by using powers of T':
julia> T' * T' * [1,0,0,0,0,0,0]
7-element Array{Float64,1}:
0.0
0.0
0.0
1.0
0.0
0.0
0.0
Now that you have all probabilities after k steps, you can easily calculate expectations as well. Maybe it pays of to define T as a sparse matrix.
I'm running a simulation study on the effects of adding fractional numbers of successes and failures, which I'll call C, to mixed effects logistic regressions. I've simulated 2000 datasets and modeled each with 5 logistic regressions (adding an C of either 1, .5, .25, .1 and .05). The models converge on the majority of the datasets, but ~200 fail to converge when I add an C of .25 and ~50 fail to converge when I add an C of .5 (Sometimes I get a warning message and sometimes I get implausible standard errors). I very rarely see any evidence of non-convergence with the other values (I've looked at warning messages, standard errors and the ratio of highest to lowest eigenvalues in the random effects matrix). Even in the datasets that fail to converge when C = .25, slightly changing C often solves the problem, such as in this example (data sets available here: https://www.dropbox.com/sh/ro92mtjkpqwlnws/AADSVzcNvl0nnnzCEF5QGM6qa?oref=e&n=19939135)
m7 <- glmer(cbind(Data + .25, (10+.5- (Data + .25))) ~ Group*Condition + (1 + Condition |ID), family="binomial", data=df2)
Warning messages:
1: In eval(expr, envir, enclos) : non-integer counts in a binomial glm!
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
summary(m7)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(Data + 0.25, (10 + 0.5 - (Data + 0.25))) ~ Group * Condition + (1 + Condition | ID)
Data: df2
AIC BIC logLik deviance df.resid
7001.1 7040.0 -3493.6 6987.1 1913
Scaled residuals:
Min 1Q Median 3Q Max
-3.5444 -0.6387 0.0143 0.6945 2.9802
Random effects:
Groups Name Variance Std.Dev. Corr
ID (Intercept) 0.26598 0.5157
Condition 0.06413 0.2532 0.66
Number of obs: 1920, groups: ID, 120
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.760461 0.001226 1436.5 <2e-16 ***
Group -1.816952 0.001225 -1483.0 <2e-16 ***
Condition -0.383383 0.001226 -312.7 <2e-16 ***
Group:Condition -0.567517 0.001225 -463.2 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Group Condtn
Group 0.000
Condition 0.000 0.000
Group:Cndtn 0.000 0.000 0.000
m8 <- glmer(cbind(Data + .2, (10+.4- (Data + .2))) ~ Group*Condition + (1 + Condition |ID), family="binomial", data=df2)
Warning message:
In eval(expr, envir, enclos) : non-integer counts in a binomial glm!
summary(m8)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(Data + 0.2, (10 + 0.4 - (Data + 0.2))) ~ Group * Condition + (1 + Condition | ID)
Data: df2
AIC BIC logLik deviance df.resid
6929.3 6968.2 -3457.6 6915.3 1913
Scaled residuals:
Min 1Q Median 3Q Max
-3.5724 -0.6329 0.0158 0.6945 2.9976
Random effects:
Groups Name Variance Std.Dev. Corr
ID (Intercept) 0.2698 0.5194
Condition 0.0652 0.2553 0.66
Number of obs: 1920, groups: ID, 120
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.76065 0.07850 22.429 < 2e-16 ***
Group -1.81762 0.10734 -16.933 < 2e-16 ***
Condition -0.38111 0.06377 -5.977 2.28e-09 ***
Group:Condition -0.57033 0.08523 -6.692 2.21e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) Group Condtn
Group -0.732
Condition -0.033 0.025
Group:Cndtn 0.029 0.045 -0.758
As this is a simulation study, I'm not especially interested in making those models converge, but I'd like to understand why they're not converging. Does anybody have any ideas?
I've been looking for a while onto websearch, however, possibly or probably I am missing the right terminology.
I have arbitrary sized arrays of scalars ...
array = [n_0, n_1, n_2, ..., n_m]
I also have a function f->x->y, with 0<=x<=1, and y an interpolated value from array. Examples:
array = [1,2,9]
f(0) = 1
f(0.5) = 2
f(1) = 9
f(0.75) = 5.5
My problem is that I want to compute the average value for some interval r = [a..b], where a E [0..1] and b E [0..1], i.e. I want to generalize my interpolation function f->x->y to compute the average along r.
My mind boggles me slightly w.r.t. finding the right weighting. Imagine I want to compute f([0.2,0.8]):
array --> 1 | 2 | 9
[0..1] --> 0.00 0.25 0.50 0.75 1.00
[0.2,0.8] --> ^___________________^
The latter being the range of values I want to compute the average of.
Would it be mathematically correct to compute the average like this?: *
1 * (1-0.8) <- 0.2 'translated' to [0..0.25]
+ 2 * 1
avg = + 9 * 0.2 <- 0.8 'translated' to [0.75..1]
----------
1.4 <-- the sum of weights
This looks correct.
In your example, your interval's length is 0.6. In that interval, your number 2 is taking up (0.75-0.25)/0.6 = 0.5/0.6 = 10/12 of space. Your number 1 takes up (0.25-0.2)/0.6 = 0.05 = 1/12 of space, likewise your number 9.
This sums up to 10/12 + 1/12 + 1/12 = 1.
For better intuition, think about it like this: The problem is to determine how much space each array-element covers along an interval. The rest is just filling the machinery described in http://en.wikipedia.org/wiki/Weighted_average#Mathematical_definition .
In his answer to this question, John Feminella says:
It's possible to do this sub-quadratically if you get really fancy, by
representing each integer as a bit vector and performing a fast
Fourier transform, but that's beyond the scope of this answer.
What is the asymptotically optimal way of solving the problem described in that question?
Suppose we have an array 1 2 4. We represent this array as a polynomial f(x) = x^1 + x^2 + x^4. Let's look at f(x)^2, which is
x^2 + 2 x^3 + x^4 + 2 x^5 + 2 x^6 + x^8
The number of ways to write n as the sum of two elements of the array is the coefficient of x^n, and this is true in general. FFT gives us a way to multiply polynomials efficiently*, so basically what we do is compute f(x)^3 and look at the coefficient of the target number S.
The reason this algorithm doesn't solve the 3SUM problem is that the efficiency of an FFT multiply depends on the degree of the resulting polynomial and thus that the array values lie in a small range.
I'am reading this example, but Could you explain a little more, I dont get the part when it says "then we Normalize"...
I know
P(sun) * P(F=bad|sun) = 0.7*0.2 = 0.14
P(rain)* P(F=bad|rain) = 0.3*0.9 = 0.27
But where do they get
W P(W | F=bad)
-----------------
sun 0.34
rain 0.66
Example from
To normalize a list of numbers, you divide each by the sum of the list.
e.g. python
>>> v = [0.14, 0.27]
>>> s = sum(v)
>>> print s
0.41000000000000003
>>> vnorm = [n/s for n in v]
>>> print vnorm
[0.34146341463414637, 0.65853658536585369]