How to call a customized proposal function in AdaptiveMCMC.jl - arrays

I've been trying to use a customized proposal distribution that generates the proposed arrays so we can use them and test or sample them in a Metropolis-Hastings algorithm, with a log_target function, i wrote the metropolis-hastings code manually and it works fine, tho it doesn't give the same satisfactory results as using Klara.jl which is a closed package now. The proposal distribution is written like this
using Distributions
nonneg(v) = all(v.>=0) ? true : false
struct OrthoNNDist <: DiscreteMultivariateDistribution
x0::Vector{Float64}
oc::Array{Float64,2}
x1s::Array
prob::Float64
#return a new uniform distribution with all vectors in x1s orthogonal to oc
function OrthoNNDist(x0::Vector{Float64}, oc::Array{Float64,2})
x1s = []
for i = 1:size(oc)[2]
x1 = x0 + oc[:, i]
if nonneg(x1)
push!(x1s, x1)
end
x1 = x0 - oc[:, i]
if nonneg(x1)
push!(x1s, x1)
end
end
new(x0, oc, x1s, 1.0/length(x1s))
end
end
Base.length(d::OrthoNNDist) = length(d.x0)
Distributions.rand(d::OrthoNNDist) = rand(d.x1s)
Distributions.pdf(d::OrthoNNDist, x::Vector) = x in d.x1s ? d.prob : 0.0
Distributions.pdf(d::OrthoNNDist) = fill(d.prob, size(d.x1s))
Distributions.logpdf(d::OrthoNNDist, x::Vector) = log(pdf(d, x))
you can test it using for example x0=[1.0, 1.0, 0.0,0.0, 0.0, 0.0, 1.0, 1.0] and mat = [0 1 0 0 1 1 0 0 1 0 0 0; 1 0 0 0 1 0 0 0 0 0 0 1; 1 0 1 0 0 0 0 1 0 0 1 0; 0 1 0 0 0 0 1 0 0 0 1 0; 0 0 0 1 0 0 0 1 1 0 0 0; 0 0 0 1 0 0 0 0 0 0 0 1; 0 0 1 0 0 1 0 0 0 1 0 0; 0 0 0 0 0 0 1 0 0 1 0 0]. we can fix the mat and let x0 to be the variable by writing for example proposal(x::Vector)= OrthoNNDist(x,mat).
If we have a log_target function in general called logtarget(x::Vector) and using this proposal distribution above and i want to use it in AdaptiveMCMC package or any other package that can be used in this case with a minimum example, i tried AdvancedMH but a part of my target function uses JuMP hence i can't get the gradient of the target, i've read the Mamba documentation but i couldn't understand it exactly i would with a minimum example in this case of a customized proposal function and a target function, AdaptiveMCMC looks more simple but i keep getting some MethodError regarding the distribution it's supposed to be a function but mine it's not as you can see in the code above. i can provide here an example of the code using Klara with comments.
function qrelay(alpha, delta, name)
n = 2
chi = fill(sqrt(0.06), n)
phi = im * tanh(chi)
omega = 1.0 / prod(cosh(chi))^2
syms, op = qrelay_op(n, phi, alpha, delta) #it gives an array
op_a, op_ab, mat, coef = op_mat(op) #array, array, matrice, array of coefficients
op_q2 = [syms.apH[1], syms.apV[1], syms.bpH[end], syms.bpV[end]] #array
op_q1 = [syms.apH[2:end]..., syms.apV[2:end]..., syms.bpH[1:end-1]..., syms.bpV[1:end-1]...] #array
mask_q1 = [op in op_q1 for op in op_a]; #array
mask_q2 = [op in op_q2 for op in op_a]; #array
qq = [x in syms.apH || x in syms.bpV ? 1 : 0 for x in op_a] #array
pdet0 = pdet_maker(0.04, 1e-5) #it gives a probability
qrs = QSampler(mat, coef, omega, pdet0) #calling a module QSampler
targetcache = Dict{Vector{Int}, Float64}()
function plogtarget(na::Vector{Int})
get!(targetcache, na) do
log(qrs.prob(qq, na, mask_q1) * qrs.prob(na))
end
end
# plogtarget(na::Vector{Int}) = log(qrs.prob(qq, na, mask_q1) * qrs.prob(na))
p = BasicDiscMuvParameter(:p, logtarget=plogtarget)
model = likelihood_model([p], isindexed=false)
sampler = MH(qrs.psetproposal, symmetric=false) #this is where the proposal function is called qrs.psetproposal(x::Vector)= qrs.OthoNNDist(x, mat)
mcrange = BasicMCRange(nsteps=2^20 + 2^10, burnin=2^10, thinning=2^5)
v0 = Dict(:p=>zeros(qq))
outopts = Dict{Symbol, Any}(
:monitor=>[:value, :logtarget],
:diagnostics=>[:accept],
:destination=>:iostream,
:filepath=>"$dataname/"*name
)
job = BasicMCJob(model, sampler, mcrange, v0, outopts=outopts)
funcQ(v) = qrs.prob(qq, v, mask_q2) #it returns a probability
return qrs, job, funcQ
end
This piece of code won't work of course because Klara is closed but i just put it here to give you a clearer view of how the code was working before.

Related

Creating Indicator matrix based on vector with group IDs

I have a vector of group IDs:
groups = [ 1 ; 1; 2; 2; 3];
which I want to use to create a matrix consisting of 1's in case the i-th and the j-th element are in the same group, and 0 otherwise. Currently I do this as follows:
n = size(groups, 1);
indMatrix = zeros(n,n);
for i = 1:n
for j = 1:n
indMatrix(i,j) = groups(i) == groups(j);
end
end
indMatrix
indMatrix =
1 1 0 0 0
1 1 0 0 0
0 0 1 1 0
0 0 1 1 0
0 0 0 0 1
Is there a better solution avoiding the nasty double for-loop? Thanks!
This can be done quite easily using implicit singleton expansion, for R2016b or later:
indMatrix = groups==groups.';
For MATLAB versions before R2016b you need bsxfun to achieve singleton expansion:
indMatrix = bsxfun(#eq, groups, groups.');

Storing values from a loop in a function in Matlab

I am writing a function in Matlab to model the length of stay in hospital of stroke patients. I am having difficulty in storing my output values.
Here is my function:
function [] = losdf(age, strokeType, dest)
% function to mdetermine length of stay in hospitaal of stroke patients
% t = time since admission (days);
% age = age of patient;
% strokeType = 1. Haemorhagic, 2. Cerebral Infarction, 3. TIA;
% dest = 5.Death 6.Nursing Home 7. Usual Residence;
alpha1 = 6.63570;
beta1 = -0.03652;
alpha2 = -3.06931;
beta2 = 0.07153;
theta0 = -8.66118;
theta1 = 0.08801;
mu1 = 22.10156;
mu2 = 2.48820;
mu3 = 1.56162;
mu4 = 0;
nu1 = 0;
nu2 = 0;
nu3 = 1.27849;
nu4 = 0;
rho1 = 0;
rho2 = 11.76860;
rho3 = 3.41989;
rho4 = 63.92514;
for t = 1:1:365
p = (exp(-exp(theta0 + (theta1.*age))));
if strokeType == 1
initialstatevec = [1 0 0 0 0 0 0];
elseif strokeType == 2
initialstatevec = [0 1 0 0 0 0 0];
else
initialstatevec = [0 0 (1-p) p 0 0 0];
end
lambda1 = exp(alpha1 + (beta1.*age));
lambda2 = exp(alpha2 + (beta2.*age));
Q = [ -(lambda1+mu1+nu1+rho1) lambda1 0 0 mu1 nu1 rho1;
0 -(lambda2+mu2+nu2+rho2) lambda2 0 mu2 nu2 rho2;
0 0 -(mu3+nu3+rho3) 0 mu3 nu3 rho3;
0 0 0 -(mu4+nu4+rho4) mu4 nu4 rho4;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0];
Pt = expm(t./365.*Q);
Pt = Pt(strokeType, dest);
Ft = sum(initialstatevec.*Pt);
Ft
end
end
Then to run my function I use:
losdf(75,3,7)
I want to plot my values of Ft in a graph from from 0 to 365 days. What is the best way to do this?
Do I need to store the values in an array first and if so what is the best way to do this?
Many ways to do this, one straightforward way is to save each data point to a vector while in the loop and plot that vector after you exit your loop.
...
Ft = zeros(365,1); % Preallocate Ft as a vector of 365 zeros
for t = 1:365
...
Ft(t) = sum(initialstatevec.*Pt); % At index "t", store your output
...
end
plot(1:365,Ft);

Libsvm Index exceeds matrix dimensions

The following libsvm matlab code keeps giving me an Index exceeds matrix dimensions after a few loops. Can anyone help me with where the error might be coming from?
testlabel = [1 1 1 1 1 1 1 1 1 1; 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0];
model = cell(3,1);
for n=1:3
model{n} = svmtrain(completelabel{n}, completefeatureVector{n}, '-b 1 -t 0');
end
numTest=10;
pr = zeros(numTest,2);
for k=1:numTest
for m=1:3
[~,~,p] = svmpredict(testlabel(m, k), featureVectortest{k}, model{m}, '-b 1');
pr(:,k) = p(:,model{m}.Label==m); %# probability of class==k
end
[~,predictedLabel] = max(pr,[],2);
end

In matlab, find the frequency at which unique rows appear in a matrix

In Matlab, say I have the following matrix, which represents a population of 10 individuals:
pop = [0 0 0 0 0; 1 1 1 0 0; 1 1 1 1 1; 1 1 1 0 0; 0 0 0 0 0; 0 0 0 0 0; 1 0 0 0 0; 1 1 1 1 1; 0 0 0 0 0; 0 0 0 0 0];
Where rows of ones and zeros define 6 different 'types' of individuals.
a = [0 0 0 0 0];
b = [1 0 0 0 0];
c = [1 1 0 0 0];
d = [1 1 1 0 0];
e = [1 1 1 1 0];
f = [1 1 1 1 1];
I want to define the proportion/frequency of a, b, c, d, e and f in pop.
I want to end up with the following list:
a = 0.5;
b = 0.1;
c = 0;
d = 0.2;
e = 0;
f = 0.2;
One way I can think of is by summing the rows, then counting the number of times each appears, and then sorting and indexing
sum_pop = sum(pop')';
x = unique(sum_pop);
N = numel(x);
count = zeros(N,1);
for l = 1:N
count(l) = sum(sum_pop==x(l));
end
pop_frequency = [x(:) count/10];
But this doesn't quite get me what I want (i.e. when frequency = 0) and it seems there must be a faster way?
You can use pdist2 (Statistics Toolbox) to get all frequencies:
indiv = [a;b;c;d;e;f]; %// matrix with all individuals
result = mean(pdist2(pop, indiv)==0, 1);
This gives, in your example,
result =
0.5000 0.1000 0 0.2000 0 0.2000
Equivalently, you can use bsxfun to manually compute pdist2(pop, indiv)==0, as in Divakar's answer.
For the specific individuals in your example (that can be identified by the number of ones) you could also do
result = histc(sum(pop, 2), 0:size(pop,2)) / size(pop,1);
There is some functionality in unique that can be used for this. If
[q,w,e] = unique(pop,'rows');
q is the matrix of unique rows, w is the index of the row first appears in the matrix. The third element e contains indices of q so that pop = q(e,:). Armed with this, the rest of the problem should be straight forward. The probability of a value in e should be the probability that this row appears in pop.
The counting can be done with histc
histc(e,1:max(e))/length(e)
and the non occuring rows can be found with
ismember(a,q,'rows')
There is of course other ways as well, maybe (probably) faster ways, or oneliners. Why I post this is because it provides a way that is easy to understand, readable and that does not require any special toolboxes.
EDIT
This example gives expected output
a = [0,0,0,0,0;1,0,0,0,0;1,1,0,0,0;1,1,1,0,0;1,1,1,1,0;1,1,1,1,1]; % catenated a-f
[q,w,e] = unique(pop,'rows');
prob = histc(e,1:max(e))/length(e);
out = zeros(size(a,1),1);
out(ismember(a,q,'rows')) = prob;
Approach #1
With bsxfun -
A = cat(1,a,b,c,d,e,f)
out = squeeze(sum(all(bsxfun(#eq,pop,permute(A,[3 2 1])),2),1))/size(pop,1)
Output -
out =
0.5000
0.1000
0
0.2000
0
0.2000
Approach #2
If those elements are binary numbers, you can convert them into decimal format.
Thus, decimal format for pop becomes -
>> bi2de(pop)
ans =
0
7
31
7
0
0
1
31
0
0
And that of the concatenated array, A becomes -
>> bi2de(A)
ans =
0
1
3
7
15
31
Finally, you need to count the decimal formatted numbers from A in that of pop, which you can do with histc. Here's the code -
A = cat(1,a,b,c,d,e,f)
out = histc(bi2de(pop),bi2de(A))/size(pop,1)
Output -
out =
0.5000
0.1000
0
0.2000
0
0.2000
I think ismember is the most direct and general way to do this. If your groups were more complicated, this would be the way to go:
population = [0,0,0,0,0; 1,1,1,0,0; 1,1,1,1,1; 1,1,1,0,0; 0,0,0,0,0; 0,0,0,0,0; 1,0,0,0,0; 1,1,1,1,1; 0,0,0,0,0; 0,0,0,0,0];
groups = [0,0,0,0,0; 1,0,0,0,0; 1,1,0,0,0; 1,1,1,0,0; 1,1,1,1,0; 1,1,1,1,1];
[~, whichGroup] = ismember(population, groups, 'rows');
freqOfGroup = accumarray(whichGroup, 1)/size(groups, 1);
In your special case the groups can be represented by their sums, so if this generic solution is not fast enough, use the sum-histc simplification Luis used.

How can I generate this matrix (containing only 0s and ±1s)?

I would like to generate matrix of size (n(n-1)/2, n) that looks like this (n=5 in this case):
-1 1 0 0 0
-1 0 1 0 0
-1 0 0 1 0
-1 0 0 0 1
0 -1 1 0 0
0 -1 0 1 0
0 -1 0 0 1
0 0 -1 1 0
0 0 -1 0 1
0 0 0 -1 1
This is what I, quickly, came up with:
G = [];
for i = 1:n-1;
for j = i+1:n
v = sparse(1,i,-1,1,n);
w = sparse(1,j,1,1,n);
vw = v+w;
G = [G; vw];
end
end
G = full(G);
It works, but is there a faster/cleaner way of doing it?
Use nchoosek to generate the indices of the columns that will be nonzero:
n = 5; %// number of columns
ind = nchoosek(1:n,2); %// ind(:,1): columns with "-1". ind(:,2): with "1".
m = size(ind,1);
rows = (1:m).'; %'// row indices
G = zeros(m,n);
G(rows + m*(ind(:,1)-1)) = -1;
G(rows + m*(ind(:,2)-1)) = 1;
You have two nested loops, which leads to O(N^2) complexity of non-vectorized operations, which is too much for this task. Take a look that your matrix actually has a rectursive pattern:
G(n+1) = [ -1 I(n)]
[ 0 G(n)];
where I(n) is identity matrix of size n. That's how you can express this pattern in matlab:
function G = mat(n)
% Treat original call as G(n+1)
n = n - 1;
% Non-recursive branch for trivial case
if n == 1
G = [-1 1];
return;
end
RT = eye(n); % Right-top: I(n)
LT = repmat(-1, n, 1); % Left-top: -1
RB = mat(n); % Right-bottom: G(n), recursive
LB = zeros(size(RB, 1), 1); % Left-bottom: 0
G = [LT RT; LB RB];
end
And it gives us O(N) complexity of non-vectorized operations. It probably will waste some memory during recursion and matrix composition if Matlab is not smart enought to factor these out. If it is critical, you may unroll recursion into loop and iteratively fill up corresponding places in your original pre-allocated matrix.

Resources