Fisher test for a given large sets of 'p' values using matlab? - arrays

The following problem is the extension of This problem
I have written the following codes :
load y; P = y;k = length(P);
% the following matrix is used to sum each 'n' elements in a row:
`n = 2; %For sum of n elements in a row
summer = diag(ones(k,1));
for d = 1:n-1
summer = summer + diag(ones(k-d,1),-d);
end
X = -2.*log(P(:).')*summer;`
The value of X I am getting is 'NAN' values for all given P datasets array(of size 200x1) but when I am testing this with say 10 values of P then it is working very fine and not giving me any 'NAN' values.Can anyone help me why I am not getting values for large datasets while for small number it is working...??

There are probably -Inf, Inf, or NaN values in your P vector to begin with. Based on the arithmetic operations being done this seems to be the only possible source of NaN values resulting in X

Related

Problem with creating a single row matrix

I have this for loop that creates a matrix of 1825 columns. This matrix contains a value that ranges from 0 to 1 with increments of 0.013888888889 and a step increment of 25 (so after 25 zeros it goes to 0.013888888889 and then after 25 times it increases again). My problem is that the last 25 columns containing 1s are not created. Instead of a matrix with 1x1825, I get a matrix of 1x1800 without the 1s. This is the code:
coutput=repmat([0], 1,25);
for n=0.013888888889:0.013888888889:1
coutput = [coutput repmat([n],1,25)];
end
The problem comes from your loop variable:
n = 0.013888888889:0.013888888889:1
Let's compare:
temp = 0:0.013888888889:1;
temp([1 2 end-1 end])
n = 72;
temp = linspace(0, 1, n+1);
temp([1 2 end-1 end])
We get:
ans =
0.00000 0.01389 0.97222 0.98611
ans =
0.00000 0.01389 0.98611 1.00000
Using your solution, we get a vector with dimensions 1x72, whereas the "more exact" version gives a vector with dimensions 1x73. So, in the end, it's some rounding issue.
The remaining part can be simplified by using your repmat and reshape:
k = 25;
out = reshape(repmat(temp, k, 1), 1, (n+1) * k)
(Output omitted here.)
Hope that helps!
As HansHirse indicated above, this is a rounding issue. You should not use floating-point values as loop indices.
The simple and correct way of implementing your loop is this:
for n=1:72
n = ii/72;
%...
end
But of course this code would be a lot more efficient using Hans’ repmat+reshape solution because it avoids repeatedly reallocating the output array inside the loop.

Matlab: Help in implementing quantized time series

I am having trouble implementing this code due to the variable s_k being logical 0/1. In what way can I implement this statement?
s_k is a random sequence of 0/1 generated using a rand() and quantizing the output of rand() by its mean given below. After this, I don't know how to implement. Please help.
N =1000;
input = randn(N);
s = (input>=0.5); %converting into logical 0/1;
UPDATE
N = 3;
tmax = 5;
y(1) = 0.1;
for i =1 : tmax+N-1 %// Change here
y(i+1) = 4*y(i)*(1-y(i)); %nonlinear model for generating the input to Autoregressive model
end
s = (y>=0.5);
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
x = sum(s(ind+1).*(2.^(-ind+N+1))); % The output of this conversion should be real numbers
% Autoregressive model of order 1
z(1) =0;
for j =2 : N
z(j) = 0.195 *z(j-1) + x(j);
end
You've generated the random logical sequence, which is great. You also need to know N, which is the total number of points to collect at one time, as well as a list of time values t. Because this is a discrete summation, I'm going to assume the values of t are discrete. What you need to do first is generate a sliding window matrix. Each column of this matrix represents a set of time values for each value of t for the output. This can easily be achieved with bsxfun. Assuming a maximum time of tmax, a starting time of 0 and a neighbourhood size N (like in your equation), we can do:
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
For example, assuming tmax = 5 and N = 3, we get:
ind =
0 1 2 3 4 5
1 2 3 4 5 6
2 3 4 5 6 7
Each column represents a time that we want to calculate the output at and every row in a column shows a list of time values we want to calculate for the desired output.
Finally, to calculate the output x, you simply take your s_k vector, make it a column vector, use ind to access into it, do a point-by-point multiplication with 2^(-k+N+1) by substituting k with what we got from ind, and sum along the rows. So:
s = rand(max(ind(:))+1, 1) >= 0.5;
x = sum(s(ind+1).*(2.^(-ind+N+1)));
The first statement generates a random vector that is as long as the maximum time value that we have. Once we have this, we use ind to index into this random vector so that we can generate a sliding window of logical values. We need to offset this by 1 as MATLAB starts indexing at 1.

Matlab Error A(I) = B

I am currently looking at Binomial Option Pricing. I have written the code below, which works fine, when you enter the variables in one at a time. However, entering each set of values is very tedious, and I need to be able to analyse a large set of data. I have created arrays for each of the variables. But, I keep getting the error; A(I) = B, the number of elements in B must equal I. The function is shown below.
function C = BinC(S0,K,r,sig,T,N);
% PURPOSE:
% To return the value of a European call option using the Binomial method
%-------------------------------------------------------------------------
% INPUTS:
% S0 - The initial price of the underlying asset
% K - The strike price
% r - The risk free rate of return, expressed as a decimal
% sig - The volatility of the underlying asset, expressed as a decimal
% T - The time to maturity, expressed as a decimal
% N - The number of steps
%-------------------------------------------------------------------------
dt = T/N;
u = exp(sig*sqrt(dt));
d = 1/u;
p = (exp(r*dt) - d)/(u - d);
S = zeros(N+1,1);
% Price of underlying asset at time T
for n = 1:N+1
S(n) = S0*(d^(N+1-n))*(u^(n-1));
end
% Price of Option at time T
for n = 1:N+1
C(n) = max(S(n)- K, 0);
end
% Backtrack to get option price at time 0
for i = N:-1:1
for n = 1:i
C(n) = exp(-r*dt)*(p*C(n+1) + (1-p)*C(n));
end
end
disp(C(1))
After importing my data, I entered this in to the command window.
for i=1:20
w(i)= BinC(S0(i),K(i),r(i),sig(i),T(i),N(i));
end
When I enter w, all I get back is w = []. I have no idea how I can make A(I) = B. I apologise, if this is a very silly question, but I am new to Matlab and in need of help. Thanks
Your function computes an entire vector C, but displays only C(1). This display is deceptive: it makes you think the function is returning a scalar, but it's not: it's returning the entire vector C, which you try to store into a scalar location.
The solution is simple: Change your function definition to this (rename the output variable):
function out = BinC(S0,K,r,sig,T,N);
Then at the last line of the function, remove the disp, and replace it with
out = C(1);
To verify all of this (compare with your non-working example), try calling it by itself at the command line, and examine the output.

Optimize parameters of a pairwise distance function in Matlab

This question is related to matlab: find the index of common values at the same entry from two arrays.
Suppose that I have an 1000 by 10000 matrix that contains value 0,1,and 2. Each row are treated as a sample. I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a,b,c,d can treated as as the row vector of length 10000 according to some definition and p=10000. c and d are probabilities such that c+d=1.
An example of how to find the values of a,b,c,d: suppose we want to find d between sample i and bj, then I look at row i and j.
If kth entry of row i and j has value 2 and 2, then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case).
If kth entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4.
The similar assignment will give to the case for 2,0(a=0,b=0,c=1/2,d=1/2),1,1(a=1,b=1,c=1/2,d=1/2),1,0(a=0,b=1,c=1/4,d=3/4),0,0(a=0,b=2,c=0,d=1).
The matlab code I have so far is using for loops for i and j, then find the cases above by using find, then create two arrays for a/c and b/d. This is extremely slow, is there a way that I can improve the efficiency?
Edit: the distance d is the formula given in this paper on page 13.
Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. Figuring out the formulae was fun. I flipped things around a bit to minimise division, and since I wasn't aware of pdist until #horchler's comment, you get it wrapped in loops with the constants factored out:
% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
for jj=ii+1:n
a = min(m(ii,:), m(jj,:));
b = 2 - max(m(ii,:), m(jj,:));
c = 4 ./ (m(ii,:) + m(jj,:));
c(c == Inf) = 0;
d = 1 - c;
distance(ii,jj) = sum(a.*c + b.*d);
% distance(jj,ii) = distance(ii,jj); % optional for the full matrix
end
end
distance = 1 - (1 / (2 * p)) * distance;

Matlab Assigning Elements to Array in loop

I have this loop which generates a vector "Diff". How do I place the values of Diff in an array that records all the Diff's generated? The problem is that the length of Diff should be a fixed length (36) which is the width of the table "CleanPrice". But because col_set varies in length (according to the number of NaNs in the data it is reading), then Diff also varies in length. What I need it to do is assign the answers generated according to their appropriate column number. i.e. row(i) of diff should contain col(i) where all other rows in Diff should be assigned a "0" or "NaN". Basically I need DiffArray to be a (nTrials x 36) array where each row is the (36 x 1) DiffArray generated. At the moment though, each time the length of col changes, I get the following error:
??? Subscripted assignment dimension mismatch.
Error in ==> NSSmodel
at 41 DiffMatrix(end+1,:)=Diff
This is my code:
DiffArray=[];
StartRow=2935;
EndRow=2940;
nTrials=EndRow-StartRow;
for row=StartRow:EndRow;
col_set=find(~isnan(gcm3.data.CleanPrice(row,1:end)));
col=col_set(:,2:end);
CleanPrices=transpose(gcm3.data.CleanPrice(row,col));
Maturity=gcm3.data.CouponandMaturity(col-1,2);
SettleDate=gcm3.data.CouponandMaturity(row,3);
Settle = repmat(SettleDate,[length(Maturity) 1]);
CleanPrices =transpose(gcm3.data.CleanPrice(row,col));
CouponRate = gcm3.data.CouponandMaturity(col-1,1);
Instruments = [Settle Maturity CleanPrices CouponRate];
PlottingPoints = gcm3.data.CouponandMaturity(1,2):gcm3.data.CouponandMaturity(36,2);
Yield = bndyield(CleanPrices,CouponRate,Settle,Maturity);
SvenssonModel = IRFunctionCurve.fitSvensson('Zero',SettleDate,Instruments)
ParYield=SvenssonModel.getParYields(Maturity);
[PriceActual, AccruedIntActual] = bndprice(Yield, CouponRate, Settle, Maturity);
[PriceNSS, AccruedIntNSS] = bndprice(ParYield, CouponRate, Settle, Maturity);
Diff=PriceActual-PriceNSS
DiffArray(end+1,:)=Diff
end
I looked at num2cell in this post but wasn't sure how to apply it correctly and started getting errors relating to that instead.
Is it correct to say you want to add an 'incomplete' row to DiffArray? If you know exactly where each element should go you could maybe do something like this:
indices = [1:7; 2:8; 3:9; [1 2 3 6 7 8 10]];
Diff = rand(4, 7);
DiffArray = zeros(4, 10) * NaN;
for row = 1:4
DiffArray(row, indices(row, :)) = Diff(row,:);
end
of course in your case you would be calculating Diff and Index (a row vector) inside the loop and not using preassigned arrays. The above is just to illustrate how to use an indexing vector to position a short row in a matrix.

Resources