In matlab, say I have the following data:
data = [4 0.1; 6 0.5; 3 0.8; 2 1.4; 7 1.6; 12 1.8; 9 1.9; 1 2.3; 5 2.5; 5 2.6];
I want to place the 1st column into bins according to elements in the 2nd column (i.e. 0-1, 1-2, 2-3...), and calculate the mean and 95% confidence interval of the elements in column 1 within that bin . So I'd have a matrix something like this:
mean lower_95% upper_95% bin
4.33 0
7.5 1
3.67 2
You can use accumarray with the appropriate function for the mean (mean) or the quantiles (quantile):
m = accumarray(floor(data(:,2))+1, data(:,1), [], #mean);
l = accumarray(floor(data(:,2))+1, data(:,1), [], #(x) quantile(x,.05));
u = accumarray(floor(data(:,2))+1, data(:,1), [], #(x) quantile(x,.95));
result = [m l u (0:numel(m)-1).'];
This can also be done calling accumarray once with cell array output:
result = accumarray(floor(data(:,2))+1, data(:,1), [],...
#(x) {[mean(x) quantile(x,.05) quantile(x,.95)]});
result = cell2mat(result);
For your example data,
result =
4.3333 3.0000 6.0000 0
7.5000 2.0000 12.0000 1.0000
3.6667 1.0000 5.0000 2.0000
This outputs a matrix with the labelled columns. Note that for your example data, 2 standard deviations from the mean (for the 95% confidence interval) gives values outside of the bands. With a larger (normally distributed) data set, you wouldn't see this.
Your data:
data = [4 0.1; 6 0.5; 3 0.8; 2 1.4; 7 1.6; 12 1.8; 9 1.9; 1 2.3; 5 2.5; 5 2.6];
Binning for output table:
% Initialise output matrix. Columns:
% Mean, lower 95%, upper 95%, bin left, bin right
bins = [0 1; 1 2; 2 3];
out = zeros(size(bins,1),5);
% Cycle through bins
for ii = 1:size(bins,1)
% Store logical array of which elements fit in given bin
% You may want to include edge case for "greater than or equal to" leftmost bin.
% Alternatively you could make the left bin equal to "left bin - eps" = -eps
bin = data(:,2) > bins(ii,1) & data(:,2) <= bins(ii,2);
% Calculate mean, and mean +- 2*std deviation for confidence intervals
out(ii,1) = mean(data(bin,2));
out(ii,2) = out(ii,1) - 2*std(data(bin,2));
out(ii,3) = out(ii,1) + 2*std(data(bin,2));
end
% Append bins to the matrix
out(:,4:5) = bins;
Output:
out =
0.4667 -0.2357 1.1690 0 1.0000
1.6750 1.2315 2.1185 1.0000 2.0000
2.4667 2.1612 2.7722 2.0000 3.0000
I'm a new programmer, working in MATLAB, and I am hoping to substitute some values in one of my matrices with information from another matrix.
So, for example, supposing I have one matrix [A]:
A = [0 0 0 0 0
.5 0 .2 .8 0
1 .3 1 .1 .1
1 1 .4 1 1
1 1 1 1 1]
And another matrix B:
B = [.4 .3 .2 .1 .2]
I would like to replace the first nonzero value in A with the one in the same column in matrix B such that:
A_new = [0 0 0 0 0
.4 0 .2 .1 0
1 .3 1 .1 .2
1 1 .4 1 1
1 1 1 1 1]
There are some values between 0 and 1 that I want to keep untouched which precludes just changing everything between 0 and 1 (exclusive). I'm sure the solution will involve an if statement and possibly a for loop, but I'm not sure how to set it up. Any advice is appreciated!
Try this:
[~, row] = max(A~=0);
A(row + (0:size(A,1):numel(A)-1)) = B
How it works:
The first line produces a vector row containing the index of the first nonzero row in each column. This use of max is a "column-wise find" of sorts. Namely, max works down each column and tells you the row index of the maximum value in that column. Since each column only contains 0 or 1, there may be several maximizing (1) values; max gives the (index of) the first one.
In the second line that vector is transformed into a linear index to replace those elements of A.
Another way, though probably inefficient, is to use find and search for row and column locations that are non-zero. Once you find this, because MATLAB searches for non-zero entries column-wise, you can apply diff to the column indices and find transitions. The elements on the right-hand of the transition denote the first non-zero value for each column. Once you find these locations, get the corresponding row locations for these entries, create a new matrix A_new that is a copy of A, then change the corresponding entries to B. You'll need to use sub2ind for the assignment.
Something like this:
%// Define matrix
A = [0 0 0 0 0
.5 0 .2 .8 0
1 .3 1 .1 .1
1 1 .4 1 1
1 1 1 1 1];
%// Find non-zero entries
[row,col] = find(A);
%// Figure out the column locations that are the first non-zero for each column
ind = diff([Inf; col]) ~= 0;
%// Create a new matrix that is a copy of the old one
A_new = A;
%// Create the B vector
B = [.4 .3 .2 .1 .2];
%// Do the assignment
A_new(sub2ind(size(A), row(ind), (1:size(A,2)).')) = B;
Doing sub2ind is perhaps inefficient because there is a lot error checking done and that can slow things down. You can compute the linear indices manually by:
columns = size(A,2);
A_new((0:columns-1).'*columns + row(ind)) = B;
Running the above code, we get:
>> A
A =
0 0 0 0 0
0.5000 0 0.2000 0.8000 0
1.0000 0.3000 1.0000 0.1000 0.1000
1.0000 1.0000 0.4000 1.0000 1.0000
1.0000 1.0000 1.0000 1.0000 1.0000
>> A_new
A_new =
0 0 0 0 0
0.4000 0 0.2000 0.1000 0
1.0000 0.3000 1.0000 0.1000 0.2000
1.0000 1.0000 0.4000 1.0000 1.0000
1.0000 1.0000 1.0000 1.0000 1.0000
In Matlab, say I have the following matrix, which represents a population of 10 individuals:
pop = [0 0 0 0 0; 1 1 1 0 0; 1 1 1 1 1; 1 1 1 0 0; 0 0 0 0 0; 0 0 0 0 0; 1 0 0 0 0; 1 1 1 1 1; 0 0 0 0 0; 0 0 0 0 0];
Where rows of ones and zeros define 6 different 'types' of individuals.
a = [0 0 0 0 0];
b = [1 0 0 0 0];
c = [1 1 0 0 0];
d = [1 1 1 0 0];
e = [1 1 1 1 0];
f = [1 1 1 1 1];
I want to define the proportion/frequency of a, b, c, d, e and f in pop.
I want to end up with the following list:
a = 0.5;
b = 0.1;
c = 0;
d = 0.2;
e = 0;
f = 0.2;
One way I can think of is by summing the rows, then counting the number of times each appears, and then sorting and indexing
sum_pop = sum(pop')';
x = unique(sum_pop);
N = numel(x);
count = zeros(N,1);
for l = 1:N
count(l) = sum(sum_pop==x(l));
end
pop_frequency = [x(:) count/10];
But this doesn't quite get me what I want (i.e. when frequency = 0) and it seems there must be a faster way?
You can use pdist2 (Statistics Toolbox) to get all frequencies:
indiv = [a;b;c;d;e;f]; %// matrix with all individuals
result = mean(pdist2(pop, indiv)==0, 1);
This gives, in your example,
result =
0.5000 0.1000 0 0.2000 0 0.2000
Equivalently, you can use bsxfun to manually compute pdist2(pop, indiv)==0, as in Divakar's answer.
For the specific individuals in your example (that can be identified by the number of ones) you could also do
result = histc(sum(pop, 2), 0:size(pop,2)) / size(pop,1);
There is some functionality in unique that can be used for this. If
[q,w,e] = unique(pop,'rows');
q is the matrix of unique rows, w is the index of the row first appears in the matrix. The third element e contains indices of q so that pop = q(e,:). Armed with this, the rest of the problem should be straight forward. The probability of a value in e should be the probability that this row appears in pop.
The counting can be done with histc
histc(e,1:max(e))/length(e)
and the non occuring rows can be found with
ismember(a,q,'rows')
There is of course other ways as well, maybe (probably) faster ways, or oneliners. Why I post this is because it provides a way that is easy to understand, readable and that does not require any special toolboxes.
EDIT
This example gives expected output
a = [0,0,0,0,0;1,0,0,0,0;1,1,0,0,0;1,1,1,0,0;1,1,1,1,0;1,1,1,1,1]; % catenated a-f
[q,w,e] = unique(pop,'rows');
prob = histc(e,1:max(e))/length(e);
out = zeros(size(a,1),1);
out(ismember(a,q,'rows')) = prob;
Approach #1
With bsxfun -
A = cat(1,a,b,c,d,e,f)
out = squeeze(sum(all(bsxfun(#eq,pop,permute(A,[3 2 1])),2),1))/size(pop,1)
Output -
out =
0.5000
0.1000
0
0.2000
0
0.2000
Approach #2
If those elements are binary numbers, you can convert them into decimal format.
Thus, decimal format for pop becomes -
>> bi2de(pop)
ans =
0
7
31
7
0
0
1
31
0
0
And that of the concatenated array, A becomes -
>> bi2de(A)
ans =
0
1
3
7
15
31
Finally, you need to count the decimal formatted numbers from A in that of pop, which you can do with histc. Here's the code -
A = cat(1,a,b,c,d,e,f)
out = histc(bi2de(pop),bi2de(A))/size(pop,1)
Output -
out =
0.5000
0.1000
0
0.2000
0
0.2000
I think ismember is the most direct and general way to do this. If your groups were more complicated, this would be the way to go:
population = [0,0,0,0,0; 1,1,1,0,0; 1,1,1,1,1; 1,1,1,0,0; 0,0,0,0,0; 0,0,0,0,0; 1,0,0,0,0; 1,1,1,1,1; 0,0,0,0,0; 0,0,0,0,0];
groups = [0,0,0,0,0; 1,0,0,0,0; 1,1,0,0,0; 1,1,1,0,0; 1,1,1,1,0; 1,1,1,1,1];
[~, whichGroup] = ismember(population, groups, 'rows');
freqOfGroup = accumarray(whichGroup, 1)/size(groups, 1);
In your special case the groups can be represented by their sums, so if this generic solution is not fast enough, use the sum-histc simplification Luis used.
Hi guys i need your help, so i have an array
a b c n
1 1 2 4
1 3 2 6
1 6 0 7
and i want to create another array form each rows of my array, see picture below.
I tried using this code:
assuming that my data is located at array M so,
for x=1:10
d = M(:,4)/(M(:,1) + M(:,2) + M(:,3) + x)
end
but it doesn't give my desired output
in excel you just only write the equation and drag it down, in you will have the answer but i don't know how to do it in matlab, i think we could use for loop. thanks.
PLEASE SEE THE RED BOX THAT'S MY DESIRED OUTPUT
The equivalent in Matlab would be:
data = [...
1 1 2 4;
1 3 2 6;
1 6 0 7]
x = (1:10).';
f = #(t) data(t,4)./(data(t,1) + data(t,2) + data(t,3) + x )
y = [ x f(1) x f(2) x f(3) ]
or even simpler:
N = 10;
f = #(t) [(1:N).' data(t,4)./(data(t,1) + data(t,2) + data(t,3) + (1:N).' )]
y = [ f(1) f(2) f(3) ]
the number in f(...) always indicates which row, respectively which y e.g. y1, y2, etc. you are calculating for each column of the output. The brackets [...] are concatenating the result.
Be aware that you need to use the element-wise division operator ./
Generalized for an n x m sized input array, but assuming that the n-column is always the last one of your input Matrix:
N = 10;
f = #(t) [(1:N).' data(t,end)./(sum( data(t,(1:end-1))) + (1:N).' )]
y = cell2mat(arrayfun(f, 1:size(data,1),'uni',0))
But in this case you should think about, if a more vectorized approach like Divakar's answer might be more appropriate.
result:
y =
1 0.8 1 0.85714 1 0.875
2 0.66667 2 0.75 2 0.77778
3 0.57143 3 0.66667 3 0.7
4 0.5 4 0.6 4 0.63636
5 0.44444 5 0.54545 5 0.58333
6 0.4 6 0.5 6 0.53846
7 0.36364 7 0.46154 7 0.5
8 0.33333 8 0.42857 8 0.46667
9 0.30769 9 0.4 9 0.4375
10 0.28571 10 0.375 10 0.41176
Vectorized approach to get the desired output with another good case for bsxfun to have the desired output for a generic m x n sized input array -
N = 10; %// Number of rows in the output
[m,n] = size(M) %// Get size
sum_cols = sum(M(:,1:n-1),2) %// sum along dim-2 until the second last column
sum_firstN = bsxfun(#plus,sum_cols,1:N) %// For each column-sum, add 1:N
out1 = bsxfun(#ldivide,sum_firstN,M(:,n)).'%//'# elementwise divide by last col
out = [repmat([1:N]',1,n); out1] %//'# Concatenate with starting columns of 1:N
out = reshape(out,N,[]) %// Reshape into desired shape
Code run for given 3 x 4 sized input array -
out =
1.0000 0.8000 1.0000 0.8571 1.0000 0.8750
2.0000 0.6667 2.0000 0.7500 2.0000 0.7778
3.0000 0.5714 3.0000 0.6667 3.0000 0.7000
4.0000 0.5000 4.0000 0.6000 4.0000 0.6364
5.0000 0.4444 5.0000 0.5455 5.0000 0.5833
6.0000 0.4000 6.0000 0.5000 6.0000 0.5385
7.0000 0.3636 7.0000 0.4615 7.0000 0.5000
8.0000 0.3333 8.0000 0.4286 8.0000 0.4667
9.0000 0.3077 9.0000 0.4000 9.0000 0.4375
10.0000 0.2857 10.0000 0.3750 10.0000 0.4118
H = [1 1; 1 2; 2 -1; 2 0; -1 2; -2 1; -1 -1; -2 -2;]';
I need to threshold each value such that
H(I,j) = 0 if H(I,j) > =1,
else H(I,j) = 1 if H(I,j) <=0
I applied this code
a = H(1,1)
a(a<=0) = 1
a(a>=1) = 0
But this means that the already affected value in the first step may get changed again. What is the correct way of thresholding? The above code is giving incorrect answers. I should be getting
a = [0 0; 0 0; 0 1; 0 1; 1 0; 1 0; 1 1; 1 1]
Please help
EDIT
Based upon the answer now I am getting
0 0
0 0
1.0000 0.3443
0.8138 0.9919
0 0.7993
0.1386 1.0000
1.0000 1.0000
1.0000 1.0000
As can be seen, rows 3-6 are all incorrect. Please help
ind1 = H>=1; %// get indices before doing any change
ind2 = H<=0;
H(ind1) = 0; %// then do the changes
H(ind2) = 1;
If dealing with non-integer values, you should apply a certain tolerance in the comparisons:
tol = 1e-6; %// example tolerance
ind1 = H>=1-tol; %// get indices before doing any change
ind2 = H<=0+tol;
H(ind1) = 0; %// then do the changes
H(ind2) = 1;