in matlab put data into bins and calculate mean - arrays

In matlab, say I have the following data:
data = [4 0.1; 6 0.5; 3 0.8; 2 1.4; 7 1.6; 12 1.8; 9 1.9; 1 2.3; 5 2.5; 5 2.6];
I want to place the 1st column into bins according to elements in the 2nd column (i.e. 0-1, 1-2, 2-3...), and calculate the mean and 95% confidence interval of the elements in column 1 within that bin . So I'd have a matrix something like this:
mean lower_95% upper_95% bin
4.33 0
7.5 1
3.67 2

You can use accumarray with the appropriate function for the mean (mean) or the quantiles (quantile):
m = accumarray(floor(data(:,2))+1, data(:,1), [], #mean);
l = accumarray(floor(data(:,2))+1, data(:,1), [], #(x) quantile(x,.05));
u = accumarray(floor(data(:,2))+1, data(:,1), [], #(x) quantile(x,.95));
result = [m l u (0:numel(m)-1).'];
This can also be done calling accumarray once with cell array output:
result = accumarray(floor(data(:,2))+1, data(:,1), [],...
#(x) {[mean(x) quantile(x,.05) quantile(x,.95)]});
result = cell2mat(result);
For your example data,
result =
4.3333 3.0000 6.0000 0
7.5000 2.0000 12.0000 1.0000
3.6667 1.0000 5.0000 2.0000

This outputs a matrix with the labelled columns. Note that for your example data, 2 standard deviations from the mean (for the 95% confidence interval) gives values outside of the bands. With a larger (normally distributed) data set, you wouldn't see this.
Your data:
data = [4 0.1; 6 0.5; 3 0.8; 2 1.4; 7 1.6; 12 1.8; 9 1.9; 1 2.3; 5 2.5; 5 2.6];
Binning for output table:
% Initialise output matrix. Columns:
% Mean, lower 95%, upper 95%, bin left, bin right
bins = [0 1; 1 2; 2 3];
out = zeros(size(bins,1),5);
% Cycle through bins
for ii = 1:size(bins,1)
% Store logical array of which elements fit in given bin
% You may want to include edge case for "greater than or equal to" leftmost bin.
% Alternatively you could make the left bin equal to "left bin - eps" = -eps
bin = data(:,2) > bins(ii,1) & data(:,2) <= bins(ii,2);
% Calculate mean, and mean +- 2*std deviation for confidence intervals
out(ii,1) = mean(data(bin,2));
out(ii,2) = out(ii,1) - 2*std(data(bin,2));
out(ii,3) = out(ii,1) + 2*std(data(bin,2));
end
% Append bins to the matrix
out(:,4:5) = bins;
Output:
out =
0.4667 -0.2357 1.1690 0 1.0000
1.6750 1.2315 2.1185 1.0000 2.0000
2.4667 2.1612 2.7722 2.0000 3.0000

Related

Selecting elements from a vector based on condition on another vector

I want to know how to select those numbers which correspond (i.e. same position) to my pre-defined numbers.
For example, I have these vectors:
a = [ 1 0.1 2 3 0.1 0.5 4 0.1];
b = [100 200 300 400 500 600 700 800]
I need to select elements from b which correspond to the positions of the whole numbers in a (1, 2, 3 and 4), so the output must be:
output = [1 100
2 300
3 400
4 700]
How can this be done?
Create a logical index based on a, and apply it to both a and b to get the desired result:
ind = ~mod(a,1); % true for integer numbers
output = [a(ind); b(ind)].'; % build result
round(x) == x ----> x is a whole number
round(x) ~= x ----> x is not a whole number
round(2.4) = 2 ------> round(2.4) ~= 2.4 --> 2.4 is not a whole number
round(2) = 2 --------> round(2) == 2 ----> 2 is a whole number
Following same logic
a = [ 1 0.1 2 3 0.1 0.5 4 0.1];
b = [100 200 300 400 500 600 700 800 700];
iswhole = (round(a) == a);
output = [a(iswhole); b(iswhole)]
Result:
output =
1 2 3 4
100 300 400 700
we can generate logical index based on a using fix() function
ind = (a==fix(a));
output= [a(ind); b(ind)]'
Although the intention is not clear, creating indexing to the matrix is the solution
My solution is
checkint = #(x) ~isinf(x) & floor(x) == x % It's very fast in a big array
[a(checkint(a))' b(checkint(a))']
The key here is creating the index to a and b for which it is a logical vector to the integer values in a. This function checkint does a good job checking integer.
Other approaches to check integer could be
checkint = #(x)double(uint64(x))==x % Slower but it works fine
or
checkint = #(x) mod(x,1) == 0 % Slowest, but it's robust and better for understanding what's going on
or
checkint = #(x) ~mod(x,1) % Slowest, treat 0 as false
It's been discussed in many other threads.

In matlab, find the frequency at which unique rows appear in a matrix

In Matlab, say I have the following matrix, which represents a population of 10 individuals:
pop = [0 0 0 0 0; 1 1 1 0 0; 1 1 1 1 1; 1 1 1 0 0; 0 0 0 0 0; 0 0 0 0 0; 1 0 0 0 0; 1 1 1 1 1; 0 0 0 0 0; 0 0 0 0 0];
Where rows of ones and zeros define 6 different 'types' of individuals.
a = [0 0 0 0 0];
b = [1 0 0 0 0];
c = [1 1 0 0 0];
d = [1 1 1 0 0];
e = [1 1 1 1 0];
f = [1 1 1 1 1];
I want to define the proportion/frequency of a, b, c, d, e and f in pop.
I want to end up with the following list:
a = 0.5;
b = 0.1;
c = 0;
d = 0.2;
e = 0;
f = 0.2;
One way I can think of is by summing the rows, then counting the number of times each appears, and then sorting and indexing
sum_pop = sum(pop')';
x = unique(sum_pop);
N = numel(x);
count = zeros(N,1);
for l = 1:N
count(l) = sum(sum_pop==x(l));
end
pop_frequency = [x(:) count/10];
But this doesn't quite get me what I want (i.e. when frequency = 0) and it seems there must be a faster way?
You can use pdist2 (Statistics Toolbox) to get all frequencies:
indiv = [a;b;c;d;e;f]; %// matrix with all individuals
result = mean(pdist2(pop, indiv)==0, 1);
This gives, in your example,
result =
0.5000 0.1000 0 0.2000 0 0.2000
Equivalently, you can use bsxfun to manually compute pdist2(pop, indiv)==0, as in Divakar's answer.
For the specific individuals in your example (that can be identified by the number of ones) you could also do
result = histc(sum(pop, 2), 0:size(pop,2)) / size(pop,1);
There is some functionality in unique that can be used for this. If
[q,w,e] = unique(pop,'rows');
q is the matrix of unique rows, w is the index of the row first appears in the matrix. The third element e contains indices of q so that pop = q(e,:). Armed with this, the rest of the problem should be straight forward. The probability of a value in e should be the probability that this row appears in pop.
The counting can be done with histc
histc(e,1:max(e))/length(e)
and the non occuring rows can be found with
ismember(a,q,'rows')
There is of course other ways as well, maybe (probably) faster ways, or oneliners. Why I post this is because it provides a way that is easy to understand, readable and that does not require any special toolboxes.
EDIT
This example gives expected output
a = [0,0,0,0,0;1,0,0,0,0;1,1,0,0,0;1,1,1,0,0;1,1,1,1,0;1,1,1,1,1]; % catenated a-f
[q,w,e] = unique(pop,'rows');
prob = histc(e,1:max(e))/length(e);
out = zeros(size(a,1),1);
out(ismember(a,q,'rows')) = prob;
Approach #1
With bsxfun -
A = cat(1,a,b,c,d,e,f)
out = squeeze(sum(all(bsxfun(#eq,pop,permute(A,[3 2 1])),2),1))/size(pop,1)
Output -
out =
0.5000
0.1000
0
0.2000
0
0.2000
Approach #2
If those elements are binary numbers, you can convert them into decimal format.
Thus, decimal format for pop becomes -
>> bi2de(pop)
ans =
0
7
31
7
0
0
1
31
0
0
And that of the concatenated array, A becomes -
>> bi2de(A)
ans =
0
1
3
7
15
31
Finally, you need to count the decimal formatted numbers from A in that of pop, which you can do with histc. Here's the code -
A = cat(1,a,b,c,d,e,f)
out = histc(bi2de(pop),bi2de(A))/size(pop,1)
Output -
out =
0.5000
0.1000
0
0.2000
0
0.2000
I think ismember is the most direct and general way to do this. If your groups were more complicated, this would be the way to go:
population = [0,0,0,0,0; 1,1,1,0,0; 1,1,1,1,1; 1,1,1,0,0; 0,0,0,0,0; 0,0,0,0,0; 1,0,0,0,0; 1,1,1,1,1; 0,0,0,0,0; 0,0,0,0,0];
groups = [0,0,0,0,0; 1,0,0,0,0; 1,1,0,0,0; 1,1,1,0,0; 1,1,1,1,0; 1,1,1,1,1];
[~, whichGroup] = ismember(population, groups, 'rows');
freqOfGroup = accumarray(whichGroup, 1)/size(groups, 1);
In your special case the groups can be represented by their sums, so if this generic solution is not fast enough, use the sum-histc simplification Luis used.

how to create an arrays from rows using Matlab

Hi guys i need your help, so i have an array
a b c n
1 1 2 4
1 3 2 6
1 6 0 7
and i want to create another array form each rows of my array, see picture below.
I tried using this code:
assuming that my data is located at array M so,
for x=1:10
d = M(:,4)/(M(:,1) + M(:,2) + M(:,3) + x)
end
but it doesn't give my desired output
in excel you just only write the equation and drag it down, in you will have the answer but i don't know how to do it in matlab, i think we could use for loop. thanks.
PLEASE SEE THE RED BOX THAT'S MY DESIRED OUTPUT
The equivalent in Matlab would be:
data = [...
1 1 2 4;
1 3 2 6;
1 6 0 7]
x = (1:10).';
f = #(t) data(t,4)./(data(t,1) + data(t,2) + data(t,3) + x )
y = [ x f(1) x f(2) x f(3) ]
or even simpler:
N = 10;
f = #(t) [(1:N).' data(t,4)./(data(t,1) + data(t,2) + data(t,3) + (1:N).' )]
y = [ f(1) f(2) f(3) ]
the number in f(...) always indicates which row, respectively which y e.g. y1, y2, etc. you are calculating for each column of the output. The brackets [...] are concatenating the result.
Be aware that you need to use the element-wise division operator ./
Generalized for an n x m sized input array, but assuming that the n-column is always the last one of your input Matrix:
N = 10;
f = #(t) [(1:N).' data(t,end)./(sum( data(t,(1:end-1))) + (1:N).' )]
y = cell2mat(arrayfun(f, 1:size(data,1),'uni',0))
But in this case you should think about, if a more vectorized approach like Divakar's answer might be more appropriate.
result:
y =
1 0.8 1 0.85714 1 0.875
2 0.66667 2 0.75 2 0.77778
3 0.57143 3 0.66667 3 0.7
4 0.5 4 0.6 4 0.63636
5 0.44444 5 0.54545 5 0.58333
6 0.4 6 0.5 6 0.53846
7 0.36364 7 0.46154 7 0.5
8 0.33333 8 0.42857 8 0.46667
9 0.30769 9 0.4 9 0.4375
10 0.28571 10 0.375 10 0.41176
Vectorized approach to get the desired output with another good case for bsxfun to have the desired output for a generic m x n sized input array -
N = 10; %// Number of rows in the output
[m,n] = size(M) %// Get size
sum_cols = sum(M(:,1:n-1),2) %// sum along dim-2 until the second last column
sum_firstN = bsxfun(#plus,sum_cols,1:N) %// For each column-sum, add 1:N
out1 = bsxfun(#ldivide,sum_firstN,M(:,n)).'%//'# elementwise divide by last col
out = [repmat([1:N]',1,n); out1] %//'# Concatenate with starting columns of 1:N
out = reshape(out,N,[]) %// Reshape into desired shape
Code run for given 3 x 4 sized input array -
out =
1.0000 0.8000 1.0000 0.8571 1.0000 0.8750
2.0000 0.6667 2.0000 0.7500 2.0000 0.7778
3.0000 0.5714 3.0000 0.6667 3.0000 0.7000
4.0000 0.5000 4.0000 0.6000 4.0000 0.6364
5.0000 0.4444 5.0000 0.5455 5.0000 0.5833
6.0000 0.4000 6.0000 0.5000 6.0000 0.5385
7.0000 0.3636 7.0000 0.4615 7.0000 0.5000
8.0000 0.3333 8.0000 0.4286 8.0000 0.4667
9.0000 0.3077 9.0000 0.4000 9.0000 0.4375
10.0000 0.2857 10.0000 0.3750 10.0000 0.4118

Multiply one part of Cell Array with a Scalar Matlab

I have a cell array that consits of a set of tracks like this:
<TL1x3> double
<TL1x3> double
<TL3x3> double
...
where TL stands for the track length. This value is different for each ekement, but there are always three columns: time, x coord, y coord.
From the tracking algorithm I get the x and y coord in pixels. However, I need them in nm, so I have to multiply them with a value, but only the second and third, not the first column of each element, e.g.:
0 5 6 x2 0 10 12
0.5 7 2 ---> 0.5 14 4
1 8 1 1 16 2
... ...
and this for every element of the array.
With cellfun, I have managed to change every cell of the array, but I don't know how to change only one part. Do you have any idea how to do this...?
You can do this by creating an anonymous function that calls bsxfun() and passing that to cellfun(). Assuming your input data is in the cell array inputData and the scale factor to apply is in the scalar variable scaleFactor;
scaledData = cellfun(#(X) bsxfun(#times, X, [1 scaleFactor scaleFactor]), inputData, 'UniformOutput', false);
I think this gives the results you want
Given sample input:
c={[1 2 3]; [4 5 6]; [7 8 9; 10 11 12; 13 14 15]};
Then:
xf = sparse([1 0 0; 0 2 0; 0 0 2]);
d=cellfun(#(x) x * xf, c, 'uniformoutput', false);
It might not be the most elegant nor efficient way, but converting your cell array to a matrix would simplify things for you:
A = {[0 5 6] ;
[0.5 7 2];
[1 8 1 ]}
B = cell2mat(A)
B(:,2:end) = 2*B(:,2:end)
Gives this in the command window:
A =
[1x3 double]
[1x3 double]
[1x3 double]
Before:
B =
0 5.0000 6.0000
0.5000 7.0000 2.0000
1.0000 8.0000 1.0000
After:
B =
0 10.0000 12.0000
0.5000 14.0000 4.0000
1.0000 16.0000 2.0000
You could also create a temporary cell array contanining the last 2 columns of your original cell array and then apply cellfun to it and put it back in the original. Are speed/performance an issue for you?

First N values of a function with two inputs

I have a function with two integer inputs like this:
function f = func(n, m)
a = 2;
b = 1;
f = sqrt((n/a)^2 + (m/b)^2);
end
m and n are integers and greater than or equal to zero. The first couple of values of f and the inputs they occure in are like below:
n ----- m ----- f
0 ----- 0 ----- 0
1 ----- 0 ----- 0.5
2 ----- 0 ----- 1
0 ----- 1 ----- 1
1 ----- 1 ----- 1.118
and so on. I want to get the first N values of f and their respective n and m. Is there an easy way to do that in matlab?
Code
%// Parameters
N = 5
a = 2;
b = 1;
%// Extents of n and m would be from 0 to N-1 to account for all possible
%// minimum values of f results resulting from their use
len1 = N-1
%// Create n and m for maximum possible combinations scenario, but save
%// them as n1 and m1 for now, as the final ones would be chopped versions
%// of them.
[n1,m1] = ndgrid(0:len1,0:len1)
%// Get corresponding f values, but store as f1, for the same chopping reason
f1 = sqrt((n1(:)./a).^2 + (m1(:)./b).^2);
%// Sort f1 so that the smallest N values from it could be choosen and also
%// get the selected row indices based on the sorting as row1
[f1,row1] = sort(f1)
%// Choose n and m based on the sorted indices and also chop off at N.
%// Use these n and m values to finally get f
n = n1(row1(1:N))
m = m1(row1(1:N))
f = f1(1:N)
Output
With N = 5, you would get -
n =
0
1
2
0
1
m =
0
0
0
1
1
f =
0
0.5000
1.0000
1.0000
1.1180
With N = 9, you would get -
n =
0
1
2
0
1
2
3
3
4
m =
0
0
0
1
1
1
0
1
0
f =
0
0.5000
1.0000
1.0000
1.1180
1.4142
1.5000
1.8028
2.0000
meshgrid and arrayfun can be used to generate an array of outputs for ranges of inputs as such
Code
nValues = 0:2
mValues = 0:3
[ii,jj] = meshgrid(mValues,nValues)
output = arrayfun(#func,ii,jj)
The two value vectors can be modified to take the range(s) of values required
Output
output =
0 0.5000 1.0000 1.5000
1.0000 1.1180 1.4142 1.8028
2.0000 2.0616 2.2361 2.5000
To give a result like the matrix in the question the following can be used (thanks #Divakar)
[jj(:),ii(:),arrayfun(#func,jj(:),ii(:))]
ans =
0 0 0
1.0000 0 0.5000
2.0000 0 1.0000
0 1.0000 1.0000
1.0000 1.0000 1.1180
2.0000 1.0000 1.4142
0 2.0000 2.0000
1.0000 2.0000 2.0616
2.0000 2.0000 2.2361
0 3.0000 3.0000
1.0000 3.0000 3.0414
2.0000 3.0000 3.1623
Something (probably rather inefficient) like this?
N = 100 % stop
i = 0
n = 0
m = 0
nout = [n]
mout = [m]
fout = [f(n,m)]
while i ~= N
a = f(n+1,m)
b = f(n,m+1)
if (a > b)
m = m + 1
nout = [nout n]
mout = [mout m]
fout = [fout b]
else
n = n + 1
nout = [nout n]
mout = [mout m]
fout = [fout a]
end if
i = i + 1
end while

Resources