Variation of indices of an array or matrix - arrays

If I use this syntax:
mX=[1:5];
A=rand(5,1);
C(mX)=sum(A(1:mX));
Why doesn't the content of C(mX) vary with varying mX?
Instead of doing
C(1)=A(1)
C(2)=A(1)+A(2), etc
it does:
C(1)=A(1)
C(2)=A(1)
C(3)=A(1), etc
Is there any way to vary C(mX) without resorting to a loop?

To answer your first question:
mX=1:5;
A=rand(5,1);
C(mX)=sum(A(1:mX));
makes the sum over A(1:[1 2 3 4 5]), which results in A(1:1), and hence all your C(mX) values will be filled with purely the element A(1).
What you want to do is make a cumulative sum, which can be done, as #leanderMoesinger mentioned with cumsum:
A=rand(5,1);
C = cumsum(A)
C =
0.0975
0.3760
0.9229
1.8804
2.8453
If you want to learn more about indexing I can highly recommend the following post: Linear indexing, logical indexing, and all that
If you want not all elements of A, but e.g. up to element three you can do
mX = 1:3;
A = rand(5,1);
C = cumsum(A(mX)); calculate only to mX
mX = [1 3 5];
C = cumsum(A(mX)) % Also works if you only want elements 1 3 and 5 to appear
% If you want elements of C 1 3 and 5 use
tmp = cumsum(A);
C = tmp(mX);

You can do this by cumsum like so:
mX=[1:5];
A=rand(5,1);
C = cumsum(A(mX));

Related

Efficient way to generate histogram from very large dataset in MATLAB?

I have two 2D arrays of size up to 35,000*35,000 each: indices and dotPs. From this, I want to create two 1D arrays such that pop contains the number of times each number appears in indices and nn contains the sum of elements in dotPs that correspond to those numbers. I have come up with the following (really dumb) way:
dotPs = [81.4285 9.2648 46.3184 5.7974 4.5016 2.6779 16.0092 41.1426;
9.2648 24.3525 11.4308 14.6598 17.9558 23.4246 19.4837 14.1173;
46.3184 11.4308 92.9264 9.2036 2.9957 0.1164 26.5770 26.0243;
5.7974 14.6598 9.2036 34.9984 16.2352 19.4568 31.8712 5.0732;
4.5016 17.9558 2.9957 16.2352 19.6595 16.0678 3.5750 16.7702;
2.6779 23.4246 0.1164 19.4568 16.0678 25.1084 6.6237 15.6188;
16.0092 19.4837 26.5770 31.8712 3.5750 6.6237 61.6045 16.6102;
41.1426 14.1173 26.0243 5.0732 16.7702 15.6188 16.6102 47.3289];
indices = [3 2 1 1 2 1 2 1;
2 2 1 2 2 1 2 2;
1 1 3 3 2 2 2 2;
1 2 3 4 3 3 4 2;
2 2 2 3 3 1 3 2;
1 1 2 3 1 8 2 2;
2 2 2 4 3 2 4 2;
1 2 2 2 2 2 2 2];
nn = zeros(1,8);
pop = zeros(1,8);
uniqueInd = unique(indices);
for k=1:numel(uniqueInd)
j = uniqueInd(k);
[I,J]=find(indices==j);
if j == 0 || numel(I) == 0
continue
end
pop(j) = pop(j) + numel(I);
nn(j) = nn(j) + sum(sum(dotPs(I,J)));
end
Because of the find function, this is very slow. How can I do this more smartly so that it runs in a few seconds rather than several minutes?
Edit: added small dummy matrices for testing the code.
Both tasks can be done with the accumarray function:
pop = accumarray(indices(:), 1, [max(indices(:)) 1]).';
nn = accumarray(indices(:), dotPs(:), [max(indices(:)) 1]).';
This assumes that indices only contains positive integers.
EDIT:
From comments, only the lower part of the indices matrix without the diagonal should be used, and it is guaranteed to contain positive integers. In that case:
mask = tril(true(size(indices)), -1);
indices_masked = indices(mask);
dotPs_masked = dotPs(mask);
pop = accumarray(indices_masked, 1, [max(indices_masked) 1]).';
nn = accumarray(indices_masked, dotPs_masked, [max(indices_masked) 1]).';
First of all, note that the dimension of indices does not matter (e.g. if both indices and dotPs were 1D arrays or 3D arrays the result will be the same).
pop can be calculated by histcount function, but since you also need to calculate the sum of the corresponding elements of dotPs array the problem becomes harder.
Here is a possible solution with a for loop. The advantage of this solution is that I am not calling find function in a loop, so it should be faster:
%Example input
indices=randi(5,3,3);
dotPs=rand(3,3);
%Solution
[C,ia,ic]=unique(indices);
nn=zeros(size(C));
pop=zeros(size(C));
for i=1:numel(indices)
nn(ic(i))=nn(ic(i))+1;
pop(ic(i))=pop(ic(i))+dotPs(i);
end
This solution uses a vector ic to categorize each of the input values. After that, I go through each element and update nn(ic) and pop(ic).
For computing pop, you can use hist, for computing nn, I couldn't find a smart solution (but I found a solution without using find):
pop = hist(indices(:), max(indices(:)));
nn = zeros(1,8);
uniqueInd = unique(indices);
for k=1:numel(uniqueInd)
j = uniqueInd(k);
nn(j) = sum(dotPs(indices == j));
end
There must be a better solution for computing nn.
I found a smarter solution applying sorting.
I am not sure it's faster, because sorting 35,000*35,000 elements might take a long time.
Sort indices just for getting the index for sorting dotPs by indices.
Sort dotPs according to index returned by previous sort.
cumsumPop = Compute cumulative sum of pop (cumulative sum of the histogram of indices).
cumsumPs = Compute cumulative sum of sorted dotPs.
Now values of cumsumPop can be used as indices in cumsumPs.
Because cumsumPs is cumulative sum, we need to use diff
for getting the solution.
Here is the "smart" solution:
pop = hist(indices(:), max(indices(:)));
[sortedIndices, I] = sort(indices(:));
sortedDotPs = dotPs(I);
cumsumPop = cumsum(pop);
cumsumPs = cumsum(sortedDotPs);
nn = diff([0; cumsumPs(cumsumPop)]);
nn = nn';

Count values in a vector less than each one of the elements in another vector

I have two vectors r and d and I'd like to know the number of times r<d(i) where i=1:length(d).
r=rand(1,1E7);
d=linspace(0,1,10);
So far I've got the following, but it's not very elegant:
for i=1:length(d)
sum(r<d(i))
end
This is an example in R but I'm not really sure this would work for matlab:
Finding number of elements in one vector that are less than an element in another vector
You can use singleton expansion with bsxfun: faster, more elegant than the loop, but also more memory-intensive:
result = sum(bsxfun(#lt, r(:), d(:).'), 1);
In recent Matlab versions bsxfun can be dropped thanks to implicit singleton expansion:
result = sum(r(:)<d(:).', 1);
An alternative approach is to use the histcounts function with the 'cumcount' option:
result = histcounts(r(:), [-inf; d(:); inf], 'Normalization', 'cumcount');
result = result(1:end-1);
You may build a matrix flagging values from vector r inferior to values from vector d in one time with bsxfun, then sum the values:
flag=bsxfun(#lt,r',d);
result=sum(flag,1);
For each element in d, count how many times this element is bigger than the elements in r, which is equivalent to your problem.
r=rand(1,10);
d=linspace(0,1,10);
result = sum(d>r(:))
Output:
result =
0 0 1 2 7 8 8 8 9 10

Compute the product of the next n elements in array

I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input.
For example for this input I should compute the product of every 3 consecutive elements, starting from the first.
[p, ind] = max_product([1 2 2 1 3 1],3);
This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3].
Is there any practical way to do it? Now I do this using:
for ii = 1:(length(v)-2)
p = prod(v(ii:ii+n-1));
end
where v is the input vector and n is the number of elements to be multiplied.
in this example n=3 but can take any positive integer value.
Depending whether n is odd or even or length(v) is odd or even, I get sometimes right answers but sometimes an error.
For example for arguments:
v = [1.35912281237829 -0.958120385352704 -0.553335935098461 1.44601450110386 1.43760259196739 0.0266423803393867 0.417039432979809 1.14033971399183 -0.418125096873537 -1.99362640306847 -0.589833539347417 -0.218969651537063 1.49863539349242 0.338844452879616 1.34169199365703 0.181185490389383 0.102817336496793 0.104835620599133 -2.70026800170358 1.46129128974515 0.64413523430416 0.921962619821458 0.568712984110933]
n = 7
I get the error:
Index exceeds matrix dimensions.
Error in max_product (line 6)
p = prod(v(ii:ii+n-1));
Is there any correct general way to do it?
Based on the solution in Fast numpy rolling_product, I'd like to suggest a MATLAB version of it, which leverages the movsum function introduced in R2016a.
The mathematical reasoning is that a product of numbers is equal to the exponent of the sum of their logarithms:
A possible MATLAB implementation of the above may look like this:
function P = movprod(vec,window_sz)
P = exp(movsum(log(vec),[0 window_sz-1],'Endpoints','discard'));
if isreal(vec) % Ensures correct outputs when the input contains negative and/or
P = real(P); % complex entries.
end
end
Several notes:
I haven't benchmarked this solution, and do not know how it compares in terms of performance to the other suggestions.
It should work correctly with vectors containing zero and/or negative and/or complex elements.
It can be easily expanded to accept a dimension to operate along (for array inputs), and any other customization afforded by movsum.
The 1st input is assumed to be either a double or a complex double row vector.
Outputs may require rounding.
Update
Inspired by the nicely thought answer of Dev-iL comes this handy solution, which does not require Matlab R2016a or above:
out = real( exp(conv(log(a),ones(1,n),'valid')) )
The basic idea is to transform the multiplication to a sum and a moving average can be used, which in turn can be realised by convolution.
Old answers
This is one way using gallery to get a circulant matrix and indexing the relevant part of the resulting matrix before multiplying the elements:
a = [1 2 2 1 3 1]
n = 3
%// circulant matrix
tmp = gallery('circul', a(:))
%// product of relevant parts of matrix
out = prod(tmp(end-n+1:-1:1, end-n+1:end), 2)
out =
4
4
6
3
More memory efficient alternative in case there are no zeros in the input:
a = [10 9 8 7 6 5 4 3 2 1]
n = 2
%// cumulative product
x = [1 cumprod(a)]
%// shifted by n and divided by itself
y = circshift( x,[0 -n] )./x
%// remove last elements
out = y(1:end-n)
out =
90 72 56 42 30 20 12 6 2
Your approach is correct. You should just change the for loop to for ii = 1:(length(v)-n+1) and then it will work fine.
If you are not going to deal with large inputs, another approach is using gallery as explained in #thewaywewalk's answer.
I think the problem may be based on your indexing. The line that states for ii = 1:(length(v)-2) does not provide the correct range of ii.
Try this:
function out = max_product(in,size)
size = size-1; % this is because we add size to i later
out = zeros(length(in),1) % assuming that this is a column vector
for i = 1:length(in)-size
out(i) = prod(in(i:i+size));
end
Your code works when restated like so:
for ii = 1:(length(v)-(n-1))
p = prod(v(ii:ii+(n-1)));
end
That should take care of the indexing problem.
using bsxfun you create a matrix each row of it contains consecutive 3 elements then take prod of 2nd dimension of the matrix. I think this is most efficient way:
max_product = #(v, n) prod(v(bsxfun(#plus, (1 : n), (0 : numel(v)-n)')), 2);
p = max_product([1 2 2 1 3 1],3)
Update:
some other solutions updated, and some such as #Dev-iL 's answer outperform others, I can suggest fftconv that in Octave outperforms conv
If you can upgrade to R2017a, you can use the new movprod function to compute a windowed product.

Reordering a vector in Matlab?

I have a vector in Matlab B of dimension nx1 that contains the integers from 1 to n in a certain order, e.g. n=6 B=(2;4;5;1;6;3).
I have a vector A of dimension mx1 with m>1 that contains the same integers in ascending order each one repeated an arbitrary number of times, e.g. m=13 A=(1;1;1;2;3;3;3;4;5;5;5;5;6).
I want to get C of dimension mx1 in which the integers in A are reordered following the order in B. In the example, C=(2;4;5;5;5;5;1;1;1;6;3;3;3)
One approach with ismember and sort -
[~,idx] = ismember(A,B)
[~,sorted_idx] = sort(idx)
C = B(idx(sorted_idx))
If you are into one-liners, then another with bsxfun -
C = B(nonzeros(bsxfun(#times,bsxfun(#eq,A,B.'),1:numel(B))))
This requires just one sort and indexing:
ind = 1:numel(B);
ind(B) = ind;
C = B(sort(ind(A)));
Another approach using repelem, accumarray, unique
B=[2;4;5;1;6;3];
A=[1;1;1;2;3;3;3;4;5;5;5;5;6];
counts = accumarray(A,A)./unique(A);
repelem(B,counts(B));
%// or as suggested by Divakar
%// counts = accumarray(A,1);
%// repelem(B,counts(B));
PS: repelem was introduced in R2015a. If you are using a prior version, refer here
Another solution using hist, but with a loop and expanding memory :(
y = hist(A, max(A))
reps = y(B);
C = [];
for nn = 1:numel(reps)
C = [C; repmat(B(nn), reps(nn), 1)];
end

Is there a better/faster way of randomly shuffling a matrix in MATLAB?

In MATLAB, I am using the shake.m function (http://www.mathworks.com/matlabcentral/fileexchange/10067-shake) to randomly shuffle each column. For example:
a = [1 2 3; 4 5 6; 7 8 9]
a =
1 2 3
4 5 6
7 8 9
b = shake(a)
b =
7 8 6
1 5 9
4 2 3
This function does exactly what I want, however my columns are very long (>10,000,000) and so this takes a long time to run. Does anyone know of a faster way of achieving this? I have tried shaking each column vector separately but this isn't faster. Thanks!
You can use randperm like this, but I don't know if it will be any faster than shake:
[m,n]=size(a)
for c = 1:n
a(randperm(m),c) = a(:,c);
end
Or you can try switch the randperm around to see which is faster (should produce the same result):
[m,n]=size(a)
for c = 1:n
a(:,c) = a(randperm(m),c);
end
Otherwise how many rows do you have? If you have far fewer rows than columns, it's possible that we can assume each permutation will be repeated, so what about something like this:
[m,n]=size(a)
cols = randperm(n);
k = 5; %//This is a parameter you'll need to tweak...
set_size = floor(n/k);
for set = 1:set_size:n
set_cols = cols(set:(set+set_size-1))
a(:,set_cols) = a(randperm(m), set_cols);
end
which would massively reduce the number of calls to randperm. Breaking it up into k equal sized sets might not be optimal though, you might want to add some randomness to that as well. The basic idea here though is that there will only be factorial(m) different orderings, and if m is much smaller than n (e.g. m=5, n=100000 like your data), then these orderings will be repeated naturally. So instead of letting that occur by itself, rather manage the process and reduce the calls to randperm which would be producing the same result anyway.
Here's a simple vectorized approach. Note that it creates an auxiliary matrix (ind) the same size as a, so depending on your memory it may be usable or not.
[~, ind] = sort(rand(size(a))); %// create a random sorting for each column
b = a(bsxfun(#plus, ind, 0:size(a,1):numel(a)-1)); %// convert to linear index
Obtain shuffled indices using randperm
idx = randperm(size(a,1));
Use the indices to shuffle the vector:
m = size(a,1);
for i=1:m
b(:,i) = a(randperm(m,:);
end
Look at this answer: Matlab: How to random shuffle columns of matrix
Here's a no-loop approach as it processes all indices at once and I believe this is as random as one could get given the requirements of shuffling among each column only.
Code
%// Get sizes
[m,n] = size(a);
%// Create an array of randomly placed sequential indices from 1 to numel(a)
rand_idx = randperm(m*n);
%// segregate those indices into rows and cols for the size of input data, a
col = ceil(rand_idx/m);
row = rem(rand_idx,m);
row(row==0)=m;
%// Sort both these row and col indices based on col, such that we have col
%// as 1,1,1,1 ...2,2,2,....3,3,3,3 and so on, which would represent per col
%// indices for the input data. Use these indices to linearly index into a
[scol,ind1] = sort(col);
a(1:m*n) = a((scol-1)*m + row(ind1))
Final output is obtained in a itself.

Resources