MATLAB: Defining n subsets of a matrix - arrays

I have a 1974x1 vector, Upper, and I am trying to break the information up into individual arrays of 36 items each. So, I used length to find that there are 1974 items and then divided by 36 and used the floor function. I cannot figure out how to do it all with n.
Here is my logic: I am defining n in an attempt to find the number of subsets that need to be defined. Then, I am trying to have subsetn become subset1, subset2,...,subset36. However, MATLAB only definies the matrix subsetn as a 1x36 matrix. However, this matrix contains what subset1 is supposed to contain(1...36). Do you guys have any advice for a newbie? What am I doing wrong?
binSize = 36;
nData = length(Upper);
nBins = floor(nData/36);
nDiscarded = nData - binSize*nBins;
n=1:binSize;
subsetn= [(n-1)*binSize+1:n*binSize];

You can create a 54x36 array where the nth column is your nth subset.
subsetArray=reshape(x(1:binSize*nBins),[],nBins);
You can access the nth subset as subsetArray(:,n)

Sorry in advance if I misunderstood what you want to do.
I think the following little trick might do what you want (it's hacky, but I'm no Matlab expert):
[a, b] = meshgrid(0:nBins-1, 0:binSize-1)
inds = a*binSize + b + 1
Now inds is a nBins*binSize matrix of indices. You can index Upper with it like
Upper(inds)
which should give you the subsets as the columns in the resulting matrix.
Edit: on seeing Yoda's answer, his is better ;)

Related

Using hist in Matlab to compute occurrences

I am using hist to compute the number of occurrences of values in a matrix in Matlab.
I think I am using it wrong because it gives me completely weird results. Could you help me to understand what is going on?
When I run this piece of code I get countsB as desired
rng default;
B=randi([0,3],10,1);
idxB=unique(B);
countsB=(hist(B,idxB))';
i.e.
B=[3;3;0;3;2;0;1;2;3;3];
idxB=[0;1;2;3];
countsB=[2;1;2;5];
When I run this other piece of code I get wrong results for countsA
A=ones(524288,1)*3418;
idxA=unique(A);
countsA=(hist(A,idxA))';
i.e.
idxA=3148;
countsA=[zeros(1709,1); 524288; zeros(1708,1)];
What am I doing wrong?
To add to the other answers: you can replace hist by the explicit sum:
idxA = unique(A);
countsA = sum(bsxfun(#eq, A(:), idxA(:).'), 1);
idxA is a scalar, which means the number of bins in this context.
setting idxA as a vector instead e.g. [0,3418] will get you a hist with bins centered at 0 and 3418, similarly to what you got with idxB, which was also a vector
I think it has to do with:
N = HIST(Y,M), where M is a scalar, uses M bins.
and I think you are assuming it would do:
N = HIST(Y,X), where X is a vector, returns the distribution of Y
among bins with centers specified by X.
In other words, in the first case matlab is assuming that you are asking for 3418 bins

Conditional Sum in Array

I have 2 arrays, A and B. I want to form a new array C with same dimension as B where each element will show SUM(A) for A > B
Below is my working code
A = [1:1:1000]
B=[1:1:100]
for n = 1:numel(B)
C(n) = sum(A(A>B(n)));
end
However, when A has millions of rows and B has thousands, and I have to do similar calculations for 20 array-couples,it takes insane amount of time.
Is there any faster way?
For example, histcounts is pretty fast, but it counts, rather than summing.
Thanks
Depending on the size of your arrays (and your memory limitations), the following code might be slightly faster:
C = A*bsxfun(#gt,A',B);
Though it's vectorized, however, it seems to be bottlenecked (perhaps) by the allocation of memory. I'm looking to see if I can get a further speedup. Depending on your input vector size, I've seen up to a factor of 2 speedup for large vectors.
Here's a method that is a bit quicker, but I'm sure there is a better way to solve this problem.
a=sort(A); %// If A and B are already sorted then this isn't necessary!
b=sort(B);
c(numel(B))=0; %// Initialise c
s=cumsum(a,2,'reverse'); %// Get the partial sums of a
for n=1:numel(B)
%// Pull out the sum for elements in a larger than b(n)
c(n)=s(find(a>b(n),1,'first'));
end
According to some very rough tests, this seems to run a bit better than twice as fast as the original method.
You had the right ideas with histcounts, as you are basically "accumulating" certain A elements based on binning. This binning operation could be done with histc. Listed in this post is a solution that starts off with similar steps as listed in #David's answer and then uses histc to bin and sum up selective elements from A to get us the desired output and all of it in a vectorized manner. Here's the implementation -
%// Sort A and B and also get sorted B indices
sA = sort(A);
[sB,sortedB_idx] = sort(B);
[~,bin] = histc(sB,sA); %// Bin sorted B onto sorted A
C_out = zeros(1,numel(B)); %// Setup output array
%// Take care of the case when all elements in B are greater than A
if sA(1) > sB(end)
C_out(:) = sum(A);
end
%// Only do further processing if there is at least one element in B > any element in A
if any(bin)
csA = cumsum(sA,'reverse'); %// Reverse cumsum on sorted A
%// Get sum(A(A>B(n))) for every n, but for sorted versions
valid_mask = cummax(bin) - bin ==0;
valid_mask2 = bin(valid_mask)+1 <= numel(A);
valid_mask(1:numel(valid_mask2)) = valid_mask2;
C_out(valid_mask) = csA(bin(valid_mask)+1);
%// Rearrange C_out to get back in original unsorted version
[~,idx] = sort(sortedB_idx);
C_out = C_out(idx);
end
Also, please remember when comparing the result from this method with the one from the original for-loop version that there would be slight variations in output as this vectorized solution uses cumsum which computes a running summation and as such would have large cumulatively summed numbers being added to individual elements that are comparatively very small, whereas the for-loop version
would sum only selective elements. So, floating-precision issues would come up there.

Minimum Complexity of two lists element summation comparison

I have a question in algorithm design about arrays, which should be implement in C language.
Suppose that we have an array which has n elements. For simplicity n is power of '2' like 1, 2, 4, 8, 16 , etc. I want to separate this to 2 parts with (n/2) elements. Condition of separating is lowest absolute difference between sum of all elements in two arrays for example if I have this array (9,2,5,3,6,1,4,7) it will be separate to these arrays (9,5,1,3) and (6,7,4,2) . summation of first array's elements is 18 and the summation of second array's elements is 19 and the difference is 1 and these two arrays are the answer but two arrays like (9,5,4,2) and (7,6,3,1) isn't the answer because the difference of element summation is 4 and we have found 1 . so 4 isn't the minimum difference. How to solve this?
Thank you.
This is the Partition Problem, which is unfortunately NP-Hard.
However, since your numbers are integers, if they are relatively low, there is a pseudo polynomial O(W*n^2) solution using Dynamic Programming (where W is sum of all elements).
The idea is to create the DP matrix of size (W/2+1)*(n+1)*(n/2+1), based on the following recursive formula:
D(0,i,0) = true
D(0,i,k) = false k != 0
D(x,i,k) = false x < 0
D(x,0,k) = false x > 0
D(x,i,0) = false x > 0
D(x,i,k) = D(x,i-1,k) OR D(x-arr[i], i-1,k-1)
The above gives a 3d matrix, where each entry D(x,i,k) says if there is a subset containing exactly k elements, that sums to x, and uses the first i elements as candidates.
Once you have this matrix, you just need to find the highest x (that is smaller than SUM/2) such that D(x,n,n/2) = true
Later, you can get the relevant subset by going back on the table and "retracing" your choices at each step. This thread deals with how it is done on a very similar problem.
For small sets, there is also the alternative of a naive brute force solution, which basically splits the array to all possible halves ((2n)!/(n!*n!) of those), and picks the best one out of them.

Cross products of elements of 3D array and matrix columns without loop in R

I'm working on a fishery stock assessment model and want to speed it up by removing a loop (actually two loops of the same form).
I have an array, A, dim(A)=[L,L,Y], and a matrix, M, dim(M)=[L,Y].
These are used to make a matrix, mat, dim(mat)=[L,Y], by calculating matrix products. My loop looks like:
for(i in 1:Y){
mat[,i]<-(A[,,i]%*%M[,i])[,1]}
Can anyone help me out? I really need a speed gain.
Also, (don't know if it'll make a difference but) each A[,,i] matrix is lower triangular.
I'm pretty sure this will give you the results you want. Since there is no reproducible example, I can't be absolutely sure. Had to trace some of the linear algebra logic to see what you are trying to accomplish.
library(plyr) # We need this to split the array into a list of 9 matrices
B = lapply(alply(A, 3), function(x) (x%*%M)) # Perform 9 linear algebra multiplications
sapply(1:9, function(i) (B[[i]])[,i]) # Extract the 9 columns you actually want.
I used the following test data:
A = array(rnorm(225), dim = c(5,5,9))
M = matrix(rnorm(45), nrow = 5, ncol = 9)

How can I combine these two arrays into a matrix?

In MATLAB, if I define 2 matrices like:
A = [1:10];
B = [1:11];
How do I make matrix C with column 1 equal to A and column 2 equal to B? I cannot find any answers online. Sorry if I used the wrong MATLAB terminology for this scenario.
Well, to accomplish this you first need to make sure that A and B are the same length. In your example, A has 10 elements and B has 11, so that won't work.
However, assuming A and B have the same number of elements, this will do the trick:
C = [A(:) B(:)];
This first reshapes A and B into column vectors using single-colon indexing, then concatenates them horizontally.
if A,B same length, then can just type
C=[A' B']

Resources