arrange rows according to common values present between the rows in an array in matlab - arrays

i have an array of 800 rows with 3 columns. I want to arrange it in such a way that the row coming after the first row contains at least 2 common values as the first and the next row after that one also contains at least 2 common values as the second.
Example; row 1 = 2 4 5 common row= row 30 = 2 5 13
so first arrangement;
2 4 5
2 5 13
the next row will be one common to the new row 2
example row 4= 13 45 5
therefore the arrangement will now be;
2 4 5
2 5 13
13 45 5
etc
currently, i have this code that groups those common together in cells and then displays them one after another. the problem with this is that the array contains more than one commonality.. for example, row 1 could have 2 rows that have a common value as it, this code brings all those rows together into one array and does the same for the second row.. how can i make the code i make the code do what i explained in the first paragraph;
here is the code;
% Data
A = connections;
% Engine
[m, n] = size(A);
groups =[];
ng = 0;
for k=1:m-1
u = unique(A(k,:)); % representation of kth row
[in, J] = ismember(A(k:end,:),u);
l = m-k+1;
r = repmat((1:l).', n, 1);
c = accumarray([r(in) J(in)],1,[l n]); % count
c = bsxfun(#min,c,c(1,:)); % clip
rows = sum(c,2)>=2; % check if at least 2 elements are common
rows(1) = false;
if any(rows)
ng = ng+1;
rows(1) = true;
groups = (k-1) + find(rows);
end
end
please note that the rest of the code below could be added only if groups was a cell array, bt it has been changed to be a normal array as written above.. so to test the code, there is no need to add the code below.
% Remove the tail
groups(ng+1:end) = [];
% Display
for k=1:1:length(groups)
gk = groups{k};
for r = gk'
fprintf('%d %d %d %d\n', r, A(r,:));
end
end

This is the Hamiltonian path problem. A Hamiltonian path or traceable path is a path that visits each vertex exactly once. Here you have 800 vertices (rows) each have one or more neighbors. The condition for neighborhood is that a vertex having 2 or more values common with other vertex.
to solve the problem You should create an adjacency matrix corresponding to the graph structure then finding an Hamiltonian path on the graph.
Here is a way to create the adjacency matrix:
[ m n] = size(A);
%generate all 2 element subsets of the set [1 2 3 ] to check if a row
% has two elemnts common with other row
x = nchoosek(1:n,2);
%generate indices for generating two element rows
col = repmat(x,m,1);
M = (1:m).';
row = kron(M,ones(size(x)));
IDX = sub2ind([m n] ,row, col);
%convert A with [m , 3] to a [(m * 3), 2)]matrix
% then sort each row to be comparable with other rows
B=sort(A(IDX),2);
%indices of rows that have two common elements is found
[~,I,II] = unique(B,'rows', 'first');
convert index of the [(m * 3), 2)]matrix to [m , 3] matrix
I = uint32(I-1) / n + 1;
%create adjacency matrix
adjacency= sparse(I(II(:)), M(II(:)),1,m,m);
%make the matrix symmetric
adjacency = adjacency | adjacency.';
% a vertex can not be neighbor of itself
adjacency(speye(m))=false;
You should implement your own version of the Hamiltonian path algorithm or find in the web.
However the Hamiltonian path problem can be converted to a special form of the travelling salesman problem by adding a vertex to the graph and making all other vertices neighbor to it.
So some changes should be applied to the adjacency matrix:
adjacency = [adjacency; ones(1,m)];
adjacency = [adjacency, ones(n+1,1)];
adjacency (end) = 0;
In Mathworks you can find an example that shows how to solve the travelling salesman problem using binary integer programming. However the method provided doesn't seems straightforward. So you may found other one or implement your own.

Related

Efficient way to generate histogram from very large dataset in MATLAB?

I have two 2D arrays of size up to 35,000*35,000 each: indices and dotPs. From this, I want to create two 1D arrays such that pop contains the number of times each number appears in indices and nn contains the sum of elements in dotPs that correspond to those numbers. I have come up with the following (really dumb) way:
dotPs = [81.4285 9.2648 46.3184 5.7974 4.5016 2.6779 16.0092 41.1426;
9.2648 24.3525 11.4308 14.6598 17.9558 23.4246 19.4837 14.1173;
46.3184 11.4308 92.9264 9.2036 2.9957 0.1164 26.5770 26.0243;
5.7974 14.6598 9.2036 34.9984 16.2352 19.4568 31.8712 5.0732;
4.5016 17.9558 2.9957 16.2352 19.6595 16.0678 3.5750 16.7702;
2.6779 23.4246 0.1164 19.4568 16.0678 25.1084 6.6237 15.6188;
16.0092 19.4837 26.5770 31.8712 3.5750 6.6237 61.6045 16.6102;
41.1426 14.1173 26.0243 5.0732 16.7702 15.6188 16.6102 47.3289];
indices = [3 2 1 1 2 1 2 1;
2 2 1 2 2 1 2 2;
1 1 3 3 2 2 2 2;
1 2 3 4 3 3 4 2;
2 2 2 3 3 1 3 2;
1 1 2 3 1 8 2 2;
2 2 2 4 3 2 4 2;
1 2 2 2 2 2 2 2];
nn = zeros(1,8);
pop = zeros(1,8);
uniqueInd = unique(indices);
for k=1:numel(uniqueInd)
j = uniqueInd(k);
[I,J]=find(indices==j);
if j == 0 || numel(I) == 0
continue
end
pop(j) = pop(j) + numel(I);
nn(j) = nn(j) + sum(sum(dotPs(I,J)));
end
Because of the find function, this is very slow. How can I do this more smartly so that it runs in a few seconds rather than several minutes?
Edit: added small dummy matrices for testing the code.
Both tasks can be done with the accumarray function:
pop = accumarray(indices(:), 1, [max(indices(:)) 1]).';
nn = accumarray(indices(:), dotPs(:), [max(indices(:)) 1]).';
This assumes that indices only contains positive integers.
EDIT:
From comments, only the lower part of the indices matrix without the diagonal should be used, and it is guaranteed to contain positive integers. In that case:
mask = tril(true(size(indices)), -1);
indices_masked = indices(mask);
dotPs_masked = dotPs(mask);
pop = accumarray(indices_masked, 1, [max(indices_masked) 1]).';
nn = accumarray(indices_masked, dotPs_masked, [max(indices_masked) 1]).';
First of all, note that the dimension of indices does not matter (e.g. if both indices and dotPs were 1D arrays or 3D arrays the result will be the same).
pop can be calculated by histcount function, but since you also need to calculate the sum of the corresponding elements of dotPs array the problem becomes harder.
Here is a possible solution with a for loop. The advantage of this solution is that I am not calling find function in a loop, so it should be faster:
%Example input
indices=randi(5,3,3);
dotPs=rand(3,3);
%Solution
[C,ia,ic]=unique(indices);
nn=zeros(size(C));
pop=zeros(size(C));
for i=1:numel(indices)
nn(ic(i))=nn(ic(i))+1;
pop(ic(i))=pop(ic(i))+dotPs(i);
end
This solution uses a vector ic to categorize each of the input values. After that, I go through each element and update nn(ic) and pop(ic).
For computing pop, you can use hist, for computing nn, I couldn't find a smart solution (but I found a solution without using find):
pop = hist(indices(:), max(indices(:)));
nn = zeros(1,8);
uniqueInd = unique(indices);
for k=1:numel(uniqueInd)
j = uniqueInd(k);
nn(j) = sum(dotPs(indices == j));
end
There must be a better solution for computing nn.
I found a smarter solution applying sorting.
I am not sure it's faster, because sorting 35,000*35,000 elements might take a long time.
Sort indices just for getting the index for sorting dotPs by indices.
Sort dotPs according to index returned by previous sort.
cumsumPop = Compute cumulative sum of pop (cumulative sum of the histogram of indices).
cumsumPs = Compute cumulative sum of sorted dotPs.
Now values of cumsumPop can be used as indices in cumsumPs.
Because cumsumPs is cumulative sum, we need to use diff
for getting the solution.
Here is the "smart" solution:
pop = hist(indices(:), max(indices(:)));
[sortedIndices, I] = sort(indices(:));
sortedDotPs = dotPs(I);
cumsumPop = cumsum(pop);
cumsumPs = cumsum(sortedDotPs);
nn = diff([0; cumsumPs(cumsumPop)]);
nn = nn';

Compute the product of the next n elements in array

I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input.
For example for this input I should compute the product of every 3 consecutive elements, starting from the first.
[p, ind] = max_product([1 2 2 1 3 1],3);
This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3].
Is there any practical way to do it? Now I do this using:
for ii = 1:(length(v)-2)
p = prod(v(ii:ii+n-1));
end
where v is the input vector and n is the number of elements to be multiplied.
in this example n=3 but can take any positive integer value.
Depending whether n is odd or even or length(v) is odd or even, I get sometimes right answers but sometimes an error.
For example for arguments:
v = [1.35912281237829 -0.958120385352704 -0.553335935098461 1.44601450110386 1.43760259196739 0.0266423803393867 0.417039432979809 1.14033971399183 -0.418125096873537 -1.99362640306847 -0.589833539347417 -0.218969651537063 1.49863539349242 0.338844452879616 1.34169199365703 0.181185490389383 0.102817336496793 0.104835620599133 -2.70026800170358 1.46129128974515 0.64413523430416 0.921962619821458 0.568712984110933]
n = 7
I get the error:
Index exceeds matrix dimensions.
Error in max_product (line 6)
p = prod(v(ii:ii+n-1));
Is there any correct general way to do it?
Based on the solution in Fast numpy rolling_product, I'd like to suggest a MATLAB version of it, which leverages the movsum function introduced in R2016a.
The mathematical reasoning is that a product of numbers is equal to the exponent of the sum of their logarithms:
A possible MATLAB implementation of the above may look like this:
function P = movprod(vec,window_sz)
P = exp(movsum(log(vec),[0 window_sz-1],'Endpoints','discard'));
if isreal(vec) % Ensures correct outputs when the input contains negative and/or
P = real(P); % complex entries.
end
end
Several notes:
I haven't benchmarked this solution, and do not know how it compares in terms of performance to the other suggestions.
It should work correctly with vectors containing zero and/or negative and/or complex elements.
It can be easily expanded to accept a dimension to operate along (for array inputs), and any other customization afforded by movsum.
The 1st input is assumed to be either a double or a complex double row vector.
Outputs may require rounding.
Update
Inspired by the nicely thought answer of Dev-iL comes this handy solution, which does not require Matlab R2016a or above:
out = real( exp(conv(log(a),ones(1,n),'valid')) )
The basic idea is to transform the multiplication to a sum and a moving average can be used, which in turn can be realised by convolution.
Old answers
This is one way using gallery to get a circulant matrix and indexing the relevant part of the resulting matrix before multiplying the elements:
a = [1 2 2 1 3 1]
n = 3
%// circulant matrix
tmp = gallery('circul', a(:))
%// product of relevant parts of matrix
out = prod(tmp(end-n+1:-1:1, end-n+1:end), 2)
out =
4
4
6
3
More memory efficient alternative in case there are no zeros in the input:
a = [10 9 8 7 6 5 4 3 2 1]
n = 2
%// cumulative product
x = [1 cumprod(a)]
%// shifted by n and divided by itself
y = circshift( x,[0 -n] )./x
%// remove last elements
out = y(1:end-n)
out =
90 72 56 42 30 20 12 6 2
Your approach is correct. You should just change the for loop to for ii = 1:(length(v)-n+1) and then it will work fine.
If you are not going to deal with large inputs, another approach is using gallery as explained in #thewaywewalk's answer.
I think the problem may be based on your indexing. The line that states for ii = 1:(length(v)-2) does not provide the correct range of ii.
Try this:
function out = max_product(in,size)
size = size-1; % this is because we add size to i later
out = zeros(length(in),1) % assuming that this is a column vector
for i = 1:length(in)-size
out(i) = prod(in(i:i+size));
end
Your code works when restated like so:
for ii = 1:(length(v)-(n-1))
p = prod(v(ii:ii+(n-1)));
end
That should take care of the indexing problem.
using bsxfun you create a matrix each row of it contains consecutive 3 elements then take prod of 2nd dimension of the matrix. I think this is most efficient way:
max_product = #(v, n) prod(v(bsxfun(#plus, (1 : n), (0 : numel(v)-n)')), 2);
p = max_product([1 2 2 1 3 1],3)
Update:
some other solutions updated, and some such as #Dev-iL 's answer outperform others, I can suggest fftconv that in Octave outperforms conv
If you can upgrade to R2017a, you can use the new movprod function to compute a windowed product.

Matlab: Help in implementing quantized time series

I am having trouble implementing this code due to the variable s_k being logical 0/1. In what way can I implement this statement?
s_k is a random sequence of 0/1 generated using a rand() and quantizing the output of rand() by its mean given below. After this, I don't know how to implement. Please help.
N =1000;
input = randn(N);
s = (input>=0.5); %converting into logical 0/1;
UPDATE
N = 3;
tmax = 5;
y(1) = 0.1;
for i =1 : tmax+N-1 %// Change here
y(i+1) = 4*y(i)*(1-y(i)); %nonlinear model for generating the input to Autoregressive model
end
s = (y>=0.5);
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
x = sum(s(ind+1).*(2.^(-ind+N+1))); % The output of this conversion should be real numbers
% Autoregressive model of order 1
z(1) =0;
for j =2 : N
z(j) = 0.195 *z(j-1) + x(j);
end
You've generated the random logical sequence, which is great. You also need to know N, which is the total number of points to collect at one time, as well as a list of time values t. Because this is a discrete summation, I'm going to assume the values of t are discrete. What you need to do first is generate a sliding window matrix. Each column of this matrix represents a set of time values for each value of t for the output. This can easily be achieved with bsxfun. Assuming a maximum time of tmax, a starting time of 0 and a neighbourhood size N (like in your equation), we can do:
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
For example, assuming tmax = 5 and N = 3, we get:
ind =
0 1 2 3 4 5
1 2 3 4 5 6
2 3 4 5 6 7
Each column represents a time that we want to calculate the output at and every row in a column shows a list of time values we want to calculate for the desired output.
Finally, to calculate the output x, you simply take your s_k vector, make it a column vector, use ind to access into it, do a point-by-point multiplication with 2^(-k+N+1) by substituting k with what we got from ind, and sum along the rows. So:
s = rand(max(ind(:))+1, 1) >= 0.5;
x = sum(s(ind+1).*(2.^(-ind+N+1)));
The first statement generates a random vector that is as long as the maximum time value that we have. Once we have this, we use ind to index into this random vector so that we can generate a sliding window of logical values. We need to offset this by 1 as MATLAB starts indexing at 1.

Select same elements matrix in R

I would like select 5 elements, no 25, as fast as possible. It takes a long time run on large vectors:
a = c(1,2,5,2,3)
b = c(2,4,1,4,5)
d = matrix(1:25,nrow=5,ncol=5)
result = array(NA,dim=length(a))
for (i in 1:length(a)) { result[i] = d[a[i],b[i]] }
OR (more slow)
result<-sapply(1:length(a), function(x) d[a[x],b[x]] )
Just use matrix indexing:
d[cbind(a, b)]
# [1] 6 17 5 17 23
For more details, see ?Extract, where you will find the following lines:
A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector.
There are also a few examples in the "Examples" section at the same help page.

Is there a better/faster way of randomly shuffling a matrix in MATLAB?

In MATLAB, I am using the shake.m function (http://www.mathworks.com/matlabcentral/fileexchange/10067-shake) to randomly shuffle each column. For example:
a = [1 2 3; 4 5 6; 7 8 9]
a =
1 2 3
4 5 6
7 8 9
b = shake(a)
b =
7 8 6
1 5 9
4 2 3
This function does exactly what I want, however my columns are very long (>10,000,000) and so this takes a long time to run. Does anyone know of a faster way of achieving this? I have tried shaking each column vector separately but this isn't faster. Thanks!
You can use randperm like this, but I don't know if it will be any faster than shake:
[m,n]=size(a)
for c = 1:n
a(randperm(m),c) = a(:,c);
end
Or you can try switch the randperm around to see which is faster (should produce the same result):
[m,n]=size(a)
for c = 1:n
a(:,c) = a(randperm(m),c);
end
Otherwise how many rows do you have? If you have far fewer rows than columns, it's possible that we can assume each permutation will be repeated, so what about something like this:
[m,n]=size(a)
cols = randperm(n);
k = 5; %//This is a parameter you'll need to tweak...
set_size = floor(n/k);
for set = 1:set_size:n
set_cols = cols(set:(set+set_size-1))
a(:,set_cols) = a(randperm(m), set_cols);
end
which would massively reduce the number of calls to randperm. Breaking it up into k equal sized sets might not be optimal though, you might want to add some randomness to that as well. The basic idea here though is that there will only be factorial(m) different orderings, and if m is much smaller than n (e.g. m=5, n=100000 like your data), then these orderings will be repeated naturally. So instead of letting that occur by itself, rather manage the process and reduce the calls to randperm which would be producing the same result anyway.
Here's a simple vectorized approach. Note that it creates an auxiliary matrix (ind) the same size as a, so depending on your memory it may be usable or not.
[~, ind] = sort(rand(size(a))); %// create a random sorting for each column
b = a(bsxfun(#plus, ind, 0:size(a,1):numel(a)-1)); %// convert to linear index
Obtain shuffled indices using randperm
idx = randperm(size(a,1));
Use the indices to shuffle the vector:
m = size(a,1);
for i=1:m
b(:,i) = a(randperm(m,:);
end
Look at this answer: Matlab: How to random shuffle columns of matrix
Here's a no-loop approach as it processes all indices at once and I believe this is as random as one could get given the requirements of shuffling among each column only.
Code
%// Get sizes
[m,n] = size(a);
%// Create an array of randomly placed sequential indices from 1 to numel(a)
rand_idx = randperm(m*n);
%// segregate those indices into rows and cols for the size of input data, a
col = ceil(rand_idx/m);
row = rem(rand_idx,m);
row(row==0)=m;
%// Sort both these row and col indices based on col, such that we have col
%// as 1,1,1,1 ...2,2,2,....3,3,3,3 and so on, which would represent per col
%// indices for the input data. Use these indices to linearly index into a
[scol,ind1] = sort(col);
a(1:m*n) = a((scol-1)*m + row(ind1))
Final output is obtained in a itself.

Resources