Efficient technique to interleave data sets by classes in MATLAB - arrays

The data set is in the following format: Input sample matrix X and output class vector Y such that each row in X is a sample and each of its column corresponds to a feature. Each index in Y corresponds to the respective output class for the corresponding sample in X. X can contain real numbers while Y contains positive integers.
My aim is to order the data set in terms of its class. For example
X = Y =
1 8 3 2
4 2 6 1
7 8 9 2
2 3 4 3
1 4 6 1
should be ordered and interleaved as
X = Y =
4 2 6 1
1 8 3 2
2 3 4 3
1 4 6 1
7 8 9 2
The code I've attempted seems to take a long time to run as it is based on serial execution. It is the following.
X = csvread('X.csv');
Y = csvread('Y.csv');
n = size(unique(Y),1);
m = size(X,1);
for i = 1:n
Dataset(i).X = X(Y==i,:);
Dataset(i).Y = Y(Y==i);
end
[num, ~] = hist(Y,n);
maxfreq = max(num);
NewX = [];
NewY = [];
for j = 1:maxfreq
for i = 1:n
if(j <= size(Dataset(i).X,1))
NewX = [NewX; Dataset(i).X(j,:)];
NewY = [NewY; i];
end
end
end
X = NewX;
Y = NewY;
clear NewX;
clear NewY;
csvwrite('OrderedX.csv', X);
csvwrite('OrderedY.csv', Y);
Is is possible to parallelize the above code?

You're resizing matrices all the time which is expensive. A quick speedup for your algorithm would be to set NewX and NewY to the proper size and just copy data in:
NewX = zeros(size(X));
NewY = zeros(size(Y));
k = 1;
for j = 1:maxfreq
for i = 1:n
if(j <= size(Dataset(i).X,1))
NewX(k,:) = Dataset(i).X(j,:);
NewY(k) = i;
k=k+1;
end
end
end

Approach #1 Using cumsum and diff following the same philosophy as the one listed in this solution -
function [outX,outY] = interleave_cumsum_diff(X,Y)
Y = Y(:);
[R,C] = find(bsxfun(#eq,Y,unique(Y).'));
lens = accumarray(C,1);
out = ones(1,numel(R));
shifts = cumsum(lens(1:end-1));
out(shifts+1) = 1- diff([0 ; shifts]);
[~,idx] = sort(cumsum(out));
sort_idx = R(idx)';
outX = X(sort_idx,:);
outY = Y(sort_idx,:);
Approach #1 Using bsxfun -
function [outX,outY] = interleave_bsxfuns(X,Y)
Y = Y(:);
[R,C] = find(bsxfun(#eq,Y,unique(Y).'));
lens = accumarray(C,1);
mask = bsxfun(#le,[1:max(lens)]',lens.');
V = zeros(size(mask));
V(mask) = R;
Vt = V.';
sort_idx = Vt(mask.');
outX = X(sort_idx,:);
outY = Y(sort_idx,:);
Sample run -
1) Inputs :
>> X
X =
1 8 3
4 2 6
7 8 9
2 3 4
1 4 6
>> Y
Y =
2
1
2
3
1
2) Outputs from the two approaches :
>> [NewX,NewY] = interleave_cumsum_diff(X,Y)
NewX =
4 2 6
1 8 3
2 3 4
1 4 6
7 8 9
NewY =
1
2
3
1
2
>> [NewX,NewY] = interleave_bsxfuns(X,Y)
NewX =
4 2 6
1 8 3
2 3 4
1 4 6
7 8 9
NewY =
1
2
3
1
2

Related

Finding number(s) that is(are) repeated consecutively most often

Given this array for example:
a = [1 2 2 2 1 3 2 1 4 4 4 5 1]
I want to find a way to check which numbers are repeated consecutively most often. In this example, the output should be [2 4] since both 2 and 4 are repeated three times consecutively.
Another example:
a = [1 1 2 3 1 1 5]
This should return [1 1] because there are separate instances of 1 being repeated twice.
This is my simple code. I know there is a better way to do this:
function val=longrun(a)
b = a(:)';
b = [b, max(b)+1];
val = [];
sum = 1;
max_occ = 0;
for i = 1:max(size(b))
q = b(i);
for j = i:size(b,2)
if (q == b(j))
sum = sum + 1;
else
if (sum > max_occ)
max_occ = sum;
val = [];
val = [val, q];
elseif (max_occ == sum)
val = [val, q];
end
sum = 1;
break;
end
end
end
if (size(a,2) == 1)
val = val'
end
end
Here's a vectorized way:
a = [1 2 2 2 1 3 2 1 4 4 4 5 1]; % input data
t = cumsum([true logical(diff(a))]); % assign a label to each run of equal values
[~, n, z] = mode(t); % maximum run length and corresponding labels
result = a(ismember(t,z{1})); % build result with repeated values
result = result(1:n:end); % remove repetitions
One solution could be:
%Dummy data
a = [1 2 2 2 1 3 2 1 4 4 4 5 5]
%Preallocation
x = ones(1,numel(a));
%Loop
for ii = 2:numel(a)
if a(ii-1) == a(ii)
x(ii) = x(ii-1)+1;
end
end
%Get the result
a(find(x==max(x)))
With a simple for loop.
The goal here is to increase the value of x if the previous value in the vector a is identical.
Or you could also vectorized the process:
x = a(find(a-circshift(a,1,2)==0)); %compare a with a + a shift of 1 and get only the repeated element.
u = unique(x); %get the unique value of x
h = histc(x,u);
res = u(h==max(h)) %get the result

Generate square matrix for vector with diagonals in MatLab

I have a vector, where each value corresponds to a diagonal. I want to create a matrix from this vector. I have a code:
x = [1:5];
N = numel(x);
diagM = diag(repmat(x(1),N,1),0);
for iD = 2:N
d = repmat(x(iD),N-iD+1,1);
d_pos = diag(d,iD-1);
d_neg = diag(d,-iD+1);
d_join = d_pos+d_neg;
diagM = diagM+d_join;
end
It gives me what i want:
diagM =
1 2 3 4 5
2 1 2 3 4
3 2 1 2 3
4 3 2 1 2
5 4 3 2 1
But it becames really slow, for example for x=[1:10^4].
Could You help me with another FASTER way to generate such a sequence?
Use toeplitz:
x = 1:5;
diagM = toeplitz(x);
Or do it manually, vectorized:
x = 1:5;
t = 1:numel(x);
diagM = x(abs(t-t.')+1); % x(abs(bsxfun(#minus, t, t.'))+1) in old Matlab versions

Extract pattern and subsequent n elements from array and count number of occurences

I have an array of doubles like this:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30]
i want to find the pattern [1 2 3 4] within the array and then store the 2 values after that pattern with it like:
A = [1 2 3 4 0 3]
B = [1 2 3 4 150 30]
i can find the pattern like this but i don't know how to get and store 2 values after that with the previous one.
And after finding A, B if i want to find the number of occurrences of each arrays within array C how can i do that?
indices = cellfun(#(c) strfind(c,pattern), C, 'UniformOutput', false);
Thanks!
Assuming you're fine with a cell array output, this works fine:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30 42 1 2 3 4 0 3]
p = [1 2 3 4]
n = 2
% full patttern length - 1
dn = numel(p) + n - 1
%// find indices
ind = strfind(C,p)
%// pre check if pattern at end of array
if ind(end)+ dn > numel(C), k = -1; else k = 0; end
%// extracting
temp = arrayfun(#(x) C(x:x+dn), ind(1:end+k) , 'uni', 0)
%// post processing
[out, ~, idx] = unique(vertcat(temp{:}),'rows','stable')
occ = histcounts(idx).'
If the array C ends with at least n elements after the last occurrence of the pattern p, you can use the short form:
out = arrayfun(#(x) C(x:x+n+numel(p)-1), strfind(C,p) , 'uni', 0)
out =
1 2 3 4 0 3
1 2 3 4 150 30
occ =
2
1
A simple solution can be:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30];
pattern = [1 2 3 4];
numberOfAddition = 2;
outputs = zeros(length(A),length(pattern)+ numberOfAddition); % preallocation
numberOfFoundPattern = 1;
lengthOfConsider = length(C) - length(pattern) - numberOfAddition;
for i = 1:lengthOfConsider
if(sum(C(i:i+length(pattern)) - pattern) == 0) % find pattern
outputs(numberOfFoundPattern,:) = C(i:i+length(pattern)+numberOfAddition);
numberOfFoundPattern = numberOfFoundPattern + 1;
end
end
outputs = outputs(1:numberOfFoundPattern - 1,:);

How to pre-allocate arrays in multi-loop iteration

I need to pre-allocate arrays in my code below.
I don't quite understand how to pre-allocate arrays in multi-loop iteration.
a=0:1:2;
b=0:1:2;
c=0:1:2;
xx1=[];yy1=[];zz1=[];xx2=[];yy2=[];zz2=[];
for k=1:length(c)-1;
z1=c(k); z2=c(k+1);
for w=1:length(b)-1;
y1=b(w); y2=b(w+1);
for q=1:length(a)-1;
x1=a(q); x2=a(q+1);
xx1=[xx1;x1]; xx2=[xx2;x2];
yy1=[yy1;y1]; yy2=[yy2;y2];
zz1=[zz1;z1]; zz2=[zz2;z2];
end
end
end
The expected results are:
[xx1 xx2 yy1 yy2 zz1 zz2]
ans =
0 1 0 1 0 1
1 2 0 1 0 1
0 1 1 2 0 1
1 2 1 2 0 1
0 1 0 1 1 2
1 2 0 1 1 2
0 1 1 2 1 2
1 2 1 2 1 2
Increase a counter in the innermost loop to keep track of which entry of xx1 etc you should fill.
a = 0:1:2;
b = 0:1:2;
c = 0:1:2;
xx1 = NaN((length(a)-1)*(length(b)-1)*(length(c)-1),1); %// preallocate
xx2 = NaN((length(a)-1)*(length(b)-1)*(length(c)-1),1);
yy1 = NaN((length(a)-1)*(length(b)-1)*(length(c)-1),1);
yy2 = NaN((length(a)-1)*(length(b)-1)*(length(c)-1),1);
zz1 = NaN((length(a)-1)*(length(b)-1)*(length(c)-1),1);
zz2 = NaN((length(a)-1)*(length(b)-1)*(length(c)-1),1);
n = 0; %// initiallize counter
for k=1:length(c)-1;
z1=c(k); z2=c(k+1);
for w=1:length(b)-1;
y1=b(w); y2=b(w+1);
for q=1:length(a)-1;
n = n + 1; %// increase counter;
x1 = a(q);
x2 = a(q+1);
xx1(n) = x1; %// fill values
xx2(n) = x2;
yy1(n) = y1;
yy2(n) = y2;
zz1(n) = z1;
zz2(n) = z2;
end
end
end
Anyway, it can be done without loops, adapting the procedure given in this answer. This has two advantages:
It may be faster if a, b, c are large.
The same code works for any number of vectors, not just 3. Simply define vectors1 and vectors2 accordingly in the code below.
Code without loops:
a = 0:1:2;
b = 0:1:2;
c = 0:1:2;
vectors1 = { a(1:end-1), b(1:end-1), c(1:end-1) };
vectors2 = { a(2:end), b(2:end), c(2:end) };
n = numel(vectors1);
combs1 = cell(1,n);
[combs1{:}] = ndgrid(vectors1{end:-1:1});
combs1 = reshape(cat(n+1, combs1{:}),[],n);
combs2 = cell(1,n);
[combs2{:}] = ndgrid(vectors2{end:-1:1});
combs2 = reshape(cat(n+1, combs2{:}),[],n);
result(:,2:2:2*n) = combs2;
result(:,1:2:2*n) = combs1;

how to replicate an array

I want to make a function like this
>> matdup([1 2],3,4) %or any other input that user wish to enter
ans=
1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2
I am stuck in my code. My logic:
m = matdup(input,row,col)
for i = 1:row
for j = 1:col
m(i, j)= input;
This is producing this:
>> matdup(1,2,2)
ans=
1 1
1 1
But failed at this:
>> matdup([1 2],3,4)
error at console:
Subscripted assignment dimension mismatch.
Error in ==> matdup at 6
m(i, j)= input
Any idea?
Method 1: Are you allowed to use ones? Try this -
A = [1 2]
rowIdx = [1 : size(A,1)]';
colIdx = [1 : size(A,2)]';
out = A(rowIdx(:, ones(3,1)), colIdx(:, ones(4,1)))
Output
out =
1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2
1 2 1 2 1 2 1 2
Method 2: Are you allowed to use bsxfun and permute? Try this for the same result -
A = [1 2]
row_mapped = bsxfun(#plus,A,zeros(3,1))
out = reshape(bsxfun(#plus,row_mapped,permute(zeros(4,1),[3 2 1])),[3 8])
Matlab has a funcion called repmat that does the same.
If you want to create a similar function, you could do something like this:
function B = matdup(A, M, N)
[nr, nc] = size(A);
B = zeros([nr nc] .* [M N]);
for r = 1:M
for c = 1:N
rr = (r - 1) * nr + 1;
cc = (c - 1) * nc + 1;
B(rr:rr + nr - 1, cc:cc + nc - 1) = A;
end
end
end
Note this function is restricted to 2D matrices.
Try kron:
matdup = #(x,m,n) kron(ones(m,n),x)
Demonstration:
>> A = [5 6 7];
>> out = matdup(A,3,2)
out =
5 6 7 5 6 7
5 6 7 5 6 7
5 6 7 5 6 7
Note that you can switch the inputs to kron to effectively replicate elements rather than the whole matrix:
repel = #(x,m,n) kron(x,ones(m,n));
Demonstration:
>> A = [5 6 7];
>> out = repel(A,3,2)
out =
5 5 6 6 7 7
5 5 6 6 7 7
5 5 6 6 7 7
The replication can be done easily using mod:
function R = matdup(A, M, N)
[m n]= size(A);
R = A(mod(0:m*M-1,m)+1, mod(0:n*N-1,n)+1)

Resources