Find unique elements of multiple arrays - arrays

Let say I have 3 MATs
X = [ 1 3 9 10 ];
Y = [ 1 9 11 20];
Z = [ 1 3 9 11 ];
Now I would like to find the values that appear only once, and to what array they belong to

I generalized EBH's answer to cover flexible number of arrays, arrays with different sizes and multidimensional arrays. This method also can only deal with integer-valued arrays:
function [uniq, id] = uniQ(varargin)
combo = [];
idx = [];
for ii = 1:nargin
combo = [combo; varargin{ii}(:)]; % merge the arrays
idx = [idx; ii*ones(numel(varargin{ii}), 1)];
end
counts = histcounts(combo, min(combo):max(combo)+1);
ids = find(counts == 1); % finding index of unique elements in combo
uniq = min(combo) - 1 + ids(:); % constructing array of unique elements in 'counts'
id = zeros(size(uniq));
for ii = 1:numel(uniq)
ids = find(combo == uniq(ii), 1); % finding index of unique elements in 'combo'
id(ii) = idx(ids); % assigning the corresponding index
end
And this is how it works:
[uniq, id] = uniQ([9, 4], 15, randi(12,3,3), magic(3))
uniq =
1
7
11
12
15
id =
4
4
3
3
2

If you are only dealing with integers and your vectors are equally sized (all with the same number of elements), you can use histcounts for a quick search for unique elements:
X = [1 -3 9 10];
Y = [1 9 11 20];
Z = [1 3 9 11];
XYZ = [X(:) Y(:) Z(:)]; % one matrix with all vectors as columns
counts = histcounts(XYZ,min(XYZ(:)):max(XYZ(:))+1);
R = min(XYZ(:)):max(XYZ(:)); % range of the data
unkelem = R(counts==1);
and then locate them using a loop with find:
pos = zeros(size(unkelem));
counter = 1;
for k = unkelem
[~,pos(counter)] = find(XYZ==k);
counter = counter+1;
end
result = [unkelem;pos]
and you get:
result =
-3 3 10 20
1 3 1 2
so -3 3 10 20 are unique, and they appear at the 1 3 1 2 vectors, respectively.

Related

Finding number(s) that is(are) repeated consecutively most often

Given this array for example:
a = [1 2 2 2 1 3 2 1 4 4 4 5 1]
I want to find a way to check which numbers are repeated consecutively most often. In this example, the output should be [2 4] since both 2 and 4 are repeated three times consecutively.
Another example:
a = [1 1 2 3 1 1 5]
This should return [1 1] because there are separate instances of 1 being repeated twice.
This is my simple code. I know there is a better way to do this:
function val=longrun(a)
b = a(:)';
b = [b, max(b)+1];
val = [];
sum = 1;
max_occ = 0;
for i = 1:max(size(b))
q = b(i);
for j = i:size(b,2)
if (q == b(j))
sum = sum + 1;
else
if (sum > max_occ)
max_occ = sum;
val = [];
val = [val, q];
elseif (max_occ == sum)
val = [val, q];
end
sum = 1;
break;
end
end
end
if (size(a,2) == 1)
val = val'
end
end
Here's a vectorized way:
a = [1 2 2 2 1 3 2 1 4 4 4 5 1]; % input data
t = cumsum([true logical(diff(a))]); % assign a label to each run of equal values
[~, n, z] = mode(t); % maximum run length and corresponding labels
result = a(ismember(t,z{1})); % build result with repeated values
result = result(1:n:end); % remove repetitions
One solution could be:
%Dummy data
a = [1 2 2 2 1 3 2 1 4 4 4 5 5]
%Preallocation
x = ones(1,numel(a));
%Loop
for ii = 2:numel(a)
if a(ii-1) == a(ii)
x(ii) = x(ii-1)+1;
end
end
%Get the result
a(find(x==max(x)))
With a simple for loop.
The goal here is to increase the value of x if the previous value in the vector a is identical.
Or you could also vectorized the process:
x = a(find(a-circshift(a,1,2)==0)); %compare a with a + a shift of 1 and get only the repeated element.
u = unique(x); %get the unique value of x
h = histc(x,u);
res = u(h==max(h)) %get the result

Generate square matrix for vector with diagonals in MatLab

I have a vector, where each value corresponds to a diagonal. I want to create a matrix from this vector. I have a code:
x = [1:5];
N = numel(x);
diagM = diag(repmat(x(1),N,1),0);
for iD = 2:N
d = repmat(x(iD),N-iD+1,1);
d_pos = diag(d,iD-1);
d_neg = diag(d,-iD+1);
d_join = d_pos+d_neg;
diagM = diagM+d_join;
end
It gives me what i want:
diagM =
1 2 3 4 5
2 1 2 3 4
3 2 1 2 3
4 3 2 1 2
5 4 3 2 1
But it becames really slow, for example for x=[1:10^4].
Could You help me with another FASTER way to generate such a sequence?
Use toeplitz:
x = 1:5;
diagM = toeplitz(x);
Or do it manually, vectorized:
x = 1:5;
t = 1:numel(x);
diagM = x(abs(t-t.')+1); % x(abs(bsxfun(#minus, t, t.'))+1) in old Matlab versions

Shuffle array while spacing repeating elements

I'm trying to write a function that shuffles an array, which contains repeating elements, but ensures that repeating elements are not too close to one another.
This code works but seems inefficient to me:
function shuffledArr = distShuffle(myArr, myDist)
% this function takes an array myArr and shuffles it, while ensuring that repeating
% elements are at least myDist elements away from on another
% flag to indicate whether there are repetitions within myDist
reps = 1;
while reps
% set to 0 to break while-loop, will be set to 1 if it doesn't meet condition
reps = 0;
% randomly shuffle array
shuffledArr = Shuffle(myArr);
% loop through each unique value, find its position, and calculate the distance to the next occurence
for x = 1:length(unique(myArr))
% check if there are any repetitions that are separated by myDist or less
if any(diff(find(shuffledArr == x)) <= myDist)
reps = 1;
break;
end
end
end
This seems suboptimal to me for three reasons:
1) It may not be necessary to repeatedly shuffle until a solution has been found.
2) This while loop will go on forever if there is no possible solution (i.e. setting myDist to be too high to find a configuration that fits). Any ideas on how to catch this in advance?
3) There must be an easier way to determine the distance between repeating elements in an array than what I did by looping through each unique value.
I would be grateful for answers to points 2 and 3, even if point 1 is correct and it is possible to do this in a single shuffle.
I think it is sufficient to check the following condition to prevent infinite loops:
[~,num, C] = mode(myArr);
N = numel(C);
assert( (myDist<=N) || (myDist-N+1) * (num-1) +N*num <= numel(myArr),...
'Shuffling impossible!');
Assume that myDist is 2 and we have the following data:
[4 6 5 1 6 7 4 6]
We can find the the mode , 6, with its occurence, 3. We arrange 6s separating them by 2 = myDist blanks:
6 _ _ 6 _ _6
There must be (3-1) * myDist = 4 numbers to fill the blanks. Now we have five more numbers so the array can be shuffled.
The problem becomes more complicated if we have multiple modes. For example for this array [4 6 5 1 6 7 4 6 4] we have N=2 modes: 6 and 4. They can be arranged as:
6 4 _ 6 4 _ 6 4
We have 2 blanks and three more numbers [ 5 1 7] that can be used to fill the blanks. If for example we had only one number [ 5] it was impossible to fill the blanks and we couldn't shuffle the array.
For the third point you can use sparse matrix to accelerate the computation (My initial testing in Octave shows that it is more efficient):
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
shuffledBin = sparse ( 1:numel(idx), S, true, numel(idx) + myDist, numel(U) );
reps = any (diff(find(shuffledBin)) <= myDist);
end
shuffledArr = U(S);
end
Alternatively you can use sub2ind and sort instead of sparse matrix:
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
f = sub2ind ( [numel(idx) + myDist, numel(U)] , 1:numel(idx), S );
reps = any (diff(sort(f)) <= myDist);
end
shuffledArr = U(S);
end
If you just want to find one possible solution you could use something like that:
x = [1 1 1 2 2 2 3 3 3 3 3 4 5 5 6 7 8 9];
n = numel(x);
dist = 3; %minimal distance
uni = unique(x); %get the unique value
his = histc(x,uni); %count the occurence of each element
s = [sortrows([uni;his].',2,'descend'), zeros(length(uni),1)];
xr = []; %the vector that will contains the solution
%the for loop that will maximize the distance of each element
for ii = 1:n
s(s(:,3)<0,3) = s(s(:,3)<0,3)+1;
s(1,3) = s(1,3)-dist;
s(1,2) = s(1,2)-1;
xr = [xr s(1,1)];
s = sortrows(s,[3,2],{'descend','descend'})
end
if any(s(:,2)~=0)
fprintf('failed, dist is too big')
end
Result:
xr = [3 1 2 5 3 1 2 4 3 6 7 8 3 9 5 1 2 3]
Explaination:
I create a vector s and at the beggining s is equal to:
s =
3 5 0
1 3 0
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
%col1 = unique element; col2 = occurence of each element, col3 = penalities
At each iteration of our for-loop we choose the element with the maximum occurence since this element will be harder to place in our array.
Then after the first iteration s is equal to:
s =
1 3 0 %1 is the next element that will be placed in our array.
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
3 4 -3 %3 has now 5-1 = 4 occurence and a penalities of -3 so it won't show up the next 3 iterations.
at the end every number of the second column should be equal to 0, if it's not the minimal distance was too big.

Extract pattern and subsequent n elements from array and count number of occurences

I have an array of doubles like this:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30]
i want to find the pattern [1 2 3 4] within the array and then store the 2 values after that pattern with it like:
A = [1 2 3 4 0 3]
B = [1 2 3 4 150 30]
i can find the pattern like this but i don't know how to get and store 2 values after that with the previous one.
And after finding A, B if i want to find the number of occurrences of each arrays within array C how can i do that?
indices = cellfun(#(c) strfind(c,pattern), C, 'UniformOutput', false);
Thanks!
Assuming you're fine with a cell array output, this works fine:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30 42 1 2 3 4 0 3]
p = [1 2 3 4]
n = 2
% full patttern length - 1
dn = numel(p) + n - 1
%// find indices
ind = strfind(C,p)
%// pre check if pattern at end of array
if ind(end)+ dn > numel(C), k = -1; else k = 0; end
%// extracting
temp = arrayfun(#(x) C(x:x+dn), ind(1:end+k) , 'uni', 0)
%// post processing
[out, ~, idx] = unique(vertcat(temp{:}),'rows','stable')
occ = histcounts(idx).'
If the array C ends with at least n elements after the last occurrence of the pattern p, you can use the short form:
out = arrayfun(#(x) C(x:x+n+numel(p)-1), strfind(C,p) , 'uni', 0)
out =
1 2 3 4 0 3
1 2 3 4 150 30
occ =
2
1
A simple solution can be:
C = [1 2 3 4 0 3 2 5 6 7 1 2 3 4 150 30];
pattern = [1 2 3 4];
numberOfAddition = 2;
outputs = zeros(length(A),length(pattern)+ numberOfAddition); % preallocation
numberOfFoundPattern = 1;
lengthOfConsider = length(C) - length(pattern) - numberOfAddition;
for i = 1:lengthOfConsider
if(sum(C(i:i+length(pattern)) - pattern) == 0) % find pattern
outputs(numberOfFoundPattern,:) = C(i:i+length(pattern)+numberOfAddition);
numberOfFoundPattern = numberOfFoundPattern + 1;
end
end
outputs = outputs(1:numberOfFoundPattern - 1,:);

Find consecutive values in 3D array

Say I have an array the size 100x150x30, a geographical grid 100x150 with 30 values for each grid point, and want to find consecutive elements along the third dimension with a congruous length of minimum 3.
I would like to find the maximum length of consecutive elements blocks, as well as the number of occurrences.
I have tried this on a simple vector:
var=[20 21 50 70 90 91 92 93];
a=diff(var);
q = diff([0 a 0] == 1);
v = find(q == -1) - find(q == 1);
v = v+1;
v2 = v(v>3);
v3 = max(v2); % maximum length: 4
z = numel(v2); % number: 1
Now I'd like to apply this to the 3rd dimension of my array.
With A being my 100x150x30 array, I've come this far:
aa = diff(A, 1, 3);
b1 = diff((aa == 1),1,3);
b2 = zeros(100,150,1);
qq = cat(3,b2,b1,b2);
But I'm stuck on the next step, which would be: find(qq == -1) - find(qq == 1);. I can't make it work.
Is there a way to put it in a loop, or do I have to find the consecutive values another way?
Thanks for any help!
A = randi(25,100,150,30); %// generate random array
tmpsize = size(A); %// get its size
B = diff(A,1,3); %// difference
v3 = zeros(tmpsize([1 2])); %//initialise
z = zeros(tmpsize([1 2]));
for ii = 1:100 %// double loop over all entries
for jj = 1:150
q = diff([0 squeeze(B(ii,jj,:)).' 0] == 1);%'//
v = find(q == -1) - find(q == 1);
v=v+1;
v2=v(v>3);
try %// if v2 is empty, set to nan
v3(ii,jj)=max(v2);
catch
v3(ii,jj)=nan;
end
z(ii,jj)=numel(v2);
end
end
The above seems to work. It just doubly loops over both dimensions you want to get the difference over.
The part where I think you were stuck was using squeeze to get the vector to put in your variable q.
The try/catch is there solely to prevent empty consecutive arrays in v2 throwing an error in the assignment to v3, since that would remove its entry. Now it simply sets it to nan, though you can switch that to 0 of course.
Here's one vectorized approach -
%// Parameters
[m,n,r] = size(var);
max_occ_thresh = 2 %// Threshold for consecutive occurrences
% Get indices of start and stop of consecutive number islands
df = diff(var,[],3)==1;
A = reshape(df,[],size(df,3));
dfA = diff([zeros(size(A,1),1) A zeros(size(A,1),1)],[],2).'; %//'
[R1,C1] = find(dfA==1);
[R2,C2] = find(dfA==-1);
%// Get interval lengths
interval_lens = R2 - R1+1;
%// Get max consecutive occurrences across dim-3
max_len = zeros(m,n);
maxIDs = accumarray(C1,interval_lens,[],#max);
max_len(1:numel(maxIDs)) = maxIDs
%// Get number of consecutive occurrences that are a bove max_occ_thresh
num_occ = zeros(m,n);
counts = accumarray(C1,interval_lens>max_occ_thresh);
num_occ(1:numel(counts)) = counts
Sample run -
var(:,:,1) =
2 3 1 4 1
1 4 1 5 2
var(:,:,2) =
2 2 3 1 2
1 3 5 1 4
var(:,:,3) =
5 2 4 1 2
1 5 1 5 1
var(:,:,4) =
3 5 5 1 5
5 1 3 4 3
var(:,:,5) =
5 5 4 4 4
3 4 5 2 2
var(:,:,6) =
3 4 4 5 3
2 5 4 2 2
max_occ_thresh =
2
max_len =
0 0 3 2 2
0 2 0 0 0
num_occ =
0 0 1 0 0
0 0 0 0 0

Resources