Generate an array with specific duplicate elements in MATLAB - arrays

I have one array, for example B = [2,5,7], and also have a number C = 10, where C is always larger than or equal to the largest number in B.
and I want to generate an array A according to B and C. In this specific example, I have
A = [1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7, 7, 8, 9, 10]
that is, I generate an array [1:C], but with each element in B are duplicated 3 times. Is there any good way that does not use for loop to generate array A?
Thank you!

You can use repelem (introduced in Matlab R2015a):
B = [2 5 7]
C = 10;
n = 3;
r = ones(1,C);
r(B) = n;
A = repelem(1:C, r)

How about...
B = [2,5,7];
C = 10;
A = sort([1:C,B,B])

I think the answer of #RPM could be faster. But because you specifically asked for a solution without sort:
B = [2,5,7];
C = 10;
D = setdiff(1:C,B)-1;
A = reshape(repmat(1:C,3,1),1,3*C);
A([3*D+1,3*D+2]) = [];
which will also return the correct result. I'm not to sure about the order setdiff() has. It might be worse than sort() in all cases. Especially with
A = sort([1:C,B,B]) as the input is already almost in order.

Following the same philosophy as this solution to Repeat copies of array elements: Run-length decoding in MATLAB, you can do something similar here, like this -
%// Get increment array (increments used after each index being repeated in B)
inc = zeros(1,C);
inc(B+1) = N-1
%// Calculate ID array (places in output array where shifts occur)
id = zeros(1,C+(N-1)*numel(B));
id(cumsum(inc) + (1:C)) = 1
%// Finally get cumulative summation for final output
A = cumsum(id)
Sample run -
B =
2 5 7
C =
10
N =
3
inc =
0 0 2 0 0 2 0 2 0 0
id =
1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 1
A =
1 2 2 2 3 4 5 5 5 6 7 7 7 8 9 10

Related

Shuffle array while spacing repeating elements

I'm trying to write a function that shuffles an array, which contains repeating elements, but ensures that repeating elements are not too close to one another.
This code works but seems inefficient to me:
function shuffledArr = distShuffle(myArr, myDist)
% this function takes an array myArr and shuffles it, while ensuring that repeating
% elements are at least myDist elements away from on another
% flag to indicate whether there are repetitions within myDist
reps = 1;
while reps
% set to 0 to break while-loop, will be set to 1 if it doesn't meet condition
reps = 0;
% randomly shuffle array
shuffledArr = Shuffle(myArr);
% loop through each unique value, find its position, and calculate the distance to the next occurence
for x = 1:length(unique(myArr))
% check if there are any repetitions that are separated by myDist or less
if any(diff(find(shuffledArr == x)) <= myDist)
reps = 1;
break;
end
end
end
This seems suboptimal to me for three reasons:
1) It may not be necessary to repeatedly shuffle until a solution has been found.
2) This while loop will go on forever if there is no possible solution (i.e. setting myDist to be too high to find a configuration that fits). Any ideas on how to catch this in advance?
3) There must be an easier way to determine the distance between repeating elements in an array than what I did by looping through each unique value.
I would be grateful for answers to points 2 and 3, even if point 1 is correct and it is possible to do this in a single shuffle.
I think it is sufficient to check the following condition to prevent infinite loops:
[~,num, C] = mode(myArr);
N = numel(C);
assert( (myDist<=N) || (myDist-N+1) * (num-1) +N*num <= numel(myArr),...
'Shuffling impossible!');
Assume that myDist is 2 and we have the following data:
[4 6 5 1 6 7 4 6]
We can find the the mode , 6, with its occurence, 3. We arrange 6s separating them by 2 = myDist blanks:
6 _ _ 6 _ _6
There must be (3-1) * myDist = 4 numbers to fill the blanks. Now we have five more numbers so the array can be shuffled.
The problem becomes more complicated if we have multiple modes. For example for this array [4 6 5 1 6 7 4 6 4] we have N=2 modes: 6 and 4. They can be arranged as:
6 4 _ 6 4 _ 6 4
We have 2 blanks and three more numbers [ 5 1 7] that can be used to fill the blanks. If for example we had only one number [ 5] it was impossible to fill the blanks and we couldn't shuffle the array.
For the third point you can use sparse matrix to accelerate the computation (My initial testing in Octave shows that it is more efficient):
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
shuffledBin = sparse ( 1:numel(idx), S, true, numel(idx) + myDist, numel(U) );
reps = any (diff(find(shuffledBin)) <= myDist);
end
shuffledArr = U(S);
end
Alternatively you can use sub2ind and sort instead of sparse matrix:
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
f = sub2ind ( [numel(idx) + myDist, numel(U)] , 1:numel(idx), S );
reps = any (diff(sort(f)) <= myDist);
end
shuffledArr = U(S);
end
If you just want to find one possible solution you could use something like that:
x = [1 1 1 2 2 2 3 3 3 3 3 4 5 5 6 7 8 9];
n = numel(x);
dist = 3; %minimal distance
uni = unique(x); %get the unique value
his = histc(x,uni); %count the occurence of each element
s = [sortrows([uni;his].',2,'descend'), zeros(length(uni),1)];
xr = []; %the vector that will contains the solution
%the for loop that will maximize the distance of each element
for ii = 1:n
s(s(:,3)<0,3) = s(s(:,3)<0,3)+1;
s(1,3) = s(1,3)-dist;
s(1,2) = s(1,2)-1;
xr = [xr s(1,1)];
s = sortrows(s,[3,2],{'descend','descend'})
end
if any(s(:,2)~=0)
fprintf('failed, dist is too big')
end
Result:
xr = [3 1 2 5 3 1 2 4 3 6 7 8 3 9 5 1 2 3]
Explaination:
I create a vector s and at the beggining s is equal to:
s =
3 5 0
1 3 0
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
%col1 = unique element; col2 = occurence of each element, col3 = penalities
At each iteration of our for-loop we choose the element with the maximum occurence since this element will be harder to place in our array.
Then after the first iteration s is equal to:
s =
1 3 0 %1 is the next element that will be placed in our array.
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
3 4 -3 %3 has now 5-1 = 4 occurence and a penalities of -3 so it won't show up the next 3 iterations.
at the end every number of the second column should be equal to 0, if it's not the minimal distance was too big.

Find unique elements of multiple arrays

Let say I have 3 MATs
X = [ 1 3 9 10 ];
Y = [ 1 9 11 20];
Z = [ 1 3 9 11 ];
Now I would like to find the values that appear only once, and to what array they belong to
I generalized EBH's answer to cover flexible number of arrays, arrays with different sizes and multidimensional arrays. This method also can only deal with integer-valued arrays:
function [uniq, id] = uniQ(varargin)
combo = [];
idx = [];
for ii = 1:nargin
combo = [combo; varargin{ii}(:)]; % merge the arrays
idx = [idx; ii*ones(numel(varargin{ii}), 1)];
end
counts = histcounts(combo, min(combo):max(combo)+1);
ids = find(counts == 1); % finding index of unique elements in combo
uniq = min(combo) - 1 + ids(:); % constructing array of unique elements in 'counts'
id = zeros(size(uniq));
for ii = 1:numel(uniq)
ids = find(combo == uniq(ii), 1); % finding index of unique elements in 'combo'
id(ii) = idx(ids); % assigning the corresponding index
end
And this is how it works:
[uniq, id] = uniQ([9, 4], 15, randi(12,3,3), magic(3))
uniq =
1
7
11
12
15
id =
4
4
3
3
2
If you are only dealing with integers and your vectors are equally sized (all with the same number of elements), you can use histcounts for a quick search for unique elements:
X = [1 -3 9 10];
Y = [1 9 11 20];
Z = [1 3 9 11];
XYZ = [X(:) Y(:) Z(:)]; % one matrix with all vectors as columns
counts = histcounts(XYZ,min(XYZ(:)):max(XYZ(:))+1);
R = min(XYZ(:)):max(XYZ(:)); % range of the data
unkelem = R(counts==1);
and then locate them using a loop with find:
pos = zeros(size(unkelem));
counter = 1;
for k = unkelem
[~,pos(counter)] = find(XYZ==k);
counter = counter+1;
end
result = [unkelem;pos]
and you get:
result =
-3 3 10 20
1 3 1 2
so -3 3 10 20 are unique, and they appear at the 1 3 1 2 vectors, respectively.

Create all possible Mx1 vectors from an Nx1 vector in MATLAB

I am trying to create all possible 1xM vectors (word) from a 1xN vector (alphabet) in MATLAB. N is > M. For example, I want to create all possible 2x1 "words" from a 4x1 "alphabet" alphabet = [1 2 3 4];
I expect a result like:
[1 1]
[1 2]
[1 3]
[1 4]
[2 1]
[2 2]
...
I want to make M an input to my routine and I do not know it beforehand. Otherwise, I could easily do this using nested for-loops. Anyway to do this?
Try
[d1 d2] = ndgrid(alphabet);
[d2(:) d1(:)]
To parameterize on M:
d = cell(M, 1);
[d{:}] = ndgrid(alphabet);
for i = 1:M
d{i} = d{i}(:);
end
[d{end:-1:1}]
In general, and in languages that don't have ndgrid in their library, the way to parameterize for-loop nesting is using recursion.
[result] = function cartesian(alphabet, M)
if M <= 1
result = alphabet;
else
recursed = cartesian(alphabet, M-1)
N = size(recursed,1);
result = zeros(M, N * numel(alphabet));
for i=1:numel(alphabet)
result(1,1+(i-1)*N:i*N) = alphabet(i);
result(2:M,1+(i-1)*N:i*N) = recursed; % in MATLAB, this line can be vectorized with repmat... but in MATLAB you'd use ndgrid anyway
end
end
end
To get all k-letter combinations from an arbitrary alphabet, use
n = length(alphabet);
aux = dec2base(0:n^k-1,n)
aux2 = aux-'A';
ind = aux2<0;
aux2(ind) = aux(ind)-'0'
aux2(~ind) = aux2(~ind)+10;
words = alphabet(aux2+1)
The alphabet may consist of up to 36 elements (as per dec2base). Those elements may be numbers or characters.
How this works:
The numbers 0, 1, ... , n^k-1 when expressed in base n give all groups of k numbers taken from 0,...,n-1. dec2base does the conversion to base n, but gives the result in form of strings, so need to convert to the corresponding number (that's part with aux and aux2). We then add 1 to make the numbers 1,..., n. Finally, we index alphabet with that to use the real letters of numbers of the alphabet.
Example with letters:
>> alphabet = 'abc';
>> k = 2;
>> words
words =
aa
ab
ac
ba
bb
bc
ca
cb
cc
Example with numbers:
>> alphabet = [1 3 5 7];
>> k = 2;
>> words
words =
1 1
1 3
1 5
1 7
3 1
3 3
3 5
3 7
5 1
5 3
5 5
5 7
7 1
7 3
7 5
7 7
use ndgrid function in Matlab
[a,b] = ndgrid(alphabet)

How to remove elements of one array from another?

I have two arrays:
A=[1 1 2 2 3 3 3];
B=[1 3];
Is there any function that can remove elements which are contained in B from A?
The result should be
C=[1 2 2 3 3];
The order is not important, but if there is more specific elements like two times 1 in A, then I need operation that removes (from A) only as many of these specific elements is in B (in this case only one of 1 and one of 3; meaning other 1 and 3 should remain in final product C). This function should be analogous to setdiff, with the difference that it should take care of multiple instances of array elements. This analogy can hold because my B only contains elements that are in A.
For loop solution:
C = A;
for ii = 1:length(B)
C(find(C == B(ii), 1,'first')) = [];
end
Result
C =
1 2 2 3 3
Here's a vectorized solution using accumarray and repelem:
maxValue = max([A B]);
counts = accumarray(A(:), 1, [maxValue 1])-accumarray(B(:), 1, [maxValue 1]);
C = repelem(1:maxValue, max(counts, 0));
And the result for your sample data A = [1 1 2 2 3 3 3]; B = [1 3];:
C =
1 2 2 3 3
This will even work for cases where there are values in B not in A (like B = [1 4];) or more of a given value in B than in A (like B = [1 1 1];).
Note: The above works sinceA and B contain integers. If they were to contain floating-point values, you could map the unique values to integers first using unique and ismember. Let's say we had the following sample data:
A = [0 0 pi pi 2*pi 2*pi 2*pi];
B = [0 2*pi];
Here's a variant of the above code that can handle this:
uniqueValues = unique([A B]);
[~, A] = ismember(A, uniqueValues);
[~, B] = ismember(B, uniqueValues);
maxValue = max([A B]);
counts = accumarray(A(:), 1, [maxValue 1])-accumarray(B(:), 1, [maxValue 1]);
C = uniqueValues(repelem(1:maxValue, max(counts, 0)));
And the results:
C =
0 3.1416 3.1416 6.2832 6.2832 % [0 pi pi 2*pi 2*pi]

Element-wise array replication according to a count [duplicate]

This question already has answers here:
Repeat copies of array elements: Run-length decoding in MATLAB
(5 answers)
Closed 8 years ago.
My question is similar to this one, but I would like to replicate each element according to a count specified in a second array of the same size.
An example of this, say I had an array v = [3 1 9 4], I want to use rep = [2 3 1 5] to replicate the first element 2 times, the second three times, and so on to get [3 3 1 1 1 9 4 4 4 4 4].
So far I'm using a simple loop to get the job done. This is what I started with:
vv = [];
for i=1:numel(v)
vv = [vv repmat(v(i),1,rep(i))];
end
I managed to improve by preallocating space:
vv = zeros(1,sum(rep));
c = cumsum([1 rep]);
for i=1:numel(v)
vv(c(i):c(i)+rep(i)-1) = repmat(v(i),1,rep(i));
end
However I still feel there has to be a more clever way to do this... Thanks
Here's one way I like to accomplish this:
>> index = zeros(1,sum(rep));
>> index(cumsum([1 rep(1:end-1)])) = 1;
index =
1 0 1 0 0 1 1 0 0 0 0
>> index = cumsum(index)
index =
1 1 2 2 2 3 4 4 4 4 4
>> vv = v(index)
vv =
3 3 1 1 1 9 4 4 4 4 4
This works by first creating an index vector of zeroes the same length as the final count of all the values. By performing a cumulative sum of the rep vector with the last element removed and a 1 placed at the start, I get a vector of indices into index showing where the groups of replicated values will begin. These points are marked with ones. When a cumulative sum is performed on index, I get a final index vector that I can use to index into v to create the vector of heterogeneously-replicated values.
To add to the list of possible solutions, consider this one:
vv = cellfun(#(a,b)repmat(a,1,b), num2cell(v), num2cell(rep), 'UniformOutput',0);
vv = [vv{:}];
This is much slower than the one by gnovice..
What you are trying to do is to run-length decode. A high level reliable/vectorized utility is the FEX submission rude():
% example inputs
counts = [2, 3, 1];
values = [24,3,30];
the result
rude(counts, values)
ans =
24 24 3 3 3 30
Note that this function performs the opposite operation as well, i.e. run-length encodes a vector or in other words returns values and the corresponding counts.
accumarray function can be used to make the code work if zeros exit in rep array
function vv = repeatElements(v, rep)
index = accumarray(cumsum(rep)'+1, 1);
vv = v(cumsum(index(1:end-1))+1);
end
This works similar to solution of gnovice, except that indices are accumulated instead being assigned to 1. This allows to skip some indices (3 and 6 in the example below) and remove corresponding elements from the output.
>> v = [3 1 42 9 4 42];
>> rep = [2 3 0 1 5 0];
>> index = accumarray(cumsum(rep)'+1, 1)'
index =
0 0 1 0 0 2 1 0 0 0 0 2
>> cumsum(index(1:end-1))+1
ans =
1 1 2 2 2 4 5 5 5 5 5
>> vv = v(cumsum(index(1:end-1))+1)
vv =
3 3 1 1 1 9 4 4 4 4 4

Resources