Finding number(s) that is(are) repeated consecutively most often - arrays

Given this array for example:
a = [1 2 2 2 1 3 2 1 4 4 4 5 1]
I want to find a way to check which numbers are repeated consecutively most often. In this example, the output should be [2 4] since both 2 and 4 are repeated three times consecutively.
Another example:
a = [1 1 2 3 1 1 5]
This should return [1 1] because there are separate instances of 1 being repeated twice.
This is my simple code. I know there is a better way to do this:
function val=longrun(a)
b = a(:)';
b = [b, max(b)+1];
val = [];
sum = 1;
max_occ = 0;
for i = 1:max(size(b))
q = b(i);
for j = i:size(b,2)
if (q == b(j))
sum = sum + 1;
else
if (sum > max_occ)
max_occ = sum;
val = [];
val = [val, q];
elseif (max_occ == sum)
val = [val, q];
end
sum = 1;
break;
end
end
end
if (size(a,2) == 1)
val = val'
end
end

Here's a vectorized way:
a = [1 2 2 2 1 3 2 1 4 4 4 5 1]; % input data
t = cumsum([true logical(diff(a))]); % assign a label to each run of equal values
[~, n, z] = mode(t); % maximum run length and corresponding labels
result = a(ismember(t,z{1})); % build result with repeated values
result = result(1:n:end); % remove repetitions

One solution could be:
%Dummy data
a = [1 2 2 2 1 3 2 1 4 4 4 5 5]
%Preallocation
x = ones(1,numel(a));
%Loop
for ii = 2:numel(a)
if a(ii-1) == a(ii)
x(ii) = x(ii-1)+1;
end
end
%Get the result
a(find(x==max(x)))
With a simple for loop.
The goal here is to increase the value of x if the previous value in the vector a is identical.
Or you could also vectorized the process:
x = a(find(a-circshift(a,1,2)==0)); %compare a with a + a shift of 1 and get only the repeated element.
u = unique(x); %get the unique value of x
h = histc(x,u);
res = u(h==max(h)) %get the result

Related

Get the first 2 non-zero elements from every row of matrix

I have a matrix A like this:
A = [ 1 0 2 4; 2 3 1 0; 0 0 3 4 ]
A has only unique row elements except zero, and each row has at least 2 non-zero elements.
I want to create a new matrix B from A,where each row in B contains the first two non-zero elements of the corresponding row in A.
B = [ 1 2 ; 2 3 ; 3 4 ]
It is easy with loops but I need vectorized solution.
Here's a vectorized approach:
A = [1 0 2 4; 2 3 1 0; 0 0 3 4]; % example input
N = 2; % number of wanted nonzeros per row
[~, ind] = sort(~A, 2); % sort each row of A by the logical negation of its values.
% Get the indices of the sorting
ind = ind(:, 1:N); % keep first N columns
B = A((1:size(A,1)).' + (ind-1)*size(A,1)); % generate linear index and use into A
Here is another vectorised approach.
A_bool = A > 0; A_size = size(A); A_rows = A_size(1);
A_boolsum = cumsum( A_bool, 2 ) .* A_bool; % for each row, and at each column,
% count how many nonzero instances
% have occurred up to that column
% (inclusive), and then 'zero' back
% all original zero locations.
[~, ColumnsOfFirsts ] = max( A_boolsum == 1, [], 2 );
[~, ColumnsOfSeconds ] = max( A_boolsum == 2, [], 2 );
LinearIndicesOfFirsts = sub2ind( A_size, [1 : A_rows].', ColumnsOfFirsts );
LinearIndicesOfSeconds = sub2ind( A_size, [1 : A_rows].', ColumnsOfSeconds );
Firsts = A(LinearIndicesOfFirsts );
Seconds = A(LinearIndicesOfSeconds);
Result = horzcat( Firsts, Seconds )
% Result =
% 1 2
% 2 3
% 3 4
PS. Matlab / Octave common subset compatible code.

Generate square matrix for vector with diagonals in MatLab

I have a vector, where each value corresponds to a diagonal. I want to create a matrix from this vector. I have a code:
x = [1:5];
N = numel(x);
diagM = diag(repmat(x(1),N,1),0);
for iD = 2:N
d = repmat(x(iD),N-iD+1,1);
d_pos = diag(d,iD-1);
d_neg = diag(d,-iD+1);
d_join = d_pos+d_neg;
diagM = diagM+d_join;
end
It gives me what i want:
diagM =
1 2 3 4 5
2 1 2 3 4
3 2 1 2 3
4 3 2 1 2
5 4 3 2 1
But it becames really slow, for example for x=[1:10^4].
Could You help me with another FASTER way to generate such a sequence?
Use toeplitz:
x = 1:5;
diagM = toeplitz(x);
Or do it manually, vectorized:
x = 1:5;
t = 1:numel(x);
diagM = x(abs(t-t.')+1); % x(abs(bsxfun(#minus, t, t.'))+1) in old Matlab versions

Find unique elements of multiple arrays

Let say I have 3 MATs
X = [ 1 3 9 10 ];
Y = [ 1 9 11 20];
Z = [ 1 3 9 11 ];
Now I would like to find the values that appear only once, and to what array they belong to
I generalized EBH's answer to cover flexible number of arrays, arrays with different sizes and multidimensional arrays. This method also can only deal with integer-valued arrays:
function [uniq, id] = uniQ(varargin)
combo = [];
idx = [];
for ii = 1:nargin
combo = [combo; varargin{ii}(:)]; % merge the arrays
idx = [idx; ii*ones(numel(varargin{ii}), 1)];
end
counts = histcounts(combo, min(combo):max(combo)+1);
ids = find(counts == 1); % finding index of unique elements in combo
uniq = min(combo) - 1 + ids(:); % constructing array of unique elements in 'counts'
id = zeros(size(uniq));
for ii = 1:numel(uniq)
ids = find(combo == uniq(ii), 1); % finding index of unique elements in 'combo'
id(ii) = idx(ids); % assigning the corresponding index
end
And this is how it works:
[uniq, id] = uniQ([9, 4], 15, randi(12,3,3), magic(3))
uniq =
1
7
11
12
15
id =
4
4
3
3
2
If you are only dealing with integers and your vectors are equally sized (all with the same number of elements), you can use histcounts for a quick search for unique elements:
X = [1 -3 9 10];
Y = [1 9 11 20];
Z = [1 3 9 11];
XYZ = [X(:) Y(:) Z(:)]; % one matrix with all vectors as columns
counts = histcounts(XYZ,min(XYZ(:)):max(XYZ(:))+1);
R = min(XYZ(:)):max(XYZ(:)); % range of the data
unkelem = R(counts==1);
and then locate them using a loop with find:
pos = zeros(size(unkelem));
counter = 1;
for k = unkelem
[~,pos(counter)] = find(XYZ==k);
counter = counter+1;
end
result = [unkelem;pos]
and you get:
result =
-3 3 10 20
1 3 1 2
so -3 3 10 20 are unique, and they appear at the 1 3 1 2 vectors, respectively.

Matlab: How do I re-insert elements in an array?

My original array was A = [1 0 2 3 0 7]. I deleted the indexes with a zero in them, and got A = [1 2 3 7]. I stored the indexes of the elements I deleted in an array called DEL = [2 5].
How can I re-insert the zeros in the array to get back the original array?
This will do it for you:
A = [1 2 3 7];
DEL = [2 5];
n = numel(A) + numel(DEL);
B = zeros(1,n);
mask = true(1,n);
mask(DEL) = false;
B(mask) = A;
Alternatively, you can set the mask in one line using:
mask = setdiff(1:n, DEL);
Result:
B =
1 0 2 3 0 7
A = [1 0 2 3 0 7] ;
A_ = [1 2 3 7] ;
[~,i] = find (A) ;
B = zeros (1,length(A)) ;
B(i) = A_ ;
A = [1 2 3 7];
DEL = [2 5];
array_indices = 1:6; % the indices of the original array
array_indices(DEL) = []; % indices of numbers that were not deleted
B = zeros(1,6); % created zeros of the same length as original array
B(array_indices) = A; % replace zeros with non-zero #s at their locations

mean of parts of an array in octave

I have two arrays. One is a list of lengths within the other. For example
zarray = [1 2 3 4 5 6 7 8 9 10]
and
lengths = [1 3 2 1 3]
I want to average (mean) over parts the first array with lengths given by the second. For this example, resulting in:
[mean([1]),mean([2,3,4]),mean([5,6]),mean([7]),mean([8,9,10])]
I am trying to avoid looping, for the sake of speed. I tried using mat2cell and cellfun as follows
zcell = mat2cell(zarray,[1],lengths);
zcellsum = cellfun('mean',zcell);
But the cellfun part is very slow. Is there a way to do this without looping or cellfun?
Here is a fully vectorized solution (no explicit for-loops, or hidden loops with ARRAYFUN, CELLFUN, ..). The idea is to use the extremely fast ACCUMARRAY function:
%# data
zarray = [1 2 3 4 5 6 7 8 9 10];
lengths = [1 3 2 1 3];
%# generate subscripts: 1 2 2 2 3 3 4 5 5 5
endLocs = cumsum(lengths(:));
subs = zeros(endLocs(end),1);
subs([1;endLocs(1:end-1)+1]) = 1;
subs = cumsum(subs);
%# mean of each part
means = accumarray(subs, zarray) ./ lengths(:)
The result in this case:
means =
1
3
5.5
7
9
Speed test:
Consider the following comparison of the different methods. I am using the TIMEIT function by Steve Eddins:
function [t,v] = testMeans()
%# generate test data
[arr,len] = genData();
%# define functions
f1 = #() func1(arr,len);
f2 = #() func2(arr,len);
f3 = #() func3(arr,len);
f4 = #() func4(arr,len);
%# timeit
t(1) = timeit( f1 );
t(2) = timeit( f2 );
t(3) = timeit( f3 );
t(4) = timeit( f4 );
%# return results to check their validity
v{1} = f1();
v{2} = f2();
v{3} = f3();
v{4} = f4();
end
function [arr,len] = genData()
%#arr = [1 2 3 4 5 6 7 8 9 10];
%#len = [1 3 2 1 3];
numArr = 10000; %# number of elements in array
numParts = 500; %# number of parts/regions
arr = rand(1,numArr);
len = zeros(1,numParts);
len(1:end-1) = diff(sort( randperm(numArr,numParts) ));
len(end) = numArr - sum(len);
end
function m = func1(arr, len)
%# #Drodbar: for-loop
idx = 1;
N = length(len);
m = zeros(1,N);
for i=1:N
m(i) = mean( arr(idx+(0:len(i)-1)) );
idx = idx + len(i);
end
end
function m = func2(arr, len)
%# #user1073959: MAT2CELL+CELLFUN
m = cellfun(#mean, mat2cell(arr, 1, len));
end
function m = func3(arr, len)
%# #Drodbar: ARRAYFUN+CELLFUN
idx = arrayfun(#(a,b) a-(0:b-1), cumsum(len), len, 'UniformOutput',false);
m = cellfun(#(a) mean(arr(a)), idx);
end
function m = func4(arr, len)
%# #Amro: ACCUMARRAY
endLocs = cumsum(len(:));
subs = zeros(endLocs(end),1);
subs([1;endLocs(1:end-1)+1]) = 1;
subs = cumsum(subs);
m = accumarray(subs, arr) ./ len(:);
if isrow(len)
m = m';
end
end
Below are the timings. Tests were performed on a WinXP 32-bit machine with MATLAB R2012a. My method is an order of magnitude faster than all other methods. For-loop is second best.
>> [t,v] = testMeans();
>> t
t =
0.013098 0.013074 0.022407 0.00031807
| | | \_________ #Amro: ACCUMARRAY (!)
| | \___________________ #Drodbar: ARRAYFUN+CELLFUN
| \______________________________ #user1073959: MAT2CELL+CELLFUN
\__________________________________________ #Drodbar: FOR-loop
Furthermore all results are correct and equal -- differences are in the order of eps the machine precision (caused by different ways of accumulating round-off errors), therefore considered rubbish and simply ignored:
%#assert( isequal(v{:}) )
>> maxErr = max(max( diff(vertcat(v{:})) ))
maxErr =
3.3307e-16
Here is a solution using arrayfun and cellfun
zarray = [1 2 3 4 5 6 7 8 9 10];
lengths = [1 3 2 1 3];
% Generate the indexes for the elements contained within each length specified
% subset. idx would be {[1], [4, 3, 2], [6, 5], [7], [10, 9, 8]} in this case
idx = arrayfun(#(a,b) a-(0:b-1), cumsum(lengths), lengths,'UniformOutput',false);
means = cellfun( #(a) mean(zarray(a)), idx);
Your desired output result:
means =
1.0000 3.0000 5.5000 7.0000 9.0000
Following #tmpearce comment I did a quick time performance comparison between above's solution, from which I create a function called subsetMeans1
function means = subsetMeans1( zarray, lengths)
% Generate the indexes for the elements contained within each length specified
% subset. idx would be {[1], [4, 3, 2], [6, 5], [7], [10, 9, 8]} in this case
idx = arrayfun(#(a,b) a-(0:b-1), cumsum(lengths), lengths,'UniformOutput',false);
means = cellfun( #(a) mean(zarray(a)), idx);
and a simple for loop alternative, function subsetMeans2.
function means = subsetMeans2( zarray, lengths)
% Method based on single loop
idx = 1;
N = length(lengths);
means = zeros( 1, N);
for i = 1:N
means(i) = mean( zarray(idx+(0:lengths(i)-1)) );
idx = idx+lengths(i);
end
Using the next test scrip, based on TIMEIT, that allows checking performance varying the number of elements on the input vector and sizes of elements per subset:
% Generate some data for the performance test
% Total of elements on the vector to test
nVec = 100000;
% Max of elements per subset
nSubset = 5;
% Data generation aux variables
lenghtsGen = randi( nSubset, 1, nVec);
accumLen = cumsum(lenghtsGen);
maxIdx = find( accumLen < nVec, 1, 'last' );
% % Original test data
% zarray = [1 2 3 4 5 6 7 8 9 10];
% lengths = [1 3 2 1 3];
% Vector to test
zarray = 1:nVec;
lengths = [ lenghtsGen(1:maxIdx) nVec-accumLen(maxIdx)] ;
% Double check that nVec is will be the max index
assert ( sum(lengths) == nVec)
t1(1) = timeit(#() subsetMeans1( zarray, lengths));
t1(2) = timeit(#() subsetMeans2( zarray, lengths));
fprintf('Time spent subsetMeans1: %f\n',t1(1));
fprintf('Time spent subsetMeans2: %f\n',t1(2));
It turns out that the non-vectorised version without arrayfun and cellfun is faster, presumably due to the extra overhead of those functions
Time spent subsetMeans1: 2.082457
Time spent subsetMeans2: 1.278473

Resources