How to balance unique values in an array Matlab - arrays

I have a vector
Y = [1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 1 0 0 0]
1 occurs 17 times
0 occurs 21 times
How can I randomly remove 0s so that both values have equal amounts, such as 1 (17 times) and 0 (17 times)?
This should also work on much bigger matrix.

Starting with your example
Y = [1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 1 0 0 0]
You can do the following:
% Get the indices of the value which is more common (`0` here)
zeroIdx = find(~Y); % equivalent to find(Y==0)
% Get random indices to remove
remIdx = randperm(nnz(~Y), nnz(~Y) - nnz(Y));
% Remove elements
Y(zeroIdx(remIdx)) = [];
You could combine the last two lines, but I think it would be less clear.
The randperm line is choosing the correct number of elements to remove from random indices between 1 and the number of zeros.

If the data can only have two values
Values are assumed to be 0 and 1. The most common value is randomly removed to equalize their counts:
Y = [1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 1 0 0 0 1 0 0 0]; % data
ind0 = find(Y==0); % indices of zeros
ind1 = find(Y==1); % indices of ones
t(1,1:numel(ind0)) = ind0(randperm(numel(ind0))); % random permutation of indices of zeros
t(2,1:numel(ind1)) = ind1(randperm(numel(ind1))); % same for ones. Pads shorter row with 0
t = t(:, all(t,1)); % keep only columns that don't have padding
result = Y(sort(t(:))); % linearize, sort and use those indices into the data
Generalization for more than two values
Values are arbitrary. All values except the least common one are randomly removed to equalize their counts:
Y = [0 1 2 0 2 1 1 2 0 2 1 2 2 0 0]; % data
vals = [0 1 2]; % or use vals = unique(Y), but absent values will not be detected
t = [];
for k = 1:numel(vals) % loop over values
ind_k = find(Y==vals(k));
t(k, 1:numel(ind_k)) = ind_k(randperm(numel(ind_k)));
end
t = t(:, all(t,1));
result = Y(sort(t(:)));

Related

Find regions of contiguous zeros in a binary array

I have
x=[ 1 1 1 1 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 1]
I want to find all the regions that have more than 5 zeros in a row. I want to find the index where it starts and where it stops.
In this case I want this: c=[12 18]. I can do it using for loops but I wonder if there is any better way, at least to find if there are some regions where this 'mask' ( mask=[0 0 0 0 0] ) appears.
A convolution based approach:
n = 5;
x = [0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0];
end_idx = find(diff(conv(~x, ones(1,n))==n)==-1)
start_idx = find(diff(conv(~x, ones(1,n))==n)==1) - n + 2
returning
end_idx =
6 14 25
start_idx =
1 9 20
Note that this part is common to both lines: diff(conv(~x, ones(1,n))==n) so it would be more efficient to pull it out:
kernel = ones(1,n);
convolved = diff(conv(~x, kernel)==n);
end_idx = find(convolved==-1)
start_idx = find(convolved==1) - n + 2
You can use regexp this way:
convert the array into a string
remove the blanks
use regexp to find the sequence of 0
A possible implementation could be:
x=[ 1 1 1 1 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 1]
% Convert the array to string and remove the blanks
str=strrep(num2str(x),' ','')
% Find the occurrences
[start_idx,end_idx]=regexp(str,'0{6,}')
This gives:
start_idx = 12
end_idx = 17
where x(start_idx) is the first element of the sequence and x(end_idx) is the last one
Applied to a more long sequence, start_idx and end_idx results being arrays:
x=[0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0]
start_idx =
1 9 20
end_idx =
6 14 25

MATLAB find first elements in columns of array

Within the context of writing a certain function, I have the following example matrix:
temp =
1 2 0 0 1 0
1 0 0 0 0 0
0 1 0 0 0 1
I want to obtain an array whose each element indicates the number of the element out of all non-zero elements which starts that column. If a column is empty, the element should correspond to the next non-empty column. For the matrix temp, the result would be:
result = [1 3 5 5 5 6]
Because the first non-zero element starts the first column, the third starts the second column, the fifth starts the fifth column and the sixth starts the sixth column.
How can I do this operation for any general matrix (one which may or may not contain empty columns) in a vectorized way?
Code:
temp = [1 2 0 0 1 0; 1 0 0 0 0 0; 0 1 0 0 0 1]
t10 = temp~=0
l2 = cumsum(t10(end:-1:1))
temp2 = reshape(l2(end)-l2(end:-1:1)+1, size(temp))
result = temp2(1,:)
Output:
temp =
1 2 0 0 1 0
1 0 0 0 0 0
0 1 0 0 0 1
t10 =
1 1 0 0 1 0
1 0 0 0 0 0
0 1 0 0 0 1
l2 =
1 1 1 1 1 2 2 2 2 2 2 2 3 3 4 4 5 6
temp2 =
1 3 5 5 5 6
2 4 5 5 6 6
3 4 5 5 6 6
result =
1 3 5 5 5 6
Printing values of each step may be clearer than my explanation. Basically we use cumsum to get the IDs of the non-zero elements. As you need to know the ID before reaching the element, a reversed cumsum will do. Then the only thing left is to reverse the ID numbers back.
Here's another way:
temp = [1 2 0 0 1 0; 1 0 0 0 0 0; 0 1 0 0 0 1]; % data
[~, c] = find(temp); % col indices of nonzero elements
result = accumarray(c, 1:numel(c), [], #min, NaN).'; % index, among all nonzero
% values, of the first nonzero value of each col; or NaN if none exists
result = cummin(result, 'reverse'); % fill NaN's using backwards cumulative maximum

Count and sum consecutive duplicate numbers

How to sum only consecutive duplicate numbers in order to find a unique value? for example, I have a vector
A = [0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 1]
then when summing only consecutive duplicate numbers, I will have
B = [0 1 0 1 0 2 0 1 0 1]
finally the number of unique values different from 0 and 1 :
sum(B>1)
I know one way to solve the problem:
sum(diff(find(A==1))==1)
but it seems it is not a good method.
An alternative solution:
A = [0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 0 1]
%// get Islands
a = cumsum(~A)
b = a(logical(A))
%// count occurences
c = histc(b,unique(b))
%// count number of occurences > 1
d = sum(c > 1)
%// or sum of occurences > 1
e = sum(c(c > 1))
c =
1 1 2 1 1
d =
1
e =
2
This will give you the total number of repeated values in the array including "0" values.
sum(A(1:end-1)-A(2:end)==0)
ans =
7
If you are interested only in the repeated "1" values, you can change it to
sum(A(1:end-1)+A(2:end)==2)
ans =
1
Note that this is the count of duplicates if you have [1 1 1] you'll get 2 not 1.

Fill odd sequences between ones in binary vector with value

I'm looking for a vectorized solution for this problem :
Let A a vector (great size : > 10000) of 0 and 1.
Ex :
A = [0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 etc]
I want to replace the 0 between the 1's (of odd ranks) by 2
i.e. to produce :
B = [0 0 0 1 2 2 2 2 2 1 0 0 0 1 2 2 1 0 0 1 2 1 etc]
Thanks for your help
It can be done quite easily with cumsum and mod:
A = [0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1]
Short answer
A( mod(cumsum(A),2) & ~A ) = 2
A =
0 0 0 1 2 2 2 2 2 1 0 0 0 1 2 2 1 0 0 1 2 1
You requested to fill the islands of odd rank, but by changing mod(... to ~mod(... you can easily fill also the islands of even rank.
Explanation/Old answer:
mask1 = logical(A);
mask2 = logical(mod(cumsum(A),2))
out = zeros(size(A));
out(mask2) = 2
out(mask1) = 1
try using cumsum
cs = cumsum( A );
B = 2*( mod(cs,2)== 1 );
B(A==1) = 1;

Select n elements in matrix left-wise based on certain value

I have a logical matrix A, and I would like to select all the elements to the left of each of my 1s values given a fixed distant. Let's say my distance is 4, I would like to (for instance) replace with a fixed value (saying 2) all the 4 cells at the left of each 1 in A.
A= [0 0 0 0 0 1 0
0 1 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 1 0 1]
B= [0 2 2 2 2 1 0
2 1 0 0 0 0 0
0 0 0 0 0 0 0
2 2 2 2 2 2 1]
In B is what I would like to have, considering also overwrting (last row in B), and cases where there is only 1 value at the left of my 1 and not 4 as the fixed searching distance (second row).
How about this lovely one-liner?
n = 3;
const = 5;
A = [0 0 0 0 0 1 0;
0 1 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 1 0 1]
A(bsxfun(#ne,fliplr(filter(ones(1,1+n),1,fliplr(A),[],2)),A)) = const
results in:
A =
0 0 5 5 5 1 0
5 1 0 0 0 0 0
0 0 0 0 0 0 0
0 5 5 5 5 5 1
here some explanations:
Am = fliplr(A); %// mirrored input required
Bm = filter(ones(1,1+n),1,Am,[],2); %// moving average filter for 2nd dimension
B = fliplr(Bm); %// back mirrored
mask = bsxfun(#ne,B,A) %// mask for constants
A(mask) = const
Here is a simple solution you could have come up with:
w=4; % Window size
v=2; % Desired value
B = A;
for r=1:size(A,1) % Go over all rows
for c=2:size(A,2) % Go over all columns
if A(r,c)==1 % If we encounter a 1
B(r,max(1,c-w):c-1)=v; % Set the four spots before this point to your value (if possible)
end
end
end
d = 4; %// distance
v = 2; %// value
A = fliplr(A).'; %'// flip matrix, and transpose to work along rows.
ind = logical( cumsum(A) ...
- [ zeros(size(A,1)-d+2,size(A,2)); cumsum(A(1:end-d-1,:)) ] - A );
A(ind) = v;
A = fliplr(A.');
Result:
A =
0 2 2 2 2 1 0
2 1 0 0 0 0 0
0 0 0 0 0 0 0
2 2 2 2 2 2 1
Approach #1 One-liner using imdilate available with Image Processing Toolbox -
A(imdilate(A,[ones(1,4) zeros(1,4+1)])==1)=2
Explanation
Step #1: Create a morphological structuring element to be used with imdilate -
morph_strel = [ones(1,4) zeros(1,4+1)]
This basically represents a window extending n places to the left with ones and n places to the right including the origin with zeros.
Step #2: Use imdilate that will modify A such that we would have 1 at all four places to the left of each 1 in A -
imdilate_result = imdilate(A,morph_strel)
Step #3: Select all four indices for each 1 of A and set them to 2 -
A(imdilate_result==1)=2
Thus, one can write a general form for this approach as -
A(imdilate(A,[ones(1,window_length) zeros(1,window_length+1)])==1)=new_value
where window_length would be 4 and new_value would be 2 for the given data.
Approach #2 Using bsxfun-
%// Paramters
window_length = 4;
new_value = 2;
B = A' %//'
[r,c] = find(B)
extents = bsxfun(#plus,r,-window_length:-1)
valid_ind1 = extents>0
jump_factor = (c-1)*size(B,1)
extents_valid = extents.*valid_ind1
B(nonzeros(bsxfun(#plus,extents_valid,jump_factor).*valid_ind1))=new_value
B = B' %// B is the desired output

Resources