graph representing the randomization of each column in a binary matrix - arrays

Imagine the following binary image exemplified by the matrix below. This is a simplified version of the images I'll be working with:
0 1 0 1
0 1 1 1
0 0 0 1
0 1 1 1
I want to construct a graph that will represent the randomness of each column. My thought is to develop a random index = the total transitions between each value in the column / by the total possible transitions. In the matrix above, each column could have a total possible of 3 transitions.
For the example above:
Column 1 would have a random index of 0% (0/3)
Column 2 would have a random index of 66.7% (2/3)
Column 3 = 100% (3/3)
Column 4 = 0% (0/3) even though they are 1's and not 0's. Doesn't matter, I just want the transitions.
Can I draw a boundary around all the 1 values and then have MATLAB sum all of the boundaries?

To calculate what you are suggesting you can just do:
sum( diff(A) ~= 0 )
The diff(A) will take the forward difference down the columns and the sum will count the number of non-zero changes. So if you do this you will get:
ans =
0 2 3 0

Let your image be defined as
im = [ 0 1 0 1
0 1 1 1
0 0 0 1
0 1 1 1 ];
The random index you want can be computed as
result = sum(diff(im)~=0) / (size(im,1)-1);
Explanation: diff computes the difference between consecutive elemtents down each column. The result is compared against zero (~=0), and all nonzero values within each row are added (with sum). Finally, the result is divided by the maximum number os transitions, which is the number of rows minus 1 (size(im,1)-1)
Equivalently, you could use xor between consecutive rows:
result = sum(xor(im(1:end-1,:), im(2:end,:))) / (size(im,1)-1)

Related

Find the middle value in array that meets condition

I've got logical array(zeros and ones) 1500x700
I want to find "1" in every column and when there are more than one "1" in column i should choose the middle one.
Is that possible to do it? I know how to find "1", but don't know how to extract the middle "1" if there's couple of "1" in one column.
The find function returns the indices of your ones.
>> example=[1,0,0,1,0,1,1];
>> indices=find(example)
indices =
1 4 6 7
>> indices(floor(numel(indices)/2))
ans =
4
Do this for each column and you have a solution.
You can
Get the row and column indices of ones with find;
Apply accumarray with a custom function to get the middle row index for each column.
x = [1 0 0 0 0; 0 0 1 0 0; 1 0 1 0 0; 1 0 0 1 0]; % example
[ii, jj] = find(x); % step 1
result = accumarray(jj, ii, [size(x,2) 1], #(x) x(ceil(end/2)), NaN); % step 2
Note that:
For an even number of ones this gives the first of the two middle indices. If you prefer the average of the two middle indices replace #(x) x(ceil(end/2)) by #median.
For a column without ones this gives NaN as result. If you prefer a different value, replace the input fifth argument of accumarray by that.
Example:
x =
1 0 0 0 0
0 0 1 0 0
1 0 1 0 0
1 0 0 1 0
result =
3
NaN
2
4
NaN

How to delete rows from a matrix that contain more than 50% zeros MATLAB

I want to remove the rows in an array that contain more than 50% of null elements.
eg:
if the input is
1 0 0 0 5 0
2 3 5 4 3 1
3 0 0 4 3 0
2 0 9 8 2 1
0 0 4 0 1 0
I want to remove rows 1 and 5, but retain the rest. The output should look like:
2 3 5 4 3 1
3 0 0 4 3 0
2 0 9 8 2 1
I want to do this using matlab
Use logical indexing into the rows, based on the mean of the rows of A negated:
t = .5; % threshold
A(mean(A==0,2) > t, :) = [];
What this does:
Compare A with 0: turns zeros into true, and nonzeros into false.
Compute the mean of each row.
Compare that to the desired threshold.
Use the result as a logical index to delete unwanted rows.
Equivalently, you can keep the wanted rows instead of removing the unwanted ones. This may be faster depending on the proportion of rows:
A = A(mean(A~=0,2) >= 1-t, :);
You can also use the standardizeMissing function and rmmissing function together to achieve this:
>> [~,tf] = rmmissing(standardizeMissing(A,0),'MinNumMissing',floor(0.5*size(A,2))+1);
>> A(~tf,:)
The call to standardizeMissing replaces the 0 values with NaN (the standard missing indicator for double), then the rmmissing call identifies in the logical vector tf the rows that have more than 50% of their entries as 0 (i.e., those rows that have more than floor(0.5*size(A,2))+1 0-valued entries. Then you can just negate the tf output and use it as an indexer. You can adapt the minimum number missing easily to satisfy whatever percentage criteria you want.
Also note that tf is a logical vector here that is only the size of the number of rows of A.
As I mentioned on Luis' answer, one downside to his approach is that it requires an intermediate logical array of the same size as A to be created, which can potentially incur a significant memory/performance penalty when working with large arrays.
An explicit looped approach with nnz (overly verbose, for clarity):
[nrows, ncols] = size(A);
maximum_ratio_of_zeros = 0.5;
minimum_ratio_of_nonzeros = 1 - maximum_ratio_of_zeros;
todelete = false(nrows, 1);
for ii = 1:nrows
if nnz(A(ii,:))/ncols < minimum_ratio_of_nonzeros
todelete(ii) = true;
end
end
A(todelete,:) = [];
Which returns the desired answer.

MATLAB removing rows which has duplicates in sequence

I'm trying to remove the rows which has duplicates in sequence. I have only 2 possible values which are 0 and 1. I have nXm which n shows possible number of bits and m is not important for my question. My goal is to find an matrix which is nX(m-a). The rows a which has the property which includes duplicates in sequence. For example:
My matrix is :
A=[0 1 0 1 0 1;
0 0 0 1 1 1;
0 0 1 0 0 1;
0 1 0 0 1 0;
1 0 0 0 1 0]
I want to remove the rows has t duplicates in sequence for 0. In this question let's assume t is 3. So I want the matrix which:
B=[0 1 0 1 0 1;
0 0 1 0 0 1;
0 1 0 0 1 0]
2nd and 5th rows are removed.
I probably need to use diff.
So you want to remove rows of A that contain at least t zeros in sequence.
How about a single line?
B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==-t, 2),:);
How this works:
Transform A to bipolar form (2*A-1)
Convolve each row with a sequence of t ones (conv2(...))
Keep only rows for which the convolution does not contain -t (~any(...)). The presence of -t indicates a sequence of t zeros in the corresponding row of A.
To remove rows that contain at least t ones, just change -t to t:
B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==t, 2),:);
Here is a generalized approach which removes any rows which has given number of consecutive duplicates (not just zero. could be any number).
t = 3;
row_mask = ~any(all(~diff(reshape(im2col(A,[1 t],'sliding'),t,size(A,1),[]))),3);
out = A(row_mask,:)
Sample Run:
>> A
A =
0 1 0 1 0 1
0 0 1 5 5 5 %// consecutive 3 5's
0 0 1 0 0 1
0 1 0 0 1 0
1 1 1 0 0 1 %// consecutive 3 1's
>> out
out =
0 1 0 1 0 1
0 0 1 0 0 1
0 1 0 0 1 0
How about an approach using strings? This is certainly not as fast as Luis Mendo's method where you work directly with the numerical array, but it's thinking a bit outside of the box. The basis of this approach is that I consider each row of A to be a unique string, and I can search each string for occurrences of a string of 0s by regular expressions.
A=[0 1 0 1 0 1;
0 0 0 1 1 1;
0 0 1 0 0 1;
0 1 0 0 1 0;
1 0 0 0 1 0];
t = 3;
B = sprintfc('%s', char('0' + A));
ind = cellfun('isempty', regexp(B, repmat('0', [1 t])));
B(~ind) = [];
B = double(char(B) - '0');
We get:
B =
0 1 0 1 0 1
0 0 1 0 0 1
0 1 0 0 1 0
Explanation
Line 1: Convert each line of the matrix A into a string consisting of 0s and 1s. Each line becomes a cell in a cell array. This uses the undocumented function sprintfc to facilitate this cell array conversion.
Line 2: I use regular expressions to find any occurrences of a string of 0s that is t long. I first use repmat to create a search string that is full of 0s and is t long. After, I determine if each line in this cell array contains this sequence of characters (i.e. 000....). The function regexp helps us perform regular expressions and returns the locations of any matches for each cell in the cell array. Alternatively, you can use the function strfind for more recent versions of MATLAB to speed up the computation, but I chose regexp so that the solution is compatible with most MATLAB distributions out there.
Continuing on, the output of regexp/strfind is a cell array of elements where each cell reports the locations of where we found the particular string. If we have a match, there should be at least one location that is reported at the output, so I check to see if any matches are empty, meaning that these are the rows we don't want to remove. I want to turn this into a logical array for the purposes of removing rows from A, and so this is wrapped with a cellfun call to determine the cells that are empty. Therefore, this line returns a logical array where a 0 means that remove this row and a 1 means that we don't.
Line 3: I take the logical array from Line 2 and invert it because that's what we really want. We use this inverted array to index into the cell array and remove those strings.
Line 4: The output is still a cell array, so I convert it back into a character array, and finally back into a numerical array.

Matlab all() functions row numbers

I have a matrix and I wanted to find all non-zero rows in the matrix and the all(A, 2) function did this but I was wondering if there is a way to list the corresponding row number alongside the value?
Use find(all(A,2). all(A,2) gives you a vector with a 1 where there is a row of ones, and a 0 otherwise. find gives you the indices of non-zeros elements of an array. Putting them together gives the required result:
A=[0 0 1 0
1 1 1 1
0 1 1 0
0 1 0 1]
find(all(A,2))=2

Counting the occurance of a unique number in an array - MATLAB

I have an array that looks something like...
1 0 0 1 2 2 1 1 2 1 0
2 1 0 0 0 1 1 0 0 2 1
1 2 2 1 1 1 2 0 0 1 0
0 0 0 1 2 1 1 2 0 1 2
however my real array is (50x50).
I am relatively new to MATLAB and need to be able to count the amount of unique values in each row and column, for example there is four '1's in row-2 and three '0's in column-3. I need to be able to do this with my real array.
It would help even more if these quantities of unique values were in arrays of their own also.
PLEASE use simple language, or else i will get lost, for example if representing an array, don't call it x, but perhaps column_occurances_array... for me please :)
What I would do is iterate over each row of your matrix and calculate a histogram of occurrences for each row. Use histc to calculate the occurrences of each row. The thing that is nice about histc is that you are able to specify where the bins are to start accumulating. These correspond to the unique entries for each row of your matrix. As such, use unique to compute these unique entries.
Now, I would use arrayfun to iterate over all of your rows in your matrix, and this will produce a cell array. Each element in this cell array will give you the counts for each unique value for each row. Therefore, assuming your matrix of values is stored in A, you would simply do:
vals = arrayfun(#(x) [unique(A(x,:)); histc(A(x,:), unique(A(x,:)))], 1:size(A,1), 'uni', 0);
Now, if we want to display all of our counts, use celldisp. Using your example, and with the above code combined with celldisp, this is what I get:
vals{1} =
0 1 2
3 5 3
vals{2} =
0 1 2
5 4 2
vals{3} =
0 1 2
3 5 3
vals{4} =
0 1 2
4 4 3
What the above display is saying is that for the first row, you have 3 zeros, 5 ones and 3 twos. The second row has 5 zeros, 4 ones and 2 twos and so on. These are just for the rows. If you want to do these for columns, you have to modify your code slightly to operate along columns:
vals = arrayfun(#(x) [unique(A(:,x)) histc(A(:,x), unique(A(:,x)))].', 1:size(A,2), 'uni', 0);
By using celldisp, this is what we get:
vals{1} =
0 1 2
1 2 1
vals{2} =
0 1 2
2 1 1
vals{3} =
0 2
3 1
vals{4} =
0 1
1 3
vals{5} =
0 1 2
1 1 2
vals{6} =
1 2
3 1
vals{7} =
1 2
3 1
vals{8} =
0 1 2
2 1 1
vals{9} =
0 2
3 1
vals{10} =
1 2
3 1
vals{11} =
0 1 2
2 1 1
This means that in the first column, we see 1 zero, 2 ones and 1 two, etc. etc.
I absolutely agree with rayryeng! However, here is some code which might be easier to understand for you as a beginner. It is without cell arrays or arrayfuns and quite self-explanatory:
%% initialize your array randomly for demonstration:
numRows = 50;
numCols = 50;
yourArray = round(10*rand(numRows,numCols));
%% do some stuff of what you are asking for
% find all occuring numbers in yourArray
occVals = unique(yourArray(:));
% now you could sort them just for convinience
occVals = sort(occVals);
% now we could create a matrix occMat_row of dimension |occVals| x numRows
% where occMat_row(i,j) represents how often the ith value occurs in the
% jth row, analoguesly occMat_col:
occMat_row = zeros(length(occVals),numRows);
occMat_col = zeros(length(occVals),numCols);
for k = 1:length(occVals)
occMat_row(k,:) = sum(yourArray == occVals(k),2)';
occMat_col(k,:) = sum(yourArray == occVals(k),1);
end

Resources