Look at each row separately in a matrix (Matlab) - arrays

I have a matrix in Matlab(2012) with 3 columns and X number of rows, X is defined by the user, so varies each time. For this example though I will use a fixed 5x3 matrix.
So I would like to perform an iterative function on each row within the matrix, while the value in the third column is below a certain value. Then store the new values within the same matrix, so overwrite the original values.
The code below is a simplified version of the problem.
M=[-2 -5 -3 -2 -4]; %Vector containing random values
Vf_X=M+1; %Defining the first column of the matrix
Vf_Y=M+2; %Defining the secound column of the matrix
Vf_Z=M; %Defining the third column of the matrix
Vf=[Vf_X',Vf_Y',Vf_Z']; %Creating the matrix
while Vf(:,3)<0
Vf=Vf+1;
end
disp(Vf)
The result I get is
1 2 0
-2 -1 -3
0 1 -1
1 2 0
-1 0 -2
Ideally I would like to get this result instead
1 2 0
1 2 0
1 2 0
1 2 0
1 2 0
The while will not start if any value is above zero to begin with and stops as soon as one value goes above zero.
I hope this makes sense and I have supplied enough information
Thank you for your time and help.

Your current problem is that you stop iterating the very moment any of the values in the third row break the condition. Correct me if I'm wrong, but what I think you want is to continue doing iterations on the remaining rows, until the conditions are broken by all third columns.
You could do that like this:
inds = true(size(Vf,1),1);
while any(inds)
Vf(inds,:) = Vf(inds,:)+1;
inds = Vf(:,3) < 0;
end
Of course, for the simple addition you provide, there is a better and faster way:
inds = Vf(:,3)<0;
Vf(inds,:) = bsxfun(#minus, Vf(inds,:), Vf(inds,3));
But for general functions, the while above will do the trick.

Related

Efficient algorithm for merging consistent rows of multiple 2d arrays

I have an arbitrary number of 2d arrays of equal width but possibly non-equal height. They can consist of 1s, 0s, or a wildcard * which can match either a 1 or a 0. The wildcards are always in the same columns. I want to return all possible rows that are consistent with at least one row in every array simultaneously, and contain no wild cards.
For a concrete example, consider the three 2d arrays
1 0 1 * * 0 1 0 * 1 * 1
a = 1 1 1 * b = * 1 1 0 c = * 0 * 0
0 1 0 * * 0 0 1
A possible row in the solution might be 1 0 1 0. It is consistent with the top row of a, the top row of b, and the second row of c. By contrast, a row that should not be in the solution is 0 1 0 1, since it is not consistent with any row of b despite being consistent with the bottom row of a and the top row of c.
Beyond doing an inefficient brute-force check I'm rather stuck. It seems like there should be a faster way. Are there are any tricks that might help solve this problem efficiently?

SUMPRODUCT() handling arrays in excel

I have three arrays A, B, and A - B = C. They are broken into columns and formatted in excel like:
A, B, C, A, B, C, A, B, C... D, E
I want to sum all C>0 = D, and sum of all C<0 = E. The problem is that C is broken up for easy human readability, so I only want to call every third column.
My solution:
Following a variation on the method given here and here, and a simple test array of data:
1 0 1 1
1 1 0 1
-1 -1 -1 1
0 -1 -1 0
-1 -1 1 0
1 1 0 1
I will pull out the even columns and do the conditional sums:
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)=0)*(A1:D1>0),A1:D1)
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)=0)*(A1:D1<0),A1:D1)
Which produces the correct result:
1 0
2 0
1 -1
0 -1
0 -1
2 0
But I am absolutely baffled as to why this works. For one thing, I didn't put in the double negative (--), so upon getting a "TRUE" or "FALSE" value, the formula should have spit an error at me. For another, this works just fine even though I'm not running it as a CSE array function in excel. And the part I get least of all is the arguments for SUMPRODUCT().
MOD() is just acting as a filter for the conditional, that I get, but I don't understand how it's handling A1:D1 when all it gets from COLUMN is a single number. COLUMN(A1:D1) just returns a single scalar value in excel, the first column in the range, in this case 1. How is that being turned into the needed array [1, 3], especially since I'm not using CSE?
SUMPRODUCT is naturally an Array type formula which is why you do not need the CSE:
=SUMPRODUCT(A1:A6,B1:B6)
This will iterate and do A1*B1+A2*B2+...
So using this functionality we can do:
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)))
Which will iterate through the columns and return: 2 MOD(1,2)+MOD(2,2)+...
The reason you do not need the double unary(--) is because you have the * in the expression. Any math done on a Boolean(TRUE/FALSE) will turn it to its Bit (1/0).
The -- is short hand for -1 * -1 *
You would need the -- if you did this:
=SUMPRODUCT(--(MOD(COLUMN(A1:D1),2)=0),--(A1:D1>0),A1:D1)
So with this you would end up multiplying a series of 1's and 0's against the series in A1:D1.
When either is false it will return 0 and 0 times anything is 0.
So only when both are TRUE or 1 does the value in the corresponding cell get added to the iterating sum.

How to delete rows from a matrix that contain more than 50% zeros MATLAB

I want to remove the rows in an array that contain more than 50% of null elements.
eg:
if the input is
1 0 0 0 5 0
2 3 5 4 3 1
3 0 0 4 3 0
2 0 9 8 2 1
0 0 4 0 1 0
I want to remove rows 1 and 5, but retain the rest. The output should look like:
2 3 5 4 3 1
3 0 0 4 3 0
2 0 9 8 2 1
I want to do this using matlab
Use logical indexing into the rows, based on the mean of the rows of A negated:
t = .5; % threshold
A(mean(A==0,2) > t, :) = [];
What this does:
Compare A with 0: turns zeros into true, and nonzeros into false.
Compute the mean of each row.
Compare that to the desired threshold.
Use the result as a logical index to delete unwanted rows.
Equivalently, you can keep the wanted rows instead of removing the unwanted ones. This may be faster depending on the proportion of rows:
A = A(mean(A~=0,2) >= 1-t, :);
You can also use the standardizeMissing function and rmmissing function together to achieve this:
>> [~,tf] = rmmissing(standardizeMissing(A,0),'MinNumMissing',floor(0.5*size(A,2))+1);
>> A(~tf,:)
The call to standardizeMissing replaces the 0 values with NaN (the standard missing indicator for double), then the rmmissing call identifies in the logical vector tf the rows that have more than 50% of their entries as 0 (i.e., those rows that have more than floor(0.5*size(A,2))+1 0-valued entries. Then you can just negate the tf output and use it as an indexer. You can adapt the minimum number missing easily to satisfy whatever percentage criteria you want.
Also note that tf is a logical vector here that is only the size of the number of rows of A.
As I mentioned on Luis' answer, one downside to his approach is that it requires an intermediate logical array of the same size as A to be created, which can potentially incur a significant memory/performance penalty when working with large arrays.
An explicit looped approach with nnz (overly verbose, for clarity):
[nrows, ncols] = size(A);
maximum_ratio_of_zeros = 0.5;
minimum_ratio_of_nonzeros = 1 - maximum_ratio_of_zeros;
todelete = false(nrows, 1);
for ii = 1:nrows
if nnz(A(ii,:))/ncols < minimum_ratio_of_nonzeros
todelete(ii) = true;
end
end
A(todelete,:) = [];
Which returns the desired answer.

Generating a matrix to describe a two-dimensional feature

Let's say I have a vector A = [-1,2];
Each element in A is described by the actual number and sign. So each element has a 2 dimensional feature-set.
I would like to generate a matrix, in this case 2x2 where the columns correspond to the element, and rows correspond to the presence of a feature. The presence of a feature is described by 1's and 0's. So, if an element is positive, it is 1, if the element is the number 1, then the result is 1 as well. In the case above I would get:
Element 1 Element 2
Is this a 1? 1 0
Is this a positive number? 0 1
What is the smartest way to go about accomplishing this? Obviously if statements would work, but I feel that there should be a faster, much smarter way of going about this. I am coding this in matlab by the way, and I would appreciate any help.
#Benoit_11's solution is a fine one. Here's a similar but maybe simpler solution. You could try both and see which is faster if you care about speed.
features = [abs(A) == 1; A > 0];
this assumes A is a row vector in order to get the output in the format you specified.
Simple way using ismember for the first condition and logical operation for the 2nd condition. ismember outputs a logical array which you can plug into the output you need (here called DescribeA; and likewise when you check for values greater than 0 using the > operator.
%// Test array
A = [-1,2,1,-10,5,-3,1]
%// Initialize output
DescribeA = zeros(2,numel(A));
%// 1st condition. Check if values are 1 or -1
DescribeA(1,:) = ismember(A,1)|ismember(A,-1);
%// Check if they are > 0
DescribeA(2,:) = A>0;
Output in Command Window:
A =
-1 2 1 -10 5 -3 1
DescribeA =
1 0 1 0 0 0 1
0 1 1 0 1 0 1
I feel there is a smarter way for the 1st condition but I can't seem to find it.

Using bsxfun with an anonymous function

after trying to understand the bsxfun function I have tried to implement it in a script to avoid looping. I am trying to check if each individual element in an array is contained in one matrix, returning a matrix the same size as the initial array containing 1 and 0's respectively. The anonymous function I have created is:
myfunction = #(x,y) (sum(any(x == y)));
x is the matrix which will contain the 'accepted values' per say. y is the input array. So far I have tried using the bsxfun function in this way:
dummyvar = bsxfun(myfunction,dxcp,X)
I understand that myfunction is equal to the handle of the anonymous function and that bsxfun can be used to accomplish this I just do not understand the reason for the following error:
Non-singleton dimensions of the two input arrays must match each other.
I am using the following test data:
dxcp = [1 2 3 6 10 20];
X = [2 5 9 18];
and hope for the output to be:
dummyvar = [1,0,0,0]
Cheers, NZBRU.
EDIT: Reached 15 rep so I have updated the answer
Thanks again guys, I thought I would update this as I now understand how the solution provided from Divakar works. This might deter confusion from others who have read my initial question and are confused to how bsxfun() works, I think writing it out helps me understand it better too.
Note: The following may be incorrect, I have just tried to understand how the function operates by looking at this one case.
The input into the bsxfun function was dxcp and X transposed. The function handle used was #eq so each element was compared.
%%// Given data
dxcp = [1 2 3 6 10 20];
X = [2 5 9 18];
The following code:
bsxfun(#eq,dxcp,X')
compared every value of dxcp, the first input variable, to every row of X'. The following matrix is the output of this:
dummyvar =
0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
The first element was found by comparing 1 and 2 dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
The next along the first row was found by comparing 2 and 2 dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
This was repeated until all of the values of dxcp where compared to the first row of X'. Following this logic, the first element in the second row was calculating using the comparison between: dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
The final solution provided was any(bsxfun(#eq,dxcp,X'),2) which is equivalent to: any(dummyvar,2). http://nf.nci.org.au/facilities/software/Matlab/techdoc/ref/any.html seems to explain the any function in detail well. Basically, say:
A = [1,2;0,0;0,1]
If the following code is run:
result = any(A,2)
Then the function any will check if each row contains one or several non-zero elements and return 1 if so. The result of this example would be:
result = [1;0;1];
Because the second input parameter is equal to 2. If the above line was changed to result = any(A,1) then it would check for each column.
Using this logic,
result = any(A,2)
was used to obtain the final result.
1
0
0
0
which if needed could be transposed to equal
[1,0,0,0]
Performance- After running the following code:
tic
dummyvar = ~any(bsxfun(#eq,dxcp,X'),2)'
toc
It was found that the duration was:
Elapsed time is 0.000085 seconds.
The alternative below:
tic
arrayfun(#(el) any(el == dxcp),X)
toc
using the arrayfun() function (which applies a function to each element of an array) resulted in a runtime of:
Elapsed time is 0.000260 seconds.
^The above run times are averages over 5 runs of each meaning that in this case bsxfun() is faster (on average).
You don't want every combination of elements thrown into your any(x == y) test, you want each element from dxcp tested to see if it exists in X. So here is the short version, which also needs no transposes. Vectorization should also be a bit faster than bsxfun.
arrayfun(#(el) any(el == X), dxcp)
The result is
ans =
0 1 0 0 0 0

Resources