Generating a matrix to describe a two-dimensional feature - arrays

Let's say I have a vector A = [-1,2];
Each element in A is described by the actual number and sign. So each element has a 2 dimensional feature-set.
I would like to generate a matrix, in this case 2x2 where the columns correspond to the element, and rows correspond to the presence of a feature. The presence of a feature is described by 1's and 0's. So, if an element is positive, it is 1, if the element is the number 1, then the result is 1 as well. In the case above I would get:
Element 1 Element 2
Is this a 1? 1 0
Is this a positive number? 0 1
What is the smartest way to go about accomplishing this? Obviously if statements would work, but I feel that there should be a faster, much smarter way of going about this. I am coding this in matlab by the way, and I would appreciate any help.

#Benoit_11's solution is a fine one. Here's a similar but maybe simpler solution. You could try both and see which is faster if you care about speed.
features = [abs(A) == 1; A > 0];
this assumes A is a row vector in order to get the output in the format you specified.

Simple way using ismember for the first condition and logical operation for the 2nd condition. ismember outputs a logical array which you can plug into the output you need (here called DescribeA; and likewise when you check for values greater than 0 using the > operator.
%// Test array
A = [-1,2,1,-10,5,-3,1]
%// Initialize output
DescribeA = zeros(2,numel(A));
%// 1st condition. Check if values are 1 or -1
DescribeA(1,:) = ismember(A,1)|ismember(A,-1);
%// Check if they are > 0
DescribeA(2,:) = A>0;
Output in Command Window:
A =
-1 2 1 -10 5 -3 1
DescribeA =
1 0 1 0 0 0 1
0 1 1 0 1 0 1
I feel there is a smarter way for the 1st condition but I can't seem to find it.

Related

How do you calculate the number of elements in a jagged array in F#?

I am new to F# and haven't found the answer to this anywhere. I am creating a jagged array that can hold 10 rows and 10 columns each with an increasing number of elements. The code I used for the array creation and printing is as follows:
let jagged = [| for a in 1 .. 10 do yield [| for a in 1 .. a do yield 0 |] |]
let mutable len = 0;
for arr in jagged do
for col in arr do
len <- (len + 1)
printf "%i " col
printfn "";
printfn "%i" len
The above code gives the following output
0
0 0
0 0 0
0 0 0 0
0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
55
Currently, I am calculating the number of elements manually but would like to know if there is a better way to do so.
If you want to calculate the length of a single array, you could use Array.length. But what you have is an array of arrays of different lengths, and you want to calculate the sum of their sizes. Rather than just give you the answer, I'll show you how you could use https://fsharpforfunandprofit.com/posts/list-module-functions/ (a site by Scott Wlaschin that's a really terrific resource, BTW) to find the answer yourself. This page presents a series of questions to help you find the functions you're looking for: starting from question 1, you move to other questions and eventually to a list of useful functions.
Question 1 on that page is, "What kind of collection do you have?" The choices are "I don't have a collection and I want to create one", or "I have one collection I want to work with", or several other choices where you have two or three or more collections. Here, we have one collection we want to work with, so the page directs us to question 9.
Question 9 on that page has a bunch of choices I won't repeat here, but one of them is "If you want to aggregate or summarize the collection into a single value". That sounds like what we want: we want the sum of the lengths of the sub-arrays. So we go to section 14, which has a bunch of functions we could use. And halfway down the list is sum and sumBy. Those sound intriguing. The sum function "returns the sum of the elements in the collection"... well, no, that won't work, because our array contains arrays, not numbers. But the sumBy function "returns the sum of the results generated by applying the function to each element of the collection." And we know there's a function for finding the length of a single array: Array.length. (The page talks about functions that work on lists, but pretty much any function that works on lists has a corresponding function that works on arrays and a similar corresponding function that works on sequences. The few exceptions are for things like how you can have infinite sequences, but not infinite arrays or lists, so there's a Seq.initInfinite function but there's no Array.initInfinite or List.initInfinite function).
So now that we've found that, we just need to write it.
let lengthOfJaggedArray arr = arr |> Array.sumBy Array.length
And that's it. Instead of calculating the length by hand via two nested for loops, there's a one-line solution that's quite simple and uses built-in functions. All you needed to do was know what functions are available — and since the entire list of available array/list/seq functions can be a little daunting when you're new to F#, Scott Wlaschin has made a very useful resource to help make it a bit less daunting.

How to delete rows from a matrix that contain more than 50% zeros MATLAB

I want to remove the rows in an array that contain more than 50% of null elements.
eg:
if the input is
1 0 0 0 5 0
2 3 5 4 3 1
3 0 0 4 3 0
2 0 9 8 2 1
0 0 4 0 1 0
I want to remove rows 1 and 5, but retain the rest. The output should look like:
2 3 5 4 3 1
3 0 0 4 3 0
2 0 9 8 2 1
I want to do this using matlab
Use logical indexing into the rows, based on the mean of the rows of A negated:
t = .5; % threshold
A(mean(A==0,2) > t, :) = [];
What this does:
Compare A with 0: turns zeros into true, and nonzeros into false.
Compute the mean of each row.
Compare that to the desired threshold.
Use the result as a logical index to delete unwanted rows.
Equivalently, you can keep the wanted rows instead of removing the unwanted ones. This may be faster depending on the proportion of rows:
A = A(mean(A~=0,2) >= 1-t, :);
You can also use the standardizeMissing function and rmmissing function together to achieve this:
>> [~,tf] = rmmissing(standardizeMissing(A,0),'MinNumMissing',floor(0.5*size(A,2))+1);
>> A(~tf,:)
The call to standardizeMissing replaces the 0 values with NaN (the standard missing indicator for double), then the rmmissing call identifies in the logical vector tf the rows that have more than 50% of their entries as 0 (i.e., those rows that have more than floor(0.5*size(A,2))+1 0-valued entries. Then you can just negate the tf output and use it as an indexer. You can adapt the minimum number missing easily to satisfy whatever percentage criteria you want.
Also note that tf is a logical vector here that is only the size of the number of rows of A.
As I mentioned on Luis' answer, one downside to his approach is that it requires an intermediate logical array of the same size as A to be created, which can potentially incur a significant memory/performance penalty when working with large arrays.
An explicit looped approach with nnz (overly verbose, for clarity):
[nrows, ncols] = size(A);
maximum_ratio_of_zeros = 0.5;
minimum_ratio_of_nonzeros = 1 - maximum_ratio_of_zeros;
todelete = false(nrows, 1);
for ii = 1:nrows
if nnz(A(ii,:))/ncols < minimum_ratio_of_nonzeros
todelete(ii) = true;
end
end
A(todelete,:) = [];
Which returns the desired answer.

Using bsxfun with an anonymous function

after trying to understand the bsxfun function I have tried to implement it in a script to avoid looping. I am trying to check if each individual element in an array is contained in one matrix, returning a matrix the same size as the initial array containing 1 and 0's respectively. The anonymous function I have created is:
myfunction = #(x,y) (sum(any(x == y)));
x is the matrix which will contain the 'accepted values' per say. y is the input array. So far I have tried using the bsxfun function in this way:
dummyvar = bsxfun(myfunction,dxcp,X)
I understand that myfunction is equal to the handle of the anonymous function and that bsxfun can be used to accomplish this I just do not understand the reason for the following error:
Non-singleton dimensions of the two input arrays must match each other.
I am using the following test data:
dxcp = [1 2 3 6 10 20];
X = [2 5 9 18];
and hope for the output to be:
dummyvar = [1,0,0,0]
Cheers, NZBRU.
EDIT: Reached 15 rep so I have updated the answer
Thanks again guys, I thought I would update this as I now understand how the solution provided from Divakar works. This might deter confusion from others who have read my initial question and are confused to how bsxfun() works, I think writing it out helps me understand it better too.
Note: The following may be incorrect, I have just tried to understand how the function operates by looking at this one case.
The input into the bsxfun function was dxcp and X transposed. The function handle used was #eq so each element was compared.
%%// Given data
dxcp = [1 2 3 6 10 20];
X = [2 5 9 18];
The following code:
bsxfun(#eq,dxcp,X')
compared every value of dxcp, the first input variable, to every row of X'. The following matrix is the output of this:
dummyvar =
0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
The first element was found by comparing 1 and 2 dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
The next along the first row was found by comparing 2 and 2 dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
This was repeated until all of the values of dxcp where compared to the first row of X'. Following this logic, the first element in the second row was calculating using the comparison between: dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
The final solution provided was any(bsxfun(#eq,dxcp,X'),2) which is equivalent to: any(dummyvar,2). http://nf.nci.org.au/facilities/software/Matlab/techdoc/ref/any.html seems to explain the any function in detail well. Basically, say:
A = [1,2;0,0;0,1]
If the following code is run:
result = any(A,2)
Then the function any will check if each row contains one or several non-zero elements and return 1 if so. The result of this example would be:
result = [1;0;1];
Because the second input parameter is equal to 2. If the above line was changed to result = any(A,1) then it would check for each column.
Using this logic,
result = any(A,2)
was used to obtain the final result.
1
0
0
0
which if needed could be transposed to equal
[1,0,0,0]
Performance- After running the following code:
tic
dummyvar = ~any(bsxfun(#eq,dxcp,X'),2)'
toc
It was found that the duration was:
Elapsed time is 0.000085 seconds.
The alternative below:
tic
arrayfun(#(el) any(el == dxcp),X)
toc
using the arrayfun() function (which applies a function to each element of an array) resulted in a runtime of:
Elapsed time is 0.000260 seconds.
^The above run times are averages over 5 runs of each meaning that in this case bsxfun() is faster (on average).
You don't want every combination of elements thrown into your any(x == y) test, you want each element from dxcp tested to see if it exists in X. So here is the short version, which also needs no transposes. Vectorization should also be a bit faster than bsxfun.
arrayfun(#(el) any(el == X), dxcp)
The result is
ans =
0 1 0 0 0 0

Finding row with maximum no. of 1s if each row is sorted using logicalOR approach

Question similar to this may have been discussed before but I want to discuss a different approach to this.
Given a boolen 2D array where each row is sorted, find the rows with maximum number of 1s.
Input Matrix :
0 1 1 1
0 0 1 1
1 1 1 1
0 0 0 0
Output : 2
How about doing this approach...Logical OR for column 0 of each row and if answer is 1, return that row index and stop. Like in this case if I do (0 | 0 | 1 | 0) answer would be one and thereby return that row index. if the input matrix is something like :
Input matrix:
0 1 1 1
0 0 1 1
0 0 0 1
0 0 0 0
Ouput : 0
When I do logicalOR of column 0 of each row, answer would be zero...so I would move to column 1 of each row, the procedure is followed till the LogicalOR is 1.?I know other approaches to solve this problem but I would like to have view on this approach.
If it's:
0 ... 0 1
0 ... 0 0
0 ... 0 0
0 ... 0 0
0 ... 0 0
You'd have to search many columns.
The maximum amount of work involved would be linear in the number of cells (O(mn)), and the other approaches outperform this here.
Specifically the approach where:
You start at the top right and
Repeatedly:
Search left until you find a 0 and
Search down until you find a 1
And return the last row where you found a 1
Is linear in the number of rows plus columns (O(m + n)).
That would work since it's equivalent to finding the row for which the leftmost 1 is before (or at the same point as) any other row's leftmost 1. It would still be O(m * n) in the worst case:
Input Matrix :
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1
Given that your rows are sorted, I would binary search for the position of the first one for each row, and return the row with the minimum position. This would be O(m * logn), although you might be able to do better.
Your approach is likely to be orders of magnitude slower than the naive "go through the rows, and count the zeros, and remember the row with the fewest zeros." The reason is that, assuming your bits are stored one-row-at-a-time, with the bools packed tightly, then memory for the row will be in cache all at once, and bit-counting will cache beautifully.
Contrast this to your proposed approach, where for each row, the cache line will be loaded, and a single bit will be read from it. By the time you've cycled through all the rows in your array, the memory for the first row will (probably, if you've got any reasonable number of rows), be out of the cache, and the row will have to be loaded again.
Approximately, assuming a 64B cache line, the first approach is going to need (1/64*8) memory accesses per bit in the array, compared to 1 memory access per bit in the array compared to yours. Since counting the bits and remembering the max is just a few cycles, it's reasonable to think that the memory access are going to dominate the running cost, which means the first approach will run approximately 64 * 8 = 512 times faster. Of course, you'll get some of that time back because your approach can terminate early, but the 512 times speed hit is a large cost to overcome.
If your rows are super-long, you may find that a hybrid between these two approaches works excellently: count the number of bits in the first cache-line's worth of data in each row (being careful to cache-line-align each row of your data in memory), and if every row has no bits set in the first cache-line, go to the second and so forth. This combines the cache-efficiency of the first approach with the early termination of the second approach.
As with all optimisations, you should measure results, and be sure that it's important that the code is fast. The efficient solution is likely to impose annoying restrictions (like 64-byte memory alignment for rows), and the code will be harder to read than a straightforward solution.

Look at each row separately in a matrix (Matlab)

I have a matrix in Matlab(2012) with 3 columns and X number of rows, X is defined by the user, so varies each time. For this example though I will use a fixed 5x3 matrix.
So I would like to perform an iterative function on each row within the matrix, while the value in the third column is below a certain value. Then store the new values within the same matrix, so overwrite the original values.
The code below is a simplified version of the problem.
M=[-2 -5 -3 -2 -4]; %Vector containing random values
Vf_X=M+1; %Defining the first column of the matrix
Vf_Y=M+2; %Defining the secound column of the matrix
Vf_Z=M; %Defining the third column of the matrix
Vf=[Vf_X',Vf_Y',Vf_Z']; %Creating the matrix
while Vf(:,3)<0
Vf=Vf+1;
end
disp(Vf)
The result I get is
1 2 0
-2 -1 -3
0 1 -1
1 2 0
-1 0 -2
Ideally I would like to get this result instead
1 2 0
1 2 0
1 2 0
1 2 0
1 2 0
The while will not start if any value is above zero to begin with and stops as soon as one value goes above zero.
I hope this makes sense and I have supplied enough information
Thank you for your time and help.
Your current problem is that you stop iterating the very moment any of the values in the third row break the condition. Correct me if I'm wrong, but what I think you want is to continue doing iterations on the remaining rows, until the conditions are broken by all third columns.
You could do that like this:
inds = true(size(Vf,1),1);
while any(inds)
Vf(inds,:) = Vf(inds,:)+1;
inds = Vf(:,3) < 0;
end
Of course, for the simple addition you provide, there is a better and faster way:
inds = Vf(:,3)<0;
Vf(inds,:) = bsxfun(#minus, Vf(inds,:), Vf(inds,3));
But for general functions, the while above will do the trick.

Resources