Counting rows without any NaNs in a struct - arrays

I need to count how many structs do not have any NaNs across all fields in an array of structures. The sample struct looks like this:
a(1).b = 11;
a(2).b = NaN;
a(3).b = 22;
a(4).b = 33;
a(1).c = 44;
a(2).c = 55;
a(3).c = 66;
a(4).c = NaN;
The output looks like this
Fields b c
1 44 11
2 55 NaN
3 66 22
4 NaN 33
The structs without NaNs are 1 and 3, so there should be 2 in total here.
I tried using size(a, 2), but it just tells me the total number of structs in the array. I need it to calculate N (the number of observations in the sample). NaNs don't count as observations as they are omitted in the analysis.
What is the simplest way to count structs without any NaNs in a struct array?

I would suggest using the following one-line command:
nnz(~any(cellfun(#isnan,struct2cell(a))))
struct2cell(a) converts your struct into the 3D cell array
cellfun(#isnan,___) applies isnan to each element of cell array
~any(__) works along first dimension and returns arrays that have no NaNs
nnz(__) counts how many rows have no NaNs
The result is just a number, 2 in this case.
The following:
find(~any(cellfun(#isnan,struct2cell(a))))
Would tell you which rows are without NaNs

This will tell you which ones have no NaNs
for ii=1:size(a,2)
hasNoNaNs(ii)=~any(structfun(#isnan,a(ii)));
end
The way it works is iterates trhoug each of the structures, and use structfun to call isnan in each of the elements of it, then checks if any of them is a NaN and negates the result, thus giving 1 in the ones that have no NaNs

Because bsxfun is never the wrong approach!
sum(all(bsxfun(#le,cell2mat(struct2cell(a)),inf)))
How this works:
This converts the struct to a cell, and then to a matrix:
cell2mat(struct2cell(a))
ans(:,:,1) =
11
44
ans(:,:,2) =
NaN
55
ans(:,:,3) =
22
66
ans(:,:,4) =
33
NaN
Then it uses bsxfun to check which of those elements are less than, or equal to zero. The only value that doesn't satisfy this condition is NaN.
bsxfun(#le,cell2mat(struct2cell(a)),inf)
ans(:,:,1) =
1
1
ans(:,:,2) =
0
1
ans(:,:,3) =
1
1
ans(:,:,4) =
1
0
Then, we check if all the values in each of those slices are true:
all(bsxfun(#le,cell2mat(struct2cell(a)),inf))
ans(:,:,1) =
1
ans(:,:,2) =
0
ans(:,:,3) =
1
ans(:,:,4) =
0
And finally, we sum it up:
sum(all(bsxfun(#le,cell2mat(struct2cell(a)),inf)))
ans =
2
(By the way: It's possible to just skip the bsxfun, but where's the fun in that)
sum(all(cell2mat(struct2cell(a))<=inf))

Use arrayfun to iterate over a and structfun to iterate over fields and you get a logical array of elements that do not have NaNs:
>> arrayfun(#(x) ~any(structfun(#isnan, x)), a)
ans =
1 0 1 0
Now you can just sum it
>> sum(arrayfun(#(x) ~any(structfun(#isnan, x)), a))
ans =
2

Taking the idea from this not working answer to use comma separated lists:
s=sum(~any(isnan([[a.b];[a.c]])));
It may look very dumb to hard-code the field names but it leads to fast code because it avoids both iterating and cell arrays.
Generalizing this approach to arbitriary field names, you end up with this solution:
n=true(size(a));
for f = fieldnames(a).'
n(isnan([a.(f{1})]))=false;
end
n=sum(n(:));
Assuming that you have a large struct with only few fieldnames this is very efficient, because it is only iterating the fieldnames.

Third solution maybe - may not be elegant depending on your data:
A = [[a.b];[a.c]]; %//EDIT -- Fixed based on #Daniel's correct solution
IndNotNaN = find (~isnan(A));
Depends if you have lots of structs you will have to concatenate a.b, a.c ....a.n

Related

Loading arrays of different sizes into a single array

I have 100 arrays with the dimension of nx1. n varies from one array to the next (e.g, n1 = 50, n2 = 52, n3 = 48 etc.). I would like to combine all these arrays into a single one with the dimension of 100 x m with m being the max of n's.
The issue I am running into is that as n varies, Matlab will throw out an error says that the dimensions mismatch. Is there a way to get around this so I can pad "missing" cell with N/A? For instance, if the first array contains 50 elements (i.e., n1 = 50) like this:
23
31
6
...
22
the second array contains 52 elements (i.e., n2 = 52) like this:
25
85
41
...
8
12
66
The result should be:
23 25
31 85
6 41
... ...
22 8
N/A 12
N/A 66
Thanks to the community in advance!
Here is another approach without eval.
array_lengths = cellfun(#numel, arrays);
max_length = max(array_lengths);
result = nan(max_rows, num_arrays);
for r=1:num_arrays
result(1:array_lengths(r),r) = arrays{r}(1:array_lengths(r));
end
Some explanation: I'm assuming your arrays are stored in a cell to begin with. Here is some code to generate fictitious data with the dimensions you gave:
% some dummy data for your arrays.
num_arrays = 100;
primerArrayCell = num2cell(ones(1,num_arrays)); % , 1, ones(1, num_arrays));
arrays = cellfun(#(c) rand(randi(50, 1),1), primerArrayCell, 'uniformoutput',false);
You can use cellfun with an anonymous function to get the lengths of each individual array:
% Assume your arrays are in a cell of arrays with the variable name arrays
array_lengths = cellfun(#numel, arrays);
max_length = max(array_lengths);
Allocate an array of nan values to store your result
% initialize your data to nan's.
result = nan(max_rows, num_arrays);
Then fill in the non-nan values based on the length of the arrays calculated previously.
for r=1:num_arrays
result(1:array_lengths(r),r) = arrays{r}(1:array_lengths(r));
end
You may want to consider using structure arrays for storing such datasets as it makes everything easier when merging them into a single array.
But to answer your question, if you have arrays like this:
a1 = 1:20; % array of size 1 x 20
n1 = numel(a1); % 20
a2 = 50:60; % array of size 1 x 11
n2 = numel(a2); % 11
... say you have nArrs arrays
Given nArrs arrays for example, you can create the desired matrix res like this:
m = max([n1, n2, .... ]);
res = ones(m,nArrs) * nan; % initialize the result matrix w/ nan
% Manually
res(1:n1,1) = a1.';
res(1:n2,2) = a2.';
% ... so on
% Or use eval instead like this
for i = 1:nArrs
eval(['res(1:n' int2str(i) ', i) = a' int2str(i) '.'';'])
end
Now bear in mind that using eval is NOT recommended but hopefully that just gives you an idea as to what to do. If you did use structures, you can replace eval with something more efficient and robust like arrayfun for instance.

Accumulating different sized column vectors stored as a cell array into a matrix padded with NaNs

Imagine I have a series of different sized column vectors inside an array and want to group them into a matrix by padding the empty spaces with NaN. How can I do this?
There is already an answer to a very similar problem (accumulate cells of different lengths into a matrix in MATLAB?) but that solution deals with row vectors and my problem is with column vectors. One possible solution could be transposing each of the array components and then applying the above mentioned solution. However, I have no idea how to do this.
Also, speed is a bit of an issue so if possible take that into consideration.
You can just slightly tweak that answer you found to work for columns:
tcell = {[1,2,3]', [1,2,3,4,5]', [1,2,3,4,5,6]', [1]', []'}; %\\ ignore this comment, it's just for formatting in SO
maxSize = max(cellfun(#numel,tcell));
fcn = #(x) [x; nan(maxSize-numel(x),1)];
cmat = cellfun(fcn,tcell,'UniformOutput',false);
cmat = horzcat(cmat{:})
cmat =
1 1 1 1 NaN
2 2 2 NaN NaN
3 3 3 NaN NaN
NaN 4 4 NaN NaN
NaN 5 5 NaN NaN
NaN NaN 6 NaN NaN
Or you could tweak this as an alternative:
cell2mat(cellfun(#(x)cat(1,x,NaN(maxSize-length(x),1)),tcell,'UniformOutput',false))
If you want speed the cell data structure is your enemy. For this example I will assume you have this vectors stored in a structure called vector_holder:
elements = fieldnames(vector_holder);
% Per Dan request
maximum_size = max(structfun(#max, vector_holder));
% maximum_size is the maximum length of all your separate arrays
matrix = NaN(length(elements), maximum_size);
for i = 1:length(elements)
current_length = length(vector.holder(element{i}));
matrix(i, 1:current_length) = vector.holder(element{i});
end
Many Matlab functions are slower when dealing with cell variables. In addition, a cell matrix with N double-precision elements requires more memory than a double-precision matrix with N elements.

Using bsxfun with an anonymous function

after trying to understand the bsxfun function I have tried to implement it in a script to avoid looping. I am trying to check if each individual element in an array is contained in one matrix, returning a matrix the same size as the initial array containing 1 and 0's respectively. The anonymous function I have created is:
myfunction = #(x,y) (sum(any(x == y)));
x is the matrix which will contain the 'accepted values' per say. y is the input array. So far I have tried using the bsxfun function in this way:
dummyvar = bsxfun(myfunction,dxcp,X)
I understand that myfunction is equal to the handle of the anonymous function and that bsxfun can be used to accomplish this I just do not understand the reason for the following error:
Non-singleton dimensions of the two input arrays must match each other.
I am using the following test data:
dxcp = [1 2 3 6 10 20];
X = [2 5 9 18];
and hope for the output to be:
dummyvar = [1,0,0,0]
Cheers, NZBRU.
EDIT: Reached 15 rep so I have updated the answer
Thanks again guys, I thought I would update this as I now understand how the solution provided from Divakar works. This might deter confusion from others who have read my initial question and are confused to how bsxfun() works, I think writing it out helps me understand it better too.
Note: The following may be incorrect, I have just tried to understand how the function operates by looking at this one case.
The input into the bsxfun function was dxcp and X transposed. The function handle used was #eq so each element was compared.
%%// Given data
dxcp = [1 2 3 6 10 20];
X = [2 5 9 18];
The following code:
bsxfun(#eq,dxcp,X')
compared every value of dxcp, the first input variable, to every row of X'. The following matrix is the output of this:
dummyvar =
0 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
The first element was found by comparing 1 and 2 dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
The next along the first row was found by comparing 2 and 2 dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
This was repeated until all of the values of dxcp where compared to the first row of X'. Following this logic, the first element in the second row was calculating using the comparison between: dxcp = [1 2 3 6 10 20]; X' = [2;5;9;18];
The final solution provided was any(bsxfun(#eq,dxcp,X'),2) which is equivalent to: any(dummyvar,2). http://nf.nci.org.au/facilities/software/Matlab/techdoc/ref/any.html seems to explain the any function in detail well. Basically, say:
A = [1,2;0,0;0,1]
If the following code is run:
result = any(A,2)
Then the function any will check if each row contains one or several non-zero elements and return 1 if so. The result of this example would be:
result = [1;0;1];
Because the second input parameter is equal to 2. If the above line was changed to result = any(A,1) then it would check for each column.
Using this logic,
result = any(A,2)
was used to obtain the final result.
1
0
0
0
which if needed could be transposed to equal
[1,0,0,0]
Performance- After running the following code:
tic
dummyvar = ~any(bsxfun(#eq,dxcp,X'),2)'
toc
It was found that the duration was:
Elapsed time is 0.000085 seconds.
The alternative below:
tic
arrayfun(#(el) any(el == dxcp),X)
toc
using the arrayfun() function (which applies a function to each element of an array) resulted in a runtime of:
Elapsed time is 0.000260 seconds.
^The above run times are averages over 5 runs of each meaning that in this case bsxfun() is faster (on average).
You don't want every combination of elements thrown into your any(x == y) test, you want each element from dxcp tested to see if it exists in X. So here is the short version, which also needs no transposes. Vectorization should also be a bit faster than bsxfun.
arrayfun(#(el) any(el == X), dxcp)
The result is
ans =
0 1 0 0 0 0

Building array from conditions on another array

I am new to matlab. I want to do the following:
Generate an array of a thousand replications of a random draws between three alternatives A,B and C, where at every draw, each alternative has the same probability to be picked.
So eventually I need something like P = [ A A B C B B B C A C A C C ... ] where each element in the array was chosen randomly among the three possible outcomes.
I came up with a solution which gives me exactly what I want, namely
% Generating random pick among doors 1,2,3, where 1 stands for A, 2 for B,
% 3 for B.
I = rand(1);
if I < 1/3
PP = 1;
elseif 1/3 <= I & I < 2/3
PP = 2;
else
PP = 3;
end
% Generating a thousand random picks among dors A,B,C
I = rand(999);
for i=1:999
if I(i) < 1/3
P = 1;
elseif 1/3 <= I(i) & I(i) < 2/3
P = 2;
else
P = 3;
end
PP = [PP P]
end
As I said, it works, but when I run the procedure, it takes a while for what appears to me as a simple task. At the same time, I long such a task is "supposed" to take in matlab. So I have three question:
Is this really a slow procedure to generate the desired outcome?
If it is, why is this procedure particularly slow?
What would be a more effective way to produce the desired outcome?
This can be done much easier with randi
>> PP = randi(3,1,10)
PP =
2 1 3 3 2 2 2 3 2 1
If you actually want to choose between 3 alternatives, you use the output of randi directly to index into another matrix.
>> options = [13,22,77]
options =
13 22 77
>> options(randi(3,1,10))
ans =
22 13 77 13 77 13 22 22 77 13
As to the reason why your solution is slow, you do something similar to this:
x = [];
for i=1:10
x = [x i^2]; %size of x grows on every iteration
end
This is not very good, since on every iteration, Matlab needs to allocate space for a larger vector x. In old versions of Matlab, this lead to quadratic behavior (if you double the size of the problem, it takes 4 times longer). In newer versions, Matlab is smart enough to avoid this problem. It is however still considered nice to preallocate space for your array if you know beforehand how big it will be:
x = zeros(1,10); % space for x is preallocated. can also use nan() or ones()
for i = 1:length(x)
x(i) = i^2;
end
But in many cases, it is even faster to use vectorized code that does not use any for-loops like so:
x = (1:10).^2;
All 3 solutions give the same result:
x = 1 4 9 16 25 36 49 64 81 100
cnt=10;
option={'a','b','c'}
x=option([randi(numel(option),cnt,1)])

Matlab removing unwanted numbers from array

I have got a matlab script from net which generates even numbers from an inital value. this is the code.
n = [1 2 3 4 5 6];
iseven = [];
for i = 1: length(n);
if rem(n(i),2) == 0
iseven(i) = i;
else iseven(i) = 0;
end
end
iseven
and its results is this
iseven =
0 2 0 4 0 6
in the result i am getting both even numbers and zeros, is there any way i can remove the zeros and get a result like this
iseven =
2 4 6
To display only non-zero results, you can use nonzeros
iseven = [0 2 0 4 0 6]
nonzeros(iseven)
ans =
2 4 6
You can obtain such vector without the loop you have:
n(rem(n, 2)==0)
ans =
2 4 6
However, if you already have a vector with zeros and non-zeroz, uou can easily remove the zero entries using find:
iseven = iseven(find(iseven));
find is probably one of the most frequently used matlab functions. It returns the indices of non-zero entries in vectors and matrices:
% indices of non-zeros in the vector
idx = find(iseven);
You can use it for obtaining row/column indices for matrices if you use two output arguments:
% row/column indices of non-zero matrix entries
[i,j] = find(eye(10));
The code you downloaded seems to be a long-winded way of computing the range
2:2:length(n)
If you want to return only the even values in a vector called iseven try this expression:
iseven(rem(iseven,2)==0)
Finally, if you really want to remove 0s from an array, try this:
iseven = iseven(iseven~=0)
Add to the end of the vector whenever you find what you're looking for.
n = [1 2 3 4 5 6];
iseven = []; % has length 0
for i = 1: length(n);
if rem(n(i),2) == 0
iseven = [iseven i]; % append to the vector
end
end
iseven
To remove the all Zeros from the program we can use the following command,and the command is-
iseven(iseven==0)=[]

Resources