Matlab, find common elements of two cell arrays - arrays

I have two cell arrays, the sizes are 1x20033 and 1x19. Let's call these two cell arrays as A and B. I want to compare each cell of A with each cell of B to see if there is any common element.
Finally, I need to build a binary matrix and put one when there is a match.
I tried this:
BinaryMatrix=zeros(20033,19);
for i=1:1:20033
for j=1:1:19
match=find(ismember(A{i},B{j}));
if match==1
BinaryMatrix(i,j)= 1;
end
end
end
but I faced this error: "Input A of class double and input B of class cell must be
cell arrays of strings, unless one is a string."
Please tell me What should I do to solve it?

The code that you have almost works. What I would recommend you do is split up the strings found in A and B by spaces. As such, A and B would then be cell arrays of elements where each element in A or B is a single word. The spaces will serve as delimiters for separating out the words.
Once you do this, use intersect to see if there are any common words between the words in A and the words in B. intersect works by considering two arrays (these can be numeric arrays, cell arrays, etc.) C and D as sets, and it returns the set intersection between these two arrays.
In our case, C and D would be a cell array of words separated by spaces from A and B. intersect(C,D) will return a cell array of strings where each element in the output is a string found in both C and D. As such, should this cell array be non-empty, we have found at least one common word between C and D. If this is the case, then set your binary flag at the location of your matrix to 1. In other words:
BinaryMatrix = false(20033,19);
for i=1:1:20033
for j=1:1:19
Asplit = strsplit(A{i});
Bsplit = strsplit(B{j});
if (~isempty(intersect(Asplit, Bsplit)))
BinaryMatrix(i,j)= true;
end
end
end
You'll notice that I have changed your matrix from zeros(20033,19), to false(20033,19). The reason why is because by doing zeros, you are allocating 8 bytes per number in your matrix as this will create your matrix in double precision. By doing false, this will be a logical matrix instead, and you are allocating 1 byte per number. Seeing as how you want BinaryMatrix to be either true or false, don't use double - use logical. I don't know how large both cell arrays are, and so doing this will cut down your memory consumption by 8.
Minor Note
strsplit is only available from R2013a and onwards. If you have a version of MATLAB that is R2012b and lower, replace strsplit with regexp. As such, you would replace the two lines in the for loop with:
Asplit = regexp(A{i}, ' ', 'split');
Bsplit = regexp(B{j}, ' ', 'split');

Related

Calculating standard deviation on data stored in a cell array of cell arrays

I have a cell array A Mx3 in size where each entry contains a further cell-array Nx1 in size, for example when M=9 and N=5:
All data contained within in any given cell array is in vector format and of equal length. For example, A{1,1} contains 5 vectors 1x93 in size whilst A{1,2} contains 5 vectors 1x100 in size:
I wish to carry out this procedure on each of the 27 cells:
B = transpose(cell2mat(A{1,1}));
B = sort(B);
C = std(B,0,2); %Calculate standard deviation
Ultimately, the desired outcome would be, for the above example, 27 columns (9x3) containing the standard deviation results (padded with 0 or NaNs to handle differing lengths) printed in the order A{1,1}, A{1,2}, A{1,3}, A{2,1}, A{2,2}, A{2,3} and so forth.
I can do this by wrapping the above code into a loop to iterate over each one of the 27 cells in the correct order however, I was wondering if there was a clever cellfun or more succinct method to accomplish this particularly without the use of a loop?
You should probably realize that cellfun is essentially a glorified for loop over cells. There's simply extra error checking and all that to ensure that the whole thing works. In any case, yes it's possible to do what you're asking in a single cellfun call. Note that I am simply going to apply the same logic as you would have in a for loop with cellfun. Also note that because you're using cell arrays, you have no choice but to iterate over the entire master cell array. However, what you'll want to do is pad each resulting column vector in each output in the final cell array so that they all share the same length. We can do that with another two cellfun calls - one to determine the largest vector length and another to perform the padding operation.
Something like this could work:
% Step #1 - Sort the vectors in each cell array, then find row-wise std
B = cellfun(#(x) std(sort(cell2mat(x).'), 0, 2), A, 'un', 0);
% Step #2 - Determine the largest length vector and pad
sizes = cellfun(#numel, B);
B = cellfun(#(x) [x; nan(max(sizes(:)) - numel(x), 1)], B, 'un', 0);
The first line of code takes each element in A, converts each cell element into a N x 5 column matrix (i.e. cell2mat(x).'), we then sort each column individually with sort, then take the standard deviation row-wise. Because the output is ultimately a vector, we must make sure that the 'UniformOutput' flag is 0, or 'un=0'. Once we complete the standard deviation calculation, we determine the total number of elements for each resulting column vector for all cell elements, determine the largest size then use another cellfun call to pad these vectors so they all match the same size.
To finally get your desired output, you need to transpose the cell array, then unroll the elements in column major order. Remember that MATLAB accesses things in column major, so a common trick to get things in row-major (what you want) as opposed to column major is to first transpose, then unroll in column-major fashion to perform a row-major readout. Doing this in one line is tricky, so you'll need to not only transpose the cell array, you must use reshape to ensure that the elements are read out in row major format, but then ensuring that the result is placed in a row of cells, then call cell2mat so you can piece these vectors together. The final result should be a 27 column matrix where we have pieced all of these vectors together in a single row-wise fashion:
C = cell2mat(reshape(B.', 1, []));

MATLAB: use strcmp(s1,s2) for variable length vector with strings

I have a query which I am trying to solve
I know that one can use strcmp(s1,s2) to compare two different strings to see whether they are the same. It gives 1 if that is the case.
However, how would one tackle this problem if you have a variable length array full of strings and you want to the whether all strings in the array are the same.
For example: ['NACA64A010' 'NACA64A010' 'NACA64A010' 'NACA64A010'] we can see that all the strings are the same in this array. However, how would one go about with using strcmp(s1,s2).
Thanks guys!
If you want all pairwise comparisons between strings: call ndgrid to generate indices of all combinations, and then index into your cell array of strings and call strcmp:
x = {'NACA64A010' 'NACA64A010' 'NACA64A010' 'NACA64A010'};
[ii, jj] = ndgrid(1:numel(x));
result = strcmp(x(ii), x(jj));
In this case
result =
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
because all strings are the same.
You probably had a pairwise comparison using strcmp in mind, but you can use it directly on cell arrays:
x={'NACA64A010' 'NACA64A010' 'NACA64A010' 'NACA64A010'}
result=all(strcmpi(x{1},x(2:end)))
Compare the first element to the remaining elements. It returns true only if all elements are equal. For a pairwise comparison you could us:
[~,~,c]=unique(x);
result=bsxfun(#eq,c,c.')
If you're solving the problem with a matrix (i.e. every row is a string) there are no particularly nice solutions in my opinion, but if your strings are contained into a cell array, things are getting easier and nicer.
So we start by creating such cell array:
myStrings={'NACA64A010' 'NACA64A010' 'NACA64A010' 'NACA64A010'};
where each cell contains a string. This will make your code more robust as well since every string can have a different length (this is not true if you concatenate all your strings in a matrix).
Then you specify which string you want to find inside such cell array:
stringThatMustBeTested='NACA64A010';
Now you can use cellfun(), which is a function that applies another function to every cell of a given cell array as follows:
results=cellfun(#(x) strcmp(x,stringThatMustBeTested),myStrings);
Such line simply means "apply strcmp() to every generic cell x inside myStrings and compare the cell with stringThatMustBeTested".
Variable results will be a logical output in which element j will be true if the j-th cell in your cell array is equal to the string you want to test. If results is entirely composed by 1s (which you can check as if sum(results)==length(results)), then all the strings are the same in myStrings (given that stringThatMustBeTested is the unique string in your cell array but anyways, this solution can be extended to a broader string search inside a cell).

Cell array to matrix conversion in matlab

I would like to covert three <1xN cell> (A, B and C) into a single Nx3 matrix. Could someone help me with this?
C={{1xN}; {1xN}; {1xN}};
where each N is a number in single quotes, e.g.
C = {{'123123' ,'12324', ....N times}; {'123123', '12324', ....N times}; {'123123', '12324' ,....N times}}
Since a couple of them mentioned about the ridiculous input, this is the reason for having it in the above form.
The three nested array of cells are the results of a regexp where my string and expression are both strings. Therefore I have the output of regexp as three cell arrays of row vectors.
For e.g.
node_ids=regexp(nodes,'(?<=node id=")\d*','match');
I can use cat function and then use a str2double for all three cell arrays and finally form a matrix by cell2mat.
For e.g.
node_ids=cat(1,node_ids{:});node_ids=str2double(node_ids);
But this takes more time and has more LOC.
My question is can it be done with fewer lines of code?
I tried using the cat function but keep getting this error:
Cannot support cell arrays containing cell arrays or objects.
Your input data is pretty bad.... why are you using a nested array of cells where each element is a string?
In any case, assuming C is your original input data, do this:
C = {{'123123' '12324'}; {'123123' '12324'}; {'123123' '12324'}};
out = cellfun(#(x) cellfun(#str2num, x, 'uni', 0), C, 'uni', 0);
out = cell2mat(cellfun(#cell2mat, out, 'uni', 0));
First line is some dummy data. Next line first goes through every nested cell element over your cell array and converts the strings into numbers. However, these are still in cell arrays. As such, the next line converts each cell array in the nested cell into a matrix, then we merge all of the cells together into one final matrix.
We get:
>> out
out =
123123 12324
123123 12324
123123 12324

Inserting zeros into an array and looping using a for loop

I have several arrays that are calculated example a,b and c (there are more than three) are calculated: Please note this is just an example the numbers are much larger and are not so basic
a=[1,2,3,4,5] b=[10,20,30,40,50] c=[100,200,300,400,500] and I want a for loop that inserts zeros into it so I can have the new_abc array steps look like.
1st for loop step new_abc=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
2nd for loop step new_abc=[1,0,0,2,0,0,3,0,0,4,0,0,5,0,0]
3rd for loop step new_abc=[1,10,0,2,20,0,3,30,0,4,40,0,5,50,0]
4th for loop step new_abc=[1,10,100,2,20,200,3,30,300,4,40,400,5,50,500]
how can I do this with a for loop?
I started with the code below which gives me the zeros
a=[1,2,3,4,5]
new_abc=zeros(1,length(a)*(3));
But I'm not sure how to place the values of the array a b and c using a for loopinto the correct locations ofnew_abc
I know I could place all the arrays into one large array and do a reshape but the calculated arrays I use become to large and I run out of ram, so reading / calculating each array and inserting them into one common array new_abcusing a for loop works best.
I'm running octave 3.8.1 which is like matlab.
This should do it. You can put a,b,c into a cell array. (you can also put them in a matrix...)
new_abc = zeros(1, 3*numel(a));
in = {a, b, c};
for k = 1:3
new_abc(k:3:end) = in{k};
end

Matlab: Delete the item in an N-dimensional array whose Nth dimension is 1, where N is unknown?

I have an N-dimensional array of items whose last dimension is the index of the array.
For example, if the array A contained images, then A(:,:,:,1) would be the first image, A(:,:,:,2) would be the second image, and so forth.
Similarly, if the array just contained integers, then A(:,1) would be the first integer, A(:,2) would be the second integer, and so forth.
-=-=-=-
What I'm trying to do is delete the first item from A when I do not know ahead of time what dimensionality it is.
If A contains images, I want to do this:
A(:,:,:,1) = [];
If A contains integers, I want to do this:
A(:,1) = [];
The problem is since I don't know what dimensionality it is, I don't know how many colons to put, and I don't know how to denote "N-1 colons here" in Matlab.
I'm hoping there is a programmatic way to do this, but I frankly have no idea what to search for if this is possible.
You can either use cell to comma-separated list expansion:
%// Build cell: {':', ':', ..., ':', [1]}
I(1:ndims(A)-1) = {':'};
I{ndims(A)} = 1;
%// Expand cell to comma separated list and delete:
A(I{:}) = [];
Or convert to cell using num2cell and then convert back using cell2mat:
C = num2cell(A,1:ndims(A)-1);
A = cell2mat(C(2:end));
I guess that unless you really need n-dimensional arrays, doing this with a cell array of n-1 dimensional arrays instead (as is C in the above code) should be a smart move in terms of simplicity of notation.

Resources