MATLAB Search Within a Cell Array of Cells - arrays

Setup:
I have a 21 x 3 cell array.
The first 2 columns are USUALLY strings or char arrays, but could be 1xn cells of strings or char arrays (if there are multiple alternate strings that mean the same thing in the context of my script). The 3rd element is a number.
I'm looking to return the index of any EXACT match of with a string or char array (but type doesn't have to match) contained in this cell array in column 1, and if column 1 doesn't match, then column 2.
I can use the following:
find(strcmp( 'example', celllist(:,1) ))
find(strcmp( 'example', celllist(:,2) ))
And these will match the corresponding indices with any strings / char arrays in the top level cell array. This won't, of course, match any strings that are inside of cells of strings inside the top level cell array.
Is there an elegant way to match those strings (that is, without using a for, while, or similar loop)? I want it to return the index of the main cell array (1 through 21) if the cells contains the match OR the cell within the cell contains an exact match in ANY of its cells.

The cellstr function is your friend, since it converts all of the following to a cell array of chars:
chars e.g. cellstr( 'abc' ) => {'abc'}
cells of chars e.g. cellstr( {'abc','def'} ) => {'abc','def'}
strings e.g. cellstr( "abc" ) => {'abc'}
string arrays e.g. cellstr( ["abc", "def"] ) => {'abc','def'}
Then you don't have to care about variable types, and can just do an ismember check on every element, which we can assume is a cell of chars.
We can set up a test:
testStr = 'example';
arr = { 'abc', 'def', {'example','ghi'}, "jkl", "example" };
% Expected output is [0,0,1,0,1]
Doing this with a loop to better understand the logic would look like this:
isMatch = false(1,numel(arr)); % initialise output
for ii = 1:numel(arr) % loop over main array
x = cellstr(arr{ii}); % convert to cellstr
isMatch(ii) = any( ismember( testStr, x ) ); % check if any sub-element is match
end
If you want to avoid loops* then you can do this one-liner instead using cellfun
isMatch = cellfun( #(x) any( ismember( testStr, cellstr(x) ) ), arr );
% >> isMatch = [0 0 1 0 1]
So for your case, you could run this on both columns and apply some simple logic to select the one you want
isMatchCol1 = cellfun( #(x) any( ismember( testStr, cellstr(x) ) ), arr(:,1) );
isMatchCol2 = cellfun( #(x) any( ismember( testStr, cellstr(x) ) ), arr(:,2) );
If you want the row index instead of a logical array, you can wrap the output with the find function, i.e. isMatchIdx = find(isMatch);.
*This only avoids loops visually, cellfun is basically a looping device in disguise, but it does save us initialising the output at least.

Related

how to compare two 1d arrays and output the indices where array 1 has the same scalar value as array 2

I'm trying to figure out how to take two 1d arrays, put it in a function so that every number in array 1 is compared to every number in array 2, and then the element numbers of the duplicate numbers found is displayed with respect to array 1
for example
array 1 = [12,16,36,72,82]
array 2= [16,53,72,12,40,71]
and the output would be
elements= 1 2 4
I'm new to Matlab so I don't currently have all the skills to make This work I'm trying to figure it out but I don't know what exactly to do.
it won't let me post the code because it doesn't make sense.
im not sure how to post is on her otherwise.
[Edit2]
Best approach
The unreliable approach below works for the example given by the OP, with the restriction discussed below. The best approach is using MATLAB's ismember and find functions:
array1 = [12,16,36,72,82];
array2 = [16,53,72,12,40,71];
idc = find( ismember(array1, array2 ) )
ismember(array1, array2) returns a logical array indicating which elements of array1 are contained in array2.
find(...) converts the logical array to indices.
[Edit1]
Alternative approach
An approach without ismember would be:
array1 = [12,16,36,72,82];
array2 = [16,53,72,12,40,71];
findFcn = #(X) find( array1(:) == X )';
idcs = arrayfun(findFcn, array2(:), 'UniformOutput', false );
idcs = unique([idcs{:}])
Explanation:
findFun(X) gives the index of the value of X in array1.
findFun(X) is called by arrayfun with X being equal to each element of array2 and arrayfun stores the return values of the calls to the cell array idcs.
Finally, idcs = unique([idcs{:}]); converts the cell array idc to a double array and removes repetitions.
[First answer]
Unreliable approach
You can use the intersect function:
array1 = [12,16,36,72,82];
array2 = [16,53,72,12,40,71];
[ ~, x1 ] = intersect( array1, array2 );
~ means that the first return value, which would be the intersection of array1 and array2, is discarded. x1 is then equal to the indices that the intersection's values have in array1.
If you also want to have the indices in array2 you could do
[ ~, x1, x2 ] = intersect( array1, array2 );
to store them in x2.
Note:
x1 and x2 only contain the indices of the first occurrence in array1 and array2, respectively.

Filter MATLAB non-numerical array data based on criteria

Two questions, one fairly simple question (at least it seems it should be simple) and one that may take a bit more work. Feel free to contribute to either or both.
First, I'd like to create a string array based off of an existing string array based on a criteria. Take for example a similar operation with a double array:
>> nums = [ 1 2 1 2]
nums =
1 2 1 2
>> big_nums = (nums == 2) .* nums
big_nums =
0 2 0 2
I'd like to do something similar with a string array, however I don't know what function to use:
>> sizes = ["XL" "L" "XL" "L"]
sizes =
1×4 string array
"XL" "L" "XL" "L"
>> large_sizes = (sizes == "L") .* sizes
Undefined operator '.*' for input arguments of type 'string'.
I'd like the output to be
large_sizes =
1×4 string array
"" "L" "" "L"
Second question. Suppose I have a 2 dimensional cell array. I'd like to filter data based on criteria:
>> data = {"winter", 1; "spring", 2; "summer", 3; "fall", 4}
data =
4×2 cell array
["winter"] [1]
["spring"] [2]
["summer"] [3]
["fall" ] [4]
>> nice_weather = ( (data(1,:) == "fall") + (data(1,:) == "spring") ) .* data
Error using ==
Cell must be a cell array of character vectors.
I'd like a code that results in one of two arrays:
nice_weather =
4×2 cell array
[""] [1]
["spring"] [2]
[""] [3]
["fall"] [4]
----- OR -----
nice_weather =
2×2 cell array
["spring"] [2]
["fall"] [4]
For this question, I am also open to separating data into multiple arrays (for example, one array for strings and one array for numbers).
Thanks!
This solution uses the strcmpi function from MATLAB (no toolbox required) to compare two strings (insensitive to the case).
1D Cell Array:
sizes = {'XL' 'L' 'XL' 'L'}; % Changed " to ' & used cell array
idx = strcmpi(sizes,'L'); % Logical index
sizelist = {sizes{idx}}
Or you could try something like
sizes(~idx) = {"" ""} % manual just for example
For this to automatically adjust the number of blanks "", you could use repmat like this
sizes(~idx) = repmat({""},1,sum(~idx))
Output:
sizes = 1×4 cell array
{[""]} {'L'} {[""]} {'L'}
2D Cell Array:
data = {'winter', 1; 'spring', 2; 'summer', 3; 'fall', 4}; % Changed " to '
nicemo1 = 'spring';
nicemo2 = 'fall';
idx = strcmpi(data(:,1),nicemo1) | strcmp(data(:,1),nicemo2); % Obtain logical index
data(idx,:)
Output:
ans = 2×2 cell array
{'spring'} {[2]}
{'fall' } {[4]}
Tested with MATLAB R2018b.
Also beware variables like sizes as dropping a letter masks a useful function, size.

Matlab- split cell array column by delimiter

I have many 33213168x1 cell arrays, where each cell contains an 85 x 1 column.
Each cell in the column is in the form
[0.55;0.25;0.75]
[0.33;0.66;0.99]
I want to split up this single column by the semi-colon delimiter so that each cell in the cell array is 85x3, like:
[0.55][0.25][0.75]
[0.33][0.66][0.99]
I've tried numerous techniques to solve this, but most commonly get the errors 'cell elements must be character arrays' or 'input must be a string.'
Some of the approaches I've tried:
splitcells = strsplit(regress_original_053108,';');
splitcells = cellfun(#(x) strsplit(regress_original_053108, ';'),regress_original_053108 , 'UniformOutput',0);
splitcells = regexp(regress_original_053108, ';', 'split');
splitcells = textscan(regress_original_053108, 'delimiter', ';');
Etc. Any feedback about how to do this would be appreciated.
Hope this solves your problem:
% Example input
input = {[0.55;0.25;0.75]};
cellArray(1:85,1) = input;
% Create array
doubleArray = zeros(85,3);
% Fill array
for i=1:85
doubleArray(i,:) = cellArray{i,1}';
end
Each cell you have is not a string, hence you can't use strsplit. Use this approach:
for ii = length(X) % Here X denotes your 33213168x1 cell array
X{ii} = cell2mat(cellfun(#(y) y.', X{ii}, 'UniformOutput', false));
end

Using matrix operations instead of FOR loop for boolean comparison of cell array to index value

I have a function that takes one vector as its input, uses another function to create a derivative vector from the input, and then compares the two vectors to produce its output vector. I currently have it working with a for loop as follows:
The original array, nameVec, is used as the input to the following functions:
% INPUT: nameVec = '' 'a' 'b' 'aa' 'ab' 'ba' 'aba' 'abb'
First, a function called computeParentName removes the last character from each array element of nameVec and produces this cell array:
% OUTPUT: parentNameVec = '' '' '' 'a' 'a' 'b' 'ab' 'ab
Next, the function computeParentIndex finds the indices of where each element in parentNameVec appears in nameVec:
function [parentIndexVec] = computeParentIndex(nameVec)
parentNameVec = computeParentName(nameVec);
[~,parentIndexVec] = ismember(parentNameVec, nameVec);
end
% OUTPUT: parentIndexVec = 1 1 1 2 2 3 5 5
I am now trying to develop a function that essentially acts in reverse, as it takes nameVec and outputs a cell array, which contains at each index, an array of all indices in parentNameVec where the value is that of the output array's ('daughterIndexVec`) current index
function [daughterIndexVec] = computeDaughterIndex(nameVec)
parentIndexVec = computeParentIndex(nameVec);
for i=1:length(parentIndexVec)
daughterIndexVec{i} = find(parentIndexVec==i);
end
end
% OUTPUT: daughterIndexVec = {[1,2,3] [4,5] [6] [] [7,8] [] [] []}
Is there a simpler (more efficient) way to accomplish this without use of for loops?
Any assistance is greatly appreciated!
You can use the second output of ismember to get the locations of each value in parentNameVec in nameVec and then use accumarray to group all indices which share the same index in nameVec together in a cell array.
[~, ind] = ismember(parentNameVec, nameVec);
daughterIndexVec = accumarray(ind(:), 1:numel(ind), [numel(ind) 1], #(x){x.'});
% {[1,2,3] [4,5] [6] [] [7,8] [] [] []}

Get indices of string occurrences in cell-array

I have a cell array that contains a long list of strings. Most of the strings are in duplicates. I need the indices of instances of a string within the cell array.
I tried the following:
[bool,ind] = ismember(string,var);
Which consistently returns scalar ind while there are clearly more than one index for which the contents in the cell array matches string.
How can I have a list of indices that points to the locations in the cell array that contains string?
As an alternative to Divakar's comment, you could use strcmp. This works even if some cell doesn't contain a string:
>> strcmp('aaa', {'aaa', 'bb', 'aaa', 'c', 25, [1 2 3]})
ans =
1 0 1 0 0 0
Alternatively, you can ID each string and thus have representative numeric arrays corresponding to the input cell array and string. For IDing, you can use unique and then use find as you would with numeric arrays. Here's how you can achieve that -
var_ext = [var string]
[~,~,idx] = unique(var_ext)
out = find(idx(1:end-1)==idx(end))
Breakdown of the code:
var_ext = [var string]: Concatenate everything (string and var) into a single cell array, with the string ending up at the end (last element) of it.
[~,~,idx] = unique(var_ext): ID everything in that concatenated cell array.
find(idx(1:end-1)==idx(end)): idx(1:end-1) represents the numeric IDs for the cell array elements and idx(end) would be the ID for the string. Compare these IDs and use find to pick up the matching indices to give us the final output.
Sample run -
Inputs:
var = {'er','meh','nop','meh','ya','meh'}
string = 'meh'
Output:
out =
2
4
6
regexp would solve this problem better and the easy way.
string = ['my' 'bat' 'my' 'ball' 'my' 'score']
expression = ['my']
regexp(string,expresssion)
ans = 1 6 12

Resources