Difference of two cell arrays of strings in Matlab - arrays

I have a cell array like
a={'potential'; 'impact'; 'of'; 'stranded'; 'assets'; 'and'; 'other'; 'necessary'; 'regulatory'; 'approvals'; 'assets'}
and want to subtract from it an array like b={'a'; 'and'; 'of'; 'my'; '#'}.
Using setdiff(a,b) sorts my array after the difference is computed. What I want is to eliminate from a all the elements present in b without sorting a. Also the repetitions should be preserved, for eg. 'assets' in array a should appear at two locations in final array.
The following code I am using does the job:
for i = 1:length(b)
tf = ~strcmp(b(i),a)
a = a(tf,:)
end
But the problem is that array b contains more than 200 string elements which slows down my code considerably. Is there a better way to do this?

tf = ismember(a,b);
a = a(~tf)

EDU>> a
a =
'potential'
'impact'
'of'
'stranded'
'assets'
'and'
'other'
'necessary'
'regulatory'
'approvals'
'assets'
EDU>> b
b =
'a'
'and'
'of'
'my'
'#'
[I,J]=setdiff(a,b);
Now do
EDU>> a(sort(J),:)
ans =
'potential'
'impact'
'stranded'
'other'
'necessary'
'regulatory'
'approvals'
'assets'

Related

MATLAB cellfun() to map contains() to cell array

a={'hello','world','friends'};
I want to see if for every word in the cell array contains the letter 'o', how to use cellfun() to achieve the following in a compact expression?
b = [ contains(a(1),'o') contains(a(2),'o') contains(a(3),'o')]
You don't need cellfun, if you read the documentation, contains works natively on cell arrays of characters:
a = {'hello', 'world', 'friends'};
b = contains(a, 'o');
Which returns:
b =
1×3 logical array
1 1 0

How to extract different values/elements of matrix or array without repeating?

I have a vector/ or it could be array :
A = [1,2,3,4,5,1,2,3,4,5,1,2,3]
I want to extract existing different values/elements from this vector without repeating:
1,2,3,4,5
B= [1,2,3,4,5]
How can I extract it ?
I would appreciate for any help please
Try this,
A = [1,2,3,4,5,1,2,3,4,5,1,2,3]
y = unique(A)
B = unique(A) returns the same values as in a but with no repetitions. The resulting vector is sorted in ascending order. A can be a cell array of strings.
B = unique(A,'stable') does the same as above, but without sorting.
B = unique(A,'rows') returns the unique rows ofA`.
[B,i,j] = unique(...) also returns index vectors i and j such that B = A(i) and A = B(j) (or B = A(i,:) and A = B(j,:)).
Reference: http://cens.ioc.ee/local/man/matlab/techdoc/ref/unique.html
Documentation: https://uk.mathworks.com/help/matlab/ref/unique.html
The answers below are correct but if the user does not want to sort the data, you can use unique with the parameter stable
A = [1,2,3,4,5,1,2,3,4,5,1,2,3]
B = unique(A,'stable')

Apply a function to three parallel array of arrays

I have three arrays of arrays like this:
catLabels = [catA, catB, catC]
binaryLabels = [binA, binB, binC]
trueLabels = []
trueLabels.extend(repeat(y_true_categories, len(binaryLabels)))
def binaryConversion(trueLabel, evalLabel, binaryLabel):
for true,eval, binary in zip(trueLabel, evalLabel, binaryLabel):
if eval == true:
binary.append(1)
else:
binary.append(0)
for x,y,z in zip(trueLabels,catLabels,binaryLabels):
binaryConversion(x, y, z)
Each of the values in catLabels and binLabels is an array. binLabels contain an array of empty arrays, each of which I want to fill in 1s and 0s lets say for example catA = [A B C A B D] and binA = []. trueLabels contains multiple arrays each of which are the same (y_true_categories, i.e. my true categorical labels [A C C B B D]. In this case, my binaryConversion function should fill in [1 0 1 0 1 1] for the array binA.
For some reason my current function is not achieving this and leaves each of bin A, binB, binC empty.
What am I doing wrong?
I have figured out the answer. The inner zip statement will not work because I start with empty binary labels, and zip only works when all the arrays you are zipping are of the same length. So I removed the binaryLabel from the zip function within binaryConversion(trueLabel, evalLabel, binaryLabel) and appended to each binaryLabel empty binary array inside the loop. In addition, I was appending 1s and 0s to the element-wise binary, instead of the actual empty array binaryLabel.
New code:
def binaryConversion(trueLabel, evalLabel, emptyBinaryArray):
for true,eval in zip(trueLabel, evalLabel):
if eval == true:
emptyBinaryArray.append(1)
else:
emptyBinaryArray.append(0)
for trueLabels,predictionLabels,emptyBinaryArray in zip(trueLabels,catLabels,binaryLabels):
binaryConversion(trueLabels, predictionLabels, emptyBinaryArray)

How do I convert a cell array w/ different data formats to a matrix in Matlab?

So my main objective is to take a matrix of form
matrix = [a, 1; b, 2; c, 3]
and a list of identifiers in matrix[:,1]
list = [a; c]
and generate a new matrix
new_matrix = [a, 1;c, 3]
My problem is I need to import the data that would be used in 'matrix' from a tab-delimited text file. To get this data into Matlab I use the code:
matrix_open = fopen(fn_matrix, 'r');
matrix = textscan(matrix_open, '%c %d', 'Delimiter', '\t');
which outputs a cell array of two 3x1 arrays. I want to get this into one 3x2 matrix where the first column is a character, and the second column an integer (these data formats will be different in my implementation).
So far I've tried the code:
matrix_1 = cell2mat(matrix(1,1));
matrix_2 = cell2mat(matrix(1,2));
matrix = horzcat(matrix_1, matrix_2)
but this is returning a 3x2 matrix where the second column is empty.
If I just use
cell2mat(matrix)
it says it can't do it because of the different data formats.
Thanks!
This is the help of matlab for the cell2mat function:
cell2mat Convert the contents of a cell array into a single matrix.
M = cell2mat(C) converts a multidimensional cell array with contents of
the same data type into a single matrix. The contents of C must be able
to concatenate into a hyperrectangle. Moreover, for each pair of
neighboring cells, the dimensions of the cell's contents must match,
excluding the dimension in which the cells are neighbors. This constraint
must hold true for neighboring cells along all of the cell array's
dimensions.
From what I understand the contents you want to put in a matrix should be of the same type otherwise why do you want a matrix? you could simply create a new cell array.
It's not possible to have a normal matrix with characters and numbers. That's why cell2mat won't work here. But you can store different datatypes in a cell-array. Use cellstr for the strings/characters and num2cell for the integers to convert the contents of matrix. If you have other datatypes, use an appropriate function for this step. Then assign them to the columns of an empty cell-array.
Here is the code:
fn_matrix = 'data.txt';
matrix_open = fopen(fn_matrix, 'r');
matrix = textscan(matrix_open, '%c %d', 'Delimiter', '\t');
X = cell(size(matrix{1},1),2);
X(:,1) = cellstr(matrix{1});
X(:,2) = num2cell(matrix{2});
The result:
X =
'a' [1]
'b' [2]
'c' [3]
Now we can do the second part of the question. Extracting the entries where the letter matches with one of the list. Therefore you can use ismember and logical indexing like this:
list = ['a'; 'c'];
sel = ismember(X(:,1),list);
Y(:,1) = X(sel,1);
Y(:,2) = X(sel,2);
The result here:
Y =
'a' [1]
'c' [3]

Get indices of string occurrences in cell-array

I have a cell array that contains a long list of strings. Most of the strings are in duplicates. I need the indices of instances of a string within the cell array.
I tried the following:
[bool,ind] = ismember(string,var);
Which consistently returns scalar ind while there are clearly more than one index for which the contents in the cell array matches string.
How can I have a list of indices that points to the locations in the cell array that contains string?
As an alternative to Divakar's comment, you could use strcmp. This works even if some cell doesn't contain a string:
>> strcmp('aaa', {'aaa', 'bb', 'aaa', 'c', 25, [1 2 3]})
ans =
1 0 1 0 0 0
Alternatively, you can ID each string and thus have representative numeric arrays corresponding to the input cell array and string. For IDing, you can use unique and then use find as you would with numeric arrays. Here's how you can achieve that -
var_ext = [var string]
[~,~,idx] = unique(var_ext)
out = find(idx(1:end-1)==idx(end))
Breakdown of the code:
var_ext = [var string]: Concatenate everything (string and var) into a single cell array, with the string ending up at the end (last element) of it.
[~,~,idx] = unique(var_ext): ID everything in that concatenated cell array.
find(idx(1:end-1)==idx(end)): idx(1:end-1) represents the numeric IDs for the cell array elements and idx(end) would be the ID for the string. Compare these IDs and use find to pick up the matching indices to give us the final output.
Sample run -
Inputs:
var = {'er','meh','nop','meh','ya','meh'}
string = 'meh'
Output:
out =
2
4
6
regexp would solve this problem better and the easy way.
string = ['my' 'bat' 'my' 'ball' 'my' 'score']
expression = ['my']
regexp(string,expresssion)
ans = 1 6 12

Resources