Count items in one cell array in another cell array matlab - arrays

I have 2 cell arrays which are "celldata" and "data" . Both of them store strings inside. Now I would like to check each element in "celldata" whether in "data" or not? For example, celldata = {'AB'; 'BE'; 'BC'} and data={'ABCD' 'BCDE' 'ACBE' 'ADEBC '}. I would like the expected output will be s=3 and v= 1 for AB, s=2 and v=2 for BE, s=2 and v=2 for BC, because I just need to count the sequence of the string in 'celldata'
The code I wrote is shown below. Any help would be certainly appreciated.
My code:
s=0; support counter
v=0; violate counter
SV=[]; % array to store the support
VV=[]; % array to store the violate
pairs = ['AB'; 'BE'; 'BC']
%celldata = cellstr(pairs)
celldata = {'AB'; 'BE'; 'BC'}
data={'ABCD' 'BCDE' 'ACBE' 'ADEBC '} % 3 AB, 2 BE, 2 BC
for jj=1:length(data)
for kk=1:length(celldata)
res = regexp( data(jj),celldata(kk) )
m = cell2mat(res);
e=isempty(m) % check res array is empty or not
if e == 0
s = s + 1;
SV(jj)=s;
v=v;
else
s=s;
v= v+1;
VV(jj)=v;
end
end
end

If I am understanding your variables correctly, s is the number of cells which the substring AB, AE and, BC does not appear and v is the number of times it does. If this is accurate then
v = cellfun(#(x) length(cell2mat(strfind(data, x))), celldata);
s = numel(data) - v;
gives
v = [1;1;3];
s = [3;3;1];

Related

Generate a matrix of combinations (permutation) without repetition (array exceeds maximum array size preference)

I am trying to generate a matrix, that has all unique combinations of [0 0 1 1], I wrote this code for this:
v1 = [0 0 1 1];
M1 = unique(perms([0 0 1 1]),'rows');
• This isn't ideal, because perms() is seeing each vector element as unique and doing:
4! = 4 * 3 * 2 * 1 = 24 combinations.
• With unique() I tried to delete all the repetitive entries so I end up with the combination matrix M1 →
only [4!/ 2! * (4-2)!] = 6 combinations!
Now, when I try to do something very simple like:
n = 15;
i = 1;
v1 = [zeros(1,n-i) ones(1,i)];
M = unique(perms(vec_1),'rows');
• Instead of getting [15!/ 1! * (15-1)!] = 15 combinations, the perms() function is trying to do
15! = 1.3077e+12 combinations and it's interrupted.
• How would you go about doing in a much better way? Thanks in advance!
You can use nchoosek to return the indicies which should be 1, I think in your heart you knew this must be possible because you were using the definition of nchoosek to determine the expected final number of permutations! So we can use:
idx = nchoosek( 1:N, k );
Where N is the number of elements in your array v1, and k is the number of elements which have the value 1. Then it's simply a case of creating the zeros array and populating the ones.
v1 = [0, 0, 1, 1];
N = numel(v1); % number of elements in array
k = nnz(v1); % number of non-zero elements in array
colidx = nchoosek( 1:N, k ); % column index for ones
rowidx = repmat( 1:size(colidx,1), k, 1 ).'; % row index for ones
M = zeros( size(colidx,1), N ); % create output
M( rowidx(:) + size(M,1) * (colidx(:)-1) ) = 1;
This works for both of your examples without the need for a huge intermediate matrix.
Aside: since you'd have the indicies using this approach, you could instead create a sparse matrix, but whether that's a good idea or not would depend what you're doing after this point.

How to extract different values/elements of matrix or array without repeating?

I have a vector/ or it could be array :
A = [1,2,3,4,5,1,2,3,4,5,1,2,3]
I want to extract existing different values/elements from this vector without repeating:
1,2,3,4,5
B= [1,2,3,4,5]
How can I extract it ?
I would appreciate for any help please
Try this,
A = [1,2,3,4,5,1,2,3,4,5,1,2,3]
y = unique(A)
B = unique(A) returns the same values as in a but with no repetitions. The resulting vector is sorted in ascending order. A can be a cell array of strings.
B = unique(A,'stable') does the same as above, but without sorting.
B = unique(A,'rows') returns the unique rows ofA`.
[B,i,j] = unique(...) also returns index vectors i and j such that B = A(i) and A = B(j) (or B = A(i,:) and A = B(j,:)).
Reference: http://cens.ioc.ee/local/man/matlab/techdoc/ref/unique.html
Documentation: https://uk.mathworks.com/help/matlab/ref/unique.html
The answers below are correct but if the user does not want to sort the data, you can use unique with the parameter stable
A = [1,2,3,4,5,1,2,3,4,5,1,2,3]
B = unique(A,'stable')

Given two arrays A and B, how to get B values which are the closest to A

Suppose I have two arrays ordered in an ascending order, i.e.:
A = [1 5 7], B = [1 2 3 6 9 10]
I would like to create from B a new vector B', which contains only the closest values to A values (one for each).
I also need the indexes. So, in my example I would like to get:
B' = [1 6 9], Idx = [1 4 5]
Note that the third value is 9. Indeed 6 is closer to 7 but it is already 'taken' since it is close to 4.
Any idea for a suitable code?
Note: my true arrays are much larger and contain real (not int) values
Also, it is given that B is longer then A
Thanks!
Assuming you want to minimize the overall discrepancies between elements of A and matched elements in B, the problem can be written as an assignment problem of assigning to every row (element of A) a column (element of B) given a cost matrix C. The Hungarian (or Munkres') algorithm solves the assignment problem.
I assume that you want to minimize cumulative squared distance between A and matched elements in B, and use the function [assignment,cost] = munkres(costMat) by Yi Cao from https://www.mathworks.com/matlabcentral/fileexchange/20652-hungarian-algorithm-for-linear-assignment-problems--v2-3-:
A = [1 5 7];
B = [1 2 3 6 9 10];
[Bprime,matches] = matching(A,B)
function [Bprime,matches] = matching(A,B)
C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
[matches,~] = munkres(C);
Bprime = B(matches);
end
Assuming instead you want to find matches recursively, as suggested by your question, you could either walk through A, for each element in A find the closest remaining element in B and discard it (sortedmatching below); or you could iteratively form and discard the distance-minimizing match between remaining elements in A and B until all elements in A are matched (greedymatching):
A = [1 5 7];
B = [1 2 3 6 9 10];
[~,~,Bprime,matches] = sortedmatching(A,B,[],[])
[~,~,Bprime,matches] = greedymatching(A,B,[],[])
function [A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches)
[~,ix] = min((A(1) - B).^2);
matches = [matches ix];
Bprime = [Bprime B(ix)];
A = A(2:end);
B(ix) = Inf;
if(not(isempty(A)))
[A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches);
end
end
function [A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches)
C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
[minrows,ixrows] = min(C);
[~,ixcol] = min(minrows);
ixrow = ixrows(ixcol);
matches(ixrow) = ixcol;
Bprime(ixrow) = B(ixcol);
A(ixrow) = -Inf;
B(ixcol) = Inf;
if(max(A) > -Inf)
[A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches);
end
end
While producing the same results in your example, all three methods potentially give different answers on the same data.
Normally I would run screaming from for and while loops in Matlab, but in this case I cannot see how the solution could be vectorized. At least it is O(N) (or near enough, depending on how many equally-close matches to each A(i) there are in B). It would be pretty simple to code the following in C and compile it into a mex file, to make it run at optimal speed, but here's a pure-Matlab solution:
function [out, ind] = greedy_nearest(A, B)
if nargin < 1, A = [1 5 7]; end
if nargin < 2, B = [1 2 3 6 9 10]; end
ind = A * 0;
walk = 1;
for i = 1:numel(A)
match = 0;
lastDelta = inf;
while walk < numel(B)
delta = abs(B(walk) - A(i));
if delta < lastDelta, match = walk; end
if delta > lastDelta, break, end
lastDelta = delta;
walk = walk + 1;
end
ind(i) = match;
walk = match + 1;
end
out = B(ind);
You could first get the absolute distance from each value in A to each value in B, sort them and then get the first unique value to a sequence when looking down in each column.
% Get distance from each value in A to each value in B
[~, minIdx] = sort(abs(bsxfun(#minus, A,B.')));
% Get first unique sequence looking down each column
idx = zeros(size(A));
for iCol = 1:numel(A)
for iRow = 1:iCol
if ~ismember(idx, minIdx(iRow,iCol))
idx(iCol) = minIdx(iRow,iCol);
break
end
end
end
The result when applying idx to B
>> idx
1 4 5
>> B(idx)
1 6 9

Select entries from matrices according to indices in another matrix in MATLAB

I have matrices:
a= 0.8147 0.1270 0.6324
0.9058 0.9134 0.0975
b= 0.2785 0.9649 0.9572
0.5469 0.1576 0.4854
0.9575 0.9706 0.8003
c = 0.1419 0.7922
0.4218 0.9595
0.9157 0.6557
and also I have another matrix
I= 1 3 1 1
2 1 3 2
I want to get d matrix such that
d= a(1,3) b(3,1) c(1,1)
a(2,1) b(1,3) c(3,2)
where indices come as two consecutive entries of I matrix.
This is one example I get. However, I get different size matrices for a,b,c,.. and I.
Added: I is m x (n+3) which includes indices, and other (n+2) matrices which have corresponding entries are X,A1,A2,...,An,Y. When n is given, A1,A2,...,An matrices are generated.
Can someone please help me to write Matlab code for this task?
You can do it with varargin. Assuming that your matrices are constructed such that you can form your desired output in the way you want (Updated according to Carmine's answer):
function out = IDcombiner(I, varargin)
out = zeros(size(I, 1), nargin-1);
idx = #(m, I, ii) (sub2ind(size(m), I(:, ii), I(:, ii+1)));
for ii = 1:1:nargin-1
out(:, ii) = varargin{ii}(idx(varargin{ii}, I, ii));
end
Now using this function you can make your selection on a flexible number of inputs:
out = IDcombiner(I, a, b, c)
out =
0.6324 0.9575 0.1419
0.9058 0.9572 0.6557
There is also a one-liner solution, which I do not recommend, since it dramatically decreases the readability of the code and doesn't help you gain much:
IDcombiner = #(I,varargin) ...
cell2mat(arrayfun(#(x) varargin{x}(sub2ind(size(varargin{x}), ...
I(:,x), I(:,x+1))), 1:nargin-1, 'UniformOutput', false));
Normally a matrix is not interpreted as a list of indices, but you can have this if you use sub2ind. To use it you need the size of the matrix you are addressing. Let's make an example starting with a:
a(sub2ind(size(a), I(:,1), I(:,2)))
The code does not change if you first assign the newly generated matrices to a variable name.
will use the column I(:,1) as rows and I(:,2) as columns.
To make the code more readable you can define an anonymous function that does this, let's call it idx:
idx = #(m,I,i)(sub2ind(size(m), I(:,i), I(:,i+1)))
So finally the code will be
d = [a(idx(a,I,1)), b(idx(b,I,2)), c(idx(c,I,3))]
The code does not change if you first assign the newly generated matrices to a variable name.
Other details
Let's make an example with 2 central matrices:
a = rand(3,1) % 3 rows, 1 column
b = rand(3,3) % 3 rows, 3 columns
c = rand(3,3) % another squared matrix
d = rand(3,1) % 3 rows, 1 column
The definition of the anonymous function is the same, you just change the definition of the output vector:
output = [a(idx(a,I,1)), b(idx(b,I,2)), c(idx(c,I,3)), d(idx(d,I,3))]
Keep in mind that following that pattern you always need a I matrix with (n_matrices + 1) columns.
Generalization
Let's generalize this code for a number n of central matrices of size rxr and for "side matrices" of size rxc. I will use some values of those parameters for this example, but you can use what you want.
Let me generate an example to use:
r = 3;
c = 4;
n = 3;
a = rand(r,c); % 2D array
b = rand(r,r,n); % 3D array, along z = 1:n you have 2D matrices of size rxr
c = rand(r,c);
I = [1 3 1 2 1 3; 2 1 3 1 1 1];
The code I wrote can easily be extended using cat to append matrices (note the 2 in the function tells MATLAB to append on the direction of the columns) and a for cycle:
idx = #(m,I,i)(sub2ind(size(m), I(:,i), I(:,i+1)))
d = a(idx(a,I,1));
for i = 1:n
temp = b(:,:,i);
d = cat(2,d,temp(idx(tmp,I,i+1)));
end
d = cat(2,d,c(idx(c,I,n+1)));
If you really don't want to address anything "by hand", you can use cell arrays to put all the matrices together and then cyclically apply the anonymous function to each matrix in the cell array.

Finding molar mass with MATLAB

Alrighty, so I have a pretty 'simple' problem on my hands. I am given two inputs for my function: a string that gives the formula of the equation and a structure that contains the information I need and looks like this:
Name
Symbol
AtomicNumber
AtomicWeight
To find the molecular weight, I have to take all of the elements in the formula, find their total mass and add them all together. For example, let's say that I have to find the molecular weight of oxygen. The formula would look like:
H2,O
The molecular weight will thus be
2*(Hydrogen's weight) + (Oxygen's weight), which evaluates to 18.015.
There will always be a comma separating the different elements in a formula. What I am having trouble with right now, is taking the number out of the string(the formula). I feel like I'm over-complicating how I am going about extracting it. If there's a number, I know it can be in positions 2 or 3 (depending on the element name). I tried to use isnumeric, I tried to do some really weird, coding stuff (which you'll see below), but I am having difficulties.
test case:
mass5 = molarMass('C,H2,Br,C,H2,Br', table)
mass5 => 187.862
table:
Name Symbol AtomicNumber AtomicWeight
'Carbon' 'C' 6 12.0110000000000
'Hydrogen' 'H' 1 1.00800000000000
'Nitrogen' 'N' 7 14.0070000000000
'Oxygen' 'O' 8 15.9990000000000
'Phosphorus''P' 15 30.9737619980000
'Sulfur' 'S' 16 32.0600000000000
'Chlorine' 'Cl' 17 35.4500000000000
'Bromine' 'Br' 35 79.9040000000000
'Sodium' 'Na' 11 22.9897692800000
'Magnesium' 'Mg' 12 24.3050000000000
My code so far is:
function[molar_mass] = molarMass(formula, information)
Names = []; %// Creates a Name array
[~,c] = size(information); %Finds the rows and columns of the table
for i = 1:c %Reads through the columns
Molecules = getfield(information(:,i), 'Name'); %Finds the numbers in the 'Name' area
Names = [Names {Molecules}];
end
Symbols = [];
[~, c2] = size(information);
for i = 1:c2 %Reads through the columns
Symbs = getfield(information(:,i), 'Symbol'); %Finds the numbers in the 'Symbol'
Symbols = [Symbols {Symbs}];
end
AN = [];
[~, c3] = size(information);
for i = 1:c3 %Reads through the columns
Atom = getfield(information(:,i), 'AtomicNumber'); %Finds the numbers in the 'AtomicWeight' area
AN = [AN {Atom}];
end
Wt = [information(:).AtomicWeight];
formula_parts = strsplit(formula, ','); % cell array of strings
total_mass = 0;
multi = [];
atoms = [];
Indices = [];
for ipart = 1:length(formula_parts)
part = formula_parts{ipart}; % Takes in the string
isdigit = (part >= '0') & (part <= '9'); % A boolean array
atom = part(~isdigit); % Select all chars that are not digits
Indixes = find(strcmp(Symbols, atom));
Indices = [Indices {Indixes}];
mole = atom;
atoms = [atoms {mole}];
natoms = part(isdigit); % Select all chars that are digits
% Convert natoms string to numbers, default to 1 if missing
if length(natoms) == 0
natoms = '1';
multi = [multi {natoms}];
else
natoms = num2str(natoms);
multi = [multi {natoms}];
end
end
multi = char(multi);
multi = str2num(multi); %Creates a number array with my multipliers
f=56;
Molecule_Wt = Wt{Indices};
duck = 62;
total_mass = total_mass + atom_weight * multi;
end
Thanks to Bas Swinckels I can now extract the numbers from the formulas, but what I'm struggling with now is how to pull out the weights associated with the symbols. I created my own weight_chart, but strcmp won't work there. Neither will strfind or strmatch. What I want to do is find the formulas in my input, in the chart. Then index it from that index, to the column (so add 1 I believe). How do I find the indices though? I'd prefer to find them in the order the strings appear in my input, since I can then apply my 'multi' array to it.
Any help/suggestions would be appreciated :)
Given the string, you can pull out the part that is a digit character with the isstrprop function. Then use that to address your string to get just those characters, then cast that as a double with str2double.
PartialString = 'H12';
Subscript = str2double (PartialString (isstrprop (PartialString, 'digit')));
This should get you started, there is still some parts that need to be filled in:
formula_parts = strsplit(formula, ','); % cell array of strings
total_mass = 0;
for ipart = 1:length(formula_parts)
part = formula_parts{ipart}; % string like 'H2'
isdigit = isstrprop(part, 'digit'); % boolean array
atom = part(~isdigit); % select all chars that are not digits
natoms = part(isdigit); % select all chars that are digits
% convert natoms string to int, default to 1 if missing
if length(natoms) == 0
natoms = 1;
else
natoms = num2str(natoms);
end
% calculate weight
atom_weight = lookup_weight(atom); % somehow look up value in table
total_mass = total_mass + atom_weight * natoms;
end
See this old question about how to extract letters or digits from a string.

Resources