Alrighty, so I have a pretty 'simple' problem on my hands. I am given two inputs for my function: a string that gives the formula of the equation and a structure that contains the information I need and looks like this:
Name
Symbol
AtomicNumber
AtomicWeight
To find the molecular weight, I have to take all of the elements in the formula, find their total mass and add them all together. For example, let's say that I have to find the molecular weight of oxygen. The formula would look like:
H2,O
The molecular weight will thus be
2*(Hydrogen's weight) + (Oxygen's weight), which evaluates to 18.015.
There will always be a comma separating the different elements in a formula. What I am having trouble with right now, is taking the number out of the string(the formula). I feel like I'm over-complicating how I am going about extracting it. If there's a number, I know it can be in positions 2 or 3 (depending on the element name). I tried to use isnumeric, I tried to do some really weird, coding stuff (which you'll see below), but I am having difficulties.
test case:
mass5 = molarMass('C,H2,Br,C,H2,Br', table)
mass5 => 187.862
table:
Name Symbol AtomicNumber AtomicWeight
'Carbon' 'C' 6 12.0110000000000
'Hydrogen' 'H' 1 1.00800000000000
'Nitrogen' 'N' 7 14.0070000000000
'Oxygen' 'O' 8 15.9990000000000
'Phosphorus''P' 15 30.9737619980000
'Sulfur' 'S' 16 32.0600000000000
'Chlorine' 'Cl' 17 35.4500000000000
'Bromine' 'Br' 35 79.9040000000000
'Sodium' 'Na' 11 22.9897692800000
'Magnesium' 'Mg' 12 24.3050000000000
My code so far is:
function[molar_mass] = molarMass(formula, information)
Names = []; %// Creates a Name array
[~,c] = size(information); %Finds the rows and columns of the table
for i = 1:c %Reads through the columns
Molecules = getfield(information(:,i), 'Name'); %Finds the numbers in the 'Name' area
Names = [Names {Molecules}];
end
Symbols = [];
[~, c2] = size(information);
for i = 1:c2 %Reads through the columns
Symbs = getfield(information(:,i), 'Symbol'); %Finds the numbers in the 'Symbol'
Symbols = [Symbols {Symbs}];
end
AN = [];
[~, c3] = size(information);
for i = 1:c3 %Reads through the columns
Atom = getfield(information(:,i), 'AtomicNumber'); %Finds the numbers in the 'AtomicWeight' area
AN = [AN {Atom}];
end
Wt = [information(:).AtomicWeight];
formula_parts = strsplit(formula, ','); % cell array of strings
total_mass = 0;
multi = [];
atoms = [];
Indices = [];
for ipart = 1:length(formula_parts)
part = formula_parts{ipart}; % Takes in the string
isdigit = (part >= '0') & (part <= '9'); % A boolean array
atom = part(~isdigit); % Select all chars that are not digits
Indixes = find(strcmp(Symbols, atom));
Indices = [Indices {Indixes}];
mole = atom;
atoms = [atoms {mole}];
natoms = part(isdigit); % Select all chars that are digits
% Convert natoms string to numbers, default to 1 if missing
if length(natoms) == 0
natoms = '1';
multi = [multi {natoms}];
else
natoms = num2str(natoms);
multi = [multi {natoms}];
end
end
multi = char(multi);
multi = str2num(multi); %Creates a number array with my multipliers
f=56;
Molecule_Wt = Wt{Indices};
duck = 62;
total_mass = total_mass + atom_weight * multi;
end
Thanks to Bas Swinckels I can now extract the numbers from the formulas, but what I'm struggling with now is how to pull out the weights associated with the symbols. I created my own weight_chart, but strcmp won't work there. Neither will strfind or strmatch. What I want to do is find the formulas in my input, in the chart. Then index it from that index, to the column (so add 1 I believe). How do I find the indices though? I'd prefer to find them in the order the strings appear in my input, since I can then apply my 'multi' array to it.
Any help/suggestions would be appreciated :)
Given the string, you can pull out the part that is a digit character with the isstrprop function. Then use that to address your string to get just those characters, then cast that as a double with str2double.
PartialString = 'H12';
Subscript = str2double (PartialString (isstrprop (PartialString, 'digit')));
This should get you started, there is still some parts that need to be filled in:
formula_parts = strsplit(formula, ','); % cell array of strings
total_mass = 0;
for ipart = 1:length(formula_parts)
part = formula_parts{ipart}; % string like 'H2'
isdigit = isstrprop(part, 'digit'); % boolean array
atom = part(~isdigit); % select all chars that are not digits
natoms = part(isdigit); % select all chars that are digits
% convert natoms string to int, default to 1 if missing
if length(natoms) == 0
natoms = 1;
else
natoms = num2str(natoms);
end
% calculate weight
atom_weight = lookup_weight(atom); % somehow look up value in table
total_mass = total_mass + atom_weight * natoms;
end
See this old question about how to extract letters or digits from a string.
Related
I have 60 different character arrays loaded in my workspace (Book01, Book02, ..., Book60). For example, Book01 is a 1x202040 char.
I'm working in a script file, and trying to separate the last sentence("RandomInfoAtEnd") of Book45 until Book58:
WholeBook = Book50; % Call Array for test
for i = 1:60
book = eval(['Book' num2str(i)]);
if i >= 45 && i <= 58
% Procedure to separate last sentence.
Chr = convertStringsToChars(WholeBook);
SearchedUnit = '.!?' ; % Sentence end punctuation
idx = ismember (Chr, SearchedUnit);
Loc = find (idx, 2, 'last'); % Find second last sentence-ending-punctuation
if numel (Loc) < 2
error ('the requested character cannot be found')
end
SecondLastLocation = Loc (1);
AllLocations = find (idx);
RandomInfoAtEnd = extractAfter(WholeBook,SecondLastLocation);
else
RandomInfoAtEnd = ''; % No sentence separated
end
end
Right now I have a problem only with the IF-statement or FOR-loop logic, in such a way that RandomInfoAtEnd = '' for any array that is called.
My procedure is working fine, as it separated the last sentence perfectly from any array, but what am I doing wrong with the FOR-loop/IF-statement?
Thanks.
My code:
B = zeros(height(A),1);
col_names = A.Properties.VariableNames; % Replicate header names
for k = 1:height(A)
% the following 'cellfun' compares each column to the values in A.L{k},
% and returns a cell array of the result for each of them, then
% 'cell2mat' converts it to logical array, and 'any' combines the
% results for all elements in A.L{k} to one logical vector:
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),A.L{k},...
'UniformOutput', false).'),1);
% then a logical indexing is used to define the columns for summation:
B(k) = sum(A{k,C});
end
generates the following error message.
Error using cellfun
Input #2 expected to be a cell array, was double instead.
How do I solve this error?
This is how table 'A' looks like:
A.L{1,1} contains:
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),A.L{k},...
'UniformOutput', false).'),1);
here A.L{k} gets the contents of the cell located at the kth position of A.L. Using A.L(k) you get the cell itself which is located at A.L:
tmp = A.L(k);
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),tmp{1},...
'UniformOutput', false).'),1);
Bit of a hacky way, as you first need to get the cell at A.L(k) and then need the contents of that cell, so you need a temporary variable.
I'm not entirely sure quite what's going on here, but here's a fabricated example that I think is similar to what you're trying to achieve.
%% Setup - fabricate some data
colNames = {'xx', 'yy', 'zz', 'qq'};
h = 20;
% It looks like 'L' contains something related to the column names
% so I'm going to build something like that.
L = repmat(colNames, h, 1);
% Empty some rows out completely
L(rand(h,1) > 0.7, :) = {''};
% Empty some other cells out at random
L(rand(numel(L), 1) > 0.8) = {''};
A = table(L, rand(h,1), rand(h, 1), rand(h, 1), rand(h, 1), ...
'VariableNames', ['L', colNames]);
%% Attempt to process each row
varNames = A.Properties.VariableNames;
B = zeros(height(A), 1);
for k = 1:height(A)
% I think this is what's required - work out which columns are
% named in "A.L(k,:)". This can be done simply by using ISMEMBER
% on the row of A.L.
C = ismember(varNames, A.L(k,:));
B(k) = sum(A{k, C});
end
If I'm completely off-course here, then perhaps you could give us an executable example.
I'm doing a set of problems from the MATLAB's introductory course at MIT OCW. You can see it here, it's problem number 9, part g.iii.
I have one matrix with the final grades of a course, all of them range from 1 to 5. And I have another array with only letters from 'F' to 'A' (in a 'decreasing' order).
I know how to change elements in a matrix, I suppose I could do something like this for each number:
totalGrades(find(totalGrades==1)) = 'F';
totalGrades(find(totalGrades==2)) = 'E';
totalGrades(find(totalGrades==3)) = 'C';
totalGrades(find(totalGrades==4)) = 'B';
totalGrades(find(totalGrades==5)) = 'A';
But then, what's the purpose of creating the string array "letters"?
I thought about using a loop, but we're supposed to solve the problem without one at that point of the course.
Is there a way? I'll be glad to know. Here's my code for the whole problem, but I got stuck in that last question.
load('classGrades.mat');
disp(namesAndGrades(1:5,1:8));
grades = namesAndGrades(1:15,2:size(namesAndGrades,2));
mean(grades);
meanGrades = nanmean(grades);
meanMatrix = ones(15,1)*meanGrades;
curvedGrades = 3.5*(grades./meanMatrix);
% Verifying
nanmean(curvedGrades)
mean(curvedGrades)
curvedGrades(curvedGrades>=5) = 5;
totalGrades = nanmean(curvedGrades,2);
letters = 'FDCBA';
Thanks a lot!
Try:
letters=['F','D','C','B','A'];
tg = [1 2 1 3 3 1];
letters(tg)
Result:
ans = FDFCCF
This works even when tg (total grade) is a matrix:
letters=['F','D','C','B','A'];
tg = [1 2 1 ; 3 3 1];
result = letters(tg);
result
result =
FDF
CCF
Edit (brief explanation):
It is easy to understand that when you do letters(2) you get the second element of letters (D).
But you can also select several elements from letters by giving it an array: letters([1 2]) will return the first and second elements (FD).
So, letters(indexesArray) will result in a new array that has the same length of indexesArray. But, this array has to contain numbers from 1 to the length of letters (or an error will pop up).
I'm still confused why am not able to know the results of this small algorithm of my array. the array has almost 1000 number 1-D. am trying to find the peak and the index of each peak. I did found the peaks, but I can't find the index of them. Could you please help me out. I want to plot all my values regardless the indexes.
%clear all
%close all
%clc
%// not generally appreciated
%-----------------------------------
%message1.txt.
%-----------------------------------
% t=linspace(0,tmax,length(x)); %get all numbers
% t1_n=0:0.05:tmax;
x=load('ww.txt');
tmax= length(x) ;
tt= 0:tmax -1;
x4 = x(1:5:end);
t1_n = 1:5:tt;
x1_n_ref=0;
k=0;
for i=1:length(x4)
if x4(i)>170
if x1_n_ref-x4(i)<0
x1_n_ref=x4(i);
alpha=1;
elseif alpha==1 && x1_n_ref-x4(i)>0
k=k+1;
peak(k)=x1_n_ref; // This is my peak value. but I also want to know the index of it. which will represent the time.
%peak_time(k) = t1_n(i); // this is my issue.
alpha=2;
end
else
x1_n_ref=0;
end
end
%----------------------
figure(1)
% plot(t,x,'k','linewidth',2)
hold on
% subplot(2,1,1)
grid
plot( x4,'b'); % ,tt,x,'k'
legend('down-sampling by 5');
Here is you error:
tmax= length(x) ;
tt= 0:tmax -1;
x4 = x(1:5:end);
t1_n = 1:5:tt; % <---
tt is an array containing numbers 0 through tmax-1. Defining t1_n as t1_n = 1:5:tt will not create an array, but an empty matrix. Why? Expression t1_n = 1:5:tt will use only the first value of array tt, hence reduce to t1_n = 1:5:tt = 1:5:0 = <empty matrix>. Naturally, when you later on try to access t1_n as if it were an array (peak_time(k) = t1_n(i)), you'll get an error.
You probably want to exchange t1_n = 1:5:tt with
t1_n = 1:5:tmax;
You need to index the tt array correctly.
you can use
t1_n = tt(1:5:end); % note that this will give a zero based index, rather than a 1 based index, due to t1_n starting at 0. you can use t1_n = 1:tmax if you want 1 based (matlab style)
you can also cut down the code a little, there are some variables that dont seem to be used, or may not be necessary -- including the t1_n variable:
x=load('ww.txt');
tmax= length(x);
x4 = x(1:5:end);
xmin = 170
% now change the code
maxnopeaks = round(tmax/2);
peaks(maxnopeaks)=0; % preallocate the peaks for speed
index(maxnopeaks)=0; % preallocate index for speed
i = 0;
for n = 2 : tmax-1
if x(n) > xmin
if x(n) >= x(n-1) & x(n) >= x(n+1)
i = i+1;
peaks(i) = t(n);
index(i) = n;
end
end
end
% now trim the excess values (if any)
peaks = peaks(1:i);
index = index(1:i);
I have 2 cell arrays which are "celldata" and "data" . Both of them store strings inside. Now I would like to check each element in "celldata" whether in "data" or not? For example, celldata = {'AB'; 'BE'; 'BC'} and data={'ABCD' 'BCDE' 'ACBE' 'ADEBC '}. I would like the expected output will be s=3 and v= 1 for AB, s=2 and v=2 for BE, s=2 and v=2 for BC, because I just need to count the sequence of the string in 'celldata'
The code I wrote is shown below. Any help would be certainly appreciated.
My code:
s=0; support counter
v=0; violate counter
SV=[]; % array to store the support
VV=[]; % array to store the violate
pairs = ['AB'; 'BE'; 'BC']
%celldata = cellstr(pairs)
celldata = {'AB'; 'BE'; 'BC'}
data={'ABCD' 'BCDE' 'ACBE' 'ADEBC '} % 3 AB, 2 BE, 2 BC
for jj=1:length(data)
for kk=1:length(celldata)
res = regexp( data(jj),celldata(kk) )
m = cell2mat(res);
e=isempty(m) % check res array is empty or not
if e == 0
s = s + 1;
SV(jj)=s;
v=v;
else
s=s;
v= v+1;
VV(jj)=v;
end
end
end
If I am understanding your variables correctly, s is the number of cells which the substring AB, AE and, BC does not appear and v is the number of times it does. If this is accurate then
v = cellfun(#(x) length(cell2mat(strfind(data, x))), celldata);
s = numel(data) - v;
gives
v = [1;1;3];
s = [3;3;1];