Use an array item as replacement in a MATLAB regexpr - arrays

I have a string containing a math function like this:
sin(x[1]) + cos(x[2]) + tan(x[3]) + x[1]
Now I want to replace each x[number] with a letter of the alphabet using regexpr. The result should look like this:
sin(a) + cos(b) + tan(c) + a
So I defined an alphabet array like this:
alphabet = ('a':'z')
This is my first regexpr that just replaces every x[number] with an 'a':
regexprep(functionString,'x\[(\d+)\]','${alphabet(1)}');
What I tried to make it replace with the right letter, is using $1 instead of 1. I thought this would not use alphabet(1) but dynamically the item at the right alphabet index.
regexprep(functionString,'x\[(\d+)\]','${alphabet($1)}');
Instead I am getting an error that the index exceeds the matrix dimensions.
Does anybody know what I am doing wrong? How do I get the right letter?
Thanks in advance!

Matlab uses the $1 input as text. Since int32('1') = 49 you result with an error Index exceeds matrix dimensions.
To solve your issue, use str2num:
regexprep(functionString,'x\[(\d+)\]','${alphabet(str2num($1))}')

In case you're interested, you can actually do this without having to create an alphabet variable. Here's how:
regexprep(functionString,'x\[(\d+)\]','${char($1+48)}')
Adding 48 to your index $1 and converting it to a char will give you ASCII characters starting at 'a'.

Have you tried regexprep(functionString,'x\[(\d+)\]','${alphabet($0)}');?
From what I see here : https://de.mathworks.com/help/matlab/ref/regexprep.html regex matches are 0 based, so the first one should be $0.

Related

Python: Manipulate an array

I have an array that looks just like this one:
array(['00:00;1;5950;6\r', '00:10;1;2115;6\r', '00:10;2;4130;6\r',
'00:10;3;5675;6\r', '00:20;1;1785;6\r'],
dtype='|S15')
For my evaluation I need only the value after the second semicolon at each array entry. Here in this example, I need the values:
5950, 2115, 4130, 5675, 1785.
Is it possible to manipulate the array in way to get the entries I want? And how can that be solved? I know how to remove the symbols, so in the end I get the array:
['0000159506\r', '0010121156\r', '0010241306\r', '0010356756\r', '0020117856\r']
But I don't know if this is the right way to handle these problem. Do anyone know what to do? Thank you very much!
Try this:
A = array(['00:00;1;5950;6\r', '00:10;1;2115;6\r', '00:10;2;4130;6\r',
'00:10;3;5675;6\r', '00:20;1;1785;6\r'],
dtype='|S15')
list_ = [str(x.split(";")[2]) for x in A]
The best way to get these values is to use the python split function.
You can choose the delimiter. So in this case the delimiter can be a ;.
result = line.split(';')
and then just find the third value of the resulting list.
num = result[2]

How to substring of a string in matlab array

I have a matlab cell array of size 20x1 elements. And all the elements are string like 'a12345.567'.
I want to substitute part of the string (start to 9th index) of all the cells.
so that the element in matrix will be like 'a12345.3'.
How can I do that?
You can use cellfun:
M = { 'a12345.567'; 'b12345.567' }; %// you have 20 entries like these
MM = cellfun( #(x) [x(1:7),'3'], M, 'uni', 0 )
Resulting with
ans =
a12345.3
b12345.3
For a more advanced string replacement functionality in Matlab, you might want to explore strrep, and regexprep.
Another method that you can use is regexprep. Use regular expressions and find the positions of those numbers that appear after the . character, and replace them with whatever you wish. In this case:
M = { 'a12345.567'; 'b12345.567' }; %// you have 20 entries like these - Taken from Shai
MM = regexprep(M, '\d+$', '3');
MM =
'a12345.3'
'b12345.3'
Regular expressions is a framework that finds substrings within a larger string that match a particular pattern. In our case, \d is the regular expression for a single digit (0-9). The + character means that we want to find at least one or more digits chained together. Finally the $ character means that this pattern should appear at the end of the string. In other words, we want to find a pattern in each string such that there is a number that appears at the end of the string. regexprep will find these patterns if they exist, and replace them with whatever string you want. In this case, we chose 3 as per your example.

Sum of cell in matlab

I am trying to find the number of occurrences of "the" in a file that I have read into MATLAB. I have the following code n=strfind(z,'the') where z is the cell that all my lines are stored into. It finds all the occurrences but I am unsure how to sum them up to get a number. I tried using sum but it doesn't work. Any help would be greatly appreciated.
strfind will return [] if the supplied string is not found.
cell2mat will remove empty values from a cell array and just return the indices of the found string.
Therefore, you just need the length of the returned vector
z = {'Testing','Another','the', 'And the'};
n=length(cell2mat(strfind(z,'the')))
n =
3
Consider using cellfun to operate on the output of strfind so that you can use sum as you would like to do:
sum(cellfun(#numel,strfind(z,'the')))

Algorithm - check if any string in an array of strings is a prefix of any other string in the same array

I want to check if any string in an array of strings is a prefix of any other string in the same array. I'm thinking radix sort, then single pass through the array.
Anyone have a better idea?
I think, radix sort can be modified to retrieve prefices on the fly. All we have to do is to sort lines by their first letter, storing their copies with no first letter in each cell. Then if the cell contains empty line, this line corresponds to a prefix. And if the cell contains only one entry, then of course there are no possible lines-prefices in it.
Here, this might be cleaner, than my english:
lines = [
"qwerty",
"qwe",
"asddsa",
"zxcvb",
"zxcvbn",
"zxcvbnm"
]
line_lines = [(line, line) for line in lines]
def find_sub(line_lines):
cells = [ [] for i in range(26)]
for (ine, line) in line_lines:
if ine == "":
print line
else:
index = ord(ine[0]) - ord('a')
cells[index] += [( ine[1:], line )]
for cell in cells:
if len(cell) > 1:
find_sub( cell )
find_sub(line_lines)
If you sort them, you only need to check each string if it is a prefix of the next.
To achieve a time complexity close to O(N2): compute hash values for each string.
Come up with a good hash function that looks something like:
A mapping from [a-z]->[1,26]
A modulo operation(use a large prime) to prevent overflow of integer
So something like "ab" gets computed as "12"=1*27+ 2=29
A point to note:
Be careful what base you compute the hash value on.For example if you take a base less than 27 you can have two strings giving the same hash value, and we don't want that.
Steps:
Compute hash value for each string
Compare hash values of current string with other strings:I'll let you figure out how you would do that comparison.Once two strings match, you are still not sure if it is really a prefix(due to the modulo operation that we did) so do a extra check to see if they are prefixes.
Report answer

DNA extraction python

Now, I need to find a way in which Python can find the codon position number 5 of the above code and extract that sequence until position 12 (ATGG*CTTTACCTCGTC*TCACAGGAG). So the output should be something like this:
>CCODE1112_5..11
CTTTACCTCGTC
How can I tell python to get the begin value after the first "_" and the end value after ".." so it can do it automatically? ? THANKS!!!
def extractseq( queryseq , begin=5, end =12):
queryseq=queryseq.split('\n')#transform the string in a list of lines included in the string
return queryseq[1][begin-1:end-1]
I think this function should work, beware of the index which begin at 0 in python
after written that in your script you just have to call the function subs=extractseq(seq,5,12)
ok sorry so if you want to extract the 5 and the 12 included in the substring one way to do that easly is:
substring=queryseq.split('\n')[0].split('_')[1].split('...')#extraction of the substring
begin=substring[0]
end = substring[1]
I'd probably (sigh) use a regex to extract 5 and 12 from CCODE1112_5..12_ABC.
Then convert the extracted strings to int's.
Then use the int's as indexes in a string slice on the DNA data.
For the regex:
regex = re.compile(r'^[^]*(\d+)..(\d+)_.*$')
regex.match('CCODE1112_5..12_ABC')
match = regex.match('CCODE1112_5..12_ABC')
match.group(1)
'5'
match.group(2)
'12'
To convert those to int's, use int(match.group(1)), for example.
Then your indices are 1-based, while python's are 0-based. Also, python's starting point for a slice is at the value you want, and python's ending point for a slice is one past the value you want. So subtract one from group(1) and leave group(2) alone.
So something like:
substring = dna_data[left_point-1:right_point]

Resources