Trying to create RegEx expression for string starting with a number and them matching a certain string inside? - c

I am trying to create a regular expression that starts with a number but then matches de or ba followed by 3 to digits right after it. What it should match is:
1Ghde345c
22Zui2ba777#
What I have so far is:
/^[0-9]?(be|de)*\d{3,5}/gm
But it doesn't seem to work. Is there something I am missing?

The provided regular expression currently requires that (be|de) immediately follow [0-9] at the beginning of the string:
^[0-9]?(be|de)*\d{3,5}
Based on the provided test strings, you instead want to match (be|de) followed by 3-5 digits at any position in the string, regardless of whether it immediately follows the digit at the start of the string. One way to achieve this is with the following regular expression:
^[0-9].*(be|de)\d{3,5}
^[0-9].* is used instead of ^[0-9]? for two reasons:
[0-9] is now a requirement instead of optional.
.* matches any character zero or more times, following the initial [0-9].
(be|de)* has been replaced with (be|de) to enforce only a single match, but can be reverted if sequential matches of be or de are allowed.
Example using regex101

Related

How can I modify my regular expression below to block the repetition of characters (not to be repeated more that two times in continuation)?

Below is the regular expression I want to modify to check the repetition of characters for not more than 2 times in continuation, see the below examples for more details. I have the regex for avoiding the repetition but I need both of them in the same expression. (/(.)\1{2}/)
For example:
Nameee : Invalid,
Naaame : Invalid,
Name : Valid,
Naammee: Valid,
Nnname : Invalid.
I have
^(?!.*[AaEeIiOoUu]{5}).*[AaEeIiOoUu].*[a-zA-Z\u00BF-\u1FFF\u2C00-\uD7FF]*(?:-[a-zA-Z\u00BF-\u1FFF\u2C00-\uD7FF]*)?$
Since the chars cannot appear in immediate succession, you may add (?!.*(.)\1{2}) to you pattern right after ^. You must use i modifier to make sure letters are treated in a case insensitive way.
Updated regex will look like
/^(?!.*(.)\1{2})(?!.*[AEIOU]{5}).*[AEIOU].*[A-Z\u00BF-\u1FFF\u2C00-\uD7FF]*(?:-[A-Z\u00BF-\u1FFF\u2C00-\uD7FF]*)?$/i
^^^^^^^^^^^^^^
See the regex demo
The (?!.*(.)\1{2}) is a negative lookahead that fails the match if, immediately to the right of the current location (here, right after the string start) there are 0+ chars, followed with a tripled char: (.) captures a char into Group 1 and the \1{2} is the same value as in Group 1 (due to the \1 backreference) that occurs 2 times (due to the {2} limited quantifier).

Sorting an array of URLs

I have an array with quasar URLs stored in it
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0329/spec-0329-52056-0059.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2957/spec-2957-54807-0164.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0342/spec-0342-51691-0089.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2881/spec-2881-54502-0508.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0302/spec-0302-51616-0435.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2947/spec-2947-54533-0371.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0301/spec-0301-51942-0460.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2962/spec-2962-54774-0461.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2974/spec-2974-54592-0185.fits
I want to sort out the URL array on basis of the number next to spec- and not using alphabetic order. I sorted the array with sort but it didn't help as it always took the 3rd row and 2nd last row to the top because they have a 1.
I'd like to have an output like this
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0301/spec-0301-51942-0460.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0302/spec-0302-51616-0435.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0329/spec-0329-52056-0059.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0342/spec-0342-51691-0089.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2881/spec-2881-54502-0508.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2947/spec-2947-54533-0371.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2957/spec-2957-54807-0164.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/104/spectra/2962/spec-2962-54774-0461.fits
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/2974/spec-2974-54592-0185.fits
If you will always have this pattern, you can try:
fileName = strsplit(myUrl, '/')(end)
number = strsplit(fileName(5:end), '.')(0)
Gonna walk you through this cause understanding is everything...
We start with
http://dr12.sdss3.org/sas/dr12/sdss/spectro/redux/26/spectra/0269/spec-0269-51581-0467.fits
First we split the URL on the / characters. This will return a vector of strings split up from this character. Since the number to sort on resides after the final /, we can pass end to grab the last one. Now we have
spec-0269-51581-0467.fits
Next, let's remove that pesky spec- from the number. This step isn't actually necessary, since it's constant across all the URLs, but let's just do it for fun. We can use Matlab's substring to grab the characters after the -, using fileName(5:end). This will create a string starting with the 5th character (in this case, a 0) and continue to the end. Great, now we have
0269-51581-0467.fits
Looking good! Again, this part isn't completely necessary either, but just in case for whatever reason you may need to, I've included it. We can use the strsplit function again, but this time split on the ., and grab the first element by passing a 0. Now, we have
0269-51581-0467
Go ahead and sort that little guy and you're good to go!

How to use Lex Regex to check for a double underscore

I'm trying to find the correct lex regex for finding a string literal that can consist of digits,characters, and underscore as long there no two or more underscore in a row. ie
This__Doesn'tw0rk
This_W1LL_work
So far, I tried using
{Letters}(({Letters}|{Digit})*)(_?)({Letters}|{Digit}+)
but that shouldn't work due the fact it will only have 1 underscore or no underscore. Where it is possible to have more than one underscore as long it not in a row.
{Letters}(({Letters}|{Digit}|_?)*)({Letters}|{Digit})+
This doesn't work due to the fact it can allow more than 1 underscore in a row. I'm going insane reading this(http://dinosaur.compilertools.net/lex/) over and over and trying to resolve it.
I have tried using the {m,n} as noted on the website, but that didn't quite work out well either.
Any pointers would be nice, I'm trying to figure this one last issue.
Simply try
{Letters}(_?({Letters}|{Digit}))*
This is for tokens that begin with a letter, then contain zero or more instances of an optional underscore followed by a letter or a digit. It should match
a
abc
a_b_c
aa_bbbb834758_9zz
There is no way to accept two consecutive underscores since every underscore must be followed by a letter or a digit.
Bonus: you cannot end with an underscore. Add _? to the very end if you would like to allow such a thing.

Regex negation pattern match

I have three string samples below:
day/Mon/done
day/Tue/done
day/Wed/done
How do I extract day/Wed/done using negation for the other two? Below doesn't work.
/day/[^(Mon|Tue)]/done
It's not how negated character classes work -- they still interpret each character inside the [..] as a single character. And there is no match for
day/?/done
where ? is only one character. Either use any of the techniques in Regular expression to match a line that doesn't contain a word? (thanks, Peter!), or make good use of the fact that the first character for these days are unique:
day/[^MT]../done
You might try lookahead.
'day/(?!(Mon|Tue)).*/done'

Use only one big regular expression in C

I'm writing an engine in C with libpcre that filters (in my case, a filter accepts or refuses a string) some log lines of the form:
tok1=foo tok2=bar ... tok3="value with spaces in it" ...
So, my way to filter them is to receive from the user a filter string of the form:
"tok1=regex1 tok2=regex2 tok3!=regex3 ..."
At the beginning, my engine parses this pattern, compiles all found regexes with pcre_compile/pcre_study and store them into a hashtable (or a radix).
("tok1"->pcre_regex1, "tok2"->pcre_regex2, "tok3"->pcre_regex3)
(by the way, the "!=" operator is used to filter lines which do NOT contains the following regex.)
Then, during the filtering phase itself, for every log line, I walk the line from beginning to end, char by char, I get token/value couples and if there is a regex relative to the token in my hashtable, the value must match the regex (pcre_exec) otherwise the line is rejected.
It works fine.
My question is: I am very skeptical but I was wondering if it is possible to write a single big regular expression which combine all my regex to filter lines and taking in account possible double-quotes to do:
pcre_exec(my_big_re, NULL, my_whole_log_line, len, 0, 0, NULL, 0)
Ho, and a subsidiary question : is it possible to write a negative of ANY regular expression of any form? (I am skeptical too)
Yes it's possible. Regular expressions are exactly equivalent to finite state machines, and it's trivial to negate a finite state machine (make all accepting states non-accepting, and vice-versa). However, you'll find that doing this causes an exponentially bigger regular expression in the worst case.

Resources