Regex in C, check if string contains specific characters - c

I want to have a regex that contains one and only one "-", followed by "s" or "h".
So, "-shshshshs" would match, "-ssssss" would too, but "-so" would not match, neither would "sh".
So far, I only succeeded to match "if strings contains "-" and "s" or "h", but typing "-sho" is accepted.
/* Compile regular expression */
reti = regcomp(&regex, "-[sh]", 0);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
/* Execute regular expression */
reti = regexec(&regex, "--sh", 0, NULL, 0);
if (!reti) {
puts("Match");
} else {
puts("No match");
}
Thanks in advance.

If your regex engine supports it:
"-[sh]+$"
Otherwise:
"-[sh][sh]*$"

Related

Issues with regular expression rejecting string

regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression, if two vowels it should be ok */
reti = regcomp(&regex, "[aoueiy].{2}", 0);
if (reti){
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
/* Execute regular expression */
reti = regexec(&regex, "ao", 0, NULL, 0);
if (!reti) {
puts("Match");
}
else if (reti == REG_NOMATCH) {
puts("No match");
}
I am trying to write an expression that is supposed to accept an string that contains at least two vowels. Here is my code so far, the string ao gives me "No match". I am new to regex and I find the manual hard to use. Very thankful for any help or tips.
Your regular expression matches a vowel followed by 2 other characters. [aoueiy] matches a vowel, . matches any characters, and adding {2} after it makes it match two characters. ao only has 1 character after the vowel, so it doesn't match.
The correct regexp is [aoueiy].*[aoueiy]. This matches two vowels with any number of characters (including 0) between them.

Start (^) and end ($) anchors not working

Basically I'm using following pattern in my C program (See Regular expression matching an infinite pattern):
^[0-9]( [0-9])*$
with following code:
char *pattern = "^[0-9]( [0-9])*$";
regex_t regexCompiled;
int rc = regcomp(&regexCompiled, pattern, REG_EXTENDED);
if (rc != 0) {
char msgbuf[100];
regerror(rc, &regexCompiled, msgbuf, sizeof (msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(EXIT_FAILURE);
}
if (regexec(&regexCompiled, "0 1", 0, NULL, REG_EXTENDED) == 0) {
printf("Valid\n");
} else {
printf("Invalid\n");
}
Where I exec against the string "0 1", which is valid for the pattern and it's not working. The '^' and '$' are not functioning. Why is that? and how can I make it work?
You are passing REG_EXTENDED to regexec(), that's not a valid flag for that call.
The manual page says:
eflags may be the bitwise-or of one or both of REG_NOTBOL and REG_NOTEOL which cause changes in matching behavior described below.
Probably the actual value of REG_EXTENDED matches one of those.
Changing the code to pass 0 as the final argument to regexec() makes it match.

Provisioning code regular expression

I'm trying to develop a C code that checks the validity of "Provisioning code" string using regular expressions.
A "provisioning code" format should respect the following law:
If not an empty string, this argument SHOULD be in the form of a hierarchical descriptor with one or more nodes specified. Each node in the hierarchy is represented as a 4-character sub-string, containing only numerals or upper-case letters. If there is more than one node indicated, each node is separated by a "." (dot). Examples: "TLCO" or "TLCO.GRP2".
I started development using the code in this link http://web.archive.org/web/20160308115653/http://peope.net/old/regex.html
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
int main(int argc, char *argv[]){
regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression */
reti = regcomp(&regex, "^a[[:alnum:]]", 0);
if( reti ){ fprintf(stderr, "Could not compile regex\n"); exit(1); }
/* Execute regular expression */
reti = regexec(&regex, "abc", 0, NULL, 0);
if( !reti ){
puts("Match");
}
else if( reti == REG_NOMATCH ){
puts("No match");
}
else{
regerror(reti, &regex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
/* Free compiled regular expression if you want to use the regex_t again */
regfree(&regex);
return 0;
}
this code works fine but my problem is what's the best regular expression that should be input of the function regcomp.
I started to try with a regular expression that matches with string that contains exactly 4 characters uppercases or numerals that means example like TLCO or TLC2 trying with the regular expression "[A-Z0-9]{4}" but I get "No match" as output with matches examples like TLC2.
Is there a suggestion about the right regular expression that should be input of regcomp and matches with "provisioning code"?
You may use the following regex that will work alright if you also pass REG_EXTENDED flag to the regcomp method (for the $ and {m,n} modifier to work correctly):
^[A-Z0-9]{4}([.][A-Z0-9]{4})*$
C code:
reti = regcomp(&regex, "^[A-Z0-9]{4}([.][A-Z0-9]{4})*$", REG_EXTENDED);
Details
^ - start of string
[A-Z0-9]{4} - 4 uppercase ASCII letters or digits
([.][A-Z0-9]{4})* - zero or more sequences of:
[.] - a literal . char
[A-Z0-9]{4} - 4 uppercase ASCII letters or digits
$ - end of string.

Posix regular expression not working in C

This is my first time working with regex in C and I am having some trouble. I am trying to replicate a syntax that is used in sed, namely the s/findthisstring/replacewiththis/g where findthisstring has to be present and replacewiththis does not.
The regex I came up with is ^s/(.*)/(.*)/g$
Here it is in my code
int verifyPattern(char *pattern) {
regex_t regex;
int reti = regcomp(&regex, "^s/(.*)/(.*)/g$", 0);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
reti = regexec(&regex, pattern, 0, NULL, 0);
if (!reti) {
puts("Match");
} else if (reti == REG_NOMATCH) {
puts("No match");
} else {
puts("Regex error");
}
return 1;
}
I believe the part that is messing up is the .+. If I replace it with .* everything is fine. Anyone know a work around for this??
Thanks!
You forgot to pass the REG_EXTENDED flag to allow $ in the pattern. Also, your regex would allow too many /.../ sections. What you need is to match either an escape sequence or a char other than / and \:
int reti = regcomp(&regex, "^s/(\\\\.|[^\\\\/]+)+/(\\\\.|[^\\\\/]+)*/g$", REG_EXTENDED);
See the C demo
Basically, I replaced . with (\\.|[^\\/]+) pattern matching either an escape sequence (\\.) or (|) one or more characters other than \ and /.
Here is a list of tests:
verifyPattern("s/s/s/g");//Match
verifyPattern("s/s//g");//Match
verifyPattern("s//s/g");//No Match
verifyPattern("s/s\\/s/g");//No match
verifyPattern("s/s\\/s/text/text/text/g"); // No match

How to get Ultraedit regex search to work in C/C++ code

I am having a tough time getting an Ultraedit regex to work in C/C++ code. I am aware of adding additional / for /w but still it does not work.
#include<regex.h>
#include <stdio.h>
int main()
{
int reti;
regex_t regex;
reti = regcomp(&regex, "^\w+\.c", 0);
if(!reti)
{
printf("compile success\n");
}
reti = regexec(&regex, "test.c", 0, NULL, 0);
if(!reti)
{
printf("match\n");
}
else
{
printf("mis match\n");
}
}
The regular expression above works properly on Ultraedit but why does it not work if put in C code as shown here?
I expect "match" to be printed out but when I run the above code, I get:
compile success
mis match
You need to escape the backslash one more time otherwise, it would be readed as an escape sequence.
reti = regcomp(&regex, "^\\w+\\.c", 0);
And also i think you're trying to match all the file names with the extension .c, in that case, you must use end of the line anchor.
reti = regcomp(&regex, "^\\w+\\.c$", 0);
Or
reti = regcomp(&regex, "^[[:alnum:]_]+\\.c$", 0);

Resources