Unable to match regex in C - c

I have a problem with following regex:
prefix:\w+,\w+,\s*-?[0-9]{1,4}\s*,\s*-?[0-9]{1,4}\s*,\s*-?[0-9]{1,4}\s*,(?:\w+)
The match string is the following:
prefix:string,string,-100,100,0,string
I cannot match this string in my C code. At least I get a successful match on some online tool where I generated this regex. Also there were compilation warnings about "\" characters in regex so I replaced it as explained in c-compiler-warning-unknown-escape-sequence-using-regex-for-c-program. Regex after compilation warning fixes:
prefix:\\w+,\\w+,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,(?:\\w+)
Here's the testing code:
#include <stdio.h>
#include <regex.h>
#include <stdlib.h>
#define REGEX "prefix:\\w+,\\w+,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,(?:\\w+)"
const char *input = "prefix:string,string,-100,100,0,string";
int main(){
int rc;
regex_t regex;
rc = regcomp(&regex, REGEX, 0);
if (rc != 0) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
rc = regexec(&regex, input, 0, NULL, 0);
if (rc == 0) {
printf("Match!\n");
return 0;
}
else if (rc == REG_NOMATCH) {
printf("No match\n");
return -1;
}
else {
perror("Error\n");
exit(1);
}
return 0;
}
I use gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12)

You need to do two things:
Use REG_EXTENDED flag to compile the regex (so that extended regular expression flavor (ERE) is enabled), else the limiting quantifier will need escaping, and there may be other potential issues
Remove the non-capturing group (?:...) as POSIX does not support this construct.
To make \w+ optional, just replace (?:\w+) with \w*.
Use
#define REGEX "prefix:\\w+,\\w+,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,\\w*"
^^^
and then
rc = regcomp(&regex, REGEX, REG_EXTENDED);
^^^^^^^^^^^^
See the C demo.
Also, see more about the Extended Regular Expressions that are enabled by REG_EXTENDED, and POSIX Bracket Expressions is also a good resource to learn differences between BRE (Basic Regular Expressions) and ERE flavors.

Related

How to use Regex to verify data input from keyboard is real numbers with C languages?

I try to research for REGEX in C and try to understand but I have trouble with pattern of the string type.
In this program I want to verify string input is a number(only digits number, not characters, space, or special characters)
#include<stdio.h>
#include <regex.h>
void print_result(int return_value){
if (return_value == 0){
printf("Pattern found.\n");
}
else if (return_value == REG_NOMATCH){
printf("Pattern not found.\n");
}
else{
printf("An error occured.\n");
}
}
int main() {
regex_t regex;
int return_value;
int return_value2;
return_value = regcomp(&regex,"[^a-fA-F_][0-9]+",0);
return_value = regexec(&regex, "4324", 0, NULL, 0);
return_value2 = regcomp(&regex,"\d+",0);
return_value = regexec(&regex, "4324", 0, NULL, 0);
print_result(return_value); //not found
print_result(return_value); //no found
print_result(return_value2);
return 0;
}
Can you give me some ideas to verify the input. I want find another way without use ASCII values
If you specify the flags as 0 in regcomp:
return_value = regcomp(&regex,"[^a-fA-F_][0-9]+",0);
then you are accepting the default regex syntax, which is a so-called Basic Regular Expression (BRE). The only sensible thing that can be said about BREs is "don't use them." Always specify the REG_EXTENDED flag (at least), and then you will be working with a regular expression syntax that at least bears a passing resemblance to what you expect. (Otherwise, your strings will be dominated by what's technically called "leaning timber": \ characters which enable metacharacters in the regex, and more \ characters so that the \ characters you need are not treated as escape characters in the character string.)
Take a look at man regexec and man 7 regex for more details. Make sure you read the second link thoroughly (although you can ignore basic regular expression syntax :-) ) because there are many commonly-used syntaxes in more modern regex libraries which are not present in Posix regexes, not even extended ones. (That includes \d, used in your second regex. Posix has named character classes, such as [[:digit:]].)

Provisioning code regular expression

I'm trying to develop a C code that checks the validity of "Provisioning code" string using regular expressions.
A "provisioning code" format should respect the following law:
If not an empty string, this argument SHOULD be in the form of a hierarchical descriptor with one or more nodes specified. Each node in the hierarchy is represented as a 4-character sub-string, containing only numerals or upper-case letters. If there is more than one node indicated, each node is separated by a "." (dot). Examples: "TLCO" or "TLCO.GRP2".
I started development using the code in this link http://web.archive.org/web/20160308115653/http://peope.net/old/regex.html
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
int main(int argc, char *argv[]){
regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression */
reti = regcomp(&regex, "^a[[:alnum:]]", 0);
if( reti ){ fprintf(stderr, "Could not compile regex\n"); exit(1); }
/* Execute regular expression */
reti = regexec(&regex, "abc", 0, NULL, 0);
if( !reti ){
puts("Match");
}
else if( reti == REG_NOMATCH ){
puts("No match");
}
else{
regerror(reti, &regex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
/* Free compiled regular expression if you want to use the regex_t again */
regfree(&regex);
return 0;
}
this code works fine but my problem is what's the best regular expression that should be input of the function regcomp.
I started to try with a regular expression that matches with string that contains exactly 4 characters uppercases or numerals that means example like TLCO or TLC2 trying with the regular expression "[A-Z0-9]{4}" but I get "No match" as output with matches examples like TLC2.
Is there a suggestion about the right regular expression that should be input of regcomp and matches with "provisioning code"?
You may use the following regex that will work alright if you also pass REG_EXTENDED flag to the regcomp method (for the $ and {m,n} modifier to work correctly):
^[A-Z0-9]{4}([.][A-Z0-9]{4})*$
C code:
reti = regcomp(&regex, "^[A-Z0-9]{4}([.][A-Z0-9]{4})*$", REG_EXTENDED);
Details
^ - start of string
[A-Z0-9]{4} - 4 uppercase ASCII letters or digits
([.][A-Z0-9]{4})* - zero or more sequences of:
[.] - a literal . char
[A-Z0-9]{4} - 4 uppercase ASCII letters or digits
$ - end of string.

Posix regular expression not working in C

This is my first time working with regex in C and I am having some trouble. I am trying to replicate a syntax that is used in sed, namely the s/findthisstring/replacewiththis/g where findthisstring has to be present and replacewiththis does not.
The regex I came up with is ^s/(.*)/(.*)/g$
Here it is in my code
int verifyPattern(char *pattern) {
regex_t regex;
int reti = regcomp(&regex, "^s/(.*)/(.*)/g$", 0);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
reti = regexec(&regex, pattern, 0, NULL, 0);
if (!reti) {
puts("Match");
} else if (reti == REG_NOMATCH) {
puts("No match");
} else {
puts("Regex error");
}
return 1;
}
I believe the part that is messing up is the .+. If I replace it with .* everything is fine. Anyone know a work around for this??
Thanks!
You forgot to pass the REG_EXTENDED flag to allow $ in the pattern. Also, your regex would allow too many /.../ sections. What you need is to match either an escape sequence or a char other than / and \:
int reti = regcomp(&regex, "^s/(\\\\.|[^\\\\/]+)+/(\\\\.|[^\\\\/]+)*/g$", REG_EXTENDED);
See the C demo
Basically, I replaced . with (\\.|[^\\/]+) pattern matching either an escape sequence (\\.) or (|) one or more characters other than \ and /.
Here is a list of tests:
verifyPattern("s/s/s/g");//Match
verifyPattern("s/s//g");//Match
verifyPattern("s//s/g");//No Match
verifyPattern("s/s\\/s/g");//No match
verifyPattern("s/s\\/s/text/text/text/g"); // No match

Regular Expression in C

I'm trying to match a string "123,1234" using regex.h. Following pattern does the job:
"^[0-9]\{1,\},[0-9]\{1,\}$"
If I'm giving it as a commandline argument it works fine. But when I use it inside C code it does not work. Probably because identifying backward slashes as escape characters.
Sample Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <regex.h>
int main (int argc, char * argv[]){
regex_t regex;
int reti;
char msgbuf[100];
char * string, * pattern;
string = "123,1234";
pattern = "^[0-9]\{1,\},[0-9]\{1,\}$";
if(regcomp(&regex, pattern, 0))
{
fprintf(stderr, "Could not compile regex\n");
exit(107);
}
if(!(reti = regexec(&regex, string, 0, NULL, 0)))
{
printf("MATCH\n");
}
else if(reti == REG_NOMATCH)
{
printf("NO MATCH\n");
}
else
{
regerror(reti, &regex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(107);
}
regfree(&regex);
return 0;
}
How can I solve this?
Your regular expression is an ERE not a BRE, so you need to pass the REG_EXTENDED flag to regcomp. Then, as others have said, remove the backslashes too.
^[0-9]{1,},[0-9]{1,}$
Take out the \ backslashes. They escape the character immediately following.
To override the slash as an escape sequence, in C one uses a slash to escape a slash, so \ becomes \\.
However, since you are not passing the string to the command line, the curly braces { and } are not going to be caught by the shell's parser, so you could just try it without all of the slashes.
"^[0-9]{1,},[0-9]{1,}$"
If the slashes are being treated as escape characters, have you tried doubling up the slashes so that they are treated as escaped slashes?
ie
"^[0-9]\\{1,\\},[0-9]\\{1,\\}$"

How to get Ultraedit regex search to work in C/C++ code

I am having a tough time getting an Ultraedit regex to work in C/C++ code. I am aware of adding additional / for /w but still it does not work.
#include<regex.h>
#include <stdio.h>
int main()
{
int reti;
regex_t regex;
reti = regcomp(&regex, "^\w+\.c", 0);
if(!reti)
{
printf("compile success\n");
}
reti = regexec(&regex, "test.c", 0, NULL, 0);
if(!reti)
{
printf("match\n");
}
else
{
printf("mis match\n");
}
}
The regular expression above works properly on Ultraedit but why does it not work if put in C code as shown here?
I expect "match" to be printed out but when I run the above code, I get:
compile success
mis match
You need to escape the backslash one more time otherwise, it would be readed as an escape sequence.
reti = regcomp(&regex, "^\\w+\\.c", 0);
And also i think you're trying to match all the file names with the extension .c, in that case, you must use end of the line anchor.
reti = regcomp(&regex, "^\\w+\\.c$", 0);
Or
reti = regcomp(&regex, "^[[:alnum:]_]+\\.c$", 0);

Resources