How do I get C to successfully match a regex? - c

So, I am trying to check the format of a key using the regex.h library in C. This is my code:
#include <stdio.h>
#include <regex.h>
int match(char *reg, char *string)
{
regex_t regex;
int res;
res = regcomp(&regex, reg, 0);
if (res)
{
fprintf(stderr, "Could not compile regex\n");
return 1;
}
res = regexec(&regex, string, 0, NULL, 0);
return res;
}
int main(void)
{
char *regex = "[\\w-]{24}\\.[\\w-]{6}\\.[\\w-]{27}|mfa\\.[\\w-]{84}";
char *key = "xxxxxxxxxxxxxxxxxxxxxxxx.xxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx";
if (match(regex, key) == 0) printf("Valid key!\n");
else printf("Invalid key!\n");
return 0;
}
When I run this code, I get the output:
Invalid key!
Why is this happening? If I try to test the same key with the same regex in Node.JS, I get that the key does match the regex:
> const regex = new RegExp("[\\w-]{24}\\.[\\w-]{6}\\.[\\w-]{27}|mfa\\.[\\w-]{84}");
undefined
> const key = "xxxxxxxxxxxxxxxxxxxxxxxx.xxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx";
undefined
> regex.test(key)
true
How could I get the right result using C?
Thanks in advance,
Robin

There are at least two issues here and one extra potential problem:
The limiting quantifiers will work as such in a POSIX ERE flavor, thus, as it has been pointed out in comments, you need to regcomp the pattern with a REG_EXTENDED option (i.e. res = regcomp(&regex, reg, REG_EXTENDED))
The \w shorthand character class does not work inside bracket expressions as a word char matching pattern, you need to replace it with [:alnum:]_, i.e. [\w-] must be replaced with [[:alnum:]_-]. The solution will be:
char *regex = "[[:alnum:]_-]{24}\\.[[:alnum:]_-]{6}\\.[[:alnum:]_-]{27}|mfa\\.[[:alnum:]_-]{84}";
Besides, if your regex must match the two alternatives exactly, you need to use a group around the whole pattern and add ^ and $ anchors on both ends. The solution will be:
char *regex = "^([[:alnum:]_-]{24}\\.[[:alnum:]_-]{6}\\.[[:alnum:]_-]{27}|mfa\\.[[:alnum:]_-]{84})$";
See this C demo:
#include <stdio.h>
#include <regex.h>
int match(char *reg, char *string)
{
regex_t regex;
int res;
res = regcomp(&regex, reg, REG_EXTENDED);
if (res)
{
fprintf(stderr, "Could not compile regex\n");
return 1;
}
res = regexec(&regex, string, 0, NULL, 0);
return res;
}
int main(void)
{
char *regex = "^([[:alnum:]_-]{24}\\.[[:alnum:]_-]{6}\\.[[:alnum:]_-]{27}|mfa\\.[[:alnum:]_-]{84})$";
char *key = "xxxxxxxxxxxxxxxxxxxxxxxx.xxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxx";
if (match(regex, key) == 0) printf("Valid key!\n");
else printf("Invalid key!\n");
return 0;
}
// => Valid key!

Related

Using pcre_get_substring_list, match all pattern inside of string and return in array in C with PCRE?

I need to use C in Linux with PCRE to match in this string "<test>a</test> <test>b</test> <test>c</Test>" to get the letters a, b, and c.
I found this script in stackoverflow, it is good but does not work for all the matches. Only the first matches. Why?
/*
* gcc pcre1.c -lpcre
*/
#include <pcre.h>
#include <stdio.h>
#include <string.h>
int main()
{
pcre* compile;
pcre_extra* extra;;
int res;
int ovector[30];
const char* pattern="(?i)<test>(.*?)</test>";
const char* errptr;
const char* match[30];
const char** match_list = match;
int erroffset;
char* test_str = "<test>a</test> <test>b</test> <test>c</Test>";
compile = pcre_compile(pattern, PCRE_MULTILINE,&errptr,&erroffset,NULL);
if ( compile == NULL ) {
fprintf(stderr, "ERROR: Could not compile '%s' : %s\n", pattern, errptr);
exit(1);
}
extra = pcre_study(compile, 0, &errptr);
if ( errptr != NULL ) {
fprintf(stderr, "ERROR: Could not study '%s' : %s\n", pattern, errptr);
exit(1);
}
res = pcre_exec(compile,extra,test_str,strlen(test_str),0,0,ovector,sizeof(ovector));
if ( res == 0 ) {
res = 30/3;
}
if ( res > 0 ) {
pcre_get_substring_list(test_str, ovector, res, &match_list);
printf("buffer : %s\n", test_str);
printf("match :\n");
for ( int i = 0; match_list[i]; ++ i ) {
printf("%9s%s\n", " ", match_list[i]);
printf("\n");
}
if ( match_list )
pcre_free_substring_list(match_list);
}
printf("\n");
if (compile)
pcre_free(compile);
if (extra)
pcre_free(extra);
}```
thanks
I changed your code slightly, but this works as you expect now:
% ./pcre1
a
b
c
I'll list the changes and why I made them:
We will be using ovector before it it is set initially, so zero out.
int ovector[30] = {0};
The pcre_get_substring() will be easier for you to use for this purpose, so I switched away from pcre_get_substring_list().
We didn't need match[], as pcre_get_substring() calls pcre_malloc().
The variable match_list must be char* as we are passing it as &match_list.
const char* match_list;
The function pcre_exec() expects ovecsize to be a multiple of 3.
3*(sizeof(ovector)/3)
I wrapped the pcre_exec() call in a while loop.
I used pcre_get_substring(), printf(), and pcre_free_substring() instead.
// gcc pcre1.c -lpcre
#include <pcre.h>
#include <stdio.h>
#include <string.h>
int main()
{
pcre* compile;
pcre_extra* extra;;
int res;
int ovector[30] = {0};
const char* pattern="(?i)<test>(.*?)</test>";
const char* errptr;
const char* match_list;
int erroffset;
char* test_str = "<test>a</test> <test>b</test> <test>c</Test>";
compile = pcre_compile(pattern, PCRE_MULTILINE,&errptr,&erroffset,NULL);
if ( compile == NULL ) {
fprintf(stderr, "ERROR: Could not compile '%s' : %s\n", pattern, errptr);
exit(1);
}
extra = pcre_study(compile, 0, &errptr);
if ( errptr != NULL ) {
fprintf(stderr, "ERROR: Could not study '%s' : %s\n", pattern, errptr);
exit(1);
}
while ((res = pcre_exec(compile, extra, test_str, strlen(test_str), ovector[1], 0, ovector, 3*(sizeof(ovector)/3))) >= 0) {
if (pcre_get_substring(test_str, ovector, res, 1, &match_list) >= 0) {
printf("%s\n", match_list);
pcre_free_substring(match_list);
}
}
if (compile)
pcre_free(compile);
if (extra)
pcre_free(extra);
}

how to add if statement while verifying string with Regular Expressions in C

I need to verify a String(string:89dree01) with regular expression ([a-zA-Z0-9]*) using if condition in C like so:
if(string=regex) {}
Could someone help me with this?
Below is the code snippet:
#include <regex.h>
#include <stdio.h>
int main()
{
regex_t * regex = "[a-zA-Z0-9]*";
int value;
value = regcomp(regex,"89dree01", 0);
if (value == 0) {
LOG("RegEx compiled successfully.");
}
else {
LOG("Compilation error.");
}
return 0;
}
You're not using the POSIX regexp library quite correctly.
Here's an example that checks whether arguments given on the command line match that regexp (slightly modified).
#include <regex.h>
#include <stdio.h>
int main(int argc, char **argv) {
regex_t regex;
if (regcomp(&regex, "^[a-zA-Z0-9]+$", REG_NOSUB | REG_EXTENDED)) {
return 1;
}
for (int i = 1; i < argc; i++) {
int status = regexec(&regex, argv[i], 0, NULL, 0);
printf("%s: %s\n", argv[i], status == REG_NOMATCH ? "no match" : "matched");
}
return 0;
}
~/Desktop $ gcc -o s s.c
~/Desktop $ ./s aaa bb0 00a11 ..--
aaa: matched
bb0: matched
00a11: matched
..--: no match
Edit:
Simply (if inefficiently) put, you can wrap this in a function:
int does_regexp_match(const char *string, const char *regexp) {
regex_t r;
if (regcomp(&r, regexp, REG_NOSUB | REG_EXTENDED)) {
return -1;
}
return regexec(&r, string, 0, NULL, 0) == 0 ? 1 : 0;
}
if(does_regexp_match("89dree01", "^[a-zA-Z0-9]+$") == 1) {
// it's a match
}

Generate Email address Regex [duplicate]

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 6 years ago.
I'm trying today to build a regex to make it match to email adress.
I've made one but not working in all the cases I want.
I would a Regex to match with all email address finishing with 2 characters after the dot or only the .com.
I hope to be clear enought,
aaaaaa#bbbb.uk --> should work
aaaaaa#bbbb.com --> should work
aaaaaa#bbbb.cc --> should work
aaaaaa#bbbb.ukk --> should not work
aaaaaa#bbbb. --> should not work
this is my code:
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int main (void)
{
int match;
int err;
regex_t preg;
regmatch_t pmatch[5];
size_t nmatch = 5;
const char *str_request = "1aaaak#aaaa.ukk";
const char *str_regex = "[a-zA-Z0-9][a-zA-Z0-9_.]+#[a-zA-Z0-9_]+.[a-zA-Z0-9_.]+[a-zA-Z0-9]{2}";
err = regcomp(&preg, str_regex, REG_EXTENDED);
if (err == 0)
{
match = regexec(&preg, str_request, nmatch, pmatch, 0);
nmatch = preg.re_nsub;
regfree(&preg);
if (match == 0)
{
printf ("match\n");
int start = pmatch[0].rm_so;
int end = pmatch[0].rm_eo;
printf("%d - %d\n", start, end);
}
else if (match == REG_NOMATCH)
{
printf("unmatch\n");
}
}
puts ("\nPress any key\n");
getchar ();
return (EXIT_SUCCESS);
}
"[a-zA-Z0-9][a-zA-Z0-9_.]+#[a-zA-Z0-9_]+\\.(com|[a-zA-Z]{2})$"
https://regex101.com/ is a very good tool for that
\. means a litteral dot ;
(|) means an alternative ;
$ means the end of the line, as we do not want some trailing chars after the match.

regular expressions in C match and print

I have lines from file like this:
{123} {12.3.2015 moday} {THIS IS A TEST}
is It possible to get every value between brackets {} and insert into array?
Also I wold like to know if there is some other solution for this problem...
to get like this:
array( 123,
'12.3.2015 moday',
'THIS IS A TEST'
)
My try:
int r;
regex_t reg;
regmatch_t match[2];
char *line = "{123} {12.3.2015 moday} {THIS IS A TEST}";
regcomp(&reg, "[{](.*?)*[}]", REG_ICASE | REG_EXTENDED);
r = regexec(&reg, line, 2, match, 0);
if (r == 0) {
printf("Match!\n");
printf("0: [%.*s]\n", match[0].rm_eo - match[0].rm_so, line + match[0].rm_so);
printf("1: %.*s\n", match[1].rm_eo - match[1].rm_so, line + match[1].rm_so);
} else {
printf("NO match!\n");
}
This will result:
123} {12.3.2015 moday} {THIS IS A TEST
Anyone know how to improve this?
To help you you can use the regex101 website which is really useful.
Then I suggest you to use this regex:
/(?<=\{).*?(?=\})/g
Or any of these ones:
/\{\K.*?(?=\})/g
/\{\K[^\}]+/g
/\{(.*?)\}/g
Also available here for the first one:
https://regex101.com/r/bB6sE8/1
In C you could start with this which is an example for here:
#include <stdio.h>
#include <string.h>
#include <regex.h>
int main ()
{
char * source = "{123} {12.3.2015 moday} {THIS IS A TEST}";
char * regexString = "{([^}]*)}";
size_t maxGroups = 10;
regex_t regexCompiled;
regmatch_t groupArray[10];
unsigned int m;
char * cursor;
if (regcomp(&regexCompiled, regexString, REG_EXTENDED))
{
printf("Could not compile regular expression.\n");
return 1;
};
cursor = source;
while (!regexec(&regexCompiled, cursor, 10, groupArray, 0))
{
unsigned int offset = 0;
if (groupArray[1].rm_so == -1)
break; // No more groups
offset = groupArray[1].rm_eo;
char cursorCopy[strlen(cursor) + 1];
strcpy(cursorCopy, cursor);
cursorCopy[groupArray[1].rm_eo] = 0;
printf("%s\n", cursorCopy + groupArray[1].rm_so);
cursor += offset;
}
regfree(&regexCompiled);
return 0;
}

How do you capture a group with regex?

I'm trying to extract a string from another using regex.
I'm using the POSIX regex functions (regcomp, regexec ...), and I fail at capturing a group ...
For instance, let the pattern be something as simple as "MAIL FROM:<(.*)>"
(with REG_EXTENDED cflags)
I want to capture everything between '<' and '>'
My problem is that regmatch_t gives me the boundaries of the whole pattern (MAIL FROM:<...>) instead of just what's between the parenthesis ...
What am I missing ?
Thanks in advance,
edit: some code
#define SENDER_REGEX "MAIL FROM:<(.*)>"
int main(int ac, char **av)
{
regex_t regex;
int status;
regmatch_t pmatch[1];
if (regcomp(&regex, SENDER_REGEX, REG_ICASE|REG_EXTENDED) != 0)
printf("regcomp error\n");
status = regexec(&regex, av[1], 1, pmatch, 0);
regfree(&regex);
if (!status)
printf( "matched from %d (%c) to %d (%c)\n"
, pmatch[0].rm_so
, av[1][pmatch[0].rm_so]
, pmatch[0].rm_eo
, av[1][pmatch[0].rm_eo]
);
return (0);
}
outputs:
$./a.out "012345MAIL FROM:<abcd>$"
matched from 6 (M) to 22 ($)
solution:
as RarrRarrRarr said, the indices are indeed in pmatch[1].rm_so and pmatch[1].rm_eo
hence regmatch_t pmatch[1]; becomes regmatch_t pmatch[2];
and regexec(&regex, av[1], 1, pmatch, 0); becomes regexec(&regex, av[1], 2, pmatch, 0);
Thanks :)
Here's a code example that demonstrates capturing multiple groups.
You can see that group '0' is the whole match, and subsequent groups are the parts within parentheses.
Note that this will only capture the first match in the source string. Here's a version that captures multiple groups in multiple matches.
#include <stdio.h>
#include <string.h>
#include <regex.h>
int main ()
{
char * source = "___ abc123def ___ ghi456 ___";
char * regexString = "[a-z]*([0-9]+)([a-z]*)";
size_t maxGroups = 3;
regex_t regexCompiled;
regmatch_t groupArray[maxGroups];
if (regcomp(&regexCompiled, regexString, REG_EXTENDED))
{
printf("Could not compile regular expression.\n");
return 1;
};
if (regexec(&regexCompiled, source, maxGroups, groupArray, 0) == 0)
{
unsigned int g = 0;
for (g = 0; g < maxGroups; g++)
{
if (groupArray[g].rm_so == (size_t)-1)
break; // No more groups
char sourceCopy[strlen(source) + 1];
strcpy(sourceCopy, source);
sourceCopy[groupArray[g].rm_eo] = 0;
printf("Group %u: [%2u-%2u]: %s\n",
g, groupArray[g].rm_so, groupArray[g].rm_eo,
sourceCopy + groupArray[g].rm_so);
}
}
regfree(&regexCompiled);
return 0;
}
Output:
Group 0: [ 4-13]: abc123def
Group 1: [ 7-10]: 123
Group 2: [10-13]: def
The 0th element of the pmatch array of regmatch_t structs will contain the boundaries of the whole string matched, as you have noticed. In your example, you are interested in the regmatch_t at index 1, not at index 0, in order to get information about the string matches by the subexpression.
If you need more help, try editing your question to include an actual small code sample so that people can more easily spot the problem.

Resources