i am trying to build regular expression with the regex.h lib.
i checked my expression in https://regex101.com/ with the the input
"00001206 ffffff00 00200800 00001044" and i checked it in python as well, both gave me the expected result.
when i ran the code below in c (over unix) i got "no match" print.
any one have any suggest?
regex_t regex;
int reti;
reti = regcomp(®ex, "([0-9a-fA-F]{8}( |$))+$", 0);
if (reti)
{
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
reti = regexec(®ex, "00001206 ffffff00 00200800 00001044", 0, NULL, 0);
if (!reti)
{
printf("Match");
}
else if (reti == REG_NOMATCH) {
printf("No match bla bla\n");
}
Your pattern contains a $ anchor, capturing groups with (...) and the interval quantifier {m,n}, so you need to pass REG_EXTENDED to the regex compile method:
regex_t regex;
int reti;
reti = regcomp(®ex, "([0-9a-fA-F]{8}( |$))+$", REG_EXTENDED); // <-- See here
if (reti)
{
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
reti = regexec(®ex, "00001206 ffffff00 00200800 00001044", 0, NULL, 0);
if (!reti)
{
printf("Match");
}
else if (reti == REG_NOMATCH) {
printf("No match bla bla\n");
}
See the online C demo printing Match.
However, I believe you need to match the entire string, and disallow whitespace at the end, so probably
reti = regcomp(®ex, "^[0-9a-fA-F]{8}( [0-9a-fA-F]{8})*$", REG_EXTENDED);
will be more precise as it will not allow any arbitrary text in front and won't allow a trailing space.
Related
I've tested the following regular expression at http://www.regexpal.com/
([A-Z]{1}[a-z]+)+([_]{1}([A-Z]{1}[a-z]+)+)+[.][a-z]+
Which successfully matches the following file names:
Butter_Butter.jpg
JavaPiebald_Java_Piebald.jpg
LowWhitePied_Pied.jpg
Piebald_Piebald.jpg
SpinnerBlast_Spider_Pinstripe_Pastel.jpg
Caramel_Caramel.jpg
LightningPied_Pied_Axanthic.jpg
Pastel_Pastel.jpg
Spider_Spider.jpg
Spinner_Spider_Pinstripe.jpg
When I implement the regular expression in the following C code, I receive no matches:
#define COLLECTION_REGEX "([A-Z]{1}[a-z]+)+([_]{1}([A-Z]{1}[a-z]+)+)+[.][a-z]+"
int is_valid_filename(char *filename)
{
regex_t regex;
int i, match;
char msgbuf[100];
match = 1;
i = regcomp(®ex, COLLECTION_REGEX, 0);
if (i)
{
perror("Could not compile regex");
}
else
{
match = regexec(®ex, filename, 0, NULL, 0);
if (!match)
{
puts("Match");
}
else if (match == REG_NOMATCH)
{
puts("No match");
}
else
{
regerror(match, ®ex, msgbuf, sizeof(msgbuf));
puts(msgbuf);
}
}
regfree(®ex);
return match;
}
Subsequent execution:
./a.out
No match
No match
No match
No match
No match
No match
No match
No match
No match
No match
The regular expression appears correct, I am uncertain as to why I am obtaining these results.
Output from GDB:
Breakpoint 1, is_valid_filename (filename=0x609050 "Piebald_Piebald.jpg") at crp-web-builder.c:76
76 match = regexec(®ex, filename, 0, NULL, 0);
(gdb) continue
Continuing.
No match
Breakpoint 1, is_valid_filename (filename=0x609460 "LightningPied_Pied_Axanthic.jpg") at crp-web-builder.c:76
76 match = regexec(®ex, filename, 0, NULL, 0);
(gdb) continue
Continuing.
No match
Breakpoint 1, is_valid_filename (filename=0x609870 "SpinnerBlast_Spider_Pinstripe_Pastel.jpg") at crp-web-builder.c:76
76 match = regexec(®ex, filename, 0, NULL, 0);
(gdb) continue
Continuing.
No match
Thanks to Jonathan Leffler. The line:
i = regcomp(®ex, COLLECTION_REGEX, 0);
Should be:
i = regcomp(®ex, COLLECTION_REGEX, REG_EXTENDED);
I am trying to match strings like 'sdb-iof-pool 1008.56M 884K' using this regular expression: ^(.*)([\s]+)([-+]?[0-9]*\.?[0-9]+)([K|M|G|T|P]{1})([\s]+)([-+]?[0-9]*\.?[0-9]+)([K|M|G|T|P]{1})(.*)$
My c code is the following:
int reti;
regex_t regex;
size_t maxGroups = 8;
regmatch_t groupArray[maxGroups];
const char * pattern = "^(.*)([\\s]+)([-+]?[0-9]*\\.?[0-9]+)([K|M|G|T|P]{1})([\\s]+)([-+]?[0-9]*\\.?[0-9]+)([K|M|G|T|P]{1})(.*)$";
reti = regcomp(®ex, pattern, REG_EXTENDED);
if (reti) {
regerror(reti, ®ex, log_buffer, IOF_MAX_MSG);
snprintf(error, IOF_MAX_MSG, "%s: Failed to compile regex '%s': (%d) '%s'", __FUNCTION__, pattern, reti, log_buffer);
return FAIL;
}
reti = regexec(®ex, cmd_output, maxGroups, groupArray, 0);
if (reti == REG_NOMATCH) {
regerror(reti, ®ex, log_buffer, IOF_MAX_MSG);
regfree(®ex);
snprintf(error, IOF_MAX_MSG, "Failed to match regex '%s' on '%s': %s", pattern, cmd_output, log_buffer);
return FAIL;
}
regfree(®ex);
Even though tools like this seem to confirm that the regular expression works fine, my program returns:
"Failed to match regex '^(.)([\s]+)([-+]?[0-9].?[0-9]+)([K|M|G|T|P]{1})([\s]+)([-+]?[0-9].?[0-9]+)([K|M|G|T|P]{1})(.)$' on 'sdb-iof-pool 1008.56M 884K': No match"
After several trials and errors using the utility "grep" with the option "-E" for extended regular expression and by reading the manual of the utility I suspected that the character class [\s] was the culprit. The character class [\s] is not recognized by POSIX syntax. [:space:] must be used instead.
The Curly braces{} are not working in C language Regular expressions, it is always giving output as NO match , if i give correct input as "ab" or "ac". I would request to help in this case.
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
int main(int argc, char *argv[]){ regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression */
reti = regcomp(®ex, "[a-c]{2}", 0);
if( reti ){ fprintf(stderr, "Could not compile regex\n"); return(1); }
/* Execute regular expression */
reti = regexec(®ex, "ab", 0, NULL, 0);
if( !reti ){
puts("Match");
}
else if( reti == REG_NOMATCH ){
puts("No match");
}
else{
regerror(reti, ®ex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
return 1;
}
/* Free compiled regular expression if you want to use the regex_t again */
regfree(®ex);
return 0;
}
You are using the Basic Regular Expressions dialect that has no knowledge of the quantifier {n} in the regex.
One solution would be to supply the option REG_EXTENDED as the last argument instead of 0 when creating your regex_t object.
reti = regcomp(®ex, "[a-c]{2}", REG_EXTENDED);
See http://ideone.com/oIBXxu for a Demo of your code with my modification.
As Casimir et Hippolyte notes in the comments Basic Regular Expressions support the {} quantifier as well but the curly braces must be escaped with a \ in the regex which again has to be escaped in the C string as \\. So you can use the line
reti = regcomp(®ex, "[a-c]\\{2\\}", 0);
as well as an alternative to the solution above(running Demo with this line modified under http://ideone.com/x7vlIO).
You can check http://www.regular-expressions.info/posix.html for more information about the difference between Basic and Extended Regular Expressions.
I am trying to write a program to find whether a give string is hex or not.So the given string must contain only character in between 0-9,A-F and a-f.How can i accomplish this using C?
The program i tried is give below but the regex pattern is not working well.What will be the error in this pattern?
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
int main(int argc, char *argv[]){
regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression */
reti = regcomp(®ex, "^[a-fA-F0-9]+$", 0);
if( reti )
{
fprintf(stderr, "Could not compile regex\n");
//exit(1);
}
/* Execute regular expression */
reti = regexec(®ex, "ABC123defG", 0, NULL, 0);
if( !reti ){
puts("Match");
}
else if( reti == REG_NOMATCH ){
puts("No match");
}
else{
regerror(reti, ®ex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
//exit(1);
}
/* Free compiled regular expression if you want to use the regex_t again */
regfree(®ex);
return 0;
}
You need to specify REG_EXTENDED in the flags argument to regcomp. If you don't, you end up with "basic" regular expression syntax, which doesn't include the + operator, amongst other things.
It's slightly surprising that "basic" regular expressions still exist, never mind being the default. But that's backwards-compatibility for you.
I have this code for matching an IP address pattern. But it doesn't seem to work and I don't know why. It always prints on the terminal "No match"
regex_t regex;
int reti;
char msgbuf[100];
reti = regcomp(®ex, "^([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3})$", 0);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
reti = regexec(®ex, "124.168.21.3", 0, NULL, 0);
if (!reti) {
puts("Match");
} else if (reti == REG_NOMATCH) {
puts("No match");
} else {
regerror(reti, ®ex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
regfree(®ex);
Any idea?
I found it, in fact I should specify the cflags field of the regcomp function to REG_EXTENDED and not 0.
You should escape the dots. And you probably don't need the capturing groups. Replace
"^([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3})$"
with
"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}$"