regmatch_t how can i get match only? - c

I don't think I understand how to return only the matched regular expression. I have a file that is a webpage. I'm trying to get all the links in the page. The regex works fine. But if I printf it out it will print out the line in which that match occurs. I only want to display the match only. I see you can do grouping so I tried that and am getting back an int value for my second printf call. According to the doc it is an offset. But offset to what? It doesn't seem to be accurate either because it would say 32 when character 32 on that line has nothing to do with the regex. I put in an exit just see the first match. Where am I going wrong?
char line[1000];
FILE *fp_original;
fp_original = fopen (file_original_page, "r");
regex_t re_links;
regmatch_t group[2];
regcomp (&re_links, "(href|src)=[\"|'][^\"']*[\"|']", REG_EXTENDED);
while (fgets (line, sizeof line, fp_original) != NULL) {
if (regexec (&re_links, line, 2, group, 0) == 0) {
printf ("%s", line);
printf ("%u\n", line[group[1].rm_so]);
exit (1);
}
}
fclose (fp_original);

regmatch_t array
regmatch_t is the matcharray that you pass to the regex call. If we pass 2 as the number of matches in regex we obtain in regmatch_t[0] the whole match and in regmatch_t[1] the submatch.
For instance:
size_t nmatch = 2;
regmatch_t pmatch[2];
rc = regex(&re_links, line, nmatch, pmatch, 0);
If this succeeded you can get the subexpression as follows:
pmatch[1].rm_eo - pmatch[1].rm_so, &line[pmatch[1].rm_so],
pmatch[1].rm_so, pmatch[1].rm_eo - 1);
Here is an example on how to apply the above:
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
regex_t preg;
char *string = "I'm a link to somewhere";
char *pattern = ".*\\(link\\).*";
size_t nmatch = 2;
regmatch_t pmatch[2];
regcomp(&preg, pattern, 0);
regexec(&preg, string, nmatch, pmatch, 0);
printf("a matched substring \"%.*s\" is found at position %d to %d.\n",
pmatch[1].rm_eo - pmatch[1].rm_so, &string[pmatch[1].rm_so],
pmatch[1].rm_so, pmatch[1].rm_eo - 1);
regfree(&preg);
return 0;
}
Above code is certainly not save. It serves only as an example. If you exchange pmatch with your group it should work. Also don't forget to parenthesize the part of your regex you want to capture in your group --> \\(.*\\)
Edit
In order to avoid the warning by the compiler concerning the field precision, you can replace the whole printf part with this:
char *result;
result = (char*)malloc(pmatch[1].rm_eo - pmatch[1].rm_so);
strncpy(result, &string[pmatch[1].rm_so], pmatch[1].rm_eo - pmatch[1].rm_so);
printf("a matched substring \"%s\" is found at position %lld to %lld.\n",
result, pmatch[1].rm_so, pmatch[1].rm_eo - 1);
// later on ...
free(result);

the resulting match (your group) gives you a start index and an end index. you need to print just the items between those two indeces.
group[0] will be the entire regex match. the subsequent groups will be any captures you have in your regex.
for(int i = 0; i < re_links.re_nsub; ++i) {
printf("match %d from index %d to %d: ", i, group[i].rm_so, group[i].rm_eo);
for(int j = group[i].rm_so; j < group[i].rm_eo; ++j) {
printf("%c", line[j]);
}
printf("\n");
}
For a full example see my answer here.

Related

Count number of matches using regex.h in C

I'm using the POSIX regular expressions regex.h in C to count the number of appearances of a phrase in an English-language text fragment.
But the return value of regexec(...) only tells if a match was found or not. So I tried to use the nmatch and matchptr to find distinct appearances, but when I printed out the matches from matchptr, I just received the first index of first phrase appear in my text.
Here is my code:
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
#define MAX_MATCHES 20 //The maximum number of matches allowed in a single string
void match(regex_t *pexp, char *sz) {
regmatch_t matches[MAX_MATCHES];
if (regexec(pexp, sz, MAX_MATCHES, matches, 0) == 0) {
for(int i = 0; i < MAX_MATCHES; i++)
printf("\"%s\" matches characters %d - %d\n", sz, matches[i].rm_so, matches[i].rm_eo);
}
else {
printf("\"%s\" does not match\n", sz);
}
}
int main(int argc, char* argv[]) {
int rv;
regex_t exp;
rv = regcomp(&exp, "(the)", REG_EXTENDED | REG_ICASE);
if (rv != 0) {
printf("regcomp failed\n");
}
match(&exp, "the cat is in the bathroom.");
regfree(&exp);
return 0;
}
How can I make this code to report both of the two distinct matches of regular expression (the) in the string the cat is in the bathroom?
You've understood the meaning of pmatch incorrectly. It is not used for getting repeated pattern matches. It is used to get the location of the one match and its possible subgroups. As Linux manual for regcomp(3) says:
The offsets of the subexpression starting at the ith open
parenthesis are stored in pmatch[i]. The entire regular expression's match addresses are stored in
pmatch[0]. (Note that to return the offsets of N subexpression matches, nmatch must be at least N+1.)
Any unused structure elements will contain the value -1.
If you have the regular expression this (\w+) costs (\d+) USD, there are 2 capturing groups in parentheses (\w+) and (\d+); now if nmatch was set to at least 3, pmatch[0] would contain the start and end indices of the whole match, pmatch[1] start and end for the (\w+) group and pmatch[2] for the (\d+) group.
The following code should print the ranges of consecutive matches, if any, or the string "<the input string>" does not contain a match if the pattern never matches.
It is carefully constructed so that it works for a zero-length regular expression as well (an empty regular expression, or say regular expression #? will match at each character position including after the last character; 28 matches of that regular expression would be reported for input the cat is in the bathroom.)
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
#include <string.h>
void match(regex_t *pexp, char *sz) {
// we just need the whole string match in this example
regmatch_t whole_match;
// we store the eflags in a variable, so that we can make
// ^ match the first time, but not for subsequent regexecs
int eflags = 0;
int match = 0;
size_t offset = 0;
size_t length = strlen(sz);
while (regexec(pexp, sz + offset, 1, &whole_match, eflags) == 0) {
// do not let ^ match again.
eflags = REG_NOTBOL;
match = 1;
printf("range %zd - %zd matches\n",
offset + whole_match.rm_so,
offset + whole_match.rm_eo);
// increase the starting offset
offset += whole_match.rm_eo;
// a match can be a zero-length match, we must not fail
// to advance the pointer, or we'd have an infinite loop!
if (whole_match.rm_so == whole_match.rm_eo) {
offset += 1;
}
// break the loop if we've consumed all characters. Note
// that we run once for terminating null, to let
// a zero-length match occur at the end of the string.
if (offset > length) {
break;
}
}
if (! match) {
printf("\"%s\" does not contain a match\n", sz);
}
}
int main(int argc, char* argv[]) {
int rv;
regex_t exp;
rv = regcomp(&exp, "(the)", REG_EXTENDED | REG_ICASE);
if (rv != 0) {
printf("regcomp failed\n");
}
match(&exp, "the cat is in the bathroom.");
regfree(&exp);
return 0;
}
P.S., the parentheses in your regex (the) are unnecessary in this case; you could just write the (and your initial confusion of getting 2 matches at same position was because you'd get one match for (the) and one submatch for the, had you not have had these parentheses, your code would have printed the location of first match only once).

Search and replace within a file using PCRE in C

I want to parse a shell style key-value config file with C and replace values as needed.
An example file could look like
FOO="test"
SOME_KEY="some value here"
ANOTHER_KEY="here.we.go"
SOMETHING="0"
FOO_BAR_BAZ="2"
To find the value, I want to use regular expressions. I'm a beginner with the PCRE library so I created some code to test around. This application takes two arguments: the first one is the key to search for. The second one is the value to fill into the double quotes.
#include <pcre.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#define OVECCOUNT 30
int main(int argc, char **argv){
const char *error;
int erroffset;
pcre *re;
int rc;
int i;
int ovector[OVECCOUNT];
char regex[64];
sprintf(regex,"(?<=^%s=\\\").+(?<!\\\")", argv[1]);
char *str;
FILE *conf;
conf = fopen("test.conf", "rw");
fseek(conf, 0, SEEK_END);
int confSize = ftell(conf)+1;
rewind(conf);
str = malloc(confSize);
fread(str, 1, confSize, conf);
fclose(conf);
str[confSize-1] = '\n';
re = pcre_compile (
regex, /* the pattern */
PCRE_CASELESS | PCRE_MULTILINE, /* default options */
&error, /* for error message */
&erroffset, /* for error offset */
0); /* use default character tables */
if (!re) {
printf("pcre_compile failed (offset: %d), %s\n", erroffset, error);
return -1;
}
rc = pcre_exec (
re, /* the compiled pattern */
0, /* no extra data - pattern was not studied */
str, /* the string to match */
confSize, /* the length of the string */
0, /* start at offset 0 in the subject */
0, /* default options */
ovector, /* output vector for substring information */
OVECCOUNT); /* number of elements in the output vector */
if (rc < 0) {
switch (rc) {
case PCRE_ERROR_NOMATCH:
printf("String didn't match");
break;
default:
printf("Error while matching: %d\n", rc);
break;
}
free(re);
return -1;
}
for (i = 0; i < rc; i++) {
printf("========\nlength of vector: %d\nvector[0..1]: %d %d\nchars at start/end: %c %c\n", ovector[2*i+1] - ovector[2*i], ovector[0], ovector[1], str[ovector[0]], str[ovector[1]]);
printf("file content length is %d\n========\n", strlen(str));
}
int newContentLen = strlen(argv[2])+1;
char *newContent = calloc(newContentLen,1);
memcpy(newContent, argv[2], newContentLen);
char *before = malloc(ovector[0]);
memcpy(before, str, ovector[0]);
int afterLen = confSize-ovector[1];
char *after = malloc(afterLen);
memcpy(after, str+ovector[1],afterLen);
int newFileLen = newContentLen+ovector[0]+afterLen;
char *newFile = calloc(newFileLen,1);
sprintf(newFile,"%s%s%s", before,newContent, after);
printf("%s\n", newFile);
return 0;
}
This code is working in some cases but if I want to replace FOO or ANOTHER_KEY theres something fishy.
$ ./search_replace.out FOO baz
========
length of vector: 5
vector[0..1]: 5 10
chars at start/end: b "
file content length is 94
========
FOO="9#baz"
SOME_KEY="some value here"
ANOTHER_KEY="here.we.go"
SOMETHING="0"
FOO_BAR_BAZ="2"
$ ./search_replace.out ANOTHER_KEY insert
========
length of vector: 10
vector[0..1]: 52 62
chars at start/end: h "
file content length is 94
========
FOO="baaar"
SOME_KEY="some value here"
ANOTHER_KEY=")insert"
SOMETHING="0"
FOO_BAR_BAZ="2"
Now if I change the format of the input file slightly to
TEST="new inserted"
FOO="test"
SOME_KEY="some value here"
ANOTHER_KEY="here.we.go"
SOMETHING="0"
FOO_BAR_BAZ="2"
the code is working fine.
I don't get it why the code is behaves differently here.
The extra characters before the substituted text come from not properly null-terminating your before string. (Just as you hadn't null-terminated the whole buffer str, as Paul R has pointed out.) So:
char *before = malloc(ovector[0] + 1);
memcpy(before, str, ovector[0]);
before[ovector[0]] = '\0';
Anyway, the business of allocating substrings and copying the contents seems needlessly complicated and prone to errors. For example, do the somethingLen variables count the terminating null character or not? Sometimes they do, sometimes they don't. I'd recommend to pick one representation and use it consistently. (And you should really free all allocated buffers after no longer using them and probably also clean up the compiled regex.)
You could do the replacement with just one allocation for the target buffer by using the precision field of the %s format specifier on the "before" part:
int cutLen = ovector[1] - ovector[0];
int newFileLen = confSize + strlen(argv[2]) - cutLen;
char *newFile = malloc(newFileLen + 1);
snprintf(newFile, newFileLen + 1, "%.*s%s%s",
ovector[0], str, argv[2], str + ovector[1]);
Or you could just use fprintf to ther target file if you don't need the temporary buffer.
You forgot to terminate str, so subsequently calling strlen(str) will give unpredictable results. Either change:
str = malloc(confSize);
fread(str, 1, confSize, conf);
to:
str = malloc(confSize + 1); // note: extra char for '\0' terminator
fread(str, 1, confSize, conf);
str[confSize] = '\0'; // terminate string!
and/or pass confSize instead of strlen(str) to pcre_exec.
Your string is allocated confSize bytes of memory. Let's say that confSize is 10 as an example.
str = malloc(confSize);
So valid indexes for your string are 0-9. But this line assigns '\n' to the 10th index, which is the 11th byte:
str[confSize] = '\n';
If you're wanting the last character to be '\n', it should be:
str[confSize - 1] = '\n';

Executing RE Using regex.h in C. Match & Count any numbers in String

I am trying to extract numbers from a file. I have limitation that I need to use only open(), read() and close().
I have read my data successfully and saved in a buffer. No I need to match it to RE.
I am using RE = ^[0-9]*
This is my code for this
char buffer[1024] = { NULL };
int count = 0;
int fRead;
fRead = open("file.in", O_RDONLY);
read(fRead, buffer, sizeof(buffer));
printf("\nFile Opened %s, %lu", buffer, sizeof(buffer));
/* Compile regular expression */
regex_t re;
int reti = regcomp(&re, "^[1-9]*", 0);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
/* Execute regular expression */
reti = regexec(&re, buffer, 0, NULL, 0);
if (!reti) {
printf("\nMatch: %d", reti);
} else if (reti == REG_NOMATCH) {
puts("No match");
} else {
char msgbuf[100];
regerror(reti, &re, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
close(fRead);
Now the problem is I want to count and display the digits I found in my file.
For example my file may have text some thing 2 to 3 makes 5, in such case my out put must be
OUTPUT:
2,3,5
count = 3
Take a look at the man page for regexec. The return value of regexec is, as you are using it, 0 for success or a positive error code. However, the other parameters to regexec are how you get more information about matches.
For convenience, here's the definition of regexec:
int regexec(const regex_t *preg, const char *string, size_t nmatch,
regmatch_t pmatch[], int eflags);
The pmatch parameter is where the function puts its matches if it finds them, and the nmatch parameter tells the function how many elements the pmatch parameter has so it doesn't overflow. This works similarly to other languages "match" function where the first index of pmatch will have the full regex match while the following indexes will have subgroup matches. That means that you'll need to use a subgroup match to get the number out of the string, and then you'll need to loop over the string to find subsequent subgroup matches.
First, instantiate a regmatch_t stack variable to hold the results. This just needs to be size 2 so you can store the full match in the 0 index and the subgroup match in the 1 index. You also need to change you regex so it matches the whole string until it gets to a number. We will pass it through to the regexec function along with its size for nmatch.
Each time a match is found you will need to move the start of the string forward so that the next time you call regexec on it you will get the next number and not the same one.
First update the regex string.
/* if we use .* in the regex it will be greedy and match the last number, not the first.
We need to use a + instead of a * for the number so we know there is at least 1. */
int reti = regcomp(&re, "[^0-9]*([0-9]+)", REG_EXTENDED);
Then loop to find all the matches.
/* regmatch_t is defined in regex.h */
regmatch_t matches[2];
int start;
int end;
while (1) {
reti = regexec(&re, buffer, 2, matches, 0);
/* rm_so is the start index of the match */
start = matches[1].rm_so;
/* rm_eo is the end index of the match */
end = matches[1].rm_eo;
/* break if we didn't find a match */
if (reti) break;
/* print the substring that contains the match */
printf("%.*s, ", (end - start), (buffer + start));
/* increment the count of matches */
count = count + 1;
/* This is really important!
Move the start of the string forward so we don't keep matching the same number! */
buffer = buffer + end;
}
/* print the count */
printf("count = %d", count);

why regexec() in posix c always return the first match,how can it return all match positions only run once?

Now when I want to return all match positions in str, such as:
abcd123abcd123abcd
Suppose I want to get all "abcd", I must use regexec(),get the first position:0, 3, then I will use:
123abcd123abcd
as the new string to use regexec() again, and so on.
I read the manual about regexec(), it says:
int regexec(const regex_t *preg, const char *string, size_t nmatch,
regmatch_t pmatch[], int eflags);
nmatch and pmatch are used to provide information regarding the location of any
matches.
but why doesn't this work?
This is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <regex.h>
int main(int argc, char **argv)
{
int i = 0;
int res;
int len;
char result[BUFSIZ];
char err_buf[BUFSIZ];
char* src = argv[1];
const char* pattern = "\\<[^,;]+\\>";
regex_t preg;
regmatch_t pmatch[10];
if( (res = regcomp(&preg, pattern, REG_EXTENDED)) != 0)
{
regerror(res, &preg, err_buf, BUFSIZ);
printf("regcomp: %s\n", err_buf);
exit(res);
}
res = regexec(&preg, src, 10, pmatch, REG_NOTBOL);
//~ res = regexec(&preg, src, 10, pmatch, 0);
//~ res = regexec(&preg, src, 10, pmatch, REG_NOTEOL);
if(res == REG_NOMATCH)
{
printf("NO match\n");
exit(0);
}
for (i = 0; pmatch[i].rm_so != -1; i++)
{
len = pmatch[i].rm_eo - pmatch[i].rm_so;
memcpy(result, src + pmatch[i].rm_so, len);
result[len] = 0;
printf("num %d: '%s'\n", i, result);
}
regfree(&preg);
return 0;
}
./regex 'hello, world'
the output:
num 0: 'hello'
this is my respect outputs:
num 0: 'hello'
num 1: 'world'
regexec performs a regex match. Once a match has been found regexec will return zero (i.e. successful match). The parameter pmatch will contain information about that one match. The first array index (i.e. zero) will contain the entire match, subsequent array indices contain information about capture groups/sub-expressions.
To demonstrate:
const char* pattern = "(\\w+) (\\w+)";
matched on "hello world" will output:
num 0: 'hello world' - entire match
num 1: 'hello' - capture group 1
num 2: 'world' - capture group 2
(see it in action)
In most regex environments the behaviour you seek could have been gotten by using the global modifier: /g. Regexec does not provide this modifier as a flag nor does it support modifiers. You will therefore have to loop while regexec returns zero starting from the last character of the previous match to get all matches.
The global modifier is also not available using the PCRE library (famous regex C library). The PCRE man pages have this to say about it:
By calling pcre_exec() multiple times with appropriate arguments, you
can mimic Perl's /g option

C Regular Expressions: Extracting the Actual Matches

I am using regular expressions in C (using the "regex.h" library). After setting up the standard calls (and checks) for regcomp(...) and regexec(...), I can only manage to print the actual substrings that match my compiled regular expression.
Using regexec, according to the manual pages, means you store the substring matches in a structure known as "regmatch_t". The struct only contains rm_so and rm_eo to reference what I understand to be the addresses of the characters of the matched substring in memory, but my question is how can I just use these to offsets and two pointers to extract the actual substring and store it into an array (ideally a 2D array of strings)?
It works when you just print to standard out, but whenever you try to use the same setup but store it in a string/character array, it stores the entire string that was originally used to match against the expression.
Further, what is the "%.*s" inside the print statement? I imagine it's a regular expression in of itself to read in the pointers to a character array correctly. I just want to store the matched substrings inside a collection so I can work with them elsewhere in my software.
Background: p and p2 are both pointers set to point to the start of string to match before entering the while loop in the code below:
[EDIT: "matches" is a 2D array meant to ultimately store the substring matches and was preallocated/initalized before the main loop you see below]
int ind = 0;
while(1){
regExErr1 = regexec(&r, p, 10, m, 0);
//printf("Did match regular expr, value %i\n", regExErr1);
if( regExErr1 != 0 ){
fprintf(stderr, "No more matches with the inherent regular expression!\n");
break;
}
printf("What was found was: ");
int i = 0;
while(1){
if(m[i].rm_so == -1){
break;
}
int start = m[i].rm_so + (p - p2);
int finish = m[i].rm_eo + (p - p2);
strcpy(matches[ind], ("%.*s\n", (finish - start), p2 + start));
printf("Storing: %.*s", matches[ind]);
ind++;
printf("%.*s\n", (finish - start), p2 + start);
i++;
}
p += m[0].rm_eo; // this will move the pointer p to the end of last matched pattern and on to the start of a new one
}
printf("We have in [0]: %s\n", temp);
There are quite a lot of regular expression packages, but yours seems to match the one in POSIX: regcomp() etc.
The two structures it defines in <regex.h> are:
regex_t containing at least size_t re_nsub, the number of parenthesized subexpressions.
regmatch_t containing at least regoff_t rm_so, the byte offset from start of string to start of substring, and regoff_t rm_eo, the byte offset from start of string of the first character after the end of substring.
Note that 'offsets' are not pointers but indexes into the character array.
The execution function is:
int regexec(const regex_t *restrict preg, const char *restrict string,
size_t nmatch, regmatch_t pmatch[restrict], int eflags);
Your printing code should be:
for (int i = 0; i <= r.re_nsub; i++)
{
int start = m[i].rm_so;
int finish = m[i].rm_eo;
// strcpy(matches[ind], ("%.*s\n", (finish - start), p + start)); // Based on question
sprintf(matches[ind], "%.*s\n", (finish - start), p + start); // More plausible code
printf("Storing: %.*s\n", (finish - start), matches[ind]); // Print once
ind++;
printf("%.*s\n", (finish - start), p + start); // Why print twice?
}
Note that the code should be upgraded to ensure that the string copy (via sprintf()) does not overflow the target string — maybe by using snprintf() instead of sprintf(). It is also a good idea to mark the start and end of a string in the printing. For example:
printf("<<%.*s>>\n", (finish - start), p + start);
This makes it a whole heap easier to see spaces etc.
[In future, please attempt to provide an MCVE (Minimal, Complete, Verifiable Example) or SSCCE (Short, Self-Contained, Correct Example) so that people can help more easily.]
This is an SSCCE that I created, probably in response to another SO question in 2010. It is one of a number of programs I keep that I call 'vignettes'; little programs that show the essence of some feature (such as POSIX regexes, in this case). I find them useful as memory joggers.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <regex.h>
#define tofind "^DAEMONS=\\(([^)]*)\\)[ \t]*$"
int main(int argc, char **argv)
{
FILE *fp;
char line[1024];
int retval = 0;
regex_t re;
regmatch_t rm[2];
//this file has this line "DAEMONS=(sysklogd network sshd !netfs !crond)"
const char *filename = "/etc/rc.conf";
if (argc > 1)
filename = argv[1];
if (regcomp(&re, tofind, REG_EXTENDED) != 0)
{
fprintf(stderr, "Failed to compile regex '%s'\n", tofind);
return EXIT_FAILURE;
}
printf("Regex: %s\n", tofind);
printf("Number of captured expressions: %zu\n", re.re_nsub);
fp = fopen(filename, "r");
if (fp == 0)
{
fprintf(stderr, "Failed to open file %s (%d: %s)\n", filename, errno, strerror(errno));
return EXIT_FAILURE;
}
while ((fgets(line, 1024, fp)) != NULL)
{
line[strcspn(line, "\n")] = '\0';
if ((retval = regexec(&re, line, 2, rm, 0)) == 0)
{
printf("<<%s>>\n", line);
// Complete match
printf("Line: <<%.*s>>\n", (int)(rm[0].rm_eo - rm[0].rm_so), line + rm[0].rm_so);
// Match captured in (...) - the \( and \) match literal parenthesis
printf("Text: <<%.*s>>\n", (int)(rm[1].rm_eo - rm[1].rm_so), line + rm[1].rm_so);
char *src = line + rm[1].rm_so;
char *end = line + rm[1].rm_eo;
while (src < end)
{
size_t len = strcspn(src, " ");
if (src + len > end)
len = end - src;
printf("Name: <<%.*s>>\n", (int)len, src);
src += len;
src += strspn(src, " ");
}
}
}
return EXIT_SUCCESS;
}
This was designed to find a particular line starting DAEMONS= in a file /etc/rc.conf (but you can specify an alternative file name on the command line). You can adapt it to your purposes easily enough.
Since g++ regex is bugged until who knows when, you can use my code instead (License: AGPL, no warranty, your own risk, ...)
/**
* regexp (License: AGPL3 or higher)
* #param re extended POSIX regular expression
* #param nmatch maximum number of matches
* #param str string to match
* #return An array of char pointers. You have to free() the first element (string storage). the second element is the string matching the full regex, then come the submatches.
*/
char **regexp(char *re, int nmatch, char *str) {
char **result;
char *string;
regex_t regex;
regmatch_t *match;
int i;
match=malloc(nmatch*sizeof(*match));
if (!result) {
fprintf(stderr, "Out of memory !");
return NULL;
}
if (regcomp(&regex, re, REG_EXTENDED)!=0) {
fprintf(stderr, "Failed to compile regex '%s'\n", re);
return NULL;
}
string=strdup(str);
if (regexec(&regex,string,nmatch,match,0)) {
#ifdef DEBUG
fprintf(stderr, "String '%s' does not match regex '%s'\n",str,re);
#endif
free(string);
return NULL;
}
result=malloc(sizeof(*result));
if (!result) {
fprintf(stderr, "Out of memory !");
free(string);
return NULL;
}
for (i=0; i<nmatch; ++i) {
if (match[i].rm_so>=0) {
string[match[i].rm_eo]=0;
((char**)result)[i]=string+match[i].rm_so;
#ifdef DEBUG
printf("%s\n",string+match[i].rm_so);
#endif
} else {
((char**)result)[i]="";
}
}
result[0]=string;
return result;
}

Resources