C regular expression to validate hostname, right expresion but wrong output [duplicate] - c

I'd like to match all strings that begin with a set of characters a-z, then exactly one : and another set of characters a-z right after that.
So as an example, the string "an:example" would be a correct match.
And another example, "another:ex:ample" needs to be a mismatch.
I have tried to set it up like that but it matches everything, even if i take bad string as input :(
So my regular expression is "[a-z]:[a-z]" but it evaluates the string "1an:example" as a Match :/
How can I do this correctly?
#include <stdio.h>
#include <regex.h>
int main() {
regex_t regex;
int retis;
char* str = "1an:example";
retis = regcomp(&regex, "[a-z]:[a-z]", 0);
retis = regexec(&regex, str, 0, NULL, 0);
if(!retis) {
puts("Match");
}
else if(retis == REG_NOMATCH) {
puts("No match");
}
regfree(&regex);
return 0;
}

You need
retis = regcomp(&regex, "^[a-z]+:[a-z]+$", REG_EXTENDED);
See the C online demo.
That is:
^ (start of string) and $ (end of string) are anchors that require the regex to match the whole string
[a-z]+ matches one or more lowercase letters
REG_EXTENDED allows extended regex syntax, e.g. in regex.h it is required to enable the $ anchor.

Related

Issue with setting a correct regular expression in C using regex

I'd like to match all strings that begin with a set of characters a-z, then exactly one : and another set of characters a-z right after that.
So as an example, the string "an:example" would be a correct match.
And another example, "another:ex:ample" needs to be a mismatch.
I have tried to set it up like that but it matches everything, even if i take bad string as input :(
So my regular expression is "[a-z]:[a-z]" but it evaluates the string "1an:example" as a Match :/
How can I do this correctly?
#include <stdio.h>
#include <regex.h>
int main() {
regex_t regex;
int retis;
char* str = "1an:example";
retis = regcomp(&regex, "[a-z]:[a-z]", 0);
retis = regexec(&regex, str, 0, NULL, 0);
if(!retis) {
puts("Match");
}
else if(retis == REG_NOMATCH) {
puts("No match");
}
regfree(&regex);
return 0;
}
You need
retis = regcomp(&regex, "^[a-z]+:[a-z]+$", REG_EXTENDED);
See the C online demo.
That is:
^ (start of string) and $ (end of string) are anchors that require the regex to match the whole string
[a-z]+ matches one or more lowercase letters
REG_EXTENDED allows extended regex syntax, e.g. in regex.h it is required to enable the $ anchor.

In C, How to get capturing group RegEx?

This is the C function I am having problems with:
char get_access_token(char *client_credentials)
{
regex_t regex;
int reti;
char msgbuf[100];
reti = regcomp(&regex, "\\\"access_token\\\".\\\"(.*?)\\\"", 0);
regmatch_t pmatch[1];
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
reti = regexec(&regex, client_credentials, 1, pmatch, 0);
if (!reti) {
puts("Match");
} else if (reti == REG_NOMATCH) {
puts("No match");
} else {
regerror(reti, &regex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
return (char) "";
}
The string that I'm trying to parse is a JSON string, I don't care about the actual structure I only care about the access token.
It should look like this:
{"access_token": "blablablabal"}
I want my function to return just "blablablabla"
The RegEx that I'm trying to use is this one:
\"access_token"."(.*?)"
but I can't find that in the variable pmatch, I only find two numbers in that array, I don't really know what those numbers mean.
What am I doing wrong?
P.S. I'm a C noob, I'm just learning.
There's several problems. You have typos in your regex. And you're trying to use extended regex features with a POSIX regex.
First the typos.
reti = regcomp(&regex, "\\\"access_token\\\".\\\"(.*?)\\\"", 0);
^
That should be:
reti = regcomp(&regex, "\\\"access_token\\\": \\\"(.*?)\\\"", 0);
Then we don't need to escape quotes in regexes. That makes it easier to read.
reti = regcomp(&regex, "\"access_token\": \"(.*?)\"", 0);
This still doesn't work because it's using features that basic POSIX regexes do not have. Capture groups must be escaped in a basic POSIX regex. This can be fixed by using REG_EXTENDED. The *? non-greedy operators is an enhanced non-POSIX feature borrowed from Perl. You get them with REG_ENHANCED.
reti = regcomp(&regex, "\"access_token\": \"(.*?)\"", REG_ENHANCED|REG_EXTENDED);
But don't try to parse JSON with a regex for all the same reasons we don't parse HTML with a regex. Use a JSON library such as json-glib.
Well, your pmatch array must have at least two elements, as you probably know, group 0 is the whole matching regexp, and it is filled for the whole regexp (like if all the regular expression were rounded by a pair of parenthesis) you want group 1, so pmatch[1] will be filled with the information of the first subexpression group.
If you look in the doc, the pmatch element has two fields that index the beginning index in the original buffer where the group was matched, and the one past the last index of the place in the string where the group ends. These field names are rm_so and rm_eo, and like the ones in pmatch[0], they indicate the index at where the regular (sub)expression begins and ends, resp.
You can print the matched elements with (once you know that they are valid, see doc) with:
#define SIZEOF(arr) (sizeof arr / sizeof arr[0])
...
regmatch_t pmatch[2]; /* for global regexp and group 1 */
...
/* you don't need to escape " chars, they are not special for regcomp,
* they do, however, for C, so only one \ must be used. */
res = regcomp(&regex, "\"access_token\".\"([^)]*)\"", 0);
...
reti = regexec(&regex, client_credentials, SIZEOF(pmatch), pmatch, 0);
for (i = 0; i < regex.re_nsub; i++) {
char *p = client_credentials + pmatch[i].rm_so; /* p points to beginning of match */
size_t l = pmatch[i].rm_eo - pmatch[i].rm_so; /* match length */
printf("Group #%d: %0.*s\n", i, l, p);
}
My apologies for submitting a snippet of code instead of a verifiable and complete example, but as you didn't do it in the question (so we could not test your sample code) I won't do in the answer. So, the code is not tested, and can have errors on my side. Beware of this.
Testing a sample response requires time, worse if we have first to make your sample code testable at all. (this is a complaint about the beginners ---and some nonbeginners--- use of not posting Minimal, Complete, and Verifiable example).

Regex in C For Matching

I need to make a regex that can match any alphanumeric string of a length < 99 enclosed by two #. The first character after the '#' can also be '_' which I'm not sure how to account for.
Ex. #U001# would be valid. #_A111# would also be valid. However, #____ABC# would not be valid, and neither would #ABC.
I'm relatively new to regex and noticed that the \z is an unrecognized escape sequence. I'm trying to write it in C11 if that matters.
#include <regex.h>
regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression */
reti = regcomp(&regex, "^#[[:alnum:]]#\z", 0);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
Try using the following pattern:
^#[_[:alnum:]][:alnum:]{0,97}#
Here is a brief explanation of the pattern
^ from the start of the string
# match #
[_[:alnum:]] match underscore or alpha
[:alnum:]{0,97} then match zero to 97 alpha
# match #
Code:
reti = regcomp(&regex, "^#[_[:alnum:]][:alnum:]{0,97}#", 0);

pattern matching / extracting in c using regex.h

I need help extracting a substring from a string using regex.h in C.
In this example, I am trying to extract all occurrences of character 'e' from a string 'telephone'. Unfortunately, I get stuck identifying the offsets of those characters. I am listing code below:
#include <stdio.h>
#include <regex.h>
int main(void) {
const int size=10;
regex_t regex;
regmatch_t matchStruct[size];
char pattern[] = "(e)";
char str[] = "telephone";
int failure = regcomp(&regex, pattern, REG_EXTENDED);
if (failure) {
printf("Cannot compile");
}
int matchFailure = regexec(&regex, pattern, size, matchStruct, 0);
if (!matchFailure) {
printf("\nMatch!!");
} else {
printf("NO Match!!");
}
return 0;
}
So per GNU's manual, I should get all of the occurrences of 'e' when a character is parenthesized. However, I always get only the first occurrence.
Essentially, I want to be able to see something like:
matchStruct[1].rm_so = 1;
matchStruct[1].rm_so = 2;
matchStruct[2].rm_so = 4;
matchStruct[2].rm_so = 5;
matchStruct[3].rm_so = 7;
matchStruct[3].rm_so = 8;
or something along these lines. Any advice?
Please note that you are in fact not comparing your compiled regex against str ("telephone") but rather to your plain-text pattern. Check your second attribute to regexec. That fixed, proceed for instance to "regex in C language using functions regcomp and regexec toggles between first and second match" where the answer to your question is already given.

parsing/matching string occurrence in C

I have the following string:
const char *str = "\"This is just some random text\" 130 28194 \"Some other string\" \"String 3\""
I would like to get the the integer 28194 of course the integer varies, so I can't do strstr("20194").
So I was wondering what would be a good way to get that part of the string?
I was thinking to use #include <regex.h> which I already have a procedure to match regexp's but not sure how the regexp in C will look like using the POSIX style notation. [:alpha:]+[:digit:] and if performance will be an issue. Or will it be better using strchr,strstr?
Any ideas will be appreciate it
If you want to use regex, you can use:
const char *str = "\"This is just some random text\" 130 28194 \"Some other string\" \"String 3\"";
regex_t re;
regmatch_t matches[2];
int comp_ret = regcomp(&re, "([[:digit:]]+) \"", REG_EXTENDED);
if(comp_ret)
{
// Error occured. See regex.h
}
if(!regexec(&re, str, 2, matches, 0))
{
long long result = strtoll(str + matches[1].rm_so, NULL, 10);
printf("%lld\n", result);
}
else
{
// Didn't match
}
regfree(&re);
You're correct that there are other approaches.
EDIT: Changed to use non-optional repetition and show more error checking.

Resources