Regular expression matching using regcomp() and regexec() functions in C - c

Am not familiar to use the regex library on C language. Currently am trying to use Regexec() and Regcomp() functions to search for a string that matches my pattern or regular expression. but i can*t generate my matched string. do i miss something on my code, or any fault usage with the functions?
my sample code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <regex.h>
int main(int argc, char ** argv)
{
regex_t r;
const char * my_regex = "(\\d+.\\d+.\\d+.\\d+)";
const char * my_string = "Am trying to match any ip like, 23.54.67.89 , in this string and 123.232.123.33 is possible";
const int no_of_matches = 10;
regmatch_t m[no_of_matches];
printf ("Trying to match '%s' in '%s'\n", my_regex, my_string);
int status = regcomp (&r, my_regex, REG_EXTENDED|REG_NEWLINE);
printf("status: %d\n",status);
if(status!=0)
{
printf ("Regex error compiling \n");
}
int match_size = regexec (&r, my_string, no_of_matches, m, 0);
printf("Number of Matches : %d\n",match_size);
int i = 0;
for (i = 0; i < match_size; i++)
{
//Now i wana print all matches here,
int start = m[i].rm_so;
int finish = m[i].rm_eo;
printf("%.*s\n", (finish - start), my_string + start);
}
regfree (& r);
return 0;
}
Here,to the problem: i can*t print my matches. any suggestion? am on linux.
I have edited my for loop, now it prints:
Trying to match '(\d+.\d+.\d+.\d+)' in 'Am trying to match any ip like, 23.54.67.89 , in this string and 123.232.123.33 is possible'
status: 0
Number of Matches : 1
m trying to match any ip like, 23.54.67.89 , in this string and 123.232.123.33 is possible
But am expecting my out put as:
Trying to match '(\d+.\d+.\d+.\d+)' in 'Am trying to match any ip like, 23.54.67.89 , in this string and 123.232.123.33 is possible'
status: 0
Number of Matches : 2
23.54.67.89
123.232.123.33

Your regular expression is not a POSIX regular expression. You're using Perl/Tcl/Vim flavour, which won't work like you hope it would.
regcomp() and regexec() are POSIX regular expressions, and as such, are part of POSIX-compliant (or just POSIX-y) C libraries. They are not just part of some regex library; these are the POSIX standard stuff.
In particular, POSIX regular expressions do not recognize \d, or any other backslash-character classes. You should use [[:digit:]] instead. (The character classes are enclosed in brackets, so to match any digit or lowercase letter you could use [[:digit:][:lower:]]. For anything except a control character, you could use [^[:cntrl:]].)
In general, you can check out the Character classes table in the Regular expressions Wikipedia article, which contains a concise summary of the equivalent classes with descriptions.
Do you need a locale-aware example to demonstrate this?

Related

Extracting numbers from the string using regex

I am trying to extract the number 4 and 3 from the string /ab/cd__my__sep__4__some__sep__3. I am trying with regex but not sure how would I do this. I wrote the following code, but it just prints out __my__sep__4__some__sep__3
#include <stdio.h>
#include <regex.h>
#include <string.h>
#include <stdlib.h>
int main() {
char* s = "/ab/cd__my__sep__4__some__sep__3";
regex_t regex;
int reti = regcomp(&regex,"__my__sep__([0-9]+)",REG_EXTENDED);
if(reti!=0) {
exit(-1);
}else {
regmatch_t match[2];
reti = regexec(&regex, s, 2, match, 0);
if(reti == 0) {
char *v = &s[match[1].rm_so];
ssize_t fl;
sscanf(v, "%zu", &fl);
printf("%s",v);
}else {
printf("else");
}
}
}
How could I extract the numbers 4 and 3 ?
match[0] refers to the part of the text matched by the entire pattern. match[1] is the match corresponding to the first capture (parenthesized subpattern).
Note that &s[match[1].rm_so] gives you a pointer to the start of the capture, but if you print the string at that point, you will get the part of the string starting at the beginning of the capture. In this case, that doesn't really matter. Since you're using sscanf to extract the integer value of the captured text, the fact that the substring isn't terminated immediately doesn't matter; it's not going to be followed by a digit, and sscanf will stop at the first non-digit.
But in the general case, it's possible that it will not be so easy to identify the end of the matched capture, and you can use one of these techniques:
If you want to print the capture, you can use a computed string width format: (See Note 1.)
printf("%.*s\n", match[1].rm_eo - match[1].rm_so, &s[match[1].rm_so]);
If you have strndup, you can easily create a dynamically-allocated copy of the capture: (See Note 2.)
char* capture = strndup(&s[match[1].rm_so], match[1].rm_eo - match[1].rm_so);
As a quick-and-dirty hack, it is also possible to just insert a NUL terminator (assuming that the searched string is not immutable, which means that it cannot be a string literal). You'll probably want to save the old value of the following character so that you can restore the string to it's original state:
char* capture = &s[match[1].rm_so];
char* rest = &s[match[1].rm_eo];
char saved_char = *rest;
*rest = 0;
/* capture now points to a NUL-terminated string. */
/* ... */
/* restore s */
*rest = saved_char;
None of the above is really necessary in the context of the original question, since the sscanf as written will work perfectly if you change the start of the string to scan from match[0] to match[1].
Notes:
In the general case, you should test to make sure that a capture was actually found before trying to use its offset. The rm_so member will be -1 if the capture was not found during the regex search That doesn't necessarily mean that the search failed, because the capture could be part of an alternative not used in the match.
Don't forget to free the copy when you no longer need it. If you don't have strndup, it's pretty easy to implement. But watch out for the corner cases.
Since you are using sscanf(), there is no need to use a regex. You can parse the two numbers from your string using sscanf() alone using the format string: "%*[^0-9]%d%*[^0-9]%d" where "%*[^0-9]" uses the assignment suppression '*' to read and discard all non-digit characters and then uses "%d" to extract the integer value. The full format-string just repeats those two patterns twice.
A short example using your input could be:
#include <stdio.h>
int main (void) {
char *s = "/ab/cd__my__sep__4__some__sep__3";
int a, b;
if (sscanf (s, "%*[^0-9]%d%*[^0-9]%d", &a, &b) == 2)
printf ("a: %d\nb: %d\n", a, b);
else {
fputs ("error: parse of integers failed.\n", stderr);
return 1;
}
}
Example Use/Output
$ ./bin/parse2ints
a: 4
b: 3
If you find yourself attempting to parse something that sscanf() cannot handle, then a regex is appropriate. Here, sscanf() is more than capable of handling your needs alone.
Create a regex format that only holds [0-9]. Then create a separate boolean function checking whether a character belongs or not to your regex. Then apply the function to your string. If true, add the character to the string you want to output

Detect substring in specific format

What is the simplest way to detect a substring in a specific format?
For example, consider the string in C
"[random characters/symbols] a-b-c [random characters/symbols]"
Is there a function in C that allows me to detect the substring in the format "%s-%s-%s"?
Try starting at various points within the string until success.
"%*[^- ] look for a sub-string that does not contain a '-' nor space.
"%n Record the offset in the scan.
#include<stdio.h>
int main(void) {
char *s = "[random characters/symbols] a-b-c [random characters/symbols]";
while (*s) {
int n = 0;
sscanf(s, "%*[^- ]-%*[^- ]-%*[^- ]%n", &n);
if (n) {
printf("Success '%.*s'\n", n, s);
break;
}
s++;
}
return 0;
}
Output
Success 'a-b-c'
Use strchr() or strnchr() if you have it to detect a literal string (no pattern matching). The function strnchr() is better because you can specify a max length to protect against a string with a missing null terminator; but, it is not ANSI so not all languages have it. If you use strchr() make sure you protect against a missing null terminator.
You can use regcomp() to do a regular expressions search the string.
See regex in C language using functions regcomp and regexec toggles between first and second match

How to check if input in valid - by comparing strings in C

I'm making a calc function which is meant to check if the input is valid. So, I'll have 2 strings, one with what the user inputs (eg, 3+2-1 or maybe dog - which will be invalid), and one with the ALLOWED characters stored in a string, eg '123456789/*-+.^' .
I'm not sure how can I do this and have trouble getting it started. I know a few functions such as STRMCP, and the popular ones from the string.h file, but I have no idea how to use them to check every input.
What is the most simplest way to do this?
One way of proceeding is the following.
A string is an array of ascii codes. So if your string is
char formula[50];
then you have a loop
int n =0;
while (formula[n]!=0)
{
if ( (formula[n]<........<<your code here>> ))
{printf("invalid entry\n\n"); return -1; //-1 = error code
n++;
}
you need to put the logic into the loop, but you can test the ascii codes of each character with this loop.
There may be a more elegant way of solving this, but this will work if you put the correct conditional statement here to check the ascii code of each character.
The while statement checks to see ifyou got to the end of the string.
Here's a demonstration of how use strpbrk() to check all characters in a string are in your chosen set:
#include <string.h>
#include <stdio.h>
const char alphabet[] = "123456789/*+-=.^";
int main(void) {
const char a[] = "3+2-1";
const char b[] = "dog";
char *res = strpbrk(a, alphabet);
printf("%s %s\n", a, (res) ? "true" : "false");
res = strpbrk(b, alphabet);
printf("%s %s\n", b, (res) ? "true" : "false");
return 0;
}
That's not the fastest way to do this, but it's very easy to use.
However, if you are writing a calculator function, you really want to parse the string at the same time. A typical strategy would be to have two types of entity - operators (+-/*^) and operands (numbers, so -0.1, .0002, 42, etc). You would extract these from the string as you parse it, and just fail if you hit an invalid character. (If you need to handle parentheses, you'll need a stack for the parsing.... and you'll likely need to work with a stack anyway to process and evaluate the expression overall.)

Parse a version number given as a string into 4 distinct integers using in c

I have a version number returned as a string which looks something like "6.4.12.9", four numbers, each separated by a "."
What I would like to do is to parse the string into 4 distinct integers. Giving me
int1 = 6
int2 = 4
int3 = 12
int4 = 9
I'd normally use a regex for this but that option isn't available to me using C.
You can use sscanf
int a,b,c,d;
const char *version = "1.6.3.1";
if(sscanf(version,"%d.%d.%d.%d",&a,&b,&c,&d) != 4) {
//error parsing
} else {
//ok, use the integers a,b,c,d
}
If you're on a POSIX system, and limiting yourself to POSIX is okay, you can use the POSIX standard regular expression library by doing:
#include <regex.h>
then read the relevant manual page for the API. I would not recommend a regexp-solution for this problem to begin with, but I wanted to point out for clarity that regular expressions are often available in C. Do note that this is not "standard C", so you can't use it everywhere, only on POSIX (i.e. "Unix-like") systems.
You could used strtok() for this (followed by strtol()), just make sure you're aware of the semantics of strtok(), they're slightly unusual.
You could also use sscanf().
One solution using strtoul.
int main (int argc, char const* argv[])
{
char ver[] = "6.4.12.9";
char *next = ver;
int v[4], i;
for(i = 0; i < 4; i++, next++)
v[i] = strtoul(next, &next, 10);
return 0;
}
You can use strtoul() to parse the string and get a pointer to the first non-numeric character. Another solution would be tokenizing the string using strtok() and then using strtoul() or atoi() to get an integer.
If none of them will exceed 255, inet_pton will parse it nicely for you. :-)

Extract integer from char buffer

I have a very simple problem in C. I am reading a file linewise, and store it in
a buffer
char line[80];
Each line has the following structure:
Timings results : 2215543
Timings results : 22155431
Timings results : 221554332
Timings results : 2215543
What I am trying to do, is to extract the integer value from this line. Does C here provide any simple function that allows me to do that?
Thanks
Can use sscanf per line, like:
#include <stdio.h>
int time = -1;
char* str = "Timings results : 120012";
int n = sscanf(str, "Timings results : %d", &time);
in this case n == 1 means success
Yes - try atoi
int n=atoi(str);
In your example, you have a fixed prefix before the integer, so you could simply add an offset to szLine before passing it to atoi, e.g.
int offset=strlen("Timings results : ");
int timing=atoi(szLine + offset);
Pretty efficient, but doesn't cope well with lines which aren't as expected. You could check each line first though:
const char * prefix="Timings results : ";
int offset=strlen(prefix);
char * start=strstr(szLine, prefix);
if (start)
{
int timing=atoi(start+offset);
//do whatever you need to do
}
else
{
//line didn't match
}
You can also use sscanf for parsing lines like this, which makes for more concise code:
int timing;
sscanf(szLine, "Timings results : %d", &timing);
Finally, see also Parsing Integer to String C for further ideas.

Resources