find series of number in C linux using regular expression - c

I used the code below that uses the "{}" but seems its not working as expected when using C.
int basic_regx(char *format, char *name)
{
regex_t regex;
char array[100];
if( regcomp(&regex, format, 0) )
return;
if( !regexec(&regex, name, 0, NULL, 0) )
printf ("Succeeded\n");
else
printf ("Not Succeeded\n");
regfree(&regex);
}
If I call function with follwoing:
Success - basic_regx("^[0-9]$","0");
Not Success -- basic_regx("^[0-9]{1,4}$","0");
Success - basic_regx("^[0-9]{1,4}$","0{1,4}");
Thats mean the {} is not taken as expected by the reg implementation.

regcomp and regexec use POSIX regular expressions; in other words, it only supports features described in POSIX. If you want Perl-style regular expressions, which support this expression, you may need an external library such as PCRE.
However, with POSIX regular expressions, you can do the equivalent:
basic_regx("^[0-9]\\{1,4\\}$","0");

Related

How to use Regex to verify data input from keyboard is real numbers with C languages?

I try to research for REGEX in C and try to understand but I have trouble with pattern of the string type.
In this program I want to verify string input is a number(only digits number, not characters, space, or special characters)
#include<stdio.h>
#include <regex.h>
void print_result(int return_value){
if (return_value == 0){
printf("Pattern found.\n");
}
else if (return_value == REG_NOMATCH){
printf("Pattern not found.\n");
}
else{
printf("An error occured.\n");
}
}
int main() {
regex_t regex;
int return_value;
int return_value2;
return_value = regcomp(&regex,"[^a-fA-F_][0-9]+",0);
return_value = regexec(&regex, "4324", 0, NULL, 0);
return_value2 = regcomp(&regex,"\d+",0);
return_value = regexec(&regex, "4324", 0, NULL, 0);
print_result(return_value); //not found
print_result(return_value); //no found
print_result(return_value2);
return 0;
}
Can you give me some ideas to verify the input. I want find another way without use ASCII values
If you specify the flags as 0 in regcomp:
return_value = regcomp(&regex,"[^a-fA-F_][0-9]+",0);
then you are accepting the default regex syntax, which is a so-called Basic Regular Expression (BRE). The only sensible thing that can be said about BREs is "don't use them." Always specify the REG_EXTENDED flag (at least), and then you will be working with a regular expression syntax that at least bears a passing resemblance to what you expect. (Otherwise, your strings will be dominated by what's technically called "leaning timber": \ characters which enable metacharacters in the regex, and more \ characters so that the \ characters you need are not treated as escape characters in the character string.)
Take a look at man regexec and man 7 regex for more details. Make sure you read the second link thoroughly (although you can ignore basic regular expression syntax :-) ) because there are many commonly-used syntaxes in more modern regex libraries which are not present in Posix regexes, not even extended ones. (That includes \d, used in your second regex. Posix has named character classes, such as [[:digit:]].)

Provisioning code regular expression

I'm trying to develop a C code that checks the validity of "Provisioning code" string using regular expressions.
A "provisioning code" format should respect the following law:
If not an empty string, this argument SHOULD be in the form of a hierarchical descriptor with one or more nodes specified. Each node in the hierarchy is represented as a 4-character sub-string, containing only numerals or upper-case letters. If there is more than one node indicated, each node is separated by a "." (dot). Examples: "TLCO" or "TLCO.GRP2".
I started development using the code in this link http://web.archive.org/web/20160308115653/http://peope.net/old/regex.html
#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
int main(int argc, char *argv[]){
regex_t regex;
int reti;
char msgbuf[100];
/* Compile regular expression */
reti = regcomp(&regex, "^a[[:alnum:]]", 0);
if( reti ){ fprintf(stderr, "Could not compile regex\n"); exit(1); }
/* Execute regular expression */
reti = regexec(&regex, "abc", 0, NULL, 0);
if( !reti ){
puts("Match");
}
else if( reti == REG_NOMATCH ){
puts("No match");
}
else{
regerror(reti, &regex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}
/* Free compiled regular expression if you want to use the regex_t again */
regfree(&regex);
return 0;
}
this code works fine but my problem is what's the best regular expression that should be input of the function regcomp.
I started to try with a regular expression that matches with string that contains exactly 4 characters uppercases or numerals that means example like TLCO or TLC2 trying with the regular expression "[A-Z0-9]{4}" but I get "No match" as output with matches examples like TLC2.
Is there a suggestion about the right regular expression that should be input of regcomp and matches with "provisioning code"?
You may use the following regex that will work alright if you also pass REG_EXTENDED flag to the regcomp method (for the $ and {m,n} modifier to work correctly):
^[A-Z0-9]{4}([.][A-Z0-9]{4})*$
C code:
reti = regcomp(&regex, "^[A-Z0-9]{4}([.][A-Z0-9]{4})*$", REG_EXTENDED);
Details
^ - start of string
[A-Z0-9]{4} - 4 uppercase ASCII letters or digits
([.][A-Z0-9]{4})* - zero or more sequences of:
[.] - a literal . char
[A-Z0-9]{4} - 4 uppercase ASCII letters or digits
$ - end of string.

regexec in C does not match when \b is used in the expression

I am trying to use regular expressions in my C code to find a string in each line of a text file that I am reading and \b boundary seems like it does not work. That string can not be a part of a bigger string.
After that failure I also tried some hand-written boundary expression in the following and could not make it work in my code as well (source here):
(?i)(?<=^|[^a-z])MYWORDHERE(?=$|[^a-z])
But when I try something simple like a as the regular expression, it finds what is expected.
Here is my shortened snippet:
#include <regex.h>
void readFromFile(char arr[], char * wordToSearch) {
regex_t regex;
int regexi;
char regexStr [100];
strcpy(regexStr, "\\b(");
strcat(regexStr, wordToSearch);
strcat(regexStr, ")\\b");
regexi = regcomp(&regex, regexStr, 0);
printf("regexi while compiling: %d\n", regexi);
if (regexi) {
fprintf(stderr, "compile error\n");
}
FILE* file = fopen(arr, "r");
char line[256];
while (fgets(line, sizeof(line), file)) {
regexi = regexec(&regex, line, 0, NULL, 0);
printf("%s\n", line);
printf("regexi while execing: %d\n", regexi);
if (!regexi) {
printf("there is a match.");
}
}
fclose(file);
}
In the regcomp function, I also tried to pass the REG_EXTENDED as the flag and it also did not work.
The regular expressions supported by POSIX are documented in the Linux regex(7) manual page and re_format(7) for MacOS X.
Unfortunately the POSIX standard regular expressions (which come in 2 standard flavours: obsolete basic, and the REG_EXTENDED) support neither \b nor any of the (?...) formats, both of which I believe originated in Perl.
Mac OS X (and possibly other BSD derived systems) additionally has the REG_ENHANCED format, which is not portable.
Your best choice would be to use some other regular expression library such as PCRE. While the word boundaries themselves are a regular language, the use of capturing groups make this harder, as POSIX doesn't even support non-capturing grouping, otherwise you could use something like (^|[^[:alpha:])(.*)($|[^[:alpha:]]*) but it surely would get really messy.

Unable to match regex in C

I have a problem with following regex:
prefix:\w+,\w+,\s*-?[0-9]{1,4}\s*,\s*-?[0-9]{1,4}\s*,\s*-?[0-9]{1,4}\s*,(?:\w+)
The match string is the following:
prefix:string,string,-100,100,0,string
I cannot match this string in my C code. At least I get a successful match on some online tool where I generated this regex. Also there were compilation warnings about "\" characters in regex so I replaced it as explained in c-compiler-warning-unknown-escape-sequence-using-regex-for-c-program. Regex after compilation warning fixes:
prefix:\\w+,\\w+,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,(?:\\w+)
Here's the testing code:
#include <stdio.h>
#include <regex.h>
#include <stdlib.h>
#define REGEX "prefix:\\w+,\\w+,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,(?:\\w+)"
const char *input = "prefix:string,string,-100,100,0,string";
int main(){
int rc;
regex_t regex;
rc = regcomp(&regex, REGEX, 0);
if (rc != 0) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}
rc = regexec(&regex, input, 0, NULL, 0);
if (rc == 0) {
printf("Match!\n");
return 0;
}
else if (rc == REG_NOMATCH) {
printf("No match\n");
return -1;
}
else {
perror("Error\n");
exit(1);
}
return 0;
}
I use gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12)
You need to do two things:
Use REG_EXTENDED flag to compile the regex (so that extended regular expression flavor (ERE) is enabled), else the limiting quantifier will need escaping, and there may be other potential issues
Remove the non-capturing group (?:...) as POSIX does not support this construct.
To make \w+ optional, just replace (?:\w+) with \w*.
Use
#define REGEX "prefix:\\w+,\\w+,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,\\s*-?[0-9]{1,4}\\s*,\\w*"
^^^
and then
rc = regcomp(&regex, REGEX, REG_EXTENDED);
^^^^^^^^^^^^
See the C demo.
Also, see more about the Extended Regular Expressions that are enabled by REG_EXTENDED, and POSIX Bracket Expressions is also a good resource to learn differences between BRE (Basic Regular Expressions) and ERE flavors.

How to use regular expressions in C?

I need to write a little program in C that parses a string. I wanted to use regular expressions since I've been using them for years, but I have no idea how to do that in C. I can't find any straight forward examples (i.e., "use this library", "this is the methodology").
Can someone give me a simple example?
You can use PCRE:
The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. PCRE has its own native API, as well as a set of wrapper functions that correspond to the POSIX regular expression API. The PCRE library is free, even for building commercial software.
See pcredemo.c for a PCRE example.
If you cannot use PCRE, POSIX regular expression support is probably available on your system (as #tinkertim pointed out). For Windows, you can use the gnuwin Regex for Windows package.
The regcomp documentation includes the following example:
#include <regex.h>
/*
* Match string against the extended regular expression in
* pattern, treating errors as no match.
*
* Return 1 for match, 0 for no match.
*/
int
match(const char *string, char *pattern)
{
int status;
regex_t re;
if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) {
return(0); /* Report error. */
}
status = regexec(&re, string, (size_t) 0, NULL, 0);
regfree(&re);
if (status != 0) {
return(0); /* Report error. */
}
return(1);
}
If forced into POSIX only (no pcre), here's a tidbit of fall back:
#include <regex.h>
#include <stdbool.h>
bool reg_matches(const char *str, const char *pattern)
{
regex_t re;
int ret;
if (regcomp(&re, pattern, REG_EXTENDED) != 0)
return false;
ret = regexec(&re, str, (size_t) 0, NULL, 0);
regfree(&re);
if (ret == 0)
return true;
return false;
}
You might call it like this:
int main(void)
{
static const char *pattern = "/foo/[0-9]+$";
/* Going to return 1 always, since pattern wants the last part of the
* path to be an unsigned integer */
if (! reg_matches("/foo/abc", pattern))
return 1;
return 0;
}
I highly recommend making use of PCRE if its available. But, its nice to check for it and have some sort of fall back.
I pulled the snippets from a project currently in my editor. Its just a very basic example, but gives you types and functions to look up should you need them. This answer more or less augments Sinan's answer.
Another option besides a native C library is to use an interface to another language like Python or Perl. Not having to deal with C's string handling, and the better language support for regex's should make things much easier for you. You can also use a tool like SWIG to generate wrappers for calling the code from C.

Resources