I've been trying to use regular expressions (<regex.h>) in a C project I am developing.
According to regex101 the regex it is well written and identifies what I'm trying to identify but it doesn't work when I try to run it in C.
#include <stdio.h>
#include <regex.h>
int main() {
char pattern[] = "#include.*";
char line[] = "#include <stdio.h>";
regex_t string;
int regex_return = -1;
regex_return = regcomp(&string, line, 0);
regex_return += regexec(&string, pattern, 0, NULL, 0);
printf("%d", regex_return);
return 0;
}
This is a sample code I wrote to test the expression when I found out it didn't work.
It prints 1, when I expected 0.
It prints 0 if I change the line to "#include", which is just strange to me, because it's ignoring the .* at the end.
line and pattern are swapped.
regcomp takes the pattern and regexec takes the string to check.
Related
Please consider this C code:
#include <stdio.h>
#include <regex.h>
#include <string.h>
int main(){
char * our_string = "/var/www/html/cameras/cam7/2020-01/15/cam7-2020-01-15-17-45-20-1037-03.h264";
regex_t re;
//int regex_int = regcomp(&re, "cam[:digit:]", 0);
int regex_int = regcomp(&re, "cam", 0);
if (regex_int) {
fprintf(stderr, "regex failed to compile!");
return 1;
}
regmatch_t rm[2];
if ((regexec(&re, our_string, 2, rm,0)) ){
fprintf(stderr, "regex failed to exec!");
return 1;
}
char temp[8192] = {0};
memcpy(temp, our_string + rm[1].rm_so, rm[1].rm_eo - rm[1].rm_so);
printf("We got: %s\n", temp);
puts("Bye!");
return 0;
}
I am trying to extract camX out of our_string, and need help. In its current form, above code is turning blank:
$ ./a.out
We got:
Bye!
C regex is not my forte, Please help!
You have a couple of problems:
//int regex_int = regcomp(&re, "cam[:digit:]", 0)
If you want to match cam followed by a digit, you need (Besides uncommenting this line, of course, and commenting out the one beneath it), to put [:digit:] inside a bracket expression:
int regex_int = regcomp(&re, "cam[[:digit:]]", 0)
The second issue:
memcpy(temp, our_string + rm[1].rm_so, rm[1].rm_eo - rm[1].rm_so);
Neither of your regular expressions have any groups; the second element of the rm array is not going to have anything useful in it. You need to use the first element, which has the offsets of the complete match:
memcpy(temp, our_string + rm[0].rm_so, rm[0].rm_eo - rm[0].rm_so);
You also have a memory leak because you don't have a regfree(&re); to free up memory allocated for the regular expression. Not a big deal in a simple demo program like this, but in something bigger or longer running or that does the matching in a loop, it'll become an issue.
I have a regex that I have tested with https://regexr.com/, where it works correctly. But in c it doesn't find any match.
My code is below; I have removed everything unnecessary.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
int main ()
{
char * str = "<sql db=../serverTcp/Testing.db query=SELECT * From BuyMarsians;\>";
char * regex = "<sql\s+db=(.+)\s+query=(.+;)\s*\\>";
regex_t regexCompiled;
if (regcomp(®exCompiled,regex,REG_EXTENDED))
{
printf("Could not compile regular expression.\n");
fflush(stdout);
};
if (!regexec(®exCompiled,str, 0, NULL, 0)) {
printf("matched");
fflush(stdout);
}
regfree(®exCompiled);
return 0;
}
You need to escape your backslashes. Change
char * regex = "<sql\s+db=(.+)\s+query=(.+;)\s*\\>";
to
char * regex = "<sql\\s+db=(.+)\\s+query=(.+;)\\s*\\\\>";
Note that this is extremely inefficient. A much more efficient regex uses non-greedy quantification, with ?:
<sql\s+db=(.+?)\s+query=(.+;)\s*\\>
// ^ key change
That becomes:
char * regex = "<sql\\s+db=(.+?)\\s+query=(.+;)\\s*\\\\>";
Also note: Your string to be matched also includes \. You need to escape it there, too:
char * str = "<sql db=../serverTcp/Testing.db query=SELECT * From BuyMarsians;\\>";
Here's a working demo of your corrected code.
I'm writing a C program that uses a regular expressions to determine if certain words from a text that are being read from a file are valid or invalid. I've a attached the code that does my regular expression check. I used an online regex checker and based off of that it says my regex is correct. I'm not sure why else it would be wrong.
The regex should accept a string in either the format of AB1234 or ABC1234 ABCD1234.
//compile the regular expression
reti1 = regcomp(®ex1, "[A-Z]{2,4}\\d{4}", 0);
// does the actual regex test
status = regexec(®ex1,inputString,(size_t)0,NULL,0);
if (status==0)
printf("Matched (0 => Yes): %d\n\n",status);
else
printf(">>>NO MATCH<< \n\n");
You are using POSIX regular expressions, from regex.h. These don't support the syntax you are using, which is PCRE format, and is much more common these days. You are better off trying to use a library that will give you PCRE support. If you have to use POSIX expressions, I think this will work:
#include <regex.h>
#include "stdio.h"
int main(void) {
int status;
int reti1;
regex_t regex1;
char * inputString = "ABCD1234";
//compile the regular expression
reti1 = regcomp(®ex1, "^[[:upper:]]{2,4}[[:digit:]]{4}$", REG_EXTENDED);
// does the actual regex test
status = regexec(®ex1,inputString,(size_t)0,NULL,0);
if (status==0)
printf("Matched (0 => Yes): %d\n\n",status);
else
printf(">>>NO MATCH<< \n\n");
regfree (®ex1);
return 0;
}
(Note that my C is extremely rusty, so this code is probably horrible.)
I found some good resources on this answer.
I need help extracting a substring from a string using regex.h in C.
In this example, I am trying to extract all occurrences of character 'e' from a string 'telephone'. Unfortunately, I get stuck identifying the offsets of those characters. I am listing code below:
#include <stdio.h>
#include <regex.h>
int main(void) {
const int size=10;
regex_t regex;
regmatch_t matchStruct[size];
char pattern[] = "(e)";
char str[] = "telephone";
int failure = regcomp(®ex, pattern, REG_EXTENDED);
if (failure) {
printf("Cannot compile");
}
int matchFailure = regexec(®ex, pattern, size, matchStruct, 0);
if (!matchFailure) {
printf("\nMatch!!");
} else {
printf("NO Match!!");
}
return 0;
}
So per GNU's manual, I should get all of the occurrences of 'e' when a character is parenthesized. However, I always get only the first occurrence.
Essentially, I want to be able to see something like:
matchStruct[1].rm_so = 1;
matchStruct[1].rm_so = 2;
matchStruct[2].rm_so = 4;
matchStruct[2].rm_so = 5;
matchStruct[3].rm_so = 7;
matchStruct[3].rm_so = 8;
or something along these lines. Any advice?
Please note that you are in fact not comparing your compiled regex against str ("telephone") but rather to your plain-text pattern. Check your second attribute to regexec. That fixed, proceed for instance to "regex in C language using functions regcomp and regexec toggles between first and second match" where the answer to your question is already given.
I have written a simple lex scanner in the file myscanner.l, where testlex.h is just a bunch of #defines as integers (MATCH_0 == 0, etc)
%{
#include "testlex.h"
%}
%%
"dinky" return MATCH_0;
"pinky" return MATCH_1;
"stinky" return MATCH_2;
[ \t\n] ;
. printf("unexpected character\n");
%%
int yywrap(void)
{
return 1;
}
After using lex to create the lex.yy.c file, I implement the code using this C file
#include <stdio.h>
#include "myscanner.h"
extern int yylex();
extern int yylineno;
extern char* yytext;
int main(void)
{
int l = yylex();
while (l)
{
printf("%d\n", l);
l = yylex();
}
return 0;
}
When I pass it this input stream: dinky pinky stinky stinky pinky dinky, there is absolutely no output. The output I am expecting looks like this:
0
1
2
2
1
0
Not even "unexpected character". I know my stack is set up right because I've compiled others' examples and they all scan correctly, but for some inconceivable reason my code _will_not_scan_!
What am I missing?
Looking at your expected output, what you see is the simple result of defining "dinky" -> MATCH_0 as 0.
The first value of l now becomes 0, after having scanned dinky. So while(l) is while(0) and the block is not even executed once. Subsequently your main immediately returns 0.
So don't define any tokens as 0, and then write:
int main(void)
{
int token;
while (token = yylex())
{
printf("%d\n", token);
}
return 0;
}
To be honest I'm surprised you did not find this yourself. Simply trying other input would immediately have giving a clue. And, it should be easy to find that yylex() returns 0 at EOF.
BTW, I think it's better to not use l as variable name as it's almost the same as 1.
The reason why your code does not print anything is that your first input happens to be "dinky", which returns MATCH_0. According to your expected output, MATCH_0 is zero. Therefore, the code will exit right away, before entering the loop even once.
Re-defining MATCH_0 to 1, MATCH_1 to 2, and so on will fix this problem.