Not matches for working regex in c - c

I want to match the regex (?<=SEARCH_THIS=").+(?<!"\n) in C with PCRE.
However, the following code doesn't work as expected.
#include <pcreposix.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(void){
regex_t re;
regmatch_t matches[2];
char *regex = "(?<=SEARCH_THIS=\").+(?<!\"\n)";
char *file = "NO_MATCH=\"0\"\nSOMETHING_ELSE=\"1\"\nSOME_STUFF=\"1\"\nSEARCH_THIS=\"gimme that\"\nNOT_THIS=\"foobar\"\nTHIS_NEITHER=\"test\"\n";
puts("compiling regex");
int compErr = regcomp(&re, regex, REG_NOSUB | REG_EXTENDED);
if(compErr != 0){
char buffer[128];
regerror(compErr, &re, buffer, 100);
printf("regcomp failed: %s\n", buffer);
return 0;
}
puts("executing regex");
int err = regexec(&re, file, 2, matches, 0);
if(err == 0){
puts("no error");
printf("heres the match: [.%*s]",matches[0].rm_eo-matches[0].rm_so,file+matches[0].rm_so);
} else {
puts("some error here!");
char buffer[128];
regerror(err, &re, buffer, 100);
printf("regexec failed: %s\n", buffer);
}
return 0;
}
The console output is:
compiling regex
executing regex
some error here!
regexec failed: No match
I verified the functionality of this regex here
Any idea what is going wrong here?
EDIT #1
Compiler Version
$ arm-merlin-linux-uclibc-gcc --version
arm-merlin-linux-uclibc-gcc (GCC) 4.2.1
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Compile Command
$ arm-merlin-linux-uclibc-gcc -lpcre ./re_test.c -o re_test.o

There are actually a few issues with your code.
First, you use %*s in an attempt to restrict the length of the printed string. However, the integer width before the s formatter is the minimum length of what gets printed; if the corresponding string's length is less than what's given, it'll be padded with spaces. If the length is greater than what's given, it'll just output the whole string. You'll need some other method of restricting the length of the outputted string (just avoid modifying *file, because file points to a constant string).
Second, you specify the REG_NOSUB option in your regcomp call, but according to the man page, this means that no substring positions are stored in the pmatch argument - thus, even if your regexec did work, the following printf would be using uninitialized values (which is undefined behavior).
Finally, I suspect the problem is that the \" and \n characters need to be doubly-escaped; i.e. you need to use \\\" and \\n in your regex string. While the code you gave worked for me (Ubuntu 14.04 x64), the doubly-escaped version also works.
Taking all of this into account, this is the output I get:
compiling regex
executing regex
no error
heres the match: [.gimme that"]

Related

Can not reallocate memory in a very peculiar circumstance

Almost minimal reproducible example:
prog.c
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char *buffer;
int c;
size_t bufsiz = 1024, i = 0;
if (!(buffer = malloc(bufsiz))) {
fputs("malloc() failed!\n", stderr);
return 1;
}
while (EOF != (c = fgetc(stdin))) {
buffer[i] = c;
if (++i == bufsiz && !(buffer = realloc(buffer, bufsiz *= 2))) {
fputs("realloc() failed! (loop)\n", stderr);
return 1;
}
}
buffer[i] = '\0';
if (!(buffer = realloc(buffer, i))) {
fputs("realloc() failed! ", stderr);
fprintf(stderr, "%d\n", i);
return 1;
}
fputs(buffer, stdout);
return 0;
}
I use this command to compile and run:
gcc prog.c -o prog
This command copies the content of prog.c to exp as expected:
cat prog.c | ./prog > exp
This command prints the error message “realloc() failed! 0”:
cat prog.c | ./prog > prog.c
I have yet to find out the reason behind this peculiar behavior...
P.S.: I am using GNU cat and bash
Congratulations, you've (re-)discovered a bug in your system's implementation of realloc, whereby "success" resizing to 0 is indistinguishable from an error. As specified, if realloc returns a null pointer, it has failed and the old object still exists. But some historical implementations treat realloc(p,0) as free(p). Future versions of the C standard allow for this behavior, and deprecate the use of realloc with a zero size, so if you want to use realloc like this you should make sure you are not passing a zero size.
As noted by Eric Postpischil in a comment:
Where does your program put a null character at the end of the string in the buffer?
the fact that 0 is a possible size for your buffer is indicative of a problem - you forgot to reserve space for terminating the string - and if you fix this, even a zero-length string takes a nonzero number of bytes.
In cat prog.c | ./prog > prog.c, the shell parses the command, sees there is a redirection into prog.c, and opens prog.c for writing, which erases any previous contents of the file. Then cat prog.c sees an empty file and copies it to standard output. ./prog faithfully reproduces this empty stream.

undefined reference to getdelim() error (Windows) (C Language)

I am trying to work with getdelim() function which apperantly is the preferred method with getline() over fgets(). However, when I try to run the code below, I get undefined reference to getdelim() error.
I don't think it has anything to do with the code but rather the gcc version that I am using. So to the cmd I typed gcc -v and apperantly I have gcc version 8.1.0 (x86_64-win32-seh-rev0, Built by MinGW-W64 project)
I am not sure how old it is or if that's the problem. If so, what version of the gcc should I use and can I solve this with adding some fancy macros?
Code:
#include <stdio.h>
int main()
{
int size = 10;
char *string;
printf ("Please enter a string: ");
string = (char*)malloc(size);
getdelim (&string, &size, '-', stdin);
printf( "%s\n", string );
return 0;
}

Beginner C programmer having problems with string functions

I'm a C noob, going back to school for my masters in CS so I'm taking some time to ramp up my skills. I wanted to see if anybody could lend some assistance on why I'm having problems compiling the following code. I've been following the videos on WiBit.net and develop on a 64 bit Linux environment (Ubuntu 13.10). I am using gedit and the gcc compiler no IDE.
This code runs on my Win 7 VM without errors, however when I try to execute it on my host Linux environment I'm getting errors:
Source Code: This example calls the strcmp and strcmpi functions
#include <stdio.h>
#include <string.h>
int main()
{
char str1[255];
char str2[255];
printf("str1: "); gets(str1);
printf("str2: "); gets(str2);
if(strcmp(str1, str2) == 0)
printf("Strings match exactly!");
else if(strcmpi(str1, str2) == 0)
printf("Strings match when ignoring case!");
return 0;
}
Error Message (Linux ONLY):
$gcc main.c -o demo -lm -pthread -lgmp -lreadline 2>&1
/tmp/ccwqdQMN.o: In function main':
main.c:(.text+0x25): warning: thegets' function is dangerous and should not be used.
main.c:(.text+0x8f): undefined reference to `strcmpi'
collect2: error: ld returned 1 exit status
Source Code 2: This example uses the strupr and strlwr functions
#include <stdio.h>
#include <string.h>
int main()
{
char str1[255];
char str2[255];
printf("str1: "); gets(str1);
printf("str2: "); gets(str2);
strlwr(str1);
strupr(str2);
puts (str1);
puts (str2);
return 0;
}
Error Message (Linux ONLY):
$gcc main.c -o demo -lm -pthread -lgmp -lreadline 2>&1
/tmp/ccWnIfnz.o: In function main':
main.c:(.text+0x25): warning: thegets' function is dangerous and should not be used.
main.c:(.text+0x57): undefined reference to strlwr'
main.c:(.text+0x6b): undefined reference tostrupr'
collect2: error: ld returned 1 exit status
I would love a detailed explanation if someone is willing to help and not tear me apart haha. I know that for best practices we shouldn't use gets due to buffer overflow (for example the user enters a 750 character string). Best practices would use fgets instead but my question is whether I'm getting these errors because these functions aren't part of ANSI C or what. They do show up in the man files on my machine which is throwing me through a loop.
Thanks in advance!
UPDATE:
You guys are awesome. Took all of your advice and comments and was able to revise and make a sample program for string comparison as well as conversion to upper/lower. Glad I was able to get it running on both OSes error free as well.
Sample code:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main()
{
char str[255];
printf("Enter a string: "); fgets(str,255, stdin);
printf("Here is your original string, my master: %s\n", str);
//Now let's loop through and convert this to all lowercase
int i;
for(i = 0; str[i]; i++)
{
str[i] = tolower(str[i]);
}
printf("Here is a lowercase version of your string, my master: %s\n", str);
//Now we'll loop through and convert the string to uppercase
int j;
for(j = 0; str[j]; j++)
{
str[j] = toupper(str[j]);
}
printf("Here is a uppercase version of your string, my master: %s\n", str);
return 0;
}
strcmpi problem: strcasecmp() is the posix standard and so is it in linux.
strupr and strlwr doesn't exist in glibc, although you can implement them with a single line of code, as this:
c - convert a mixed-case string to all lower case
In the compilation, first you can find a warning, because the gcc doesn't find the functions in the included header. In such cases it thinks they are declared as int funcname(void). But later, while linking, it can't find the exported symbols of this nonexistant functions, and thus it can't create the executable. This second error is what stops the compilation.
There are too many difference in the c apis, although the posix standard handles them, microsoft don't follow it.
As you noted, the gets function is unsafe because it does not perform any boundary checking: you have called it with a 255-character string buffer, but if another program wrote a line longer than 255 characters, it could write data into your process's stack, and thereby cause your process to execute malicious code (or at the very least produce a segmentation fault).
Use fgets instead:
printf("str1: "); fgets(str1, 255, stdin);
printf("str2: "); fgets(str2, 255, stdin);
If you read the error output from the compiler carefully, you'll note that it's not issuing an error on your use of gets but a warning. Your code should still compile and execute if you fix the strcmpi call.

Odd strtok behavior

I've been trying to use strtok in order to write a polynomial differentiation program, but it seems to be behaving oddly. At this point I've told it to stop at the characters ' ', [, ], (, and ). But for some reason, when passed input such as "Hello[]" it returns "Hello\n"
Is there anything wrong with my code here? All the polynomial string is is the text "Hello[]"
void differentiate(char* polynomial)
{
char current[10];
char output[100];
strncpy(current, strtok(polynomial, " []()/\n"), 10);
printf("%s", current);
} // differentiate()
EDIT : It appears to be an issue related to the shell, and it would also appear to not be a newline after all, as when I use bash it does not occur, but when I use fish, I get the following:
I've never seen this kind of thing before, does anyone have any advice? Is this just a quirk of fish?
I converted your code into this SSCCE (Short, Self-Contained, Correct Example):
#include <string.h>
#include <stdio.h>
static
void differentiate(char* polynomial)
{
char current[10];
strncpy(current, strtok(polynomial, " []()/\n"), 10);
printf("<<%s>>\n", current);
}
int main(void)
{
char string[] = "Hello[]";
printf("Before: <<%s>>\n", string);
differentiate(string);
printf("After: <<%s>>\n", string);
return 0;
}
Actual output:
Before: <<Hello[]>>
<<Hello>>
After: <<Hello>>
I was testing with GCC 4.8.1 on Mac OS X 10.8.4, but I got the same result with the Apple-supplied GCC (i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)) and clang (Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)).
You should justify your assertion that you got a newline out of strtok() by adapting this test and showing the output. Note how the code uses the << and >> to surround the string it is printing; if there's a newline in there, it will show up inside the double angle brackets.

cppcheck : Buffer is accessed out of bounds

I have the code below. After running the cppcheck tool, it reports an error as Buffer is accessed out of bounds? An error is reported on line with the snprintf.
#include <stdio.h>
int main(int argc, char * argv[])
{
if (argc > 1) {
char testref[8] = "";
snprintf(testref, sizeof(testref), "Ref:%s", argv[1]);
printf("===>testref=%s\n", testref);
}
}
below the command line interaction :
amin#ubuntu:$ gcc test.c -o test
amin#ubuntu:$
amin#ubuntu:$ ./test hello_world
===>testref=Ref:hel
amin#ubuntu:$ cppcheck test.c
Checking test.c...
[test.c:7]: (error) Buffer is accessed out of bounds.
amin#ubuntu:$
Is cppcheck correct to report this error?
I think, generally speaking, cppcheck is correct to report this error. The behavior of the snprintf function is implementation-dependent, and in some implementations it is not guaranteed that a null-character is written if the string is too large for the buffer. In such case, the consecutive call to printf() would read outside the boundaries of the buffer.
I could find at least one example of a snprintf implementation that would result in out-of-bound errors for your code. And according to this comment it was also the case for True64/DigitalUnix before c99.
It would be interesting to see if cppcheck also reports an error for the following code (it should not report an error):
#include <stdio.h>
int main(int argc, char * argv[])
{
if (argc > 1) {
char testref[8] = "";
int ret = snprintf(testref, sizeof(testref), "Ref:%s", argv[1]);
if (ret >= 0) {
printf("===>testref=%s\n", testref);
}
}
}
Also note that Cppcheck version 1.82 does not report the error for your code. I'm not sure why version 1.72 does report the error and version 1.82 doesn't.

Resources