Minimal bison/flex-generated code has memory leak - c

In debugging a memory leak on a large project, I found that the source of the leak seemed to be some flex/bison-generated code. I was able to recreate the leak with the following minimal example consisting of two files, sand.l and sand.y:
in sand.l:
%{
#include <stdlib.h>
#include "sand.tab.h"
%}
%%
[0-9]+ { return INT; }
. ;
%%
in sand.y:
%{
#include <stdio.h>
#include <stdlib.h>
int yylex();
int yyparse();
FILE* yyin;
void yyerror(const char* s);
%}
%token INT
%%
program:
program INT { puts("Found integer"); }
|
;
%%
int main(int argc, char* argv[]) {
yyin = stdin;
do {
yyparse();
} while (!feof(yyin));
return 0;
}
void yyerror(const char* s) {
puts(s);
}
The code was compiled with
$ bison -d sand.y
$ flex sand.l
$ gcc -g lex.yy.c sand.tab.c -o main -lfl
Running the program with valgrind gave the following error:
8 bytes in 1 blocks are still reachable in loss record 1 of 3
at 0x4C2AC3D: malloc (vg_replace_malloc.c:299)
by 0x40260F: yyalloc (lex.yy.c:1723)
by 0x402126: yyensure_buffer_stack (lex.yy.c:1423)
by 0x400B89: yylex (lex.yy.c:669)
by 0x402975: yyparse (sand.tab.c:1114)
by 0x402EC4: main (sand.y:24)
64 bytes in 1 blocks are still reachable in loss record 2 of 3
at 0x4C2AC3D: malloc (vg_replace_malloc.c:299)
by 0x40260F: yyalloc (lex.yy.c:1723)
by 0x401CBF: yy_create_buffer (lex.yy.c:1258)
by 0x400BB3: yylex (lex.yy.c:671)
by 0x402975: yyparse (sand.tab.c:1114)
by 0x402EC4: main (sand.y:24)
16,386 bytes in 1 blocks are still reachable in loss record 3 of 3
at 0x4C2AC3D: malloc (vg_replace_malloc.c:299)
by 0x40260F: yyalloc (lex.yy.c:1723)
by 0x401CF6: yy_create_buffer (lex.yy.c:1267)
by 0x400BB3: yylex (lex.yy.c:671)
by 0x402975: yyparse (sand.tab.c:1114)
by 0x402EC4: main (sand.y:24)
It seems that bison and/or flex is holding on to a substantial amount of memory. Is there anyway to force them to free it?

The default flex skeleton allocates an input buffer and a small buffer stack, which it never frees. You could free the input buffer manually with yy_delete_buffer(YY_CURRENT_BUFFER); but there is no documented way to delete the buffer stack. If you have a sufficiently non-ancient version of flex [see Note 1], you can call yylex_destroy() to remove the last vestiges of the buffer stack. (If you don't, it's only 8 bytes in your application, so it's not a disaster.)
If you want to write a clean application, you should generate a reentrant scanner, which puts all persistent data into a scanner context object. Your code must allocate and free this object, and freeing it will free all memory allocations. (You might also want to generate a pure parser, which works roughly the same way.)
However, the reentrant scanner has a very different API, so you will need to get your parser to pass through the scanner context object. If you use a reentrant (pure) parser as well, you'll need to modify your scanner actions because with the reentrant parser, yylval is a YYSTYPE* instead of YYSTYPE.
Notes:
In fact, you can delete the buffer stack using yylex_destroy(), as recently pointed out in a comment, as long as your flex version is at least 2.5.9. Since that version was released almost two decades ago, you'd think this note would be unnecessary, but unfortunately v2.5.4 continues to be the default MinGW installation and it had a surprisingly long life on various Linux distros as well (although I think these days you're not so likely to find it installed).

Related

How do I make dynamic allocations for a program run fail

I have a C program which uses malloc (it could also have been C++ with new). I would like to test my program and simulate an "out of memory" scenario.
I would strongly prefer running my program from within a bash or sh shell environment without modifying the core code.
How do I make dynamic memory allocations fail for a program run?
Seems like it could be possible using ulimit but I can't seem to find the right parameters:
$ ulimit -d 50
$ ./program_which_heap_allocates
./program_which_heap_allocates: error while loading shared libraries: libc.so.6: cannot map zero-fill pages
$ ulimit -d 51
bash: ulimit: data seg size: cannot modify limit: Operation not permitted
I'm having trouble running the program in such a way that dynamic linking can occur (such as stdlib) but not the allocations from my program.
If you are under Linux and using glibc then there are Hooks for Malloc. The hooks allow you to catch calls to malloc and make them randomly fail.
Your test suite could use an environment variable to tell the code to insert the malloc hook and which call of malloc to fail. E.g. if you set FOOBAR_FAIL_MALLOC=10 then your malloc hook would count down and let the 10th use of malloc return 0.
FOOBAR_FAIL_MALLOC=0 could simply report the numbers of mallocs in a testcase. You would then run the test once with FOOBAR_FAIL_MALLOC=0 and capture the number of mallocs involved. Then repeat for FOOBAR_FAIL_MALLOC=1 to N to test every single malloc.
Unless after a failure of malloc you have more mallocs. Then you have to think of something more complex to specify which mallocs should fail.
You could also just make the hook fail randomly. Given enough runs every malloc call would fail at some point.
Note: a C++ new should also hot the malloc hook
You can have your test program include the .c under test and use a #define to override calls to malloc.
For example:
prog.c:
#include <stdio.h>
#include <stdlib.h>
void *foo(int x)
{
return malloc(x);
}
test.c:
#include <stdio.h>
#include <stdlib.h>
static char buf[100];
static int malloc_fail;
void *test_malloc(size_t n)
{
if (malloc_fail) {
return NULL;
} else {
return buf;
}
}
#define malloc(x) test_malloc(x)
#include "prog.c"
#undef malloc
int main()
{
void *p;
malloc_fail=0;
p = foo(5);
printf("buf=%p, p=%p\n", (void *)buf, p); // prints same value both times
malloc_fail=1;
p = foo(4);
if (p) {
printf("buf=%p, p=%p\n", (void *)buf, p);
} else {
printf("p is NULL\n"); // this prints
}
return 0;
}

How do I make LeakSanitizer ignore end of program leaks

I want to use LeakSanitizer to detect leaked memory, but the style of the program I am using does not free memory before exit. This is fairly common in my experience.
I want to detect this leak:
int main(int argc, char const *argv[])
{
char *p = malloc(5);
p = 0;
return 0;
}
And ignore this leak:
int main(int argc, char const *argv[])
{
char *p = malloc(5);
return 0;
}
You want LSan to report only unreachable leaks i.e. pointers which are guaranteed to be leaked by the program. Problem is that by default LeakSanitizer runs it's checks at the end of the program, often after global C++ dtors have completed and their contents is no longer considered accessible. So when LSan finally runs, it has to assume that lots of stuff is no longer reachable. To work around this issue you can add
#include <lsan_interface.h>
...
#ifdef __SANITIZE_ADDRESS__
__lsan_do_leak_check();
__lsan_disable();
#endif
before returning from main (inspired by Issue 719 and llvm discussion).
PS: Be careful with extremely simple examples like the ones you post above. GCC will often remove unused assignments and allocations even at -O0 so always check that assembler matches your expectations.

Auto release of stack variables in C

Unfortunately, in C there aren't any smart pointers.. but is it possible to build a macro that wrap variable declaration and invoke function call with that variable as an input variable upon leaving the scope where the variable was declared ?
Sorry for the long phrase, but I'm working on xnu kernel where you have many elements that have built-in reference counters, and one must not forget to unref this element when done using it to avoid memory leaks.
For example, if I have the following type of proc_t:
struct proc;
typedef struct proc * proc_t;
I want to declare a stack variable based on this type within a scope, for example :
{
proc_t_release_upon_exit proc_t proc_iter = proc_find(mypid);
//the rest of the code in this scope
}
After preprocessor analyze the macro and before compilation, the following code I expect to be generated is :
{
proc_t myproc = proc_find(mypid)
//the rest of the code in scope
proc_rele(myproc);
}
Is there any way to define such macro as in C ?
You could use the cleanup variable attribute in GCC. Please take a look at this:
http://echorand.me/site/notes/articles/c_cleanup/cleanup_attribute_c.html
Sample code:
#include <stdio.h>
#include <stdlib.h>
void free_memory(void **ptr)
{
printf("Free memory: %p\n", *ptr);
free(*ptr);
}
int main(void)
{
// Define variable and allocate 1 byte, the memory will be free at
// the end of the scope by the free_memory function. The free_memory
// function will get the pointer to the variable *ptr (double pointer
// **ptr).
void *ptr __attribute__ ((__cleanup__(free_memory))) = malloc(1);
return 0;
}
If you save the source code in a file named main.c, you could compile it with this command:
gcc main.c -o main
and verify if there are any memory leaks by:
valgrind ./main
Example output from valgrind:
==1026== Memcheck, a memory error detector
==1026== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==1026== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==1026== Command: ./main
==1026==
Free memory: 0x51ff040
==1026==
==1026== HEAP SUMMARY:
==1026== in use at exit: 0 bytes in 0 blocks
==1026== total heap usage: 1 allocs, 1 frees, 1 bytes allocated
==1026==
==1026== All heap blocks were freed -- no leaks are possible
==1026==
==1026== For counts of detected and suppressed errors, rerun with: -v
==1026== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
C does provide a way to place code syntactically before other code that will execute first: the for block. Remember that clause 3 of a for structure can contain an arbitrary expression, and always runs after the execution of the main block.
You can therefore create a macro that makes a predetermined call after a given chunk of following code by wrapping a for block in a macro:
#define M_GEN_DONE_FLAG() _done_ ## __LINE__
#define M_AROUND_BLOCK2(FLAG, DECL, BEFORE, AFTER) \
for (int FLAG = (BEFORE, 0); !FLAG; ) \
for (DECL; !FLAG; FLAG = (AFTER, 1))
#define M_AROUND_BLOCK(DECL, BEFORE, AFTER) M_AROUND_BLOCK2(M_GEN_DONE_FLAG(), DECL, BEFORE, AFTER)
#define M_CLEANUP_VAR(DECL, CLEANUP_CALL) M_AROUND_BLOCK(DECL, (void)0, CLEANUP_CALL)
...and you can use it like this:
#include <stdio.h>
struct proc;
typedef struct proc * proc_t;
proc_t proc_find(int);
void proc_rele(proc_t);
void fun(int mypid) {
M_CLEANUP_VAR (proc_t myproc = proc_find(mypid), proc_rele(myproc))
{
printf("%p\n", &myproc); // just to prove it's in scope
}
}
The trick here is that a for block accepts a following statement, but if we don't actually put that statement in the macro definition, we can follow the macro invocation with a normal code block and it will "magically" belong to our new scoped-control-structure syntax, simply by virtue of following the expanded for.
Any optimizer worth using will remove the loop flag at its lowest optimization settings. Note that name clashing with the flag is not a huge concern (i.e. you don't really need a gensym for this) because the flag is scoped to the loop body, and any nested loops will safely hide it if they use the same flag name.
The bonus here is that the scope of the variable to cleanup is restricted (it cannot be used outside of the compound immediately following its declaration) and visually explicit (because of said compound).
Pros:
this is standard C with no extensions
the control flow is straightforward
it's actually (somehow) less verbose than __attribute__ __cleanup__
Cons:
it doesn't provide "full" RAII (i.e. won't protect against goto or C++ exceptions: __cleanup__ is usually implemented with C++ machinery under the hood so it's more complete). More seriously, it doesn't protect against early return (thanks #Voo). (You can at least guard against a misplaced break - if you want to - by adding a third line, switch (0) default: to the end of M_AROUND_BLOCK2.)
not everyone agrees with syntax-extending macros (but consider that you are extending C's semantics here, so...)
I know this isn't what you want to hear, but I urge you not to do this.
It is perfectly acceptable C style to have a single point of return, before which everything gets cleaned up. Since there are no exceptions, this is easy to do, and easy to verify by looking at the function.
Using macro-hackery or compiler "features" to do this is not accepted C style. It will be a burden for everyone after you to read and understand. And in the end it doesn't actually gain you much.

Trying to suppress clang false positive leak warning

I am using clang static analysis under Xcode 6.4 (6E35b), and getting a false positive warning about a potential memory leak. I do explicitly free the memory in question, but the freeing happens in a different compilation unit. Here is my MWE:
file2.c: Performs the actual freeing.
#include <stdlib.h>
void my_free(const void* p) {
free((void*) p);
}
file1.c: Allocates memory and explicitly frees it through an external function.
#include <stdlib.h>
void my_free(const void* p);
int main(int argc, char* argv[]) {
void* data = malloc(1);
if(data) my_free(data);
return 0; /* <-- "Potential leak of memory pointed to by 'data'" */
}
When I define my_free() in the same compilation unit as its invocation, no warning is generated, but of course I need to invoke my_free() from a large number of different source files.
I have read through FAQ and How to Deal with Common False Positives, but it does not address my situation. What can I do to assure clang that I really am freeing the memory in question?
In case the version information is relevant:
% clang --version
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
One way to fix that would be to add code specific for the analyser, in your header file:
#ifdef __clang_analyzer__
#define my_free free
#endif
This will make the static analyser think you're using the classic free function and stop complaining.

How do I use valgrind to find memory leaks?

How do I use valgrind to find the memory leaks in a program?
Please someone help me and describe the steps to carryout the procedure?
I am using Ubuntu 10.04 and I have a program a.c, please help me out.
How to Run Valgrind
Not to insult the OP, but for those who come to this question and are still new to Linux—you might have to install Valgrind on your system.
sudo apt install valgrind # Ubuntu, Debian, etc.
sudo yum install valgrind # RHEL, CentOS, Fedora, etc.
sudo pacman -Syu valgrind # Arch, Manjaro, Garuda, etc
Valgrind is readily usable for C/C++ code, but can even be used for other
languages when configured properly (see this for Python).
To run Valgrind, pass the executable as an argument (along with any
parameters to the program).
valgrind --leak-check=full \
--show-leak-kinds=all \
--track-origins=yes \
--verbose \
--log-file=valgrind-out.txt \
./executable exampleParam1
The flags are, in short:
--leak-check=full: "each individual leak will be shown in detail"
--show-leak-kinds=all: Show all of "definite, indirect, possible, reachable" leak kinds in the "full" report.
--track-origins=yes: Favor useful output over speed. This tracks the origins of uninitialized values, which could be very useful for memory errors. Consider turning off if Valgrind is unacceptably slow.
--verbose: Can tell you about unusual behavior of your program. Repeat for more verbosity.
--log-file: Write to a file. Useful when output exceeds terminal space.
Finally, you would like to see a Valgrind report that looks like this:
HEAP SUMMARY:
in use at exit: 0 bytes in 0 blocks
total heap usage: 636 allocs, 636 frees, 25,393 bytes allocated
All heap blocks were freed -- no leaks are possible
ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I have a leak, but WHERE?
So, you have a memory leak, and Valgrind isn't saying anything meaningful.
Perhaps, something like this:
5 bytes in 1 blocks are definitely lost in loss record 1 of 1
at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
by 0x40053E: main (in /home/Peri461/Documents/executable)
Let's take a look at the C code I wrote too:
#include <stdlib.h>
int main() {
char* string = malloc(5 * sizeof(char)); //LEAK: not freed!
return 0;
}
Well, there were 5 bytes lost. How did it happen? The error report just says
main and malloc. In a larger program, that would be seriously troublesome to
hunt down. This is because of how the executable was compiled. We can
actually get line-by-line details on what went wrong. Recompile your program
with a debug flag (I'm using gcc here):
gcc -o executable -std=c11 -Wall main.c # suppose it was this at first
gcc -o executable -std=c11 -Wall -ggdb3 main.c # add -ggdb3 to it
Now with this debug build, Valgrind points to the exact line of code
allocating the memory that got leaked! (The wording is important: it might not
be exactly where your leak is, but what got leaked. The trace helps you find
where.)
5 bytes in 1 blocks are definitely lost in loss record 1 of 1
at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
by 0x40053E: main (main.c:4)
Techniques for Debugging Memory Leaks & Errors
Make use of www.cplusplus.com! It has great documentation on C/C++ functions.
General advice for memory leaks:
Make sure your dynamically allocated memory does in fact get freed.
Don't allocate memory and forget to assign the pointer.
Don't overwrite a pointer with a new one unless the old memory is freed.
General advice for memory errors:
Access and write to addresses and indices you're sure belong to you. Memory
errors are different from leaks; they're often just IndexOutOfBoundsException
type problems.
Don't access or write to memory after freeing it.
Sometimes your leaks/errors can be linked to one another, much like an IDE discovering that you haven't typed a closing bracket yet. Resolving one issue can resolve others, so look for one that looks a good culprit and apply some of these ideas:
List out the functions in your code that depend on/are dependent on the
"offending" code that has the memory error. Follow the program's execution
(maybe even in gdb perhaps), and look for precondition/postcondition errors. The idea is to trace your program's execution while focusing on the lifetime of allocated memory.
Try commenting out the "offending" block of code (within reason, so your code
still compiles). If the Valgrind error goes away, you've found where it is.
If all else fails, try looking it up. Valgrind has documentation too!
A Look at Common Leaks and Errors
Watch your pointers
60 bytes in 1 blocks are definitely lost in loss record 1 of 1
at 0x4C2BB78: realloc (vg_replace_malloc.c:785)
by 0x4005E4: resizeArray (main.c:12)
by 0x40062E: main (main.c:19)
And the code:
#include <stdlib.h>
#include <stdint.h>
struct _List {
int32_t* data;
int32_t length;
};
typedef struct _List List;
List* resizeArray(List* array) {
int32_t* dPtr = array->data;
dPtr = realloc(dPtr, 15 * sizeof(int32_t)); //doesn't update array->data
return array;
}
int main() {
List* array = calloc(1, sizeof(List));
array->data = calloc(10, sizeof(int32_t));
array = resizeArray(array);
free(array->data);
free(array);
return 0;
}
As a teaching assistant, I've seen this mistake often. The student makes use of
a local variable and forgets to update the original pointer. The error here is
noticing that realloc can actually move the allocated memory somewhere else
and change the pointer's location. We then leave resizeArray without telling
array->data where the array was moved to.
Invalid write
1 errors in context 1 of 1:
Invalid write of size 1
at 0x4005CA: main (main.c:10)
Address 0x51f905a is 0 bytes after a block of size 26 alloc'd
at 0x4C2B975: calloc (vg_replace_malloc.c:711)
by 0x400593: main (main.c:5)
And the code:
#include <stdlib.h>
#include <stdint.h>
int main() {
char* alphabet = calloc(26, sizeof(char));
for(uint8_t i = 0; i < 26; i++) {
*(alphabet + i) = 'A' + i;
}
*(alphabet + 26) = '\0'; //null-terminate the string?
free(alphabet);
return 0;
}
Notice that Valgrind points us to the commented line of code above. The array
of size 26 is indexed [0,25] which is why *(alphabet + 26) is an invalid
write—it's out of bounds. An invalid write is a common result of
off-by-one errors. Look at the left side of your assignment operation.
Invalid read
1 errors in context 1 of 1:
Invalid read of size 1
at 0x400602: main (main.c:9)
Address 0x51f90ba is 0 bytes after a block of size 26 alloc'd
at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
by 0x4005E1: main (main.c:6)
And the code:
#include <stdlib.h>
#include <stdint.h>
int main() {
char* destination = calloc(27, sizeof(char));
char* source = malloc(26 * sizeof(char));
for(uint8_t i = 0; i < 27; i++) {
*(destination + i) = *(source + i); //Look at the last iteration.
}
free(destination);
free(source);
return 0;
}
Valgrind points us to the commented line above. Look at the last iteration here,
which is *(destination + 26) = *(source + 26);. However, *(source + 26) is
out of bounds again, similarly to the invalid write. Invalid reads are also a
common result of off-by-one errors. Look at the right side of your assignment
operation.
The Open Source (U/Dys)topia
How do I know when the leak is mine? How do I find my leak when I'm using
someone else's code? I found a leak that isn't mine; should I do something? All
are legitimate questions. First, 2 real-world examples that show 2 classes of
common encounters.
Jansson: a JSON library
#include <jansson.h>
#include <stdio.h>
int main() {
char* string = "{ \"key\": \"value\" }";
json_error_t error;
json_t* root = json_loads(string, 0, &error); //obtaining a pointer
json_t* value = json_object_get(root, "key"); //obtaining a pointer
printf("\"%s\" is the value field.\n", json_string_value(value)); //use value
json_decref(value); //Do I free this pointer?
json_decref(root); //What about this one? Does the order matter?
return 0;
}
This is a simple program: it reads a JSON string and parses it. In the making,
we use library calls to do the parsing for us. Jansson makes the necessary
allocations dynamically since JSON can contain nested structures of itself.
However, this doesn't mean we decref or "free" the memory given to us from
every function. In fact, this code I wrote above throws both an "Invalid read"
and an "Invalid write". Those errors go away when you take out the decref line
for value.
Why? The variable value is considered a "borrowed reference" in the Jansson
API. Jansson keeps track of its memory for you, and you simply have to decref
JSON structures independent of each other. The lesson here:
read the documentation. Really. It's sometimes hard to understand, but
they're telling you why these things happen. Instead, we have
existing questions about this memory error.
SDL: a graphics and gaming library
#include "SDL2/SDL.h"
int main(int argc, char* argv[]) {
if (SDL_Init(SDL_INIT_VIDEO|SDL_INIT_AUDIO) != 0) {
SDL_Log("Unable to initialize SDL: %s", SDL_GetError());
return 1;
}
SDL_Quit();
return 0;
}
What's wrong with this code? It consistently leaks ~212 KiB of memory for me. Take a moment to think about it. We turn SDL on and then off. Answer? There is nothing wrong.
That might sound bizarre at first. Truth be told, graphics are messy and sometimes you have to accept some leaks as being part of the standard library. The lesson here: you need not quell every memory leak. Sometimes you just need to suppress the leaks because they're known issues you can't do anything about. (This is not my permission to ignore your own leaks!)
Answers unto the void
How do I know when the leak is mine?
It is. (99% sure, anyway)
How do I find my leak when I'm using someone else's code?
Chances are someone else already found it. Try Google! If that fails, use the skills I gave you above. If that fails and you mostly see API calls and little of your own stack trace, see the next question.
I found a leak that isn't mine; should I do something?
Yes! Most APIs have ways to report bugs and issues. Use them! Help give back to the tools you're using in your project!
Further Reading
Thanks for staying with me this long. I hope you've learned something, as I tried to tend to the broad spectrum of people arriving at this answer. Some things I hope you've asked along the way: How does C's memory allocator work? What actually is a memory leak and a memory error? How are they different from segfaults? How does Valgrind work? If you had any of these, please do feed your curiousity:
More about malloc, C's memory allocator
Definition of a segmentation fault
Definition of a memory leak
Definition of a memory access error
How does Valgrind work?
Try this:
valgrind --leak-check=full -v ./your_program
As long as valgrind is installed it will go through your program and tell you what's wrong. It can give you pointers and approximate places where your leaks may be found. If you're segfault'ing, try running it through gdb.
You can run:
valgrind --leak-check=full --log-file="logfile.out" -v [your_program(and its arguments)]
You can create an alias in .bashrc file as follows
alias vg='valgrind --leak-check=full -v --track-origins=yes --log-file=vg_logfile.out'
So whenever you want to check memory leaks, just do simply
vg ./<name of your executable> <command line parameters to your executable>
This will generate a Valgrind log file in the current directory.

Resources