Bison - symbol table - matching free for malloc - c

I am going through mfcalc example in the Bison manual and I had a question about the symbol table.
Specifically in the routine putsym() we have calls to malloc but I don't see the corresponding call to free. Do we need to deallocate symbol table (sym_table in the following code) manually or does the tool take care of this automatically?
symrec *
putsym (char const *sym_name, int sym_type)
{
symrec *ptr = (symrec *) malloc (sizeof (symrec));
ptr->name = (char *) malloc (strlen (sym_name) + 1);
strcpy (ptr->name,sym_name);
ptr->type = sym_type;
ptr->value.var = 0; /* Set value to 0 even if fctn. */
ptr->next = (struct symrec *)sym_table;
sym_table = ptr;
return ptr;
}

"The tool" knows nothing about what your actions do.
I quoted "the tool" because in reality, there are at least two code generation tools involved in most parsing projects: a parser generators (bison, in this case) and a scanner generator ((f)lex, perhaps). The mfcalc example uses a hand-built lexer to avoid depending on lex although it would probably have been simpler to have used (f)lex. In any event, the only calls to the symbol table library are in the scanner and have absolutely nothing to do with the bison-generated code.
Of course, there are other tools at play. For example, the entire project is built with a C compiler and runs inside some kind of hosted environment (to use the words of the C standard); in other words, an operating system and runtime support library which includes implementations of malloc and free (although, as you point out, free is nowhere called by the example code).
I mention these last because they are relevant to your question. When a process terminates, all process resources are released, including its memory image. (This is not required by the C standard but almost all hosted environments work that way.) So you don't really need to free() memory allocated if it is going to be in use up to program termination.
Like global variables, unreleased memory allocations were pretty common at one time. These days, such things are considered poor practice (at best) and most programmers will avoid them, but it wasn't always the case. There was a time when many programmers considered it wasteful to track resources only in order to release them just before program termination, or to jump through the hoops necessary to ensure that pre-termination cleanup was guaranteed to execute. (Even today, many programmers will just insert a call to exit(1) when an unrecoverable error occurs, rather than going to the bother of tracking down and manually freeing every allocated memory block. Particularly in non-production code.)
Whether you approve of this coding style or not, the examples in the bison manual (and many other code examples of all kinds) date back to that innocent time.
So, it's true that the symbol table entries in this example are never freed. Your production code should probably do better, but it also should probably use a more efficient data structure and avoid depending on a (single) global context. But none of that has anything to do with the bison features that mfcalc is attempting to illustrate.

Related

Why do so many standard C functions tamper with parameters instead of returning values?

Many functions like strcat, strcpy and alike don't return the actual value but change one of the parameters (usually a buffer). This of course creates a boatload of side effects.
Wouldn't it be far more elegant to just return a new string? Why isn't this done?
Example:
char *copy_string(char *text, size_t length) {
char *result = malloc(sizeof(char) * length);
for (int i = 0; i < length; ++i) {
result[i] = text[i];
}
return result;
}
int main() {
char *copy = copy_string("Hello World", 12);
// *result now lingers in memory and can not be freed?
}
I can only guess it has something to do with memory leaking since there is dynamic memory being allocated inside of the function which you can not free internally (since you need to return a pointer to it).
Edit: From the answers it seems that it is good practice in C to work with parameters rather than creating new variables. So I should aim for building my functions like that?
Edit 2: Will my example code lead to a memory leak? Or can *result be free'd?
To answer your original question: C, at the time it was designed, was tailored to be a language of maximum efficiency. It was, basically, just a nicer way of writing assembly code (the guy who designed it, wrote his own compiler for it).
What you say (that parameters are often used rather than return codes) is mainly true for string handling. Most other functions (those that deal with numbers for example) work through return codes as expected. Or they only modify values for parameters if they have to return more than one value.
String handling in C today is considered one of the major (if not THE major) weakness in C. But those functions were written with performance in mind, and with the machines available those days (and the intent of performance) working on the callers buffers was the way of choice.
Re your edit 1: Today other intents may apply. Performance usually isn't the limiting factor. Equally or important are readability, robustness, pronenees to error. And generally, as said, the string handling in C is today generally considered an horrible relic of the past. So it's basically your choice, depending on your intent.
Re your edit 2: Yes, the memory will leak. You need to call free(copy); Which ties into edit 1: proneness of error - it's easy to forget the free and create leaks that way (or attempt to free it twice or access it after it was freed). It may be more readable and more more prone to error too (even more than the clunky original C approach of modifying the caller's buffer).
Generally, I'd suggest, whenever you have the choice, to work with a newer dialect that support std-string or something similar.
Why do so many standard C functions tamper with parameters instead of returning values?
Because that's often what the users of the C library wants.
Many functions like strcat, strcpy and alike don't return the actual value but change one of the parameters (usually a buffer). This of course creates a boatload of side effects. Wouldn't it be far more elegant to just return a new string? Why isn't this done?
It's not very efficient to allocate a memory and it'll require the user to free() them later, which is an unnecessary burden on the user. Efficiency and letting users do what they want (even if they want shoot themselves in the foot) is a part of C's philosophy.
Besides, there are syntax/implementation issues. For example, how can the following be done if the strcpy() function actually returns a newly allocated string?
char arr[256] = "Hello";
strcpy(arr, "world");
Because C doesn't allow you assign something to an array (arr).
Basically, you are questioning C is the way it is. For that question, the common answer is "historical reasons".
Two reasons:
Properly designed functions should only concern themselves with their designated purpose, and not unrelated things such as memory allocation.
Making a hard copy of the string would make the function far slower.
So for your example, if there is a need for a hard copy, the caller should malloc the buffer and afterwards call strcpy. That separates memory allocation from the algorithm.
On top of that, good design practice dictates that the module that allocated memory should also be responsible for freeing it. Otherwise the caller might not even realize that the function is allocating memory, and there would be a memory leak. If the caller instead is responsible for the allocation, then it is obvious that the caller is also responsible for clean-up.
Overall, C standard library functions are designed to be as fast as possible, meaning they will strive to meet the case where the caller has minimal requirements. A typical example of such a function is malloc, which doesn't even set the allocated data to zero, because that would take extra time. Instead they added an additional function calloc for that purpose.
Other languages have different philosophies, where they would for example force a hard copy for all string handling functions ("immutable objects"). This makes the function easier to work with and perhaps also the code easier to read, but it comes at the expense of a slower program, which needs more memory.
This is one of the main reasons why C is still widely used for development. It tends is much faster and more efficient than any other language (except raw assembler).

How can I free automatically multiple malloc in C?

I'd like to free automatically multiple malloced memory at the end of a program in C.
For example :
str1 = malloc(sizeof(char) * 10);
str2 = malloc(sizeof(char) * 10);
str3 = malloc(sizeof(char) * 10);
I don't want a function like this :
void my_free()
{
free(str1);
free(str2);
free(str3);
}
but a function which free all the memory allocated during the program.
but a function which free all the memory allocated during the program.
There's no such function in C. C doesn't do memory management automatically. So it's your responsibility to track the memory allocations and free them appropriately.
Most modern operating systems (perhaps, not embedded systems) would reclaim the memory allocated during execution at process termination. So you can skip calling free(). However, if your application is a long running one then this will become a problem if it keeps allocating memory. Depending your application, you may devise a strategy to free memory appropriately.
As Blue Moon pointed out in his answer, one of the main features of C compared to other languages is the missing memory management. While this gives you a lot of freedom it can on the other hand lead to severe bugs in your code.
Technically the detection of memory leaks is not possible with a confidence level of 100%, but there are quite powerful static code analyzers out there to guide you.
In the last embedded project I worked on we used FlexeLint. It is costly for non-commercial products but the benefit is enormours. A lot of potential bugs and leaks could be detected with such a static analyzer without even executing the code.
There is another static analyzer, free of cost for open source projects called Coverity Scan. I did not try it myself but it is probably worth a shot.
After witnessing what a good analyzer like FlexeLint is able to detect beyond mere compilation errors, I personally would not launch another C Project without such analyzis tools in place.
While this is not a direct answer to your question, it can be an improvement to your workflow because such errors as forgetting a free call will be detected in most cases.

Building a C immutable string library, how to deal with leftover const char*'s?

Dealing with strings in C definitely makes one wish for a simple class-based language, but I'm trying to instead build a convenient string library. My idea is to use immutable strings, with a struct called Rstring (for "robust string") that has an internal const char* s and int length, such that operations like rstring_concat and rstring_substring return new Rstring objects with their own newly malloc'd character pointers.
Writing up an initial draft of the library, I am pleased with the simplicity and cleanliness of using my library instead of char *. However, I realized that returning a newly allocated pointer is somewhat of a PITA without a destructor. Each time some operation is done, say via concatenation or substrings, you have some newly allocated memory, and then whichever strings you had before are now hanging around, and probably are useless, so they need to be free'd, and C has no destructors, so the user is stuck having to manually go around freeing everything.
Therefore my question is, are there any clever ways to avoid having to make a whole bunch of manual calls to free? Possibly, for example, having internal start and end indices in order to have strings which act like small strings, but really contain quite a bit more? I don't know if there's any generally accepted method of doing this, or if people are simply stuck with tedious memory management in C.
Perhaps best, is there any widely-used library for convenient string manipulation in C?
If you need a better string library for C I would recommend The Better String Library.
C does not have any way of simplifying the memory management. Any memory you allocate using malloc must be freed. If you are working with a lot of strings in one function you could use a special registry where you register strings to. The registry could then destroy all the strings that were registered to it.
e.g. (only the interfaces, no implementation)
void rstring_reg_init(rstring_reg*);
void rstring_reg_destroy(rstring_reg*);
rstring rstring_reg_create(rstring_reg*, const char*);
void rstring_reg_register(rstring_reg*, rstring);
void rstring_reg_detach(rstring_reg*, rstring);
If your strings are mutable you could even create the strings using the registry (I'd rather call it pool then). If the strings were to remember their pool you could even let them register theirselves at creation time. This could lead to rather "beautiful code" like:
rstring f() {
rstring_reg reg;
rstring_reg_init(&reg);
rstring a = rstring_reg_create(reg, "foo");
rstring b = rstring_reg_create(reg, "bar");
rstring ab = rstring_concat(a, b);
rstring s = rstring_substr(ab, 1, 4);
rstring_detach(s);
rstring_reg_destroy(&reg);
return s;
}
What this code would do is this:
Create registry
Create a and b strings which both know the registry
Create a new ab string. It is automatically added to the registry.
Create a new s string. It is also added to the registry.
Detach s from the registry as we want to return it.
Destroy registry. This automatically destroys a, b and ab
Return s - The caller of f is now responsible to manage its memory
In the end I'd rather recommend using C++ than using such beast.
What you really want is RAII and this is only possible using C++ or a proprietary GCC extension.
The "slightly clever" way of declaring an immutable string struct would be something like:
struct Rstring {
size_t length;
char s[0];
};
It's the zero-length array hack in action. You're able to allocate Rstring objects as below:
struct Rstring* alloc_rstring(const char* text) {
size_t len = strlen(text);
struct Rstring* ret = malloc(sizeof(Rstring) + len + 1));
ret->length = len;
memcpy(ret->s, text, len + 1);
return ret;
}
and free such Rstring objects with just a simple free(), since the string data resides on the same allocation.
What you really need is a garbage collector for C .
LISP and Java programmers take garbage collection for granted. With
the Boehm-Demers-Weiser library, you easily can use it in C and C++
projects, too.
The whole point of immutable strings, at least as I understand it, is to be able to avoid copies by sharing storage. (You can also avoid some locking issues if that's important.) But in that case, you really can't allow the client to free a string, unless you force them to maintain reference counts, which is really a pain.
In many applications, though, you can do what the Apache Runtime (APR) does, and use memory pools. Objects created in a memory pool are not freed individually; the entire pool is freed. This requires some lifetime analysis, and it can lead to obscure bugs because the compiler doesn't do the bookkeeping for you, but it's generally less work than reference counts, and the other payoff is that both allocation and deallocation are really fast. This might counterbalance the other disadvantage, which is that releasing storage is somewhat imprecise.
The memory pool architecture works best if you've got some sort of request-based control flow, typical of servers (but also of certain kinds of UI applications). In the request-based structure, each request initializes a memory pool, and frees it when request processing finishes. If the request makes some kind of permanent change to the state of the server, you may have to move some data from the temporary pool to a (more) permanent one, but such changes are relatively rare. For requests which can be gracefully divided into stages, some of which are temporary-memory-hungry, it's possible to nest memory pools; here, again, you may need to explicitly flag certain strings for an enclosing memory pool, or move them before their memory pool is deleted.
Since all objects in a memory pool are deleted at the same time, it is possible to implement finalizers by attaching the finalizer to the pool itself. The pool can keep a simple linked list of finalizers, which are executed sequentially (in reverse creation order) before the pool is actually freed. This allows finalizers to refer to arbitrary other objects in the same pool, provided that they don't invalidate the state of objects; the restriction is not too draconian since by and large finalizers are about managing non-memory resources like file descriptors.
APR does not have immutable strings (or, at least, it didn't last time I looked at it), so Apache ends up doing a lot of unnecessary string copying. That's a design choice; I'm not voting one way or the other on it here.
You may like to look at Glib, which has many useful data structures and services, which include type Gstring. There is a printf that puts its result directly into a Gstring. Very handy.
At heart your question is more about memory allocation than about strings per se. You get little help from C for memory allocation of strings or anything else. There are a few general approaches:
Manual: Call malloc() and free() for every object.
Conservative garbage collection: The Boehm collector is very mature. Though it has some problems, they're well understood at this point. Certain quirky coding methods and aggressive optimizations must be avoided.
Reference counting garbage collection: This requires that you to call reference and dereference routines that increment/decrement a counter in the string at the creation and destruction of each pointer to it. When a decrement takes the count to zero, the string is freed. This is just as tedious and error prone as malloc() and free(), but it lets you handle complicated cases where the string's lifetime may end in any of several separate chunks of code.
Pool allocation. In this scheme, you create any number of allocation pools. When creating a string, you specify the pool where it should be allocated. Pools are freed all at once.
Stack allocation. Strings are allocated from a single memory pool that has Mark and Release operations. Calling Release frees all strings back to the point where Mark was last called.
Combinations of all above are common. For example, GNU Obstacks combine 4 and 5.

Is there a way to test that a pointer's memory allocation has been freed?

Is there a way to tell that a pointer's memory assignment has been deallocated? I am just starting out in C and I think I am finally starting to understand the intricacies of memory management in C.
So for example:
char* pointer;
pointer = malloc(1024);
/* do stuff */
free(pointer);
/* test memory allocation after this point */
I know that the pointer will still store the memory address until I set pointer = NULL - but is there a way to test that the pointer no longer references memory I can use, without having to set it to NULL first?
The reason I want to do this is that I have a bunch of unit tests for my C program, and one of them ensures that there are no orphaned pointers after I call a special function which performs a cleanup of a couple of linked lists. Looking at the debugger I can see that my cleanup function is working, but I would like a way to test the pointers so I can wrap them in a unit test assertion.
There is no standardized memory management that tells you whether or not any given address is part of a currently allocated memory block.
For the purposes of a unit test, you could create a global map that keeps track of every allocation, so you make every allocation go through your custom malloc function that adds an entry to the map in debug builds and removes it again in free.
You could always link against an instrumented version of the libraries (say electric fence) for the purposes of unit testing.
Not ideal because you introduce a difference in the production and testing environments.
And some systems may provide sufficient instrumentation in their version of the standard library. For instance the Mac OS 10.5 library supports calling abort (3) on double frees, so if your unit tester can trap signals you are home free.
Shameless and pointless self promotion: my little toy c unit testing framework can trap signals.
Neither standard C nor POSIX (I think) provides a way to check that. Your specific operating system might have some sort of elaborate black magic for doing this that is only revealed to the Inner Circle of System Programmers, though.
Use good c practices. Example:
char* pointer = NULL;
/* do stuff */
pointer = malloc(1024);/* malloc does not always work, check it. */
if(pointer == NULL) {
/*Help me, warn or exit*/
}
/* do stuff */
if(pointer) {
free(pointer);
pointer = NULL;
}
/* do stuff */
if(pointer) {
/* tested memory allocation stuff */
}
Longer, yes, but if you always set a freed pointer to NULL, it's easy to test.

Patterns for freeing memory in C?

I'm currently working on a C based application am a bit stuck on freeing memory in a non-antipattern fashion. I am a memory-management amateur.
My main problem is I declare memory structures in various different scopes, and these structures get passed around by reference to other functions. Some of those functions may throw errors and exit().
How do I go about freeing my structures if I exit() in one scope, but not all my data structures are in that scope?
I get the feeling I need to wrap it all up in a psuedo exception handler and have the handler deal with freeing, but that still seems ugly because it would have to know about everything I may or may not need to free...
Consider wrappers to malloc and using them in a disciplined way. Track the memory that you do allocate (in a linked list maybe) and use a wrapper to exit to enumerate your memory to free it. You could also name the memory with an additional parameter and member of your linked list structure. In applications where allocated memory is highly scope dependent you will find yourself leaking memory and this can be a good method to dump the memory and analyze it.
UPDATE:
Threading in your application will make this very complex. See other answers regarding threading issues.
You don't need to worry about freeing memory when exit() is called. When the process exits, the operating system will free all of the associated memory.
I think to answer this question appropriately, we would need to know about the architecture of your entire program (or system, or whatever the case may be).
The answer is: it depends. There are a number of strategies you can use.
As others have pointed out, on a modern desktop or server operating system, you can exit() and not worry about the memory your program has allocated.
This strategy changes, for example, if you are developing on an embedded operating system where exit() might not clean everything up. Typically what I see is when individual functions return due to an error, they make sure to clean up anything they themselves have allocated. You wouldn't see any exit() calls after calling, say, 10 functions. Each function would in turn indicate an error when it returns, and each function would clean up after itself. The original main() function (if you will - it might not be called main()) would detect the error, clean up any memory it had allocated, and take the appropriate actions.
When you just have scopes-within-scopes, it's not rocket science. Where it gets difficult is if you have multiple threads of execution, and shared data structures. Then you might need a garbage collector or a way to count references and free the memory when the last user of the structure is done with it. For example, if you look at the source to the BSD networking stack, you'll see that it uses a refcnt (reference count) value in some structures that need to be kept "alive" for an extended period of time and shared among different users. (This is basically what garbage collectors do, as well.)
You can create a simple memory manager for malloc'd memory that is shared between scopes/functions.
Register it when you malloc it, de-register it when you free it. Have a function that frees all registered memory before you call exit.
It adds a bit of overhead, but it helps keep track of memory. It can also help you hunt down pesky memory leaks.
Michael's advice is sound - if you are exiting, you don't need to worry about freeing the memory since the system will reclaim it anyway.
One exception to that is shared memory segments - at least under System V Shared Memory. Those segments can persist longer than the program that creates them.
One option not mentioned so far is to use an arena-based memory allocation scheme, built on top of standard malloc(). If the entire application uses a single arena, your cleanup code can release that arena, and all is freed at once. (APR - Apache Portable Runtime - provides a pools feature which I believe is similar; David Hanson's "C Interfaces and Implementations" provides an arena-based memory allocation system; I've written one that you could use if you wanted to.) You can think of this as "poor man's garbage collection".
As a general memory discipline, every time you allocate memory dynamically, you should understand which code is going to release it and when it can be released. There are a few standard patterns. The simplest is "allocated in this function; released before this function returns". This keeps the memory largely under control (if you don't run too many iterations on the loop that contains the memory allocation), and scopes it so that it can be made available to the current function and the functions it calls. Obviously, you have to be reasonably sure that the functions you call are not going to squirrel away (cache) pointers to the data and try to reuse them later after you've released and reused the memory.
The next standard pattern is exemplified by fopen() and fclose(); there's a function that allocates a pointer to some memory, which can be used by the calling code, and then released when the program has finished with it. However, this often becomes very similar to the first case - it is usually a good idea to call fclose() in the function that called fopen() too.
Most of the remaining 'patterns' are somewhat ad hoc.
People have already pointed out that you probably don't need to worry about freeing memory if you're just exiting (or aborting) your code in case of error. But just in case, here's a pattern I developed and use a lot for creating and tearing down resources in case of error. NOTE: I'm showing a pattern here to make a point, not writing real code!
int foo_create(foo_t *foo_out) {
int res;
foo_t foo;
bar_t bar;
baz_t baz;
res = bar_create(&bar);
if (res != 0)
goto fail_bar;
res = baz_create(&baz);
if (res != 0)
goto fail_baz;
foo = malloc(sizeof(foo_s));
if (foo == NULL)
goto fail_alloc;
foo->bar = bar;
foo->baz = baz;
etc. etc. you get the idea
*foo_out = foo;
return 0; /* meaning OK */
/* tear down stuff */
fail_alloc:
baz_destroy(baz);
fail_baz:
bar_destroy(bar);
fail_bar:
return res; /* propagate error code */
}
I can bet I'm going to get some comments saying "this is bad because you use goto". But this is a disciplined and structured use of goto that makes code clearer, simpler, and easier to maintain if applied consistently. You can't achieve a simple, documented tear-down path through the code without it.
If you want to see this in real in-use commercial code, take a look at, say, arena.c from the MPS (which is coincidentally a memory management system).
It's a kind of poor-man's try...finish handler, and gives you something a bit like destructors.
I'm going to sound like a greybeard now, but in my many years of working on other people's C code, lack of clear error paths is often a very serious problem, especially in network code and other unreliable situations. Introducing them has occasionally made me quite a bit of consultancy income.
There are plenty of other things to say about your question -- I'm just going to leave it with this pattern in case that's useful.
Very simply, why not have a reference counted implementation, so when you create an object and pass it around you increment and decrement the reference counted number (remember to be atomic if you have more than one thread).
That way, when an object is no longer used (zero references) you can safely delete it, or automatically delete it in the reference count decrement call.
This sounds like a task for a Boehm garbage collector.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
Depends on the system of course whether you can or should afford to use it.

Resources