The role of "free functions" in C generic data structure implementations - c

I'm writing generic data structure implementations on C for learning purposes (using void pointers), however I'm a bit confused about the role of the "free function" that virtually all generic implementations allow the user to pass to the initializer function.
Let's say I have a stack. Should I call the user provided free function when the client calls the "pop" operation, for instance? From one side, if we free the element after accessing it, and before returning it to the caller, by the time the caller receives the value accessing that memory will be undefined behaviour, however if we don't delete it, the user will be responsible for doing so, defeating the purpose of passing a custom free function in the first place.
What is the best practice here?

Let's say I have a stack. Should I call the user provided free
function when the client calls the "pop" operation, for instance?
No, you should not free the element when it is pop'ed from the stack. The user expects to get the object back that he earlier push'ed on the stack. And then it must be valid, of course.
During a pop operation you only should free your own data that you needed to store the user's object.
I'm a bit confused about the role of the "free function" that
virtually all generic implementations allow the user to pass to the
initializer function.
The custom free functions I am aware of, are only used when the whole management data structure is destroyed. Then in addition to your management data like list nodes etc., the user data also has to be freed.
As you do not know if there are dynamically allocated objects enclosed in that user data which require some additional handling, you can only leave this at user's hand.
For this purpose the user has to pass a free function.

you always need to know how to destroy externally created objects to avoid memory leaks.
For your stack example, what happens if a user pushes an object and then destroys the stack? The stack destructor should call the destructors of all the elements it still contains.

To be safe, you should always have a function dedicated for freeing everything you malloc-ed during your program. You can do it at the end, or just when you're sure you won't ever need it after that.

Related

Should I release data after the use of get_user_pages_fast?

I'm using the get_user_pages_fast, which I allocate a memory buffer in the user and create a pages in the kernel space.
Should I free the struct pages** after the use this memory? or call to specific release function?
Thanks!
From documentation on get_user_pages() (which has similar functionality, but with more parameters, and needs a semaphore held):
Each page returned must be released with a put_page call when it is finished with. vmas will only remain valid while mmap_sem is held.
As a side node, if you were to be freeing something, it'd be a struct pages * being passed (freeing a struct pages), not struct pages **, since a pointer to struct pages * is passed to be used as a return value.
However, you typically shouldn't be assuming that you should free arbitrary things in the kernel without knowing that you're supposed to. In general, the kernel provides functions to create and destroy whatever objects you're working with.
Often, when you're given a pointer, there's a lot more going on behind the scenes. There could be semaphores, counts of references, etc. That may also be a pointer to a "real" used object in the kernel, not just some structure made for you, so freeing it could rip the rug out form under other pieces of code.

What is the lifetime of the string returned by Tk_PathName()?

I am updating a piece of Tk-based third-party software for use with Tcl/Tk 8.6, and I have run across a statement of this form:
interp->result = Tk_PathName(tkwin);
, where interp is a Tcl_Interp * and tkwin is a Tk_Window. It no longer being allowed (by default) in Tcl 8.6 to access the members of a Tcl_Interp directly, I want to convert that to a call to Tcl_SetResult(), along these lines:
Tcl_SetResult(interp, Tk_PathName(tkwin), /* what goes here? */);
But I'm having trouble finding any documentation of the lifetime of the string returned by Tk_PathName(), and I need to know that to specify the correct free function.
I suspect that the right thing to do would be to specify TCL_VOLATILE as the free function, so that Tcl makes and subsequently manages its own copy of the string, but that will produce a memory leak if the caller is responsible for freeing the string returned by Tk_PathName().
If the caller has responsibility to free, then I suppose that TCL_DYNAMIC should be specified, though that does assume (I think reasonably) that Tk would have allocated the path name via Tcl_Alloc.
So which is it? Or do I need something else, instead?
These days, Tcl always immediately copies the string you pass into Tcl_SetResult(), regardless of the third argument, so that it can adopt the string into the managed Tcl_Obj infrastructure it uses internally. TCL_VOLATILE would be right… except that it simply doesn't matter any more!
There was a point where Tcl internally tried to support both ways of doing things; the code to support it was horrible and weird and it was genuinely hard to be sure by inspection that it was bug free so we switched to early copy, which was absolutely definitely the right semantics and obviously so. It probably also only meant that the copy was brought forward a tiny bit anyway, since nothing in the rest of Tcl could make use of non-Tcl_Obj results at that point.
That said, the lifetime of the string result from Tk_PathName is the lifetime of the widget and always has been for as far back as I've used Tk; the string is allocated when the widget is created, stored inside the widget's internal data structure, and deleted when the widget is destroyed. As such, TCL_STATIC is another candidate if you're not about to destroy the widget.

How do I know it is OK to free the pointer after passed it to a function?

For example, https://developer.gnome.org/gdk3/stable/gdk3-Windows.html#gdk-window-begin-draw-frame, takes a pointer to region as parameter. So how do I know it does not store it somewhere to use later? Or may I free the pointer right after calling the function? Is there any general routine in GTK?
Thanks.
Generally, this problem can be solved with the concept of "ownership" of the pointer resp. the associated memory area.
Essentially, the function has to define if it takes ownership of the area (in this case, you pass it to that function and don't have to care about it, but OTOH you necessarily have to allocate it in a way that the function is fine with), or if it just "borrows" the pointer. In this case, it remains yours, and the function just temporarily uses it.
A third alternative is a mixed case: the function borrows ownership, but requires you to keep the memory remains allocated (i. e. usable) until a certain action occurs (e. g. freeing a given resource). In this case, it is your choice where to pick the memory from (heap, stack, static memory etc.), but it is your responsibility to keep it usable long enough.
What the function does should be documented somewhere.

Freeing other variable types in C

C does not have garbage collection, hence whenever we allocate memory using malloc/calloc/realloc, we need to manually free it after its use is over. How are variables of other data types like int, char etc. handled by C? How is the memory allocated to those variables released?
That depends. If you allocate any of those data types with malloc/calloc/realloc you will still need to free them.
On the other side, if a variable is declared inside a function, they are called automatic variables and whenever that function ends they'll be automatically collected.
The point here is not the data type per se, is the storage location. malloc/calloc/realloc allocate memory in the heap whereas automatic variables (variables declared inside functions) are allocated in the stack.
The heap is completely managed by the programmer, while the stack works in a way that when a function ends, the stack frame is shrink and every variable occupying that frame will be automatically overwritten when another function is called.
To grasp a better feeling of these, take a look at the memory layout of a C program. Other useful references might be free(3) man page and Wikipedia page for Automatic variables.
Hope this helps!
Resources (such as memory) have nothing to do with variables. You never have to think about variables. You only have to think about the resource itself, and you need to manage the lifetime of the resource. There are function calls that acquire a resource (such as malloc) and give you a handle for the resource (such as a void pointer), and you have to call another function (such as free) later on with that handle to release the resource.
Memory is only one example, C standard I/O-files work the same way, as do mutexes, sockets, window handles, etc. (In C++, add "dynamically allocated object" to the list.) But the central concept is that of the resource, the thing that needs acquiring and releasing. Variables have nothing to do with it except for the trivial fact that you can use variables to store the resource handles.

Verifying that memory has been initialized in C

I've written an API that requires a context to be initialized and thereafter passed into every API call. The caller allocates the memory for the context, and then passes it to the init function with other parameters that describe how they want later API calls to behave. The context is opaque, so the client can't really muck around in there; it's only intended for the internal use of the API functions.
The problem I'm running into is that callers are allocating the context, but not initializing it. As a result, subsequent API functions are referring to meaningless garbage as if it was a real context.
I'm looking for a way to verify that the context passed into an API function has actually been initialized. I'm not sure if this is possible. Two ideas I've thought of are:
Use a pre-defined constant and store it in a "magic" field of the context to be verified at API invocation time.
Use a checksum of the contents of the context, storing this in the "magic" field and verifying it at invocation time.
Unfortunately I know that either one of these options could result in a false positive verification, either because random crap in memory matches the "magic" number, or because the context happens to occupy the same space as a previously initialized context. I think the latter scenario is more likely.
Does this simply boil down to a question of probability? That I can avoid false positives in most cases, but not all? Is it worth using a system that merely gives me a reasonable probability of accuracy, or would this just make debugging other problems more difficult?
Best solution, I think, is add create()/delete() functions to your API and use create to allocate and initialize the structure. You can put a signature at the start of the structure to verify that the pointer you are passed points to memory allocated with create() and use delete() to overwrite the signature (or entire buffer) before freeing the memory.
You can't actually avoid false positives in C because the caller malloc'd memory that "happened" to start with your signature; but make you signature reasonably long (say 8 bytes) and the odds are low. Taking allocation out of the hands of the caller by providing a create() function will go a long way, though.
And, yeah, your biggest risk is that an initialized buffer is free'd without using delete(), and a subsequent malloc happens to reuse that memory block.
Your context variable is probably at the moment some kind of pointer to allocated memory. Instead of this, make it a token or handle that can be explicitly verified. Every time a context is initialised, you return a new token (not the actual context object) and store that token in an internal list. Then, when a client gives you a context later on, you check it is valid by looking in the list. If it is, the token can then be converted to the actual context and used, otherwise an error is returned.
typedef Context long;
typedef std::map<Context, InternalContext> Contexts;
Contexts _contexts;
Context nextContext()
{
static Context next=0;
return next++;
}
Context initialise()
{
Context c=nextContext();
_contexts.insert(make_pair(c, new InternalContext));
return c;
}
void doSomethingWithContext(Context c)
{
Contexts::iterator it=_ _contexts.find(c);
if (it==_contexts.end())
throw "invalid context";
// otherwise do stuff with the valid context variable
InternalContext *internalContext=*it.second;
}
With this method, there is no risk of an invalid memory access as you will only correctly use valid context references.
Look at the paper by Matt Bishop on Robust Programming. The use of tickets or tokens (similar to file handles in some respects, but also including a nonce - number used once) allows your library code ensure that the token it is using is valid. In fact, you allocate the data structure on behalf of the user, and pass back to the user a ticket which must be provided for each call to the API you define.
I have some code based closely on that system. The header includes the comments:
/*
** Based on the tickets in qlib.c by Matt Bishop (bishop#ucdavis.edu) in
** Robust Programming. Google terms: Bishop Robust Nonce.
** http://nob.cs.ucdavis.edu/~bishop/secprog/robust.pdf
** http://nob.cs.ucdavis.edu/classes/ecs153-1998-04/robust.html
*/
I also built an arena-based memory allocation system using tickets to identify different arenas.
You could define a new API call that takes uninitialised memory and initialises it in whatever way you need. Then, part of the client API is that the client must call the context initialisation function, otherwise undefined behaviour will result.
To sidestep the issue of a memory location of a previous context being reused, you could, in addition to freeing the context, reset it and remove the "magic" number, assuming of course that the user frees the context using your API. That way when the system returns that same block of memory for the next context request, the magic number check will fail.
see what your system does with uninitialzed menmory. m$ does: Uninitialized memory blocks in VC++

Resources