Design Pattern to free an object in C - c

When programming in C, we usually create data structures that we initialize, then free when it is no longer needed. For instance, if we want to create a dynamic array of double, it is common to declare
struct vector {
double *data;
int size;
int capacity;
}
typedef struct vector vector;
vector *v_new(int n) {
vector *v = malloc(sizeof(vector));
v->data = malloc(n * sizeof(double));
v->size = n;
v->capacity = n;
return v;
}
The question is about the common patterns for a free function. In C, the function free accepts the NULL pointer and does nothing. Is it a common pattern to design v_free functions in such a way, or are they usually expecting a non-NULL pointer? To make it clear, would you expect this implementation
void v_free(vector *v) {
if (v != NULL) {
free(v->data);
}
free(v);
}
or this one ?
void v_free(vector *v) {
free(v->data);
free(v);
}
This question is asked because we began to teach C to undergraduate students in prep school in France, and we don't have that much experience in "C Design Patterns".
Thanks for your advice.

You can't access v->data if v is NULL. So if there is a chance of that, you must do the version which checks for that, which is better written as
void v_free(vector *v) {
if (v != NULL) {
free(v->data);
free(v);
}
}
If v should never be NULL here, it's perhaps better to add an assert to make the assumption explicit:
void v_free(vector *v) {
assert(v != NULL);
free(v->data);
free(v);
}
That way the programmer will notice they are doing something wrong.
Note that neither version detects a dangling pointer, ie. pointer which points to already destroyed object. This includes pointing to memory already freed (ie. you'd have double free here) or by pointer having pointed to local variable which is not in scope any more.

The question is about the common patterns for a free function. In C, the function free accepts the NULL pointer and does nothing. Is it a common pattern to design v_free functions in such a way, or are they usually expecting a non-NULL pointer?
This is going to be a matter of opinion.
My opinion is, unless you have a good reason otherwise, program defensively. Do the thing that will make debugging a mistake easier. Make v_free error on a null pointer. Something as simple as an assert.
void v_free(vector *v) {
assert(v != NULL);
free(v->data);
free(v);
}
Consider if we quietly ignore the null case. Did the caller intend to pass a null pointer, or was it a mistake? We don't know. If it was a mistake, the program continues merrily along and probably mysteriously crashes elsewhere. This makes debugging more difficult.
Consider if we assume v_free will always receive a non-null pointer. If it does free(v->data) is undefined behavior. At best a messy error, at worst the program continues merrily along and probably mysteriously crashes elsewhere. This makes debugging more difficult.
But if we provide an error, the mistake is stopped and revealed. Do the same thing for all your vector functions.
"But what if I want to pass a null pointer?" That should be infrequent, don't optimize for it. Make the caller do the check. If they really need to do it frequently, they can write a little wrapper function.
void v_free_null(vector *v) {
if( v == NULL ) {
return;
}
v_free(v);
}

Related

BOOLEAN allocate_items(struct item * items, size_t howmany) function for allocate an array of struct item

Recently, I'm learning C. I found a question on the internet. The question is:
What is the problem with this function in terms of memory allocation?
What is a good solution? You may assume that a struct item type has
been declared. The purpose of this function is to allocate an array of
struct item, which you may assume has been declared prior to this
function.
BOOLEAN allocate_items(struct item * items, size_t howmany)
{
size_t count;
items = malloc(sizeof(struct item) * howmany);
if(!items) {
perror("failed to allocate memory");
return FALSE;
}
return TRUE;
}
So, I think that the 4th line is wrong. It should be like:
items = malloc(sizeof(struct item));
And also the 6th line is wrong. It should be like:
if(items == NULL){
Is it correct?
First of all, both line 4 and 6, as you mentioned seems to be OK.
That said, the basic problem with this function is, you're allocating memory to a local scope of variable. This way
as you don't return the pointer to allocated memory, after the function returns, there would be no way to access the allocated memory.
by not freeing up the allocated memory, you will face memory leak.
If you have to allocate memory to a pointer, you need to pass the address of that pointer to the function and allocate memory. You can also return the pointer but then you need to change the function signature.
Finally, arrays are not pointers and vice-versa. They may appear or beahave similar sometimes, but they are not the same.
The 4th line is not wrong as they are trying to declare an array of the structs.
You should add a line inside the function that declares a new pointer, temp, to hold the current value of items, then after allocating the memory,
the 6th line should be
if(items == temp)
to check whether the value has changed(because that is the closest we can get to checking whether malloc worked)
this is because the ! operator is used to check if a condition is true or not(at least at a basic level in most languages) and as a pointer isn't a condition or an int that can be used as true or false, the operator won't work.
Here a fixed version, as it would probably be written in the "industry".
bool allocate_items(struct item ** pitems, size_t howmany)
{
// argument validation
assert(NULL != pitems); // some also add release version checks...
if(NULL == pitems ) return false;
// We can also spot memory leak sources here.
// If *pItems != NULL - does that mean we have to free first to prevent
// a leak? What if it is just some random value and not something we can
// free? So contract usually is: *pitems has to be NULL...
assert(NULL == *pitems);
if(NULL != *pitems) return false;
// implementation
*pitems = malloc(sizeof(struct item) * howmany);
if(NULL == *pitems) {
perror("failed to allocate memory");
}
return NULL != *pitems;
}
While the bool defined in stdbool.h sometimes causes trouble with C++ interop (same symbols on both sides, but sometimes sizeof(bool) differs), it is still the better option compared to inventing yet another bool type.
The pitems is a pointer to the location where the pointer to the new chunk of memory shall be written to. A caller of this function might have written:
int main(int argc, const char*[] argv) {
struct item *myBunchOfStuff = NULL;
if(false != allocate_items( &myBunchOfStuff, 20) ) {
// ...
free(myBunchOfStuff);
myBunchOfStuff = NULL;
}
return 0;
}
Defensive programming states: Your function cannot claim "Heh - my function only crashed because I was given a bad value!". Instead, it has to verify itself. It is responsible not to crash. The pointer could still be != NULL but otherwise bad. That is impossible for the function to catch, usually.
In C, everyone is proud of not requiring the cast of malloc()'s return value. You can be proud of that until you compile your code with a C++ compiler. Then you have to change your code and fix that. Well, I guess it is a matter of preference...
While parameter checking is often seen as a separate part of the functions implementation, after that, you should try to stick to "single point of exit". Main reason for that is maintainability. With multiple exit points, if the function gets bigger later on, it gets harder to spot if some early exit forgets to free some memory or cleanup other forms of state.

What's the best c implementation of the C++ vector?

I've been looking into using C over C++ as I find it cleaner and the main thing I find it to lack is a vector like array.
What is the best implementation of this?
I want to just be able to call something like vector_create, vector_at, vector_add, etc.
EDIT
This answer is from a million years ago, but at some point, I actually implemented a macro-based, efficient, type-safe vector work-alike in C that covers all the typical features and needs. You can find it here:
https://github.com/eteran/c-vector
Original answer below.
What about a vector are you looking to replicate? I mean in the end, it all boils down to something like this:
int *create_vector(size_t n) {
return malloc(n * sizeof(int));
}
void delete_vector(int *v) {
free(v);
}
int *resize_vector(int *v, size_t n) {
return realloc(v, n * sizeof(int));
/* returns NULL on failure here */
}
You could wrap this all up in a struct, so it "knows its size" too, but you'd have to do it for every type (macros here?), but that seems a little uneccessary... Perhaps something like this:
typedef struct {
size_t size;
int *data;
} int_vector;
int_vector *create_vector(size_t n) {
int_vector *p = malloc(sizeof(int_vector));
if(p) {
p->data = malloc(n * sizeof(int));
p->size = n;
}
return p;
}
void delete_vector(int_vector *v) {
if(v) {
free(v->data);
free(v);
}
}
size_t resize_vector(int_vector *v, size_t n) {
if(v) {
int *p = realloc(v->data, n * sizeof(int));
if(p) {
v->data = p;
v->size = n;
}
return v->size;
}
return 0;
}
int get_vector(int_vector *v, size_t n) {
if(v && n < v->size) {
return v->data[n];
}
/* return some error value, i'm doing -1 here,
* std::vector would throw an exception if using at()
* or have UB if using [] */
return -1;
}
void set_vector(int_vector *v, size_t n, int x) {
if(v) {
if(n >= v->size) {
resize_vector(v, n);
}
v->data[n] = x;
}
}
After which, you could do:
int_vector *v = create_vector(10);
set_vector(v, 0, 123);
I dunno, it just doesn't seem worth the effort.
The most complete effort I know of to create a comprehensive set of utility types in C is GLib. For your specific needs it provides g_array_new, g_array_append_val and so on. See GLib Array Documentation.
Rather than going off on a tangent in the comments to #EvanTeran's answer I figured I'd submit a longer reply here.
As various comments allude to there's really not much point in trying to replicate the exact behavior of std::vector since C lacks templates and RAII.
What can however be useful is a dynamic array implementation that just works with bytes. This can obviously be used directly for char* strings, but can also easily be adapted for usage with any other types as long as you're careful to multiply the size parameter by sizeof(the_type).
Apache Portable Runtime has a decent set of array functions and is all C.
See the tutorial for a quick intro.
If you can multiply, there's really no need for a vector_create() function when you have malloc() or even calloc(). You just have to keep track of two values, the pointer and the allocated size, and send two values instead of one to whatever function you pass the "vector" to (if the function actually needs both the pointer and the size, that is). malloc() guarantees that the memory chunk is addressable as any type, so assign it's void * return value to e.g. a struct car * and index it with []. Most processors access array[index] almost as fast as variable, while a vector_at() function can be many times slower. If you store the pointer and size together in a struct, only do it in non time-critical code, or you'll have to index with vector.ptr[index]. Delete the space with free().
Focus on writing a good wrapper around realloc() instead, that only reallocates on every power of e.g. 2 or 1.5. See user786653's Wikipedia link.
Of course, calloc(), malloc() and realloc() can fail if you run out memory, and that's another possible reason for wanting a vector type. C++ has exceptions that automatically terminate the program if you don't catch it, C doesn't. But that's another discussion.
Lack of template functionality in C makes it impossible to support a vector like structure. The best you can do is to define a 'generic' structure with some help of the preprocessor, and then 'instantiate' for each of the types you want to support.

How to realloc an array inside a function with no lost data? (in C )

I have a dynamic array of structures, so I thought I could store the information about the array in the first structure.
So one attribute will represent the amount of memory allocated for the array and another one representing number of the structures actually stored in the array.
The trouble is, that when I put it inside a function that fills it with these structures and tries to allocate more memory if needed, the original array gets somehow distorted.
Can someone explain why is this and how to get past it?
Here is my code
#define INIT 3
typedef struct point{
int x;
int y;
int c;
int d;
}Point;
Point empty(){
Point p;
p.x=1;
p.y=10;
p.c=100;
p.d=1000; //if you put different values it will act differently - weird
return p;
}
void printArray(Point * r){
int i;
int total = r[0].y+1;
for(i=0;i<total;i++){
printf("%2d | P [%2d,%2d][%4d,%4d]\n",i,r[i].x,r[i].y,r[i].c,r[i].d);
}
}
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
}
void enter(Point* r,int c){
int i;
for(i=1;i<c;i++){
r[r[0].y+1]=empty();
r[0].y++;
if( (r[0].y+2) >= r[0].x ){ /*when the amount of Points is near
*the end of allocated memory.
reallocate the array*/
reallocFunction(r);
}
}
}
int main(int argc, char** argv) {
Point * r=(Point *) malloc ( sizeof ( Point ) * INIT );
r[0]=empty();
r[0].x=INIT; /*so here I store for how many "Points" is there memory
//in r[0].y theres how many Points there are.*/
enter(r,5);
printArray(r);
return (0);
}
Your code does not look clean to me for other reasons, but...
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
}
The problem here is that r in this function is the parameter, hence any modifications to it are lost when the function returns. You need some way to change the caller's version of r. I suggest:
Point * // Note new return type...
reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
return r; // Note: now we return r back to the caller..
}
Then later:
r = reallocFunction(r);
Now... Another thing to consider is that realloc can fail. A common pattern for realloc that accounts for this is:
Point *reallocFunction(Point * r){
void *new_buffer = realloc(r, r[0].x*2*sizeof(Point));
if (!new_buffer)
{
// realloc failed, pass the error up to the caller..
return NULL;
}
r = new_buffer;
r[0].x*=2;
r[0].y++;
return r;
}
This ensures that you don't leak r when the memory allocation fails, and the caller then has to decide what happens when your function returns NULL...
But, some other things I'd point out about this code (I don't mean to sound like I'm nitpicking about things and trying to tear them apart; this is meant as constructive design feedback):
The names of variables and members don't make it very clear what you're doing.
You've got a lot of magic constants. There's no explanation for what they mean or why they exist.
reallocFunction doesn't seem to really make sense. Perhaps the name and interface can be clearer. When do you need to realloc? Why do you double the X member? Why do you increment Y? Can the caller make these decisions instead? I would make that clearer.
Similarly it's not clear what enter() is supposed to be doing. Maybe the names could be clearer.
It's a good thing to do your allocations and manipulation of member variables in a consistent place, so it's easy to spot (and later, potentially change) how you're supposed to create, destroy and manipulate one of these objects. Here it seems in particular like main() has a lot of knowledge of your structure's internals. That seems bad.
Use of the multiplication operator in parameters to realloc in the way that you do is sometimes a red flag... It's a corner case, but the multiplication can overflow and you can end up shrinking the buffer instead of growing it. This would make you crash and in writing production code it would be important to avoid this for security reasons.
You also do not seem to initialize r[0].y. As far as I understood, you should have a r[0].y=0 somewhere.
Anyway, you using the first element of the array to do something different is definitely a bad idea. It makes your code horribly complex to understand. Just create a new structure, holding the array size, the capacity, and the pointer.

How to return an integer from a function

Which is considered better style?
int set_int (int *source) {
*source = 5;
return 0;
}
int main(){
int x;
set_int (&x);
}
OR
int *set_int (void) {
int *temp = NULL;
temp = malloc(sizeof (int));
*temp = 5;
return temp;
}
int main (void) {
int *x = set_int ();
}
Coming for a higher level programming background I gotta say I like the second version more. Any, tips would be very helpful. Still learning C.
Neither.
// "best" style for a function which sets an integer taken by pointer
void set_int(int *p) { *p = 5; }
int i;
set_int(&i);
Or:
// then again, minimise indirection
int an_interesting_int() { return 5; /* well, in real life more work */ }
int i = an_interesting_int();
Just because higher-level programming languages do a lot of allocation under the covers, does not mean that your C code will become easier to write/read/debug if you keep adding more unnecessary allocation :-)
If you do actually need an int allocated with malloc, and to use a pointer to that int, then I'd go with the first one (but bugfixed):
void set_int(int *p) { *p = 5; }
int *x = malloc(sizeof(*x));
if (x == 0) { do something about the error }
set_int(x);
Note that the function set_int is the same either way. It doesn't care where the integer it's setting came from, whether it's on the stack or the heap, who owns it, whether it has existed for a long time or whether it's brand new. So it's flexible. If you then want to also write a function which does two things (allocates something and sets the value) then of course you can, using set_int as a building block, perhaps like this:
int *allocate_and_set_int() {
int *x = malloc(sizeof(*x));
if (x != 0) set_int(x);
return x;
}
In the context of a real app, you can probably think of a better name than allocate_and_set_int...
Some errors:
int main(){
int x*; //should be int* x; or int *x;
set_int(x);
}
Also, you are not allocating any memory in the first code example.
int *x = malloc(sizeof(int));
About the style:
I prefer the first one, because you have less chances of not freeing the memory held by the pointer.
The first one is incorrect (apart from the syntax error) - you're passing an uninitialised pointer to set_int(). The correct call would be:
int main()
{
int x;
set_int(&x);
}
If they're just ints, and it can't fail, then the usual answer would be "neither" - you would usually write that like:
int get_int(void)
{
return 5;
}
int main()
{
int x;
x = get_int();
}
If, however, it's a more complicated aggregate type, then the second version is quite common:
struct somestruct *new_somestruct(int p1, const char *p2)
{
struct somestruct *s = malloc(sizeof *s);
if (s)
{
s->x = 0;
s->j = p1;
s->abc = p2;
}
return s;
}
int main()
{
struct somestruct *foo = new_somestruct(10, "Phil Collins");
free(foo);
return 0;
}
This allows struct somestruct * to be an "opaque pointer", where the complete definition of type struct somestruct isn't known to the calling code. The standard library uses this convention - for example, FILE *.
Definitely go with the first version. Notice that this allowed you to omit a dynamic memory allocation, which is SLOW, and may be a source of bugs, if you forget to later free that memory.
Also, if you decide for some reason to use the second style, notice that you don't need to initialize the pointer to NULL. This value will either way be overwritten by whatever malloc() returns. And if you're out of memory, malloc() will return NULL by itself, without your help :-).
So int *temp = malloc(sizeof(int)); is sufficient.
Memory managing rules usually state that the allocator of a memory block should also deallocate it. This is impossible when you return allocated memory. Therefore, the second should be better.
For a more complex type like a struct, you'll usually end up with a function to initialize it and maybe a function to dispose of it. Allocation and deallocate should be done separately, by you.
C gives you the freedom to allocate memory dynamically or statically, and having a function work only with one of the two modes (which would be the case if you had a function that returned dynamically allocated memory) limits you.
typedef struct
{
int x;
float y;
} foo;
void foo_init(foo* object, int x, float y)
{
object->x = x;
object->y = y;
}
int main()
{
foo myFoo;
foo_init(&foo, 1, 3.1416);
}
In the second one you would need a pointer to a pointer for it to work, and in the first you are not using the return value, though you should.
I tend to prefer the first one, in C, but that depends on what you are actually doing, as I doubt you are doing something this simple.
Keep your code as simple as you need to get it done, the KISS principle is still valid.
It is best not to return a piece of allocated memory from a function if somebody does not know how it works they might not deallocate the memory.
The memory deallocation should be the responsibility of the code allocating the memory.
The first is preferred (assuming the simple syntax bugs are fixed) because it is how you simulate an Out Parameter. However, it's only usable where the caller can arrange for all the space to be allocated to write the value into before the call; when the caller lacks that information, you've got to return a pointer to memory (maybe malloced, maybe from a pool, etc.)
What you are asking more generally is how to return values from a function. It's a great question because it's so hard to get right. What you can learn are some rules of thumb that will stop you making horrid code. Then, read good code until you internalize the different patterns.
Here is my advice:
In general any function that returns a new value should do so via its return statement. This applies for structures, obviously, but also arrays, strings, and integers. Since integers are simple types (they fit into one machine word) you can pass them around directly, not with pointers.
Never pass pointers to integers, it's an anti-pattern. Always pass integers by value.
Learn to group functions by type so that you don't have to learn (or explain) every case separately. A good model is a simple OO one: a _new function that creates an opaque struct and returns a pointer to it; a set of functions that take the pointer to that struct and do stuff with it (set properties, do work); a set of functions that return properties of that struct; a destructor that takes a pointer to the struct and frees it. Hey presto, C becomes much nicer like this.
When you do modify arguments (only structs or arrays), stick to conventions, e.g. stdc libraries always copy from right to left; the OO model I explained would always put the structure pointer first.
Avoid modifying more than one argument in one function. Otherwise you get complex interfaces you can't remember and you eventually get wrong.
Return 0 for success, -1 for errors, when the function does something which might go wrong. In some cases you may have to return -1 for errors, 0 or greater for success.
The standard POSIX APIs are a good template but don't use any kind of class pattern.

Dangling pointers and double free

After some painful experiences, I understand the problem of dangling pointers and double free. I am seeking proper solutions.
aStruct has a number of fields including other arrays.
aStruct *A = NULL, *B = NULL;
A = (aStruct*) calloc(1, sizeof(sStruct));
B = A;
free_aStruct(A);
...
// Bunch of other code in various places.
...
free_aStruct(B);
Is there any way to write free_aStruct(X) so that free_aStruct(B) exits gracefully?
void free_aStruct(aStruct *X) {
if (X ! = NULL) {
if (X->a != NULL) { free(X->a); x->a = NULL; }
free(X); X = NULL;
}
}
Doing the above only sets A = NULL when free_aStruct(A); is called. B is now dangling.
How can this situation be avoided / remedied? Is reference counting the only viable solution? Or, are there other "defensive" approaches to freeing memory, to prevent free_aStruct(B); from exploding?
In plain C, the most important solution to this problem is discipline, because the root of the problem is here:
B = A;
Making a copy of the pointer without changing anything within your struct, circumventing whatever you use without any warning from the compiler. You have to use something like this:
B = getref_aStruct(A);
The next important thing is to keep track of the allocations. Some things that help are clean modularization, information hiding and DRY -- Don't Repeat Yourself. You directly call calloc() to allocate the memory while you use a free_aStruct() function to free it. Better use a create_aStruct() to allocate it. This keeps things centralized and in one place only, instead of throwing memory allocations all over your codebase.
This is a much better base for whatever memory tracking system you build on top of this.
I do not think you can do this automatically as C places the onus and burden of you to manage the memory and therefore your responsibility to ensure that references and of course dangling pointers are looked after!
void free_aStruct(aStruct *X){
if (X ! = NULL){
if (X->a != NULL){free(X->a); x->a = NULL;}
free(X); X = NULL;
}
}
By the way, there's a typo blip in the if check above ... use of lower case 'x' instead of 'X'...
My thinking when I was looking at the above code is that you are doing a free on a copy of a pointer variable of type aStruct *. I would modify it to be a call-by-reference instead...
void free_aStruct(aStruct **X){
if (*X ! = NULL){
if (*X->a != NULL){
free(*X->a);
*X->a = NULL;
}
free(*X);
*X = NULL;
}
}
And call it like this:
free_aStruct(&A);
Other than that, you are ultimately responsible for the 'dangling pointers' yourself whether its an unintentional coding or a design fault...
Even if you could prevent the free_aStruct(B) from blowing up, if there's any reference to B in the code behind your comment, that's going to be using memory that's been freed, and so might be overwritten with new data at any point. Just "fixing" the free call will only mask the underlying error.
There are techniques you can use but the bottom line is that nothing you do can be strictly enforcable in C. Instead, i recommend incorporating valgrind (or purify) in your development process. Also, some static code analyzers may be able to detect some of these problems.
Reference counting's really not that hard:
aStruct *astruct_getref(aStruct *m)
{
m->refs++;
return m;
}
aStruct *astruct_new(void)
{
sStruct *new = calloc(1, sizeof *new);
return astruct_getref(new);
}
void astruct_free(aStruct *m)
{
if (--m->refs == 0)
free(m);
}
(In a multithreaded environment you will also potentially need to add locking).
Then your code would be:
aStruct *A = NULL, *B = NULL;
A = astruct_new();
B = astruct_getref(A);
astruct_free(A);
...
//bunch of other code in various places.
...
astruct_free(B);
You've asked about locking. Unfortunately there's no one-size-fits-all answer when it comes to locking - it all depends on what access patterns you have in your application. There's no substitute for careful design and deep thoughts. (For example, if you can guarantee that no thread will be calling astruct_getref() or astruct_free() on another thread's aStruct, then the reference count doesn't need to be protected at all - the simple implementation above will suffice).
That said, the above primitives can easily be extended to support concurrent access to the astruct_getref() and astruct_free() functions:
aStruct *astruct_getref(aStruct *m)
{
mutex_lock(m->reflock);
m->refs++;
mutex_unlock(m->reflock);
return m;
}
aStruct *astruct_new(void)
{
sStruct *new = calloc(1, sizeof *new);
mutex_init(new->reflock);
return astruct_getref(new);
}
void astruct_free(aStruct *m)
{
int refs;
mutex_lock(m->reflock);
refs = --m->refs;
mutex_unlock(m->reflock);
if (refs == 0)
free(m);
}
...but note that any variables containing the pointers to the structs that are subject to concurrent access will need their own locking too (for example, if you have a global aStruct *foo that is concurrently accessed, it will need an accompanying foo_lock).

Resources