Should custom deallocation functions consider compatibility with generic containers? - c

In C if you want to have generic containers, one of the popular approaches is to use void*. If the generic containers hold some custom struct that has its own deallocation function, it's likely going to ask for that function:
struct Foo {...};
Foo *Foo_Allocate(...);
void Foo_Deallocate(const Foo*);
int main(void)
{
/* Let's assume that when you create the list you have to
specify the deallocator of the type you want to hold */
List *list = List_Allocate(Foo_Deallocate);
/* Here we allocate a new Foo and push it into the list.
The list now has possession of the pointer. */
List_PushBack(list, Foo_Allocate());
/* When we deallocate the list, it will also deallocate all the
items we inserted, using the deallocator specified at the beginning */
List_Deallocate(list);
}
But most likely the type of the deallocator function will be something that takes a void*
typedef void (*List_FnItemDeallocator)(const void*);
The problem is that Foo_Deallocate takes a const Foo*, not a const void*. Is it still safe to pass the function, even though their signatures do not match? Probably not, since pointer types are not necessarily the same size in C.
If that's not possible, would it be a good idea to have all deallocator functions take a const void* instead of a pointer to the type they are related to, so that they would be compatible with generic containers?

As you said, assigning pointer to a different function type is not valid.
You should take a void* as a parameter, and perform some check inside each function to see if the given pointer matches the expected type (like checking for a magic number at the beginning of the struct).

As mentioned, you can use a magic number or 'header' to specify the destructor function. You can go quite far with this header, and even select a 'well known, registered' deallocator (in which case you don't actually need to store a function pointer, just possibly an integer index into an array), or have a 'flags' section within your header which specifies that this contains an 'extended' deallocator. The possibilities are quite far and quite fun.
So your list 'headers' would look something like this
#define LIST_HEAD struct list *next; struct list *prev; short flags;
struct list { LIST_HEAD };
struct list_with_custom_deallocator { LIST_HEAD void (*dealloc)(void*); };
Now, actually answering your question.. why not just define a common header type, and have your deallocators take a pointer to that type (in my example, a struct list*) and then cast it to whatever specific relevant type -- or even better,maybe heuristically determine the actual structure and deallocator from the flags (as hinted by Binyamin).

Related

C, generalize functions argument's type

I was writing a c library for linked lists and trees and i was looking for a solution to generalize the data type handled by these libraries without making a list/tree library for each type i need to handle.
For example, my list library has these functions:
/* list.h */
typedef int list_element; // <-- need to generalize that
struct list_node {
list_element value;
struct list_node* next;
};
typedef struct list_node list_node;
typedef struct list_node* list;
extern list list_cons(list_element d, list l)
And then my list.c:
/* list.c */
#include <list.h>
list list_cons(list_element d, list l){
list m = malloc(sizeof(list_node));
m->value = element_copy(d);
m->next = l;
return m;
}
Now suppose that in my main program i've to use a list of int and a list of double, i should create 2 couple of file, something like list_float.c/.h and list_int.c/.h
Also some list_element can be struct and need functions like copy/isLess/isEqual to compare themselves
I want to write something like this in my code:
/* main.c */
list_cons(void *data, list l);
Where data is a pointer to any type i want and inside list_cons, element_copy work for any type of data i pass (obviously i need to copy the data not the pointer to data, void* is the only idea i had to generalize the argument's type of a function)
As a general suggestion, you should not assume that list_cons would be the only way to construct a linked list. Sometimes malloc is just not available or user wants to preallocate everything in a static array or wants to use custom allocator or...
As a concrete sample, you may look at https://github.com/torvalds/linux/blob/master/include/linux/list.h.
If you want other license for your code, search for data structure implementations in xBSD Unix sources.
The general idea is that you only require a linked list structure to contain next/prev and similar fields, not limiting user to your type names. All iterations and basic operations are defined as preprocessor macros, on top of which you implement complex algorithms.
Notice that in C11 (read its standard n1570 and some C reference site) different types can be handled differently (see its §6.5.2).
In particular, the implementation can handle int, double and pointer values differently (their size and alignment is often different, see sizeof & alignof), and some ABI conventions decide that they are passed in different registers (e.g. in function calls).
So you cannot write something which handles (portably) all of int, double etc... the same way (unless you have variadic functions; with <stdarg.h>)
You might decide to implement some "generic" list whose content is some arbitrary pointer. But then you need some conventions about them (who is allocating that pointer, who is freeing it, perhaps what operations are allowed, etc...). Look into Glib doubly-linked lists for inspiration.
You could also use preprocessor techniques to generate an abstract data type (of your list) and functions implementing it, given some type name for the content. Look into SGLIB for inspiration.
You could also use some metaprogramming techniques: you'll describe somehow the type of the element, and you feed that description into your metaprogram which is some C code generator. Look into SWIG for inspiration. The generated C code would implement your list's abstract data type.
Don't forget memory management issues and describe and document clearly your conventions around them. Read about RAII.
Think also of complex cases like some list of list of strings (perhaps dynamically allocated à la strdup or obtained using asprintf). You'll discover that things are not simple, and you'll need to explicit conventions (e.g. could some string be shared between two sublists? When would that string be free-d, ...).
This might be a good place to use a union.
You can define your base datatype as a union of the most common types you want to support. Then you would define a enum of the types in question as use that value to flag what the union contains.
enum element_type {
TYPE_INT,
TYPE_DOUBLE
};
typedef union {
int e_int;
double e_double;
} list_element;
struct list_node {
enum element_type type;
list_element value;
struct list_node* next;
};
Then you add to the list like this:
list list_cons(list_element d, enum element_type type, list l){
list m = malloc(sizeof(list_node));
m->type = type;
m->value = element_copy(d);
m->next = l;
return m;
}

Typecasting for functions with different arguments in c

So I have two functions that does the same thing but on different type variables.
The first function fills up an array of integers when given an int arr[] argument.
The second function fills up a linked list with integers also when given a struct as an argument.
The struct for the linked list argument looks something like this
typedef struct {node_t *head;int size;}
list_t;
Now I have implemented a table of function pointers for the two functions as such:
typedef struct{
char *name; //name of the function
void (*fill)(int arr[]); //fill up the array
} alg_t;
alg_t algs[] = {
{"func1",fill_up_arr},
{"func2",fill_up_linkedList}
};
Notice that inside the struct that holds my pointers, the fill function pointer
takes int arr[] as an argument.
I only want one pointer of a function in that struct, is there a way I can use
typecasting so that other functions such as fill_up_linkedList require argument to be of type list_t instead of int arr[]?
//This is what I want my main to look like.
//I want func.fill to be called only once thus
//dynamically perform the operations for all functions inside the table of
//functions array
int arr = malloc(sizeof(int)algs.size);
for(int i = 0; i<algs.size;i++){
alg_t func = algs[i];
func.fill(arr);
}
It seems that the problem with this code would be when the loop would try and perform the fill_up_LinkedList function as it needs a different argument.
How can I use a typecast in this situation?
Thanks
The good news
C11 §6.3.2.3 Pointers ¶8 (under the general topic §6.3 Conversions) says:
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the referenced type,
the behavior is undefined.
That means you can store any and all function pointers in a common function pointer type, for example typedef void (*GenericFunctionPointer)(void). What is crucial, though, is that you convert from that generic function pointer type back to the correct function pointer type before you invoke the function via the pointer (and that you provide the correct argument list, and that you handle the return type appropriately, though ignoring the return value, if any, is always an option).
The bad news
For two different function pointer types, each with one instance of the the function, the infrastructure needed to support this is probably more elaborate than the savings, if any. On the other hand, if you have two or more different function pointer types, and most if not all of the types have many representative functions ('many' meaning 'more than one', as in the computer engineer's counting: "zero, one, many"), then the infrastructure can pay off. One of the issues is marshalling the function arguments — how are the arguments made accessible so that the function can be called via the pointer with the correct arguments.
So, doing things this way is complex and verbose.
The stated requirement
In a comment, the OP Moi says:
I only want to put one function in the struct. My goal is to find a way to allow fillArray to allow the passing of different args.
I have major reservations about the use of an uncounted array as the argument list as shown in the question — void (*fill)(int arr[]) is shown. In my view, it should be void (*fill)(size_t n, int arr[n]), using the variable length array notation. You can omit the n in the subscript if you wish — void (*fill)(size_t n, int arr[]) — or even use void (*fill)(int arr[], size_t n), which is the more classic order for the arguments.
Putting this concern aside, if you want a single function to accept different arguments, one way to achieve that is with void * as the type, but you have to be aware of the problems — one of which is type safety. You'll also need to borrow ideas from the standard C functions qsort() and bsearch(). The argument list will include the pointer to the start of the array as a void *, the size of each element of the array, and the number of elements in the array. You may also need analogues to the comparator functions.
Internal to the single called function, though, you will probably end up with two code paths, so although you call a single function via the pointer, you end up doing the equivalent of implementing two functions. You could use an interface similar to qsort()'s so that the two functions have the same interface and different bodies, and you use two pointers in the alg_t array.
Summary
You probably can't achieve the stated requirement cleanly.
You will probably need two logically separate functions to handle the two separate interfaces, even if you smush all the code inside a single function.
Use a union:
typedef struct {
char *name; //name of the function
union {
void(*fillArray)(int arr[]); //fill up the array
void(*fillList)(YourListType list); //fill up the list
};
} alg_t;
alg_t a;
a.fillArray(...);
a.fillList(...);
Or:
typedef struct {
char *name; //name of the function
union {
void(*fillArray)(int arr[]); //fill up the array
void(*fillList)(YourListType list); //fill up the list
} functions;
} alg_t;
alg_t a;
a.functions.fillArray(...);
a.functions.fillList(...);
Since you only have 2 types, you can just use an enum and a macro
enum e_type {e_foo,e_bar};
#define do_foobar(e,...) (e)?bar(__VA_ARGS__):foo(__VA_ARGS__)
replace foo and bar with your function names
If you only have the problem with the function parameters and not with the return value, there's a trick related to incomplete type definitions. The trick consists in leaving empty the list of parameters in the pointer type declaration, as in:
typedef void (*callback_ptr)();
which is different from:
typedef void (*callback_ptr)(void);
in the first case, the compiler will not check the parameters passed to the function, as the pointer is a pointer to an incompletely defined function type, while the second explicitly says the function doesn't require parameters and will give you an error if you try to call it with them.

Aliasing structures (or pasting definition of one into another)

I want to create an API for setting and getting fields of a structure in an opaque way (clients should only deal with pointers to them and pass them to the methods declared in the header files). Standard stuff, you define your structures inside the library's source files and do
typedef struct __internal_struct shiny_new_opaque_type;
Problem is that at the moment, the class is simply a wrapper around an already existing API (that will change soon). So the structures I need to use are defined in other header files (full structure declaration is there, the one I want to hide from my clients, so any attempt to dereference a pointer and access a structure member will result in a compiler error). Hence, I don't want to include those headers in my header (only in the .c files). I see three possible ways of dealing with it.
Instead of
typedef struct __internal_struct shiny_new_opaque_type;
do
typedef void shiny_new_opaque_type;
and have my methods do pointer casting. This is dangerous since the compiler can't do type checks.
Copy paste the structure definitions I'm currently using under a new struct __internal_struct (eventually I'll have to define my own struct anyway). Maybe this is the best option?
Define my __internal_struct for now to include a single member that is the corresponding structure from the other API I'm using and use that. Kind of ugly...
Basically is there a way to typedef one structure to another or use an already defined structure as an anonymous member inside another, so that at the end of the day both structures are equivalent? Neither of the following works:
typedef struct transparent_struct struct __internal_struct;
struct __internal_struct
{
struct transparent; // anonymous, direct access to its members
}
EDIT:
From the comments, seems to me that 3, or a variation thereof, would be the way to go. There is also the possibility of never defining my struct, as #Akira pointed out. So
In header: typedef struct my_type; // never defined
And in my source always use it with a cast (struct transparent*)my_type_ptr
In header: typedef struct _internal_struct my_type;
And in source files:
struct _internal_struct {
struct transparent t;
}
Then I can either one of those:
my_type_ptr->t.member
((struct transparent*)my_type_ptr)->member
If you are planning to use opaque pointers, you should provide only an incomplete struct type to users and let them do the operations through the provided functions where one the parameters is a pointer to your incomplete struct type.
For example, let's consider that we have an API which provides a struct foo type and a void print(struct foo*) function to print the content of struct foo instances. A wrapper can be implemented as follows:
wrapper.h
#ifndef WRAPPER_H
#define WRAPPER_H
struct my_obj; /* incomplete type, but you can use pointers to it */
struct my_obj* create(void); /* creates new struct my_obj instance */
void destroy(struct my_obj*); /* deletes the pointed struct my_obj instance */
void set_name_and_id(struct my_obj*, const char*, unsigned);
void show(struct my_obj*);
#endif /* WRAPPER_H */
wrapper.c
#include "wrapper.h"
#include "api.h" /* API only included here */
#include <stdlib.h>
#include <string.h>
#define TO_FOO(my_ptr) ((struct foo*)my_ptr)
struct my_obj* create(void) {
return calloc(1, sizeof(struct foo)); /* allocates memory for 'struct foo' */
}
void destroy(struct my_obj* obj) {
free(obj);
}
void set_name_and_id(struct my_obj* obj, const char* name, unsigned id) {
strcpy(TO_FOO(obj)->bar, name);
TO_FOO(obj)->baz = id;
}
void show(struct my_obj* obj) {
print(TO_FOO(obj)); /* accepts only 'struct foo' pointers */
}
Live Demo
When users include the wrapper.h from the example above, they won't see the api.h and won't be able to dereference the pointer to struct my_obj because it's an incomplete type.
To respond to your comment:
In this case both the internal_struct and the api one are of the same size, aligned and I can either access the api's members using internal_struct->api.member or ((struct API*)internal_struct)->member. What's your view on those two options?
According to N1570 draft (c11):
6.7.2.1 Structure and union specifiers
15 (...) A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
So, both of your approaches are good and safe, it's up to you which one you like. Using internal_struct->api.member is clear, I would use this version.
Converting comments into an answer.
Avoid option 1 — the API shouldn't use void pointers because of the lack of type safety. C is bad enough as it is; don't go out of your way to drive holes through what type safety is available. If the interface type is struct SomeThing *, you can pass a void * to the function without wittering from the C compiler, but you can't pass a struct SomeThingElse * to the function (without a cast, but needing to add a cast should raise warning flags in your mind). If the API uses void *, you can pass any pointer type to the function without any casts or warnings; that's highly undesirable.
Option 2 is a maintenance liability, if not nightmare. Don't go there.
Therefore, option 3 is the way to go. You have two sub-options.
3A — your structure simply contains a single member that is a pointer to the API's structure type (struct internal_struct { struct API *api; }), and
3B — your structure simply contains a single member that is the API's structure type (struct internal_struct { struct API api; }) — the difference is the presence or absence of the *.
Both 3A and 3B work; which works better for you depends on the organization of the API you're working with — how near to opaque it treats its structure type. The more nearly opaque the structure type, the more appropriate 3A is. On the other hand, it incurs some overhead in accessing the data.
Indeed I ended up going with option 3B. In this case both the internal_struct and the api one are of the same size, aligned and I can either access the api's members using internal_struct->api.member or ((struct API*)interna_struct)->member. What's your view on those two options?
While the version with the cast works, it sucks as notation. Avoid casts whenever you can — they're a bludgeon that tells the compiler "I know better than you do what I'm doing". I avoid casts as much as possible. Yes, I sometimes use casts; that's almost unavoidable. But I avoid them when possible, and this is a case where it's eminently possible.
Using option 3B, the chances are the compiler generates the same code for both internal_struct->api.member and ((struct API *)internal_struct)->member. So, use the cleaner notation — which is also more succinct. There's a mild nuisance from repeating the api.; there's a bigger nuisance from adding parentheses and repeating struct API *.
If you did the cast once:
struct API *api_ptr = (struct API *)internal_struct;
and then used api_ptr->member etc throughout, that might be sensible, but the castless version would still be better.
struct API *api_ptr = &internal_struct->api;

Declare pointer to a function that accepts a struct... from within the struct definition?

Thanks for your time! I've been looking around a lot and found answers which almost address my question, but not quite.
typedef struct
{
int menuparams; //lots of these here
void (*menufunction)(MENUSTRUCT); //points to "void functionname(MENUSTRUCT *menu)"
}MENUSTRUCT;
I have a situation where I want to use a struct to store a function pointer. My code reads an array of these structs (checking a variable in each) and upon determining the appropriate one to use, follows its function pointer to the necessary function.
However, the function(s) in question accept a pointer to that exact same struct type, because they may need to perform some work on another struct of the same type. The pointer for this other struct is passed when the function pointer is followed.
My current implementation seems to work, but I want to know if I'm going to cause myself trouble in the future. It compiles and runs just fine, but I get the following warning on the line declaring the function pointer:
warning: parameter names (without types) in function declaration
This makes sense, because my struct doesn't exist before I've finished declaring it. But I can't figure out how best to rewrite this such that it's happy having the struct contain a reference to its own type. I've looked into forward-declaring the struct but the compiler gets grumpier the further away I move from the above implementation.
If I tried:
typedef struct menu
{
int menuparamss; //lots of these here
void (*menufunction)(menu);
}MENUSTRUCT;
It does at least compile and work, but with the same warning.
Since it seems to work it's technically not the end of the world, but if at some point in the future the MCU might spontaneously burst into flames because of my stupidity, it would be best I know now!
It is easy, just use the complete type name, or forward declare the typedef:
Option 1:
typedef struct menu {
int menuparamss;
//Must use the whole type name, including the "struct" keyword
void (*menufunction)(struct menu *); //SHOULD BE A POINTER to the structure
}MENUSTRUCT;
Option 2:
typedef struct menu MENUSTRUCT;
//Without typedef
struct menu {
int menuparamss;
void (*menufunction)(MENUSTRUCT*); //SHOULD BE A POINTER to the structure
};
Define type MENUSTRUCT first, and then the struct MENUSTRUCT.
typedef struct MENUSTRUCT MENUSTRUCT;
struct MENUSTRUCT
{
int menuparams; //lots of these here
void (*menufunction)(MENUSTRUCT*);
};

Difference between using a structure member and cast a structure pointer when "emulate" polymorphism in C

I'm not sure if my wording is technically correct, so please correct me in both title and the main body of this question.
So basically my question is regarding emulating polymorphism in C. For example, suppose I have a tree, and there is a struct tree_node type. And I have some functions to help me insert nodes, delete nodes etc like this as an example:
void tree_insert(tree_node **root, tree_node *new_node);
Then I start to build other stuff for my app, and and need to use this tree to maintain, say, family members. But for human, I have another struct, let's call it "struct human_node" which is defined like this, for example:
typedef struct human_node_ {
tree_node t_node;
char *name;
} human_node;
Now apparently I want to use those tree utility functions I build for the generic tree. But they take tree_node pointers. Now time for the polymorphism emulation. So here are the two options I have, one is to cast my human_node, one is to use the t_node member in the human_node:
human_node *myfamily_tree_root, *new_family_guy;
//some initialization code and other code later...
tree_insert((tree_node **)&myfamily_tree_root, &(new_family_guy->t_node));
For concise I put both ways in one function call above.
And this is exactly where I have my confusion. So which one should I use and more importantly, why?
Both are standard, but in general if you can avoid type casts then you should pick the solution that avoids the type casts.
A common thing for such inline data structure implementations is to not even require that the tree node (or equivalent) is the first element in the struct since you might want to enter your nodes into multiple trees. Then you definitely want to use the second approach. To convert between the tree_node element and the struct containing you'll have to have some black magic macros, but it's worth it. For example in an implementation of avl trees I have these macros:
#ifndef offsetof
#define offsetof(s, e) ((size_t)&((s *)0)->e)
#endif
/* the bit at the end is to prevent mistakes where n is not an avl_node */
#define avl_data(n, type, field) ((type *)(void*)((char *)n - offsetof(type, field) - (n - (struct avl_node *)n)))
So I can have something like:
struct foo {
int data;
struct avl_node tree_node_1;
struct avl_node tree_node_2;
};
int
tree_node_1_to_data(struct avl_node *x)
{
return avl_data(x, struct foo, tree_node_1)->data;
}
If you choose to make your code this generic you definitely want to take references to your tree_node members and not typecast the pointer to the struct.
This question is probably too broad for any specific answers, but you could for instance see how CPython does it.
Basically, all Python structs have the same header, and to define your own types you must be sure to start your struct with the PyObject_HEAD macro (or PyObject_VAR_HEAD for variably sized objects like strings). This adds stuff like a type tag, reference counts, and so on.
After instantiating objects, you pass them around as PyObject*s, and the functions will infer what type the object actually is (e.g. a string, a list, etc.) and be able to dispatch based on that. Yes, you have to type cast at some point to get to your actual object contents.
For instance, this is how Python's character strings are defined:
typedef struct {
PyObject_VAR_HEAD
Py_hash_t ob_shash;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
*/
} PyBytesObject;
You can read more about the CPython's type object inheritance model. Extract:
Objects are always accessed through pointers of the type 'PyObject *'.
The type 'PyObject' is a structure that only contains the reference
count and the type pointer. The actual memory allocated for an object
contains other data that can only be accessed after casting the
pointer to a pointer to a longer structure type.
Note that this angle of attack may be more suited for interpreted code. There are probably other open source projects you may look at that are more directly relevant to your needs.

Resources