Simulate a Java generic interface and abstract data type in C - c

I am trying to port a library written in Java into C programming language. For Java interface, I intend to use a struct of function-pointers to replace, for instance:
// Java code
public interface ActionsFunction {
Set<Action> actions(Object s);
}
/* C code */
typedef struct ActionsFunction {
List* (*actions)(void* s);
void (*clear_actions)(struct List **list); /* Since C doesn't have garbage collector */
} ActionsFunction;
My question is: whether it is a suitable solution or not, and how can I simulate a generic interface such as:
public interface List <E> {
void add(E x);
Iterator<E> iterator();
}
UPDATE:
I also have to face with another problem: implementing generic abstract data structure like List, Queue, Stack, etc since the C standard library lacks of those implementation. My approach is client code should pass the pointer of its data accompanying with its size, thus allowing library to hold that one without specifying its type. One more time, it just my idea. I need your advices for the design as well as implementing technique.
My initial porting code can be found at:
https://github.com/PhamPhiLong/AIMA
generic abstract data structure can be found in utility sub folder.

Here's a very brief example using macros to accomplish something like this. This can get hairy pretty quick, but if done correctly, you can maintain complete static type safety.
#include <stdlib.h>
#include <stdio.h>
#define list_type(type) struct __list_##type
/* A generic list node that keeps 'type' by value. */
#define define_list_val(type) \
list_type(type) { \
list_type(type) *next; \
type value; \
}
#define list_add(plist, node) \
do \
{ \
typeof(plist) p; \
for (p = plist; *p != NULL; p = &(*p)->next) ; \
*p = node; \
node->next = NULL; \
} while(0)
#define list_foreach(plist, p) \
for (p = *plist; p != NULL; p = p->next)
define_list_val(int) *g_list_ints;
define_list_val(float) *g_list_floats;
int main(void)
{
list_type(int) *node;
node = malloc(sizeof(*node));
node->value = 42;
list_add(&g_list_ints, node);
node = malloc(sizeof(*node));
node->value = 66;
list_add(&g_list_ints, node);
list_foreach(&g_list_ints, node) {
printf("Node: %d\n", node->value);
}
return 0;
}

There are a few common ways to do generic-ish programming in C. I would expect to use one or more of the following methods in trying to accomplish the task you've described.
MACROS: One is to use macros. In this example, MAX looks like a function, but operate on anything that can be compared with the ">" operator:
#define MAX(a,b) ((a) > (b) ? (a) : (b))
int i;
float f;
unsigned char b;
f = MAX(7.4, 2.5)
i = MAX(3, 4)
b = MAX(10, 20)
VOID *: Another method is to use void * pointers for representing generic data, and then pass function pointers into your algorithms to operate on the data. Look up the <stdlib.h> function qsort for a classic example of this technique.
UNIONS: Yet another, though probably seen less often, technique is to use unions to hold data of multiple different types. This makes your algorithms that operate on the data kinda ugly though and might not save much coding:
enum { VAR_DOUBLE, VAR_INT, VAR_STRING }
/* Declare a generic container struct for any type of data you want to operate on */
struct VarType
{
int type;
union data
{
double d;
int i;
char * sptr;
};
}
int main(){
VarType x;
x.data.d = 1.75;
x.type = VAR_DOUBLE;
/* call some function that sorts out what to do based on value of x.type */
my_function( x );
}
CLEVER CASTING & POINTER MATH It's a pretty common idiom to see data structures with functions that operate on a specific kind of struct and then require that the struct by included in your struct to do anything useful.
The easy way to do this, is the force the struct that allows insertion into the data structure to be the first member of your derived type. Then you can seamless cast back & forth between the two. The more versatile way is to use 'offsetof'. Here's a simple example.
For example:
/* Simple types */
struct listNode { struct listNode * next; struct listNode * prev };
struct list { struct listNode dummy; }
/* Functions that operate on those types */
int append( struct list * theList, struct listNode * theNode );
listNode * first( struct list *theList );
/* To use, you must do something like this: */
/* Define your own type that includes a list node */
typedef struct {
int x;
double y;
char name[16];
struct listNode node;
} MyCoolType;
int main() {
struct list myList;
MyCoolType coolObject;
MyCoolType * ptr;
/* Add the 'coolObject's 'listNode' member to the list */
appendList( &myList, &coolObject.node );
/* Use ugly casting & pointer math to get back you your original type
You may want to google 'offsetof' here. */
ptr = (MyCoolType *) ( (char*) first( &myList )
- offsetof(MyCoolType,node);
}
The libev documentation has some more good examples of this last technique:
http://search.cpan.org/dist/EV/libev/ev.pod#COMMON_OR_USEFUL_IDIOMS_(OR_BOTH)

Related

Storing and using type information in C

I'm coming from Java and I'm trying to implement a doubly linked list in C as an exercise. I wanted to do something like the Java generics where I would pass a pointer type to the list initialization and this pointer type would be use to cast the list void pointer but I'm not sure if this is possible?
What I'm looking for is something that can be stored in a list struct and used to cast *data to the correct type from a node. I was thinking of using a double pointer but then I'd need to declare that as a void pointer and I'd have the same problem.
typedef struct node {
void *data;
struct node *next;
struct node *previous;
} node;
typedef struct list {
node *head;
node *tail;
//??? is there any way to store the data type of *data?
} list;
Typically, the use of specific functions like the following are used.
void List_Put_int(list *L, int *i);
void List_Put_double(list *L, double *d);
int * List_Get_int(list *L);
double *List_Get_double(list *L);
A not so easy for learner approach uses _Generic. C11 offers _Generic which allows for code, at compile time, to be steered as desired based on type.
The below offers basic code to save/fetch to 3 types of pointers. The macros would need expansion for each new types. _Generic does not allow 2 types listed that may be the same like unsigned * and size_t *. So there are are limitations.
The type_id(X) macros creates an enumeration for the 3 types which may be use to check for run-time problems as with LIST_POP(L, &d); below.
typedef struct node {
void *data;
int type;
} node;
typedef struct list {
node *head;
node *tail;
} list;
node node_var;
void List_Push(list *l, void *p, int type) {
// tbd code - simplistic use of global for illustration only
node_var.data = p;
node_var.type = type;
}
void *List_Pop(list *l, int type) {
// tbd code
assert(node_var.type == type);
return node_var.data;
}
#define cast(X,ptr) _Generic((X), \
double *: (double *) (ptr), \
unsigned *: (unsigned *) (ptr), \
int *: (int *) (ptr) \
)
#define type_id(X) _Generic((X), \
double *: 1, \
unsigned *: 2, \
int *: 3 \
)
#define LIST_PUSH(L, data) { List_Push((L),(data), type_id(data)); }
#define LIST_POP(L, dataptr) (*(dataptr)=cast(*dataptr, List_Pop((L), type_id(*dataptr))) )
Usage example and output
int main() {
list *L = 0; // tbd initialization
int i = 42;
printf("%p %d\n", (void*) &i, i);
LIST_PUSH(L, &i);
int *j;
LIST_POP(L, &j);
printf("%p %d\n", (void*) j, *j);
double *d;
LIST_POP(L, &d);
}
42
42
assertion error
There is no way to do what you want in C. There is no way to store a type in a variable and C doesn't have a template system like C++ that would allow you to fake it in the preprocessor.
You could define your own template-like macros that could quickly define your node and list structs for whatever type you need, but I think that sort of hackery is generally frowned upon unless you really need a whole bunch of linked lists that only differ in the type they store.
C doesn't have any runtime type information and doesn't have a type "Type". Types are meaningless once the code was compiled. So, there's no solution to what you ask provided by the language.
One common reason you would want to have a type available at runtime is that you have some code that might see different instances of your container and must do different things for different types stored in the container. You can easily solve such a situation using an enum, e.g.
enum ElementType
{
ET_INT; // int
ET_DOUBLE; // double
ET_CAR; // struct Car
// ...
};
and enumerate any type here that should ever go into your container. Another reason is if your container should take ownership of the objects stored in it and therefore must know how to destroy them (and sometimes how to clone them). For such cases, I recommend the use of function pointers:
typedef void (*ElementDeleter)(void *element);
typedef void *(*ElementCloner)(const void *element);
Then extend your struct to contain these:
typedef struct list {
node *head;
node *tail;
ElementDeleter deleter;
ElementCloner cloner;
} list;
Make sure they are set to a function that actually deletes resp. clones an element of the type to be stored in your container and then use them where needed, e.g. in a remove function, you could do something like
myList->deleter(myNode->data);
// delete the contained element without knowing its type
create enum type, that will store data type and alloc memory according to this enum. This could be done in switch/case construction.
Unlike Java or C++, C does not provide any type safety. To answer your question succinctly, by rearranging your node type this way:
struct node {
node* prev; /* put these at front */
node* next;
/* no data here */
};
You could then separately declare nodes carrying any data
struct data_node {.
data_node *prev; // keep these two data members at the front
data_node *next; // and in the same order as in struct list.
// you can add more data members here.
};
/* OR... */
enter code here
struct data_node2 {
node node_data; /* WANING: this may look a bit safer, but is _only_ if placed at the front.
/* more data ... */
};
You can then create a library that operates on data-less lists of nodes.
void list_add(list* l, node* n);
void list_remove(list* l, node* n);
/* etc... */
And by casting, use this 'generic lists' api to do operation on your list
You can have some sort of type information in your list declaration, for what it's worth, since C does not provide meaningful type protection.
struct data_list
{
data_node* head; /* this makes intent clear. */
data_node* tail;
};
struct data2_list
{
data_node2* head;
data_node2* tail;
};
/* ... */
data_node* my_data_node = malloc(sizeof(data_node));
data_node2* my_data_node2 = malloc(sizeof(data_node2));
/* ... */
list_add((list*)&my_list, (node*)my_data_node);
list_add((list*)&my_list2, &(my_data_node2->node_data));
/* warning above is because one could write this */
list_add((list*)&my_list2, (node*)my_data_node2);
/* etc... */
These two techniques generate the same object code, so which one you choose is up to you, really.
As an aside, avoid the typedef struct notation if your compiler allows, most compilers do, these days. It increases readability in the long run, IMHO. You can be certain some won't and some will agree with me on this subject though.

c programming #define struct { } declaring

I was looking at Glibc codes. Some codes of glibc's queue caught my attention. I couldn't give a meaning to this struct definition. This struct doesn't have a name. Why? How does it work?
#define LIST_ENTRY(type) \
struct { \
struct type *le_next; /* next element */ \
struct type **le_prev; /* address of previous next element */ \
}
Source
That is actually a preprocessor macro, that could be expanded (most probably with trailing name) somewhere else.
In the comments at the start of that header file there is a reference to queue(3) man page that contains more details on that and other macros:
The macro LIST_ENTRY declares a structure that connects the elements
in the list.
And an example of use:
LIST_HEAD(listhead, entry) head = LIST_HEAD_INITIALIZER(head);
struct listhead *headp; /* List head. */
struct entry {
...
LIST_ENTRY(entry) entries; /* List. */
...
}
*n1, *n2, *n3, *np, *np_temp;
LIST_INIT(&head); /* Initialize the list. */
n1 = malloc(sizeof(struct entry)); /* Insert at the head. */
LIST_INSERT_HEAD(&head, n1, entries);
Being this C code (not C++), and C lacks templates, this preprocessor macro can be used to "simulate" templates (note the type parameter).
It's a macro that is used to declare a struct type, with next and prev pointers to instances of a second struct type. That second type can be a parent type, so you can make a "linkable struct" like this:
struct foo {
LIST_ENTRY(foo) list;
int value;
};
This creates a struct foo containing a member called list which in turn is the structure in the question, with the pointers pointing at struct foo.
We can now create a little linked list of struct foos like so:
struct foo fa, fb;
fa.value = 47;
fa.list.le_next = &fb;
fa.list.le_prev = NULL;
fb.value = 11;
fb.list.le_next = NULL;
fb.list.le_prev = &fa.list.le_next;
I'm not 100% sure about the last line, but I think it kind of makes sense.

Using macros in C to define data structures

I am trying to wrap my head around the concept of using macros to define data structure operations. The following code is a simple example to use the built in list library in FreeBSD. In the library all operations are defined as macros. I have seen this approach in couple of other libraries also.
I can see that this has some advantages eg. being ability to use any data structure as an element in the list. But I do not quite understand how this works. For example:
What is stailhead? This seems to be "just" defined.
How to pass head and entries to a function?
What type is head, how can I declare a pointer to it?
Is there a standard name for this technique which I can use to search google, or any book which explains this concept? Any links or good explanation as to how this technique works will be much appreciated.
Thanks to Niklas B. I ran gcc -E and got this definition for head
struct stailhead {
struct stailq_entry *stqh_first;
struct stailq_entry **stqh_last;
} head = { ((void *)0), &(head).stqh_first };
and this for stailq_entry
struct stailq_entry {
int value;
struct { struct stailq_entry *stqe_next; } entries;
};
So I guess head is of type struct stailhead.
#include <stdio.h>
#include <stdlib.h>
#include <sys/queue.h>
struct stailq_entry {
int value;
STAILQ_ENTRY(stailq_entry) entries;
};
int main(void)
{
STAILQ_HEAD(stailhead, stailq_entry) head = STAILQ_HEAD_INITIALIZER(head);
struct stailq_entry *n1;
unsigned i;
STAILQ_INIT(&head); /* Initialize the queue. */
for (i=0;i<10;i++){
n1 = malloc(sizeof(struct stailq_entry)); /* Insert at the head. */
n1->value = i;
STAILQ_INSERT_HEAD(&head, n1, entries);
}
n1 = NULL;
while (!STAILQ_EMPTY(&head)) {
n1 = STAILQ_LAST(&head, stailq_entry, entries);
STAILQ_REMOVE(&head, n1, stailq_entry, entries);
printf ("n2: %d\n", n1->value);
free(n1);
}
return (0);
}
First read this to get a hold what these macros do. And then go to queue.h. You'll get your treasure trove there!
I found a few gold coins for you-
#define STAILQ_HEAD(name, type) \
struct name { \
struct type *stqh_first;/* first element */ \
struct type **stqh_last;/* addr of last next element */ \
}
Lets dig in a bit deep and answer your questions
What is stailhead? This seems to be "just" defined.
#define STAILQ_HEAD(name, type) \
struct name { \
struct type *stqh_first;/* first element */ \
struct type **stqh_last;/* addr of last next element */ \
}
STAILQ_HEAD(stailhead, entry) head =
STAILQ_HEAD_INITIALIZER(head);
struct stailhead *headp; /* Singly-linked tail queue head. */
So stailhead is a structure
How to pass head and entries to a function?
#define STAILQ_ENTRY(type) \
struct { \
struct type *stqe_next; /* next element */ \
}
So entries and head ( as explained before ) are just structures and you can pass them just as you pass other structures. &structure_variable
What type is head, how can I declare a pointer to it?
Already explained!
Read this man page for nice pretty examples.

Is there something in C like C++ templates? If not, how to re-use structures and functions for different data types?

I want to write a linked list that can have the data field store any build-in or user-define types. In C++ I would just use a template, but how do I accomplish this in C?
Do I have to re-write the linked list struct and a bunch of operations of it for each data type I want it to store? Unions wouldn't work because what type can it store is predefined.
There's a reason people use languages other than C.... :-)
In C, you'd have your data structure operate with void* members, and you'd cast wherever you used them to the correct types. Macros can help with some of that noise.
There are different approaches to this problem:
using datatype void*: these means, you have pointers to memory locations whose type is not further specified. If you retrieve such a pointer, you can explicitly state what is inside it: *(int*)(mystruct->voidptr) tells the compiler: look at the memory location mystruct->voidptr and interpret the contents as int.
another thing can be tricky preprocessor directives. However, this is usually a very non-trivial issue:
I also found http://sglib.sourceforge.net/
Edit: For the preprocessor trick:
#include <stdio.h>
#define mytype(t) struct { t val; }
int main(int argc, char *argv[]) {
mytype(int) myint;
myint.val=6;
printf ("%d\n", myint.val);
return 0;
}
This would be a simple wrapper for types, but I think it can become quite complicated.
It's less comfortable in C (there's a reason C++ is called C incremented), but it can be done with generic pointers (void *) and the applocation handles the type management itself.
A very nice implementation of generic data structures in C can be found in ubiqx modules, the sources are definitely worth reading.
With some care, you can do this using macros that build and manipulate structs. One of the most well-tested examples of this is the BSD "queue" library. It works on every platform I've tried (Unix, Windows, VMS) and consists of a single header file (no C file).
It has the unfortunate downside of being a bit hard to use, but it preserves as much type-safety as it can in C.
The header file is here: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/sys/queue.h?rev=1.34;content-type=text%2Fplain, and the documentation on how to use it is here: http://www.openbsd.org/cgi-bin/man.cgi?query=queue.
Beyond that, no, you're stuck with losing type-safety (using (void *) all over the place) or moving to the STL.
Here's an option that's very flexible but requires a lot of work.
In your list node, store a pointer to the data as a void *:
struct node {
void *data;
struct node *next;
};
Then you'd create a suite of functions for each type that handle tasks like comparison, assignment, duplication, etc.:
// create a new instance of the data item and copy the value
// of the parameter to it.
void *copyInt(void *src)
{
int *p = malloc(sizeof *p);
if (p) *p = *(int *)src;
return p;
}
void assignInt(void *target, void *src)
{
// we create a new instance for the assignment
*(int *)target = copyInt(src);
}
// returns -1 if lhs < rhs, 0 if lhs == rhs, 1 if lhs > rhs
int testInt(void *lhs, void *rhs)
{
if (*(int *)lhs < *(int *)rhs) return -1;
else if (*(int *)lhs == *(int *)rhs) return 0;
else return 1;
}
char *intToString(void *data)
{
size_t digits = however_many_digits_in_an_int();
char *s = malloc(digits + 2); // sign + digits + terminator
sprintf(s, "%d", *(int *)data);
return s;
}
Then you could create a list type that has pointers to these functions, such as
struct list {
struct node *head;
void *(*cpy)(void *); // copy operation
int (*test)(void *, void *); // test operation
void (*asgn)(void *, void *); // assign operation
char *(*toStr)(void *); // get string representation
...
}
struct list myIntList;
struct list myDoubleList;
myIntList.cpy = copyInt;
myIntList.test = testInt;
myIntList.asgn = assignInt;
myIntList.toStr = intToString;
myDoubleList.cpy = copyDouble;
myDoubleList.test = testDouble;
myDoubleList.asgn = assignDouble;
myDoubleList.toStr = doubleToString;
...
Then, when you pass the list to an insert or search operation, you'd call the functions from the list object:
void addToList(struct list *l, void *value)
{
struct node *new, *cur = l->head;
while (cur->next != NULL && l->test(cur->data, value) <= 0)
cur = cur->next;
new = malloc(sizeof *new);
if (!new)
{
// handle error here
}
else
{
new->data = l->cpy(value);
new->next = cur->next;
cur->next = new;
if (logging)
{
char *s = l->toStr(new->data);
fprintf(log, "Added value %s to list\n", s);
free(s);
}
}
}
...
i = 1;
addToList(&myIntList, &i);
f = 3.4;
addToList(&myDoubleList, &f);
By delegating the type-aware operations to separate functions called through function pointers, you now have a list structure that can store values of any type. To add support for new types, you only need to implement new copy, assign, toString, etc., functions for that new type.
There are drawbacks. For one thing, you can't use constants as function parameters (e.g., you can't do something simple like addToList(&myIntList, 1);) -- you have to assign everything to a variable first, and pass the address of the variable (which is why you need to create new instances of the data member when you add it to the list; if you just assigned the address of the variable, every element in the list would wind up pointing to the same object, which may no longer exist depending on the context).
Secondly, you wind up doing a lot of memory management; you don't just create a new instance of the list node, but you also must create a new instance of the data member. You must remember to free the data member before freeing the node. Then you're creating a new string instance every time you want to display the data, and you have to remember to free that string when you're done with it.
Finally, this solution throws type safety right out the window and into oncoming traffic (after lighting it on fire). The delegate functions are counting on you to keep the types straight; there's nothing preventing you from passing the address of a double variable to one of the int handling functions.
Between the memory management and the fact that you must make a function call for just about every operation, performance is going to suffer. This isn't a fast solution.
Of course, this assumes that every element in the list is the same type; if you're wanting to store elements of different types in the same list, then you're going to have to do something different, such as associate the functions with each node, rather than the list overall.
I wrote a generic linked list "template" in C using the preprocessor, but it's pretty horrible to look at, and heavily pre-processed code is not easy to debug.
These days I think you'd be better off using some other code generation tool such as Python / Cog: http://www.python.org/about/success/cog/
I agree with JonathanPatschke's answer that you should look at sys/queue.h, although I've never tried it myself, as it is not on some of the platforms I work with. I also agree with Vicki's answer to use Python.
But I've found that five or six very simple C macros meet most of my garden-variety needs. These macros help clean up ugly, bug-prone code, without littering it with hidden void *'s, which destroy type-safety. Some of these macros are:
#define ADD_LINK_TO_END_OF_LIST(add, head, tail) \
if (!(head)) \
(tail) = (head) = (add); \
else \
(tail) = (tail)->next = (add)
#define ADD_DOUBLE_LINK_TO_END_OF_LIST(add, head, tail) \
if (!(head)) \
(tail) = (head) = (add); \
else \
(tail) = ((add)->prev = (tail), (tail)->next = (add))
#define FREE_LINK_IN_LIST(p, dtor) do { /* singly-linked */ \
void *myLocalTemporaryPtr = (p)->next; \
dtor(p); \
(p) = myLocalTemporaryPtr;} while (0)
#define FREE_LINKED_LIST(p, dtor) do { \
while (p) \
FREE_LINK_IN_LIST(p, dtor);} while (0)
// copy "ctor" (shallow)
#define NEW_COPY(p) memcpy(myMalloc(sizeof *(p)), p, sizeof *(p))
// iterator
#define NEXT_IN_LIST(p, list) ((p) ? (p)->next : (list))
So, for example:
struct MyContact {
char *name;
char *address;
char *telephone;
...
struct MyContact *next;
} *myContactList = 0, *myContactTail; // the tail doesn't need to be init'd
...
struct MyContact newEntry = {};
...
ADD_LINK_TO_END_OF_LIST(NEW_COPY(newEntry), myContactList, myContactTail);
...
struct MyContact *i = 0;
while ((i = NEXT_IN_LIST(i, myContactList))) // iterate through list
// ...
The next and prev members have hard-coded names. They don't need to be void *, which avoids problems with strict anti-aliasing. They do need to be zeroed when the data item is created.
The dtor argument for FREE_LINK_IN_LIST would typically be a function like free, or (void) to do nothing, or another macro such as:
#define MY_CONTACT_ENTRY_DTOR(p) \
do { if (p) { \
free((p)->name); \
free((p)->address); \
free((p)->telephone); \
free(p); \
}} while (0)
So for example, FREE_LINKED_LIST(myContactList, MY_CONTACT_ENTRY_DTOR) would free all the members of the (duck-typed) list headed by myContactList.
There is one void * here, but perhaps it could be removed via gcc's typeof.
If you need a list that can hold elements of different types simultaneously, e.g. an int followed by three char * followed by a struct tm, then using void * for the data is the solution. But if you only need multiple list types with identical methods, the best solution depends on if you want to avoid generating many instances of almost identical machine code, or just avoid typing source code.
A struct declaration doesn't generate any machine code...
struct int_node {
void *next;
int data;
};
struct long_node {
void *next;
long data;
};
...and one single function which uses a void * parameter and/or return value, can handle them all.
struct generic_node {
void *next;
};
void *insert(void *before_this, void *element, size_t element_sizes);

Type-safe generic data structures in plain-old C?

I have done far more C++ programming than "plain old C" programming. One thing I sorely miss when programming in plain C is type-safe generic data structures, which are provided in C++ via templates.
For sake of concreteness, consider a generic singly linked list. In C++, it is a simple matter to define your own template class, and then instantiate it for the types you need.
In C, I can think of a few ways of implementing a generic singly linked list:
Write the linked list type(s) and supporting procedures once, using void pointers to go around the type system.
Write preprocessor macros taking the necessary type names, etc, to generate a type-specific version of the data structure and supporting procedures.
Use a more sophisticated, stand-alone tool to generate the code for the types you need.
I don't like option 1, as it is subverts the type system, and would likely have worse performance than a specialized type-specific implementation. Using a uniform representation of the data structure for all types, and casting to/from void pointers, so far as I can see, necessitates an indirection that would be avoided by an implementation specialized for the element type.
Option 2 doesn't require any extra tools, but it feels somewhat clunky, and could give bad compiler errors when used improperly.
Option 3 could give better compiler error messages than option 2, as the specialized data structure code would reside in expanded form that could be opened in an editor and inspected by the programmer (as opposed to code generated by preprocessor macros). However, this option is the most heavyweight, a sort of "poor-man's templates". I have used this approach before, using a simple sed script to specialize a "templated" version of some C code.
I would like to program my future "low-level" projects in C rather than C++, but have been frightened by the thought of rewriting common data structures for each specific type.
What experience do people have with this issue? Are there good libraries of generic data structures and algorithms in C that do not go with Option 1 (i.e. casting to and from void pointers, which sacrifices type safety and adds a level of indirection)?
Option 1 is the approach taken by most C implementations of generic containers that I see. The Windows driver kit and the Linux kernel use a macro to allow links for the containers to be embedded anywhere in a structure, with the macro used to obtain the structure pointer from a pointer to the link field:
list_entry() macro in Linux
CONTAINING_RECORD() macro in Windows
Option 2 is the tack taken by BSD's tree.h and queue.h container implementation:
http://openbsd.su/src/sys/sys/queue.h
http://openbsd.su/src/sys/sys/tree.h
I don't think I'd consider either of these approaches type safe. Useful, but not type safe.
C has a different kind of beauty to it than C++, and type safety and being able to always see what everything is when tracing through code without involving casts in your debugger is typically not one of them.
C's beauty comes a lot from its lack of type safety, of working around the type system and at the raw level of bits and bytes. Because of that, there's certain things it can do more easily without fighting against the language like, say, variable-length structs, using the stack even for arrays whose sizes are determined at runtime, etc. It also tends to be a lot simpler to preserve ABI when you're working at this lower level.
So there's a different kind of aesthetic involved here as well as different challenges, and I'd recommend a shift in mindset when you work in C. To really appreciate it, I'd suggest doing things many people take for granted these days, like implementing your own memory allocator or device driver. When you're working at such a low level, you can't help but look at everything as memory layouts of bits and bytes as opposed to 'objects' with behaviors attached. Furthermore, there can come a point in such low-level bit/byte manipulation code where C becomes easier to comprehend than C++ code littered with reinterpret_casts, e.g.
As for your linked list example, I would suggest a non-intrusive version of a linked node (one that does not require storing list pointers into the element type, T, itself, allowing the linked list logic and representation to be decoupled from T itself), like so:
struct ListNode
{
struct ListNode* prev;
struct ListNode* next;
MAX_ALIGN char element[1]; // Watch out for alignment here.
// see your compiler's specific info on
// aligning data members.
};
Now we can create a list node like so:
struct ListNode* list_new_node(int element_size)
{
// Watch out for alignment here.
return malloc_max_aligned(sizeof(struct ListNode) + element_size - 1);
}
// create a list node for 'struct Foo'
void foo_init(struct Foo*);
struct ListNode* foo_node = list_new_node(sizeof(struct Foo));
foo_init(foo_node->element);
To retrieve the element from the list as T*:
T* element = list_node->element;
Since it's C, there's no type checking whatsoever when casting pointers in this way, and that will probably also give you an uneasy feeling if you're coming from a C++ background.
The tricky part here is to make sure that this member, element, is properly aligned for whatever type you want to store. When you can solve that problem as portably as you need it to be, you'll have a powerful solution for creating efficient memory layouts and allocators. Often this will have you just using max alignment for everything which might seem wasteful, but typically isn't if you are using appropriate data structures and allocators which aren't paying this overhead for numerous small elements on an individual basis.
Now this solution still involves the type casting. There's little you can do about that short of having a separate version of code of this list node and the corresponding logic to work with it for every type, T, that you want to support (short of dynamic polymorphism). However, it does not involve an additional level of indirection as you might have thought was needed, and still allocates the entire list node and element in a single allocation.
And I would recommend this simple way to achieve genericity in C in many cases. Simply replace T with a buffer that has a length matching sizeof(T) and aligned properly. If you have a reasonably portable and safe way you can generalize to ensure proper alignment, you'll have a very powerful way of working with memory in a way that often improves cache hits, reduces the frequency of heap allocations/deallocations, the amount of indirection required, build times, etc.
If you need more automation like having list_new_node automatically initialize struct Foo, I would recommend creating a general type table struct that you can pass around which contains information like how big T is, a function pointer pointing to a function to create a default instance of T, another to copy T, clone T, destroy T, a comparator, etc. In C++, you can generate this table automatically using templates and built-in language concepts like copy constructors and destructors. C requires a bit more manual effort, but you can still reduce it the boilerplate a bit with macros.
Another trick that can be useful if you go with a more macro-oriented code generation route is to cash in a prefix or suffix-based naming convention of identifiers. For example, CLONE(Type, ptr) could be defined to return Type##Clone(ptr), so CLONE(Foo, foo) could invoke FooClone(foo). This is kind of a cheat to get something akin to function overloading in C, and is useful when generating code in bulk (when CLONE is used to implement another macro) or even a bit of copying and pasting of boilerplate-type code to at least improve the uniformity of the boilerplate.
Option 1, either using void * or some union based variant is what most C programs use, and it may give you BETTER performance than the C++/macro style of having multiple implementations for different types, as it has less code duplication, and thus less icache pressure and fewer icache misses.
GLib is has a bunch of generic data structures in it, http://www.gtk.org/
CCAN has a bunch of useful snippets and such http://ccan.ozlabs.org/
Your option 1 is what most old time c programmers would go for, possibly salted with a little of 2 to cut down on the repetitive typing, and just maybe employing a few function pointers for a flavor of polymorphism.
There's a common variation to option 1 which is more efficient as it uses unions to store the values in the list nodes, ie there's no additional indirection. This has the downside that the list only accepts values of certain types and potentially wastes some memory if the types are of different sizes.
However, it's possible to get rid of the union by using flexible array member instead if you're willing to break strict aliasing. C99 example code:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct ll_node
{
struct ll_node *next;
long long data[]; // use `long long` for alignment
};
extern struct ll_node *ll_unshift(
struct ll_node *head, size_t size, void *value);
extern void *ll_get(struct ll_node *head, size_t index);
#define ll_unshift_value(LIST, TYPE, ...) \
ll_unshift((LIST), sizeof (TYPE), &(TYPE){ __VA_ARGS__ })
#define ll_get_value(LIST, INDEX, TYPE) \
(*(TYPE *)ll_get((LIST), (INDEX)))
struct ll_node *ll_unshift(struct ll_node *head, size_t size, void *value)
{
struct ll_node *node = malloc(sizeof *node + size);
if(!node) assert(!"PANIC");
memcpy(node->data, value, size);
node->next = head;
return node;
}
void *ll_get(struct ll_node *head, size_t index)
{
struct ll_node *current = head;
while(current && index--)
current = current->next;
return current ? current->data : NULL;
}
int main(void)
{
struct ll_node *head = NULL;
head = ll_unshift_value(head, int, 1);
head = ll_unshift_value(head, int, 2);
head = ll_unshift_value(head, int, 3);
printf("%i\n", ll_get_value(head, 0, int));
printf("%i\n", ll_get_value(head, 1, int));
printf("%i\n", ll_get_value(head, 2, int));
return 0;
}
An old question, I know, but in case it is still of interest: I was experimenting with option 2) (pre-processor macros) today, and came up with the example I will paste below. Slightly clunky indeed, but not terrible. The code is not fully type safe, but contains sanity checks to provide a reasonable level of safety. And dealing with the compiler error messages while writing it was mild compared to what I have seen when C++ templates came into play. You are probably best starting reading this at the example use code in the "main" function.
#include <stdio.h>
#define LIST_ELEMENT(type) \
struct \
{ \
void *pvNext; \
type value; \
}
#define ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement) \
do { \
(void)(&(pElement)->value == (type *)&(pElement)->value); \
(void)(sizeof(*(pElement)) == sizeof(LIST_ELEMENT(type))); \
} while(0)
#define SET_POINTER_TO_LIST_ELEMENT(type, pDest, pSource) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
void **pvDest = (void **)&(pDest); \
*pvDest = ((void *)(pSource)); \
} while(0)
#define LINK_LIST_ELEMENT(type, pDest, pSource) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
(pDest)->pvNext = ((void *)(pSource)); \
} while(0)
#define TERMINATE_LIST_AT_ELEMENT(type, pDest) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
(pDest)->pvNext = NULL; \
} while(0)
#define ADVANCE_POINTER_TO_LIST_ELEMENT(type, pElement) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement); \
void **pvElement = (void **)&(pElement); \
*pvElement = (pElement)->pvNext; \
} while(0)
typedef struct { int a; int b; } mytype;
int main(int argc, char **argv)
{
LIST_ELEMENT(mytype) el1;
LIST_ELEMENT(mytype) el2;
LIST_ELEMENT(mytype) *pEl;
el1.value.a = 1;
el1.value.b = 2;
el2.value.a = 3;
el2.value.b = 4;
LINK_LIST_ELEMENT(mytype, &el1, &el2);
TERMINATE_LIST_AT_ELEMENT(mytype, &el2);
printf("Testing.\n");
SET_POINTER_TO_LIST_ELEMENT(mytype, pEl, &el1);
if (pEl->value.a != 1)
printf("pEl->value.a != 1: %d.\n", pEl->value.a);
ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl);
if (pEl->value.a != 3)
printf("pEl->value.a != 3: %d.\n", pEl->value.a);
ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl);
if (pEl != NULL)
printf("pEl != NULL.\n");
printf("Done.\n");
return 0;
}
I use void pointers (void*) to represent generic data structures defined with structs and typedefs. Below I share my implementation of a lib which I'm working on.
With this kind of implementation, you can think of each new type, defined with typedef, like a pseudo-class. Here, this pseudo-class is the set of the source code (some_type_implementation.c) and its header file (some_type_implementation.h).
In the source code, you have to define the struct that will present the new type. Note the struct in the "node.c" source file. There I made a void pointer to the "info" atribute. This pointer may carry any type of pointer (I think), but the price you have to pay is a type identifier inside the struct (int type), and all the switchs to make the propper handle of each type defined. So, in the node.h" header file, I defined the type "Node" (just to avoid have to type struct node every time), and also I had to define the constants "EMPTY_NODE", "COMPLEX_NODE", and "MATRIX_NODE".
You can perform the compilation, by hand, with "gcc *.c -lm".
main.c Source File
#include <stdio.h>
#include <math.h>
#define PI M_PI
#include "complex.h"
#include "matrix.h"
#include "node.h"
int main()
{
//testCpx();
//testMtx();
testNode();
return 0;
}
node.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "node.h"
#include "complex.h"
#include "matrix.h"
#define PI M_PI
struct node
{
int type;
void* info;
};
Node* newNode(int type,void* info)
{
Node* newNode = (Node*) malloc(sizeof(Node));
newNode->type = type;
if(info != NULL)
{
switch(type)
{
case COMPLEX_NODE:
newNode->info = (Complex*) info;
break;
case MATRIX_NODE:
newNode->info = (Matrix*) info;
break;
}
}
else
newNode->info = NULL;
return newNode;
}
int emptyInfoNode(Node* node)
{
return (node->info == NULL);
}
void printNode(Node* node)
{
if(emptyInfoNode(node))
{
printf("Type:%d\n",node->type);
printf("Empty info\n");
}
else
{
switch(node->type)
{
case COMPLEX_NODE:
printCpx(node->info);
break;
case MATRIX_NODE:
printMtx(node->info);
break;
}
}
}
void testNode()
{
Node *node1,*node2, *node3;
Complex *Z;
Matrix *M;
Z = mkCpx(POLAR,5,3*PI/4);
M = newMtx(3,4,PI);
node1 = newNode(COMPLEX_NODE,Z);
node2 = newNode(MATRIX_NODE,M);
node3 = newNode(EMPTY_NODE,NULL);
printNode(node1);
printNode(node2);
printNode(node3);
}
node.h Header File
#define EMPTY_NODE 0
#define COMPLEX_NODE 1
#define MATRIX_NODE 2
typedef struct node Node;
Node* newNode(int type,void* info);
int emptyInfoNode(Node* node);
void printNode(Node* node);
void testNode();
matrix.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "matrix.h"
struct matrix
{
// Meta-information about the matrix
int rows;
int cols;
// The elements of the matrix, in the form of a vector
double** MTX;
};
Matrix* newMtx(int rows,int cols,double value)
{
register int row , col;
Matrix* M = (Matrix*)malloc(sizeof(Matrix));
M->rows = rows;
M->cols = cols;
M->MTX = (double**) malloc(rows*sizeof(double*));
for(row = 0; row < rows ; row++)
{
M->MTX[row] = (double*) malloc(cols*sizeof(double));
for(col = 0; col < cols ; col++)
M->MTX[row][col] = value;
}
return M;
}
Matrix* mkMtx(int rows,int cols,double** MTX)
{
Matrix* M;
if(MTX == NULL)
{
M = newMtx(rows,cols,0);
}
else
{
M = (Matrix*)malloc(sizeof(Matrix));
M->rows = rows;
M->cols = cols;
M->MTX = MTX;
}
return M;
}
double getElemMtx(Matrix* M , int row , int col)
{
return M->MTX[row][col];
}
void printRowMtx(double* row,int cols)
{
register int j;
for(j = 0 ; j < cols ; j++)
printf("%g ",row[j]);
}
void printMtx(Matrix* M)
{
register int row = 0, col = 0;
printf("\vSize\n");
printf("\tRows:%d\n",M->rows);
printf("\tCols:%d\n",M->cols);
printf("\n");
for(; row < M->rows ; row++)
{
printRowMtx(M->MTX[row],M->cols);
printf("\n");
}
printf("\n");
}
void testMtx()
{
Matrix* M = mkMtx(10,10,NULL);
printMtx(M);
}
matrix.h Header File
typedef struct matrix Matrix;
Matrix* newMtx(int rows,int cols,double value);
Matrix* mkMatrix(int rows,int cols,double** MTX);
void print(Matrix* M);
double getMtx(Matrix* M , int row , int col);
void printRowMtx(double* row,int cols);
void printMtx(Matrix* M);
void testMtx();
complex.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "complex.h"
struct complex
{
int type;
double a;
double b;
};
Complex* mkCpx(int type,double a,double b)
{
/** Doc - {{{
* This function makes a new Complex number.
*
* #params:
* |-->type: Is an interger that denotes if the number is in
* | the analitic or in the polar form.
* | ANALITIC:0
* | POLAR :1
* |
* |-->a: Is the real part if type = 0 and is the radius if
* | type = 1
* |
* `-->b: Is the imaginary part if type = 0 and is the argument
* if type = 1
*
* #return:
* Returns the new Complex number initialized with the values
* passed
*}}} */
Complex* number = (Complex*)malloc(sizeof(Complex));
number->type = type;
number->a = a;
number->b = b;
return number;
}
void printCpx(Complex* number)
{
switch(number->type)
{
case ANALITIC:
printf("Re:%g | Im:%g\n",number->a,number->b);
break;
case POLAR:
printf("Radius:%g | Arg:%g\n",number->a,number->b);
break;
}
}
void testCpx()
{
Complex* Z = mkCpx(ANALITIC,3,2);
printCpx(Z);
}
complex.h Header File
#define ANALITIC 0
#define POLAR 1
typedef struct complex Complex;
Complex* mkCpx(int type,double a,double b);
void printCpx(Complex* number);
void testCpx();
I hope I hadn't missed nothing.
I am using option 2 for a couple of high performance collections, and it is extremely time-consuming working through the amount of macro logic needed to do anything truly compile-time generic and worth using. I am doing this purely for raw performance (games). An X-macros approach is used.
A painful issue that constantly comes up with Option 2 is, "Assuming some finite number of options, such as 8/16/32/64 bit keys, do I make said value a constant and define several functions each with a different element of this set of values that constant can take on, or do I just make it a member variable?" The former means a less performant instruction cache since you have a lot of repeated functions with just one or two numbers different, while the latter means you have to reference allocated variables which in the worst case means a data cache miss. Since Option 1 is purely dynamic, you will make such values member variables without even thinking about it. This truly is micro-optimisation, though.
Also bear in mind the trade-off between returning pointers vs. values: the latter is most performant when the size of the data item is less than or equal to pointer size; whereas if the data item is larger, it is most likely better to return pointers than to force a copy of a large object by returning value.
I would strongly suggest going for Option 1 in any scenario where you are not 100% certain that collection performance will be your bottleneck. Even with my use of Option 2, my collections library supplies a "quick setup" which is like Option 1, i.e. use of void * values in my list and map. This is sufficient for 90+% of circumstances.
You could check out https://github.com/clehner/ll.c
It's easy to use:
#include <stdio.h>
#include <string.h>
#include "ll.h"
int main()
{
int *numbers = NULL;
*( numbers = ll_new(numbers) ) = 100;
*( numbers = ll_new(numbers) ) = 200;
printf("num is %d\n", *numbers);
numbers = ll_next(numbers);
printf("num is %d\n", *numbers);
typedef struct _s {
char *word;
} s;
s *string = NULL;
*( string = ll_new(string) ) = (s) {"a string"};
*( string = ll_new(string) ) = (s) {"another string"};
printf("string is %s\n", string->word);
string = ll_next( string );
printf("string is %s\n", string->word);
return 0;
}
Output:
num is 200
num is 100
string is another string
string is a string

Resources