Is there something in C like C++ templates? If not, how to re-use structures and functions for different data types? - c

I want to write a linked list that can have the data field store any build-in or user-define types. In C++ I would just use a template, but how do I accomplish this in C?
Do I have to re-write the linked list struct and a bunch of operations of it for each data type I want it to store? Unions wouldn't work because what type can it store is predefined.

There's a reason people use languages other than C.... :-)
In C, you'd have your data structure operate with void* members, and you'd cast wherever you used them to the correct types. Macros can help with some of that noise.

There are different approaches to this problem:
using datatype void*: these means, you have pointers to memory locations whose type is not further specified. If you retrieve such a pointer, you can explicitly state what is inside it: *(int*)(mystruct->voidptr) tells the compiler: look at the memory location mystruct->voidptr and interpret the contents as int.
another thing can be tricky preprocessor directives. However, this is usually a very non-trivial issue:
I also found http://sglib.sourceforge.net/
Edit: For the preprocessor trick:
#include <stdio.h>
#define mytype(t) struct { t val; }
int main(int argc, char *argv[]) {
mytype(int) myint;
myint.val=6;
printf ("%d\n", myint.val);
return 0;
}
This would be a simple wrapper for types, but I think it can become quite complicated.

It's less comfortable in C (there's a reason C++ is called C incremented), but it can be done with generic pointers (void *) and the applocation handles the type management itself.
A very nice implementation of generic data structures in C can be found in ubiqx modules, the sources are definitely worth reading.

With some care, you can do this using macros that build and manipulate structs. One of the most well-tested examples of this is the BSD "queue" library. It works on every platform I've tried (Unix, Windows, VMS) and consists of a single header file (no C file).
It has the unfortunate downside of being a bit hard to use, but it preserves as much type-safety as it can in C.
The header file is here: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/sys/queue.h?rev=1.34;content-type=text%2Fplain, and the documentation on how to use it is here: http://www.openbsd.org/cgi-bin/man.cgi?query=queue.
Beyond that, no, you're stuck with losing type-safety (using (void *) all over the place) or moving to the STL.

Here's an option that's very flexible but requires a lot of work.
In your list node, store a pointer to the data as a void *:
struct node {
void *data;
struct node *next;
};
Then you'd create a suite of functions for each type that handle tasks like comparison, assignment, duplication, etc.:
// create a new instance of the data item and copy the value
// of the parameter to it.
void *copyInt(void *src)
{
int *p = malloc(sizeof *p);
if (p) *p = *(int *)src;
return p;
}
void assignInt(void *target, void *src)
{
// we create a new instance for the assignment
*(int *)target = copyInt(src);
}
// returns -1 if lhs < rhs, 0 if lhs == rhs, 1 if lhs > rhs
int testInt(void *lhs, void *rhs)
{
if (*(int *)lhs < *(int *)rhs) return -1;
else if (*(int *)lhs == *(int *)rhs) return 0;
else return 1;
}
char *intToString(void *data)
{
size_t digits = however_many_digits_in_an_int();
char *s = malloc(digits + 2); // sign + digits + terminator
sprintf(s, "%d", *(int *)data);
return s;
}
Then you could create a list type that has pointers to these functions, such as
struct list {
struct node *head;
void *(*cpy)(void *); // copy operation
int (*test)(void *, void *); // test operation
void (*asgn)(void *, void *); // assign operation
char *(*toStr)(void *); // get string representation
...
}
struct list myIntList;
struct list myDoubleList;
myIntList.cpy = copyInt;
myIntList.test = testInt;
myIntList.asgn = assignInt;
myIntList.toStr = intToString;
myDoubleList.cpy = copyDouble;
myDoubleList.test = testDouble;
myDoubleList.asgn = assignDouble;
myDoubleList.toStr = doubleToString;
...
Then, when you pass the list to an insert or search operation, you'd call the functions from the list object:
void addToList(struct list *l, void *value)
{
struct node *new, *cur = l->head;
while (cur->next != NULL && l->test(cur->data, value) <= 0)
cur = cur->next;
new = malloc(sizeof *new);
if (!new)
{
// handle error here
}
else
{
new->data = l->cpy(value);
new->next = cur->next;
cur->next = new;
if (logging)
{
char *s = l->toStr(new->data);
fprintf(log, "Added value %s to list\n", s);
free(s);
}
}
}
...
i = 1;
addToList(&myIntList, &i);
f = 3.4;
addToList(&myDoubleList, &f);
By delegating the type-aware operations to separate functions called through function pointers, you now have a list structure that can store values of any type. To add support for new types, you only need to implement new copy, assign, toString, etc., functions for that new type.
There are drawbacks. For one thing, you can't use constants as function parameters (e.g., you can't do something simple like addToList(&myIntList, 1);) -- you have to assign everything to a variable first, and pass the address of the variable (which is why you need to create new instances of the data member when you add it to the list; if you just assigned the address of the variable, every element in the list would wind up pointing to the same object, which may no longer exist depending on the context).
Secondly, you wind up doing a lot of memory management; you don't just create a new instance of the list node, but you also must create a new instance of the data member. You must remember to free the data member before freeing the node. Then you're creating a new string instance every time you want to display the data, and you have to remember to free that string when you're done with it.
Finally, this solution throws type safety right out the window and into oncoming traffic (after lighting it on fire). The delegate functions are counting on you to keep the types straight; there's nothing preventing you from passing the address of a double variable to one of the int handling functions.
Between the memory management and the fact that you must make a function call for just about every operation, performance is going to suffer. This isn't a fast solution.
Of course, this assumes that every element in the list is the same type; if you're wanting to store elements of different types in the same list, then you're going to have to do something different, such as associate the functions with each node, rather than the list overall.

I wrote a generic linked list "template" in C using the preprocessor, but it's pretty horrible to look at, and heavily pre-processed code is not easy to debug.
These days I think you'd be better off using some other code generation tool such as Python / Cog: http://www.python.org/about/success/cog/

I agree with JonathanPatschke's answer that you should look at sys/queue.h, although I've never tried it myself, as it is not on some of the platforms I work with. I also agree with Vicki's answer to use Python.
But I've found that five or six very simple C macros meet most of my garden-variety needs. These macros help clean up ugly, bug-prone code, without littering it with hidden void *'s, which destroy type-safety. Some of these macros are:
#define ADD_LINK_TO_END_OF_LIST(add, head, tail) \
if (!(head)) \
(tail) = (head) = (add); \
else \
(tail) = (tail)->next = (add)
#define ADD_DOUBLE_LINK_TO_END_OF_LIST(add, head, tail) \
if (!(head)) \
(tail) = (head) = (add); \
else \
(tail) = ((add)->prev = (tail), (tail)->next = (add))
#define FREE_LINK_IN_LIST(p, dtor) do { /* singly-linked */ \
void *myLocalTemporaryPtr = (p)->next; \
dtor(p); \
(p) = myLocalTemporaryPtr;} while (0)
#define FREE_LINKED_LIST(p, dtor) do { \
while (p) \
FREE_LINK_IN_LIST(p, dtor);} while (0)
// copy "ctor" (shallow)
#define NEW_COPY(p) memcpy(myMalloc(sizeof *(p)), p, sizeof *(p))
// iterator
#define NEXT_IN_LIST(p, list) ((p) ? (p)->next : (list))
So, for example:
struct MyContact {
char *name;
char *address;
char *telephone;
...
struct MyContact *next;
} *myContactList = 0, *myContactTail; // the tail doesn't need to be init'd
...
struct MyContact newEntry = {};
...
ADD_LINK_TO_END_OF_LIST(NEW_COPY(newEntry), myContactList, myContactTail);
...
struct MyContact *i = 0;
while ((i = NEXT_IN_LIST(i, myContactList))) // iterate through list
// ...
The next and prev members have hard-coded names. They don't need to be void *, which avoids problems with strict anti-aliasing. They do need to be zeroed when the data item is created.
The dtor argument for FREE_LINK_IN_LIST would typically be a function like free, or (void) to do nothing, or another macro such as:
#define MY_CONTACT_ENTRY_DTOR(p) \
do { if (p) { \
free((p)->name); \
free((p)->address); \
free((p)->telephone); \
free(p); \
}} while (0)
So for example, FREE_LINKED_LIST(myContactList, MY_CONTACT_ENTRY_DTOR) would free all the members of the (duck-typed) list headed by myContactList.
There is one void * here, but perhaps it could be removed via gcc's typeof.

If you need a list that can hold elements of different types simultaneously, e.g. an int followed by three char * followed by a struct tm, then using void * for the data is the solution. But if you only need multiple list types with identical methods, the best solution depends on if you want to avoid generating many instances of almost identical machine code, or just avoid typing source code.
A struct declaration doesn't generate any machine code...
struct int_node {
void *next;
int data;
};
struct long_node {
void *next;
long data;
};
...and one single function which uses a void * parameter and/or return value, can handle them all.
struct generic_node {
void *next;
};
void *insert(void *before_this, void *element, size_t element_sizes);

Related

Storing and using type information in C

I'm coming from Java and I'm trying to implement a doubly linked list in C as an exercise. I wanted to do something like the Java generics where I would pass a pointer type to the list initialization and this pointer type would be use to cast the list void pointer but I'm not sure if this is possible?
What I'm looking for is something that can be stored in a list struct and used to cast *data to the correct type from a node. I was thinking of using a double pointer but then I'd need to declare that as a void pointer and I'd have the same problem.
typedef struct node {
void *data;
struct node *next;
struct node *previous;
} node;
typedef struct list {
node *head;
node *tail;
//??? is there any way to store the data type of *data?
} list;
Typically, the use of specific functions like the following are used.
void List_Put_int(list *L, int *i);
void List_Put_double(list *L, double *d);
int * List_Get_int(list *L);
double *List_Get_double(list *L);
A not so easy for learner approach uses _Generic. C11 offers _Generic which allows for code, at compile time, to be steered as desired based on type.
The below offers basic code to save/fetch to 3 types of pointers. The macros would need expansion for each new types. _Generic does not allow 2 types listed that may be the same like unsigned * and size_t *. So there are are limitations.
The type_id(X) macros creates an enumeration for the 3 types which may be use to check for run-time problems as with LIST_POP(L, &d); below.
typedef struct node {
void *data;
int type;
} node;
typedef struct list {
node *head;
node *tail;
} list;
node node_var;
void List_Push(list *l, void *p, int type) {
// tbd code - simplistic use of global for illustration only
node_var.data = p;
node_var.type = type;
}
void *List_Pop(list *l, int type) {
// tbd code
assert(node_var.type == type);
return node_var.data;
}
#define cast(X,ptr) _Generic((X), \
double *: (double *) (ptr), \
unsigned *: (unsigned *) (ptr), \
int *: (int *) (ptr) \
)
#define type_id(X) _Generic((X), \
double *: 1, \
unsigned *: 2, \
int *: 3 \
)
#define LIST_PUSH(L, data) { List_Push((L),(data), type_id(data)); }
#define LIST_POP(L, dataptr) (*(dataptr)=cast(*dataptr, List_Pop((L), type_id(*dataptr))) )
Usage example and output
int main() {
list *L = 0; // tbd initialization
int i = 42;
printf("%p %d\n", (void*) &i, i);
LIST_PUSH(L, &i);
int *j;
LIST_POP(L, &j);
printf("%p %d\n", (void*) j, *j);
double *d;
LIST_POP(L, &d);
}
42
42
assertion error
There is no way to do what you want in C. There is no way to store a type in a variable and C doesn't have a template system like C++ that would allow you to fake it in the preprocessor.
You could define your own template-like macros that could quickly define your node and list structs for whatever type you need, but I think that sort of hackery is generally frowned upon unless you really need a whole bunch of linked lists that only differ in the type they store.
C doesn't have any runtime type information and doesn't have a type "Type". Types are meaningless once the code was compiled. So, there's no solution to what you ask provided by the language.
One common reason you would want to have a type available at runtime is that you have some code that might see different instances of your container and must do different things for different types stored in the container. You can easily solve such a situation using an enum, e.g.
enum ElementType
{
ET_INT; // int
ET_DOUBLE; // double
ET_CAR; // struct Car
// ...
};
and enumerate any type here that should ever go into your container. Another reason is if your container should take ownership of the objects stored in it and therefore must know how to destroy them (and sometimes how to clone them). For such cases, I recommend the use of function pointers:
typedef void (*ElementDeleter)(void *element);
typedef void *(*ElementCloner)(const void *element);
Then extend your struct to contain these:
typedef struct list {
node *head;
node *tail;
ElementDeleter deleter;
ElementCloner cloner;
} list;
Make sure they are set to a function that actually deletes resp. clones an element of the type to be stored in your container and then use them where needed, e.g. in a remove function, you could do something like
myList->deleter(myNode->data);
// delete the contained element without knowing its type
create enum type, that will store data type and alloc memory according to this enum. This could be done in switch/case construction.
Unlike Java or C++, C does not provide any type safety. To answer your question succinctly, by rearranging your node type this way:
struct node {
node* prev; /* put these at front */
node* next;
/* no data here */
};
You could then separately declare nodes carrying any data
struct data_node {.
data_node *prev; // keep these two data members at the front
data_node *next; // and in the same order as in struct list.
// you can add more data members here.
};
/* OR... */
enter code here
struct data_node2 {
node node_data; /* WANING: this may look a bit safer, but is _only_ if placed at the front.
/* more data ... */
};
You can then create a library that operates on data-less lists of nodes.
void list_add(list* l, node* n);
void list_remove(list* l, node* n);
/* etc... */
And by casting, use this 'generic lists' api to do operation on your list
You can have some sort of type information in your list declaration, for what it's worth, since C does not provide meaningful type protection.
struct data_list
{
data_node* head; /* this makes intent clear. */
data_node* tail;
};
struct data2_list
{
data_node2* head;
data_node2* tail;
};
/* ... */
data_node* my_data_node = malloc(sizeof(data_node));
data_node2* my_data_node2 = malloc(sizeof(data_node2));
/* ... */
list_add((list*)&my_list, (node*)my_data_node);
list_add((list*)&my_list2, &(my_data_node2->node_data));
/* warning above is because one could write this */
list_add((list*)&my_list2, (node*)my_data_node2);
/* etc... */
These two techniques generate the same object code, so which one you choose is up to you, really.
As an aside, avoid the typedef struct notation if your compiler allows, most compilers do, these days. It increases readability in the long run, IMHO. You can be certain some won't and some will agree with me on this subject though.

Simulate a Java generic interface and abstract data type in C

I am trying to port a library written in Java into C programming language. For Java interface, I intend to use a struct of function-pointers to replace, for instance:
// Java code
public interface ActionsFunction {
Set<Action> actions(Object s);
}
/* C code */
typedef struct ActionsFunction {
List* (*actions)(void* s);
void (*clear_actions)(struct List **list); /* Since C doesn't have garbage collector */
} ActionsFunction;
My question is: whether it is a suitable solution or not, and how can I simulate a generic interface such as:
public interface List <E> {
void add(E x);
Iterator<E> iterator();
}
UPDATE:
I also have to face with another problem: implementing generic abstract data structure like List, Queue, Stack, etc since the C standard library lacks of those implementation. My approach is client code should pass the pointer of its data accompanying with its size, thus allowing library to hold that one without specifying its type. One more time, it just my idea. I need your advices for the design as well as implementing technique.
My initial porting code can be found at:
https://github.com/PhamPhiLong/AIMA
generic abstract data structure can be found in utility sub folder.
Here's a very brief example using macros to accomplish something like this. This can get hairy pretty quick, but if done correctly, you can maintain complete static type safety.
#include <stdlib.h>
#include <stdio.h>
#define list_type(type) struct __list_##type
/* A generic list node that keeps 'type' by value. */
#define define_list_val(type) \
list_type(type) { \
list_type(type) *next; \
type value; \
}
#define list_add(plist, node) \
do \
{ \
typeof(plist) p; \
for (p = plist; *p != NULL; p = &(*p)->next) ; \
*p = node; \
node->next = NULL; \
} while(0)
#define list_foreach(plist, p) \
for (p = *plist; p != NULL; p = p->next)
define_list_val(int) *g_list_ints;
define_list_val(float) *g_list_floats;
int main(void)
{
list_type(int) *node;
node = malloc(sizeof(*node));
node->value = 42;
list_add(&g_list_ints, node);
node = malloc(sizeof(*node));
node->value = 66;
list_add(&g_list_ints, node);
list_foreach(&g_list_ints, node) {
printf("Node: %d\n", node->value);
}
return 0;
}
There are a few common ways to do generic-ish programming in C. I would expect to use one or more of the following methods in trying to accomplish the task you've described.
MACROS: One is to use macros. In this example, MAX looks like a function, but operate on anything that can be compared with the ">" operator:
#define MAX(a,b) ((a) > (b) ? (a) : (b))
int i;
float f;
unsigned char b;
f = MAX(7.4, 2.5)
i = MAX(3, 4)
b = MAX(10, 20)
VOID *: Another method is to use void * pointers for representing generic data, and then pass function pointers into your algorithms to operate on the data. Look up the <stdlib.h> function qsort for a classic example of this technique.
UNIONS: Yet another, though probably seen less often, technique is to use unions to hold data of multiple different types. This makes your algorithms that operate on the data kinda ugly though and might not save much coding:
enum { VAR_DOUBLE, VAR_INT, VAR_STRING }
/* Declare a generic container struct for any type of data you want to operate on */
struct VarType
{
int type;
union data
{
double d;
int i;
char * sptr;
};
}
int main(){
VarType x;
x.data.d = 1.75;
x.type = VAR_DOUBLE;
/* call some function that sorts out what to do based on value of x.type */
my_function( x );
}
CLEVER CASTING & POINTER MATH It's a pretty common idiom to see data structures with functions that operate on a specific kind of struct and then require that the struct by included in your struct to do anything useful.
The easy way to do this, is the force the struct that allows insertion into the data structure to be the first member of your derived type. Then you can seamless cast back & forth between the two. The more versatile way is to use 'offsetof'. Here's a simple example.
For example:
/* Simple types */
struct listNode { struct listNode * next; struct listNode * prev };
struct list { struct listNode dummy; }
/* Functions that operate on those types */
int append( struct list * theList, struct listNode * theNode );
listNode * first( struct list *theList );
/* To use, you must do something like this: */
/* Define your own type that includes a list node */
typedef struct {
int x;
double y;
char name[16];
struct listNode node;
} MyCoolType;
int main() {
struct list myList;
MyCoolType coolObject;
MyCoolType * ptr;
/* Add the 'coolObject's 'listNode' member to the list */
appendList( &myList, &coolObject.node );
/* Use ugly casting & pointer math to get back you your original type
You may want to google 'offsetof' here. */
ptr = (MyCoolType *) ( (char*) first( &myList )
- offsetof(MyCoolType,node);
}
The libev documentation has some more good examples of this last technique:
http://search.cpan.org/dist/EV/libev/ev.pod#COMMON_OR_USEFUL_IDIOMS_(OR_BOTH)

how to create a linked list of structs in c

I have 3 types of structures: book, CD (in the CD I have the struct "song"- and the CD contain a list of songs), and a DVD.
I need to create a linked list of products of a store
My question is how to create a list of products without knowing which type is the pointer in it. It can be book, CD or DVD.
(I cannot use unions.)
Leaving the implementation of the CD / DVD data structures up to you, as well as the implementation of the linked list, you would probably want to do something like this:
enum ptype {
PTYPE_BOOK,
PTYPE_CD,
PTYPE_DVD,
};
struct book {
char *author;
char *title;
char *publisher;
char *isbn;
};
struct product {
enum ptype type;
void *data;
};
struct product_list {
struct product *product;
struct product_list *next;
};
The enumeration is responsible for distinguishing the type of product being pointed to. To create a book, for instance:
struct product *
create_book(char *author, char *title, char *publisher, char *isbn)
{
struct product *p;
struct book *b;
p = calloc(1, sizeof (*p));
if (p == NULL) {
return NULL;
}
p->type = PTYPE_BOOK;
p->data = calloc(1, sizeof(*b));
if (p->data == NULL) {
free(p);
return NULL;
}
b = p->data;
b->author = author;
b->title = title;
b->publisher = publisher;
b->isbn = isbn;
return p;
}
This is a typical interface when unions can't be used for whatever reason. It's unfortunate in that it requires much more memory allocation (and in reality, you'll probably have to strdup(3) author / title / publisher / isbn).
To retrieve a book from a product, you might like to have something like this:
static inline struct book *
get_book(struct product *p)
{
assert(p->type == PTYPE_BOOK);
return p->data;
}
You don't need to (and shouldn't) cast a void pointer in C. If you're using or supporting a C++ compiler, you may need to use return (struct book *)p->data;. You'd implement something similar for your CD and DVD types. Then, when you need to extract the product:
switch (p->type) {
case PTYPE_BOOK:
b = get_book(p);
break;
case PTYPE_CD:
c = get_cd(p);
break;
case PTYPE_DVD:
d = get_dvd(p);
break;
}
You may also want to look at using something other than a linked list for storing these things, especially if they will be read / traversed many times after they are created. (A vector would not be a bad idea). If you know how many items you'll have, this can help reduce the number of allocations you must perform, and the contiguous memory access will improve speed.
If you need to search entries, I suspect you'll need an external searchable data structure anyway.
You need to use void pointers for the data set. Here is a snippet of code from my linked list structures I use modified for your need:
#define CD 1
#define DVD 2
#define BOOK 3
/* Structure for linked list elements */
typedef struct ListElmt_ {
void *data;
unsigned datatype; /* variable to know which data type to cast as */
struct ListElmt_ *next;
} ListElmt;
#define list_data(element) ((element)->data)
Using the void pointer to pack your data into the list, you can now just test the datatype variable and uncast as necessary. I use a macro to return list data (defined above). So you could use something like:
CD_struct *cd_data
if (element->datatype == CD)
cd_data = (CD_struct *) list_data(ListElmt)
if every struct has ITEMTYPE is first member, you can use LinkedList.itemtype on all
this is because the offset to itemtype does not depend on inner struct order, since by rule i said itemtype is same type in all and first in all
One way could be :
Create a generic Structure 'Product'
Keep a variable to keep track of the current type of product.
Keep three pointers of Book, CD & DVD each.
Or As in Archie's Comment :
Create a generic Structure 'Product'
Keep a variable to keep track of the current type of product.
Keep a void * pointer & cast when needed.
I think the first one is useful, if at a later stage, the product can be of multiple type. Eg - Book + CD

Type-safe generic data structures in plain-old C?

I have done far more C++ programming than "plain old C" programming. One thing I sorely miss when programming in plain C is type-safe generic data structures, which are provided in C++ via templates.
For sake of concreteness, consider a generic singly linked list. In C++, it is a simple matter to define your own template class, and then instantiate it for the types you need.
In C, I can think of a few ways of implementing a generic singly linked list:
Write the linked list type(s) and supporting procedures once, using void pointers to go around the type system.
Write preprocessor macros taking the necessary type names, etc, to generate a type-specific version of the data structure and supporting procedures.
Use a more sophisticated, stand-alone tool to generate the code for the types you need.
I don't like option 1, as it is subverts the type system, and would likely have worse performance than a specialized type-specific implementation. Using a uniform representation of the data structure for all types, and casting to/from void pointers, so far as I can see, necessitates an indirection that would be avoided by an implementation specialized for the element type.
Option 2 doesn't require any extra tools, but it feels somewhat clunky, and could give bad compiler errors when used improperly.
Option 3 could give better compiler error messages than option 2, as the specialized data structure code would reside in expanded form that could be opened in an editor and inspected by the programmer (as opposed to code generated by preprocessor macros). However, this option is the most heavyweight, a sort of "poor-man's templates". I have used this approach before, using a simple sed script to specialize a "templated" version of some C code.
I would like to program my future "low-level" projects in C rather than C++, but have been frightened by the thought of rewriting common data structures for each specific type.
What experience do people have with this issue? Are there good libraries of generic data structures and algorithms in C that do not go with Option 1 (i.e. casting to and from void pointers, which sacrifices type safety and adds a level of indirection)?
Option 1 is the approach taken by most C implementations of generic containers that I see. The Windows driver kit and the Linux kernel use a macro to allow links for the containers to be embedded anywhere in a structure, with the macro used to obtain the structure pointer from a pointer to the link field:
list_entry() macro in Linux
CONTAINING_RECORD() macro in Windows
Option 2 is the tack taken by BSD's tree.h and queue.h container implementation:
http://openbsd.su/src/sys/sys/queue.h
http://openbsd.su/src/sys/sys/tree.h
I don't think I'd consider either of these approaches type safe. Useful, but not type safe.
C has a different kind of beauty to it than C++, and type safety and being able to always see what everything is when tracing through code without involving casts in your debugger is typically not one of them.
C's beauty comes a lot from its lack of type safety, of working around the type system and at the raw level of bits and bytes. Because of that, there's certain things it can do more easily without fighting against the language like, say, variable-length structs, using the stack even for arrays whose sizes are determined at runtime, etc. It also tends to be a lot simpler to preserve ABI when you're working at this lower level.
So there's a different kind of aesthetic involved here as well as different challenges, and I'd recommend a shift in mindset when you work in C. To really appreciate it, I'd suggest doing things many people take for granted these days, like implementing your own memory allocator or device driver. When you're working at such a low level, you can't help but look at everything as memory layouts of bits and bytes as opposed to 'objects' with behaviors attached. Furthermore, there can come a point in such low-level bit/byte manipulation code where C becomes easier to comprehend than C++ code littered with reinterpret_casts, e.g.
As for your linked list example, I would suggest a non-intrusive version of a linked node (one that does not require storing list pointers into the element type, T, itself, allowing the linked list logic and representation to be decoupled from T itself), like so:
struct ListNode
{
struct ListNode* prev;
struct ListNode* next;
MAX_ALIGN char element[1]; // Watch out for alignment here.
// see your compiler's specific info on
// aligning data members.
};
Now we can create a list node like so:
struct ListNode* list_new_node(int element_size)
{
// Watch out for alignment here.
return malloc_max_aligned(sizeof(struct ListNode) + element_size - 1);
}
// create a list node for 'struct Foo'
void foo_init(struct Foo*);
struct ListNode* foo_node = list_new_node(sizeof(struct Foo));
foo_init(foo_node->element);
To retrieve the element from the list as T*:
T* element = list_node->element;
Since it's C, there's no type checking whatsoever when casting pointers in this way, and that will probably also give you an uneasy feeling if you're coming from a C++ background.
The tricky part here is to make sure that this member, element, is properly aligned for whatever type you want to store. When you can solve that problem as portably as you need it to be, you'll have a powerful solution for creating efficient memory layouts and allocators. Often this will have you just using max alignment for everything which might seem wasteful, but typically isn't if you are using appropriate data structures and allocators which aren't paying this overhead for numerous small elements on an individual basis.
Now this solution still involves the type casting. There's little you can do about that short of having a separate version of code of this list node and the corresponding logic to work with it for every type, T, that you want to support (short of dynamic polymorphism). However, it does not involve an additional level of indirection as you might have thought was needed, and still allocates the entire list node and element in a single allocation.
And I would recommend this simple way to achieve genericity in C in many cases. Simply replace T with a buffer that has a length matching sizeof(T) and aligned properly. If you have a reasonably portable and safe way you can generalize to ensure proper alignment, you'll have a very powerful way of working with memory in a way that often improves cache hits, reduces the frequency of heap allocations/deallocations, the amount of indirection required, build times, etc.
If you need more automation like having list_new_node automatically initialize struct Foo, I would recommend creating a general type table struct that you can pass around which contains information like how big T is, a function pointer pointing to a function to create a default instance of T, another to copy T, clone T, destroy T, a comparator, etc. In C++, you can generate this table automatically using templates and built-in language concepts like copy constructors and destructors. C requires a bit more manual effort, but you can still reduce it the boilerplate a bit with macros.
Another trick that can be useful if you go with a more macro-oriented code generation route is to cash in a prefix or suffix-based naming convention of identifiers. For example, CLONE(Type, ptr) could be defined to return Type##Clone(ptr), so CLONE(Foo, foo) could invoke FooClone(foo). This is kind of a cheat to get something akin to function overloading in C, and is useful when generating code in bulk (when CLONE is used to implement another macro) or even a bit of copying and pasting of boilerplate-type code to at least improve the uniformity of the boilerplate.
Option 1, either using void * or some union based variant is what most C programs use, and it may give you BETTER performance than the C++/macro style of having multiple implementations for different types, as it has less code duplication, and thus less icache pressure and fewer icache misses.
GLib is has a bunch of generic data structures in it, http://www.gtk.org/
CCAN has a bunch of useful snippets and such http://ccan.ozlabs.org/
Your option 1 is what most old time c programmers would go for, possibly salted with a little of 2 to cut down on the repetitive typing, and just maybe employing a few function pointers for a flavor of polymorphism.
There's a common variation to option 1 which is more efficient as it uses unions to store the values in the list nodes, ie there's no additional indirection. This has the downside that the list only accepts values of certain types and potentially wastes some memory if the types are of different sizes.
However, it's possible to get rid of the union by using flexible array member instead if you're willing to break strict aliasing. C99 example code:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct ll_node
{
struct ll_node *next;
long long data[]; // use `long long` for alignment
};
extern struct ll_node *ll_unshift(
struct ll_node *head, size_t size, void *value);
extern void *ll_get(struct ll_node *head, size_t index);
#define ll_unshift_value(LIST, TYPE, ...) \
ll_unshift((LIST), sizeof (TYPE), &(TYPE){ __VA_ARGS__ })
#define ll_get_value(LIST, INDEX, TYPE) \
(*(TYPE *)ll_get((LIST), (INDEX)))
struct ll_node *ll_unshift(struct ll_node *head, size_t size, void *value)
{
struct ll_node *node = malloc(sizeof *node + size);
if(!node) assert(!"PANIC");
memcpy(node->data, value, size);
node->next = head;
return node;
}
void *ll_get(struct ll_node *head, size_t index)
{
struct ll_node *current = head;
while(current && index--)
current = current->next;
return current ? current->data : NULL;
}
int main(void)
{
struct ll_node *head = NULL;
head = ll_unshift_value(head, int, 1);
head = ll_unshift_value(head, int, 2);
head = ll_unshift_value(head, int, 3);
printf("%i\n", ll_get_value(head, 0, int));
printf("%i\n", ll_get_value(head, 1, int));
printf("%i\n", ll_get_value(head, 2, int));
return 0;
}
An old question, I know, but in case it is still of interest: I was experimenting with option 2) (pre-processor macros) today, and came up with the example I will paste below. Slightly clunky indeed, but not terrible. The code is not fully type safe, but contains sanity checks to provide a reasonable level of safety. And dealing with the compiler error messages while writing it was mild compared to what I have seen when C++ templates came into play. You are probably best starting reading this at the example use code in the "main" function.
#include <stdio.h>
#define LIST_ELEMENT(type) \
struct \
{ \
void *pvNext; \
type value; \
}
#define ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement) \
do { \
(void)(&(pElement)->value == (type *)&(pElement)->value); \
(void)(sizeof(*(pElement)) == sizeof(LIST_ELEMENT(type))); \
} while(0)
#define SET_POINTER_TO_LIST_ELEMENT(type, pDest, pSource) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
void **pvDest = (void **)&(pDest); \
*pvDest = ((void *)(pSource)); \
} while(0)
#define LINK_LIST_ELEMENT(type, pDest, pSource) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
(pDest)->pvNext = ((void *)(pSource)); \
} while(0)
#define TERMINATE_LIST_AT_ELEMENT(type, pDest) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
(pDest)->pvNext = NULL; \
} while(0)
#define ADVANCE_POINTER_TO_LIST_ELEMENT(type, pElement) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement); \
void **pvElement = (void **)&(pElement); \
*pvElement = (pElement)->pvNext; \
} while(0)
typedef struct { int a; int b; } mytype;
int main(int argc, char **argv)
{
LIST_ELEMENT(mytype) el1;
LIST_ELEMENT(mytype) el2;
LIST_ELEMENT(mytype) *pEl;
el1.value.a = 1;
el1.value.b = 2;
el2.value.a = 3;
el2.value.b = 4;
LINK_LIST_ELEMENT(mytype, &el1, &el2);
TERMINATE_LIST_AT_ELEMENT(mytype, &el2);
printf("Testing.\n");
SET_POINTER_TO_LIST_ELEMENT(mytype, pEl, &el1);
if (pEl->value.a != 1)
printf("pEl->value.a != 1: %d.\n", pEl->value.a);
ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl);
if (pEl->value.a != 3)
printf("pEl->value.a != 3: %d.\n", pEl->value.a);
ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl);
if (pEl != NULL)
printf("pEl != NULL.\n");
printf("Done.\n");
return 0;
}
I use void pointers (void*) to represent generic data structures defined with structs and typedefs. Below I share my implementation of a lib which I'm working on.
With this kind of implementation, you can think of each new type, defined with typedef, like a pseudo-class. Here, this pseudo-class is the set of the source code (some_type_implementation.c) and its header file (some_type_implementation.h).
In the source code, you have to define the struct that will present the new type. Note the struct in the "node.c" source file. There I made a void pointer to the "info" atribute. This pointer may carry any type of pointer (I think), but the price you have to pay is a type identifier inside the struct (int type), and all the switchs to make the propper handle of each type defined. So, in the node.h" header file, I defined the type "Node" (just to avoid have to type struct node every time), and also I had to define the constants "EMPTY_NODE", "COMPLEX_NODE", and "MATRIX_NODE".
You can perform the compilation, by hand, with "gcc *.c -lm".
main.c Source File
#include <stdio.h>
#include <math.h>
#define PI M_PI
#include "complex.h"
#include "matrix.h"
#include "node.h"
int main()
{
//testCpx();
//testMtx();
testNode();
return 0;
}
node.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "node.h"
#include "complex.h"
#include "matrix.h"
#define PI M_PI
struct node
{
int type;
void* info;
};
Node* newNode(int type,void* info)
{
Node* newNode = (Node*) malloc(sizeof(Node));
newNode->type = type;
if(info != NULL)
{
switch(type)
{
case COMPLEX_NODE:
newNode->info = (Complex*) info;
break;
case MATRIX_NODE:
newNode->info = (Matrix*) info;
break;
}
}
else
newNode->info = NULL;
return newNode;
}
int emptyInfoNode(Node* node)
{
return (node->info == NULL);
}
void printNode(Node* node)
{
if(emptyInfoNode(node))
{
printf("Type:%d\n",node->type);
printf("Empty info\n");
}
else
{
switch(node->type)
{
case COMPLEX_NODE:
printCpx(node->info);
break;
case MATRIX_NODE:
printMtx(node->info);
break;
}
}
}
void testNode()
{
Node *node1,*node2, *node3;
Complex *Z;
Matrix *M;
Z = mkCpx(POLAR,5,3*PI/4);
M = newMtx(3,4,PI);
node1 = newNode(COMPLEX_NODE,Z);
node2 = newNode(MATRIX_NODE,M);
node3 = newNode(EMPTY_NODE,NULL);
printNode(node1);
printNode(node2);
printNode(node3);
}
node.h Header File
#define EMPTY_NODE 0
#define COMPLEX_NODE 1
#define MATRIX_NODE 2
typedef struct node Node;
Node* newNode(int type,void* info);
int emptyInfoNode(Node* node);
void printNode(Node* node);
void testNode();
matrix.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "matrix.h"
struct matrix
{
// Meta-information about the matrix
int rows;
int cols;
// The elements of the matrix, in the form of a vector
double** MTX;
};
Matrix* newMtx(int rows,int cols,double value)
{
register int row , col;
Matrix* M = (Matrix*)malloc(sizeof(Matrix));
M->rows = rows;
M->cols = cols;
M->MTX = (double**) malloc(rows*sizeof(double*));
for(row = 0; row < rows ; row++)
{
M->MTX[row] = (double*) malloc(cols*sizeof(double));
for(col = 0; col < cols ; col++)
M->MTX[row][col] = value;
}
return M;
}
Matrix* mkMtx(int rows,int cols,double** MTX)
{
Matrix* M;
if(MTX == NULL)
{
M = newMtx(rows,cols,0);
}
else
{
M = (Matrix*)malloc(sizeof(Matrix));
M->rows = rows;
M->cols = cols;
M->MTX = MTX;
}
return M;
}
double getElemMtx(Matrix* M , int row , int col)
{
return M->MTX[row][col];
}
void printRowMtx(double* row,int cols)
{
register int j;
for(j = 0 ; j < cols ; j++)
printf("%g ",row[j]);
}
void printMtx(Matrix* M)
{
register int row = 0, col = 0;
printf("\vSize\n");
printf("\tRows:%d\n",M->rows);
printf("\tCols:%d\n",M->cols);
printf("\n");
for(; row < M->rows ; row++)
{
printRowMtx(M->MTX[row],M->cols);
printf("\n");
}
printf("\n");
}
void testMtx()
{
Matrix* M = mkMtx(10,10,NULL);
printMtx(M);
}
matrix.h Header File
typedef struct matrix Matrix;
Matrix* newMtx(int rows,int cols,double value);
Matrix* mkMatrix(int rows,int cols,double** MTX);
void print(Matrix* M);
double getMtx(Matrix* M , int row , int col);
void printRowMtx(double* row,int cols);
void printMtx(Matrix* M);
void testMtx();
complex.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "complex.h"
struct complex
{
int type;
double a;
double b;
};
Complex* mkCpx(int type,double a,double b)
{
/** Doc - {{{
* This function makes a new Complex number.
*
* #params:
* |-->type: Is an interger that denotes if the number is in
* | the analitic or in the polar form.
* | ANALITIC:0
* | POLAR :1
* |
* |-->a: Is the real part if type = 0 and is the radius if
* | type = 1
* |
* `-->b: Is the imaginary part if type = 0 and is the argument
* if type = 1
*
* #return:
* Returns the new Complex number initialized with the values
* passed
*}}} */
Complex* number = (Complex*)malloc(sizeof(Complex));
number->type = type;
number->a = a;
number->b = b;
return number;
}
void printCpx(Complex* number)
{
switch(number->type)
{
case ANALITIC:
printf("Re:%g | Im:%g\n",number->a,number->b);
break;
case POLAR:
printf("Radius:%g | Arg:%g\n",number->a,number->b);
break;
}
}
void testCpx()
{
Complex* Z = mkCpx(ANALITIC,3,2);
printCpx(Z);
}
complex.h Header File
#define ANALITIC 0
#define POLAR 1
typedef struct complex Complex;
Complex* mkCpx(int type,double a,double b);
void printCpx(Complex* number);
void testCpx();
I hope I hadn't missed nothing.
I am using option 2 for a couple of high performance collections, and it is extremely time-consuming working through the amount of macro logic needed to do anything truly compile-time generic and worth using. I am doing this purely for raw performance (games). An X-macros approach is used.
A painful issue that constantly comes up with Option 2 is, "Assuming some finite number of options, such as 8/16/32/64 bit keys, do I make said value a constant and define several functions each with a different element of this set of values that constant can take on, or do I just make it a member variable?" The former means a less performant instruction cache since you have a lot of repeated functions with just one or two numbers different, while the latter means you have to reference allocated variables which in the worst case means a data cache miss. Since Option 1 is purely dynamic, you will make such values member variables without even thinking about it. This truly is micro-optimisation, though.
Also bear in mind the trade-off between returning pointers vs. values: the latter is most performant when the size of the data item is less than or equal to pointer size; whereas if the data item is larger, it is most likely better to return pointers than to force a copy of a large object by returning value.
I would strongly suggest going for Option 1 in any scenario where you are not 100% certain that collection performance will be your bottleneck. Even with my use of Option 2, my collections library supplies a "quick setup" which is like Option 1, i.e. use of void * values in my list and map. This is sufficient for 90+% of circumstances.
You could check out https://github.com/clehner/ll.c
It's easy to use:
#include <stdio.h>
#include <string.h>
#include "ll.h"
int main()
{
int *numbers = NULL;
*( numbers = ll_new(numbers) ) = 100;
*( numbers = ll_new(numbers) ) = 200;
printf("num is %d\n", *numbers);
numbers = ll_next(numbers);
printf("num is %d\n", *numbers);
typedef struct _s {
char *word;
} s;
s *string = NULL;
*( string = ll_new(string) ) = (s) {"a string"};
*( string = ll_new(string) ) = (s) {"another string"};
printf("string is %s\n", string->word);
string = ll_next( string );
printf("string is %s\n", string->word);
return 0;
}
Output:
num is 200
num is 100
string is another string
string is a string

Modular data structure in C with dynamic data type

For my upcoming university C project, I'm requested to have modular code as C allows it. Basically, I'll have .c file and a corresponding .h file for some data structure, like a linked list, binary tree, hash table, whatever...
Using a linked list as an example, I have this:
typedef struct sLinkedList {
int value;
struct sLinkedList *next;
} List;
But this forces value to be of type int and the user using this linked list library would be forced to directly change the source code of the library. I want to avoid that, I want to avoid the need to change the library, to make the code as modular as possible.
My project may need to use a linked list for a list of integers, or maybe a list of some structure. But I'm not going to duplicate the library files/code and change the code accordingly.
How can I solve this?
Unfortunately, there is no simple way to solve this. The most common, pure C approach to this type of situation is to use a void*, and to copy the value into memory allocated by you into the pointer. This makes usage tricky, though, and is very error prone.
Another alternative no one has mentioned yet can be found in the Linux kernel's list.h generic linked list implementation. The principle is this:
/* generic definition */
struct list {
strict list *next, *prev;
};
// some more code
/* specific version */
struct intlist {
struct list list;
int i;
};
If you make struct intlist* pointers, they can safely be cast (in C) to struct list* pointers, thus allowing you to write genericized functions that operate on struct list* and have them work regardless of datatype.
The list.h implementation uses some macro trickery to support arbitrary placement of the struct list inside your specific list, but I prefer to rely on the struct-cast-to-first-member trick myself. It makes the calling code much easier to read. Granted, it disables "multiple inheritance" (assuming you consider this to be some kind of inheritance) but next(mylist) looks nicer than next(mylist, list). Plus, if you can avoid delving into offsetof hackery, you're probably going to end up in better shape.
Since this is a university project, we can't just give you the answer. Instead, I'd invite you to meditate on two C features: the void pointer (which you've likely encountered before), and the token pasting operator (which you may not have).
You can avoid this by defining value as void* value;. You can assign a pointer to any type of data this way, but the calling code is required to cast and dereference the pointer to the correct type. One way to keep track of this would be to add a short char array to the struct to note the type name.
This problem is precisely the reason why templates were developed for C++. The approach I've used once or twice in C is to have the value field be a void*, and cast the values thereto on insertion and cast them back on retrieval. This is far from type-safe, of course. For extra modularity, I might write insert_int(), get_mystruct() etc. functions for each type you use this for, and do the casting there.
You can use Void* instead of int. This allows the data to be of any type. But the user should be aware of the type of data.
For that, optionally you can have another member which represents Type. which is of enum {INT,CHAR,float...}
Unlike C++ where one can use template, void * is the de-facto C solution.
Also, you can put the elements of the linked list in a separate struct, e.g:
typedef struct sLinkedListElem {
int value; /* or "void * value" */
} ListElem;
typedef struct sLinkedList {
ListElem data;
struct sLinkedList *next;
} List;
so that the elements can be changed without affecting the link-ing code.
Here is an example of linked list utilities in C:
struct Single_List_Node
{
struct Single_List * p_next;
void * p_data;
};
struct Double_List_Node
{
struct Double_List * p_next;
struct Double_List * p_prev; // pointer to previous node
void * p_data;
};
struct Single_List_Data_Type
{
size_t size; // Number of elements in list
struct Single_List_Node * p_first_node;
struct Single_List_Node * p_last_node; // To make appending faster.
};
Some generic functions:
void Single_List_Create(struct Single_List_Data_Type * p_list)
{
if (p_list)
{
p_list->size = 0;
p_list->first_node = 0;
p_list->last_node = p_list->first_node;
}
return;
}
void Single_List_Append(struct Single_List_Data_Type * p_list,
void * p_data)
{
if (p_list)
{
struct Single_List_Node * p_new_node = malloc(sizeof(struct Single_List_Node));
if (p_new_node)
{
p_new_node->p_data = p_data;
p_new_node->p_next = 0;
if (p_list->last_node)
{
p_list->last_node->p_next = p_new_node;
}
else
{
if (p_list->first_node == 0)
{
p_list->first_node = p_new_node;
p_list->last_node = p_new_node;
}
else
{
struct Single_List_Node * p_last_node = 0;
p_last_node = p_list->first_node;
while (p_last_node->p_next)
{
p_last_node = p_last_node->p_next;
}
p_list->last_node->p_next = p_new_node;
p_list->last_node = p_new_node;
}
}
++(p_list->size);
}
}
return;
}
You can put all these functions into a single source file and the function declarations into a header file. This will allow you to use the functions with other programs and not have to recompile all the time. The void * for the pointer to data will allow you to use the list with many different data types.
(The above code comes as-is and has not been tested with any compiler. The responsibility of bug fixing is up to the user of the examples.)

Resources