C Generic ADT with function pointers - c

I'm writing a generic list adt and this is what I have in the header so far. From what I know this is usually how it's done.
typedef struct _node {
void *data;
struct _node *next;
} Node;
typedef struct {
Node *dummy;
int (*comparePtr) (void *d1, void *d2);
void (*destroyPtr) (void *data);
} List;
List *ListCreate (int (*comparePtr) (void *d1, void *d2), void (*destroyPtr) (void *data));
void ListDestroy (List *node);
void ListAddToTail (List *list, void *data);
int ListContains (List *list, void *data);
void *ListGetFromIndex (List *list, int index);
It works fine on the implementation side. What I noticed is that in order to use this adt to store integers I have to make calls in this fashion
int a = 5;
ListAddToTail (list, &a);
whereas in a perfect world I'd be able to do this
ListAddToTail (list, 55);
So the question is is it possible to modify this to allow me to pass in any type of data, pointer or non-pointer, non-pointer being mainly primitive types like integers and characters?

There's no clean, completely nice way to solve this. You have a few options:
On most platforms you can simply get away with stuffing an integer in a void *. It's messy but it works pretty well, especially if you silence the warnings
Define your own boxing functions / macros that allocate the required space and give you back a pointer. You can probably make a really nice macro using typeof tricks. But then you have to remember to free that space
The main issue should be uniformity. Your list lets people store pointers. You should let them deal with questions like "how do I get a pointer to my data".
EDIT
I just made a primitive "box" macro:
#define box(value) \
({ \
typeof(value) *ptr = malloc(sizeof *ptr); \
*ptr = value; \
ptr; \
})

Related

Storing and using type information in C

I'm coming from Java and I'm trying to implement a doubly linked list in C as an exercise. I wanted to do something like the Java generics where I would pass a pointer type to the list initialization and this pointer type would be use to cast the list void pointer but I'm not sure if this is possible?
What I'm looking for is something that can be stored in a list struct and used to cast *data to the correct type from a node. I was thinking of using a double pointer but then I'd need to declare that as a void pointer and I'd have the same problem.
typedef struct node {
void *data;
struct node *next;
struct node *previous;
} node;
typedef struct list {
node *head;
node *tail;
//??? is there any way to store the data type of *data?
} list;
Typically, the use of specific functions like the following are used.
void List_Put_int(list *L, int *i);
void List_Put_double(list *L, double *d);
int * List_Get_int(list *L);
double *List_Get_double(list *L);
A not so easy for learner approach uses _Generic. C11 offers _Generic which allows for code, at compile time, to be steered as desired based on type.
The below offers basic code to save/fetch to 3 types of pointers. The macros would need expansion for each new types. _Generic does not allow 2 types listed that may be the same like unsigned * and size_t *. So there are are limitations.
The type_id(X) macros creates an enumeration for the 3 types which may be use to check for run-time problems as with LIST_POP(L, &d); below.
typedef struct node {
void *data;
int type;
} node;
typedef struct list {
node *head;
node *tail;
} list;
node node_var;
void List_Push(list *l, void *p, int type) {
// tbd code - simplistic use of global for illustration only
node_var.data = p;
node_var.type = type;
}
void *List_Pop(list *l, int type) {
// tbd code
assert(node_var.type == type);
return node_var.data;
}
#define cast(X,ptr) _Generic((X), \
double *: (double *) (ptr), \
unsigned *: (unsigned *) (ptr), \
int *: (int *) (ptr) \
)
#define type_id(X) _Generic((X), \
double *: 1, \
unsigned *: 2, \
int *: 3 \
)
#define LIST_PUSH(L, data) { List_Push((L),(data), type_id(data)); }
#define LIST_POP(L, dataptr) (*(dataptr)=cast(*dataptr, List_Pop((L), type_id(*dataptr))) )
Usage example and output
int main() {
list *L = 0; // tbd initialization
int i = 42;
printf("%p %d\n", (void*) &i, i);
LIST_PUSH(L, &i);
int *j;
LIST_POP(L, &j);
printf("%p %d\n", (void*) j, *j);
double *d;
LIST_POP(L, &d);
}
42
42
assertion error
There is no way to do what you want in C. There is no way to store a type in a variable and C doesn't have a template system like C++ that would allow you to fake it in the preprocessor.
You could define your own template-like macros that could quickly define your node and list structs for whatever type you need, but I think that sort of hackery is generally frowned upon unless you really need a whole bunch of linked lists that only differ in the type they store.
C doesn't have any runtime type information and doesn't have a type "Type". Types are meaningless once the code was compiled. So, there's no solution to what you ask provided by the language.
One common reason you would want to have a type available at runtime is that you have some code that might see different instances of your container and must do different things for different types stored in the container. You can easily solve such a situation using an enum, e.g.
enum ElementType
{
ET_INT; // int
ET_DOUBLE; // double
ET_CAR; // struct Car
// ...
};
and enumerate any type here that should ever go into your container. Another reason is if your container should take ownership of the objects stored in it and therefore must know how to destroy them (and sometimes how to clone them). For such cases, I recommend the use of function pointers:
typedef void (*ElementDeleter)(void *element);
typedef void *(*ElementCloner)(const void *element);
Then extend your struct to contain these:
typedef struct list {
node *head;
node *tail;
ElementDeleter deleter;
ElementCloner cloner;
} list;
Make sure they are set to a function that actually deletes resp. clones an element of the type to be stored in your container and then use them where needed, e.g. in a remove function, you could do something like
myList->deleter(myNode->data);
// delete the contained element without knowing its type
create enum type, that will store data type and alloc memory according to this enum. This could be done in switch/case construction.
Unlike Java or C++, C does not provide any type safety. To answer your question succinctly, by rearranging your node type this way:
struct node {
node* prev; /* put these at front */
node* next;
/* no data here */
};
You could then separately declare nodes carrying any data
struct data_node {.
data_node *prev; // keep these two data members at the front
data_node *next; // and in the same order as in struct list.
// you can add more data members here.
};
/* OR... */
enter code here
struct data_node2 {
node node_data; /* WANING: this may look a bit safer, but is _only_ if placed at the front.
/* more data ... */
};
You can then create a library that operates on data-less lists of nodes.
void list_add(list* l, node* n);
void list_remove(list* l, node* n);
/* etc... */
And by casting, use this 'generic lists' api to do operation on your list
You can have some sort of type information in your list declaration, for what it's worth, since C does not provide meaningful type protection.
struct data_list
{
data_node* head; /* this makes intent clear. */
data_node* tail;
};
struct data2_list
{
data_node2* head;
data_node2* tail;
};
/* ... */
data_node* my_data_node = malloc(sizeof(data_node));
data_node2* my_data_node2 = malloc(sizeof(data_node2));
/* ... */
list_add((list*)&my_list, (node*)my_data_node);
list_add((list*)&my_list2, &(my_data_node2->node_data));
/* warning above is because one could write this */
list_add((list*)&my_list2, (node*)my_data_node2);
/* etc... */
These two techniques generate the same object code, so which one you choose is up to you, really.
As an aside, avoid the typedef struct notation if your compiler allows, most compilers do, these days. It increases readability in the long run, IMHO. You can be certain some won't and some will agree with me on this subject though.

Coding Style -- Pass by Reference or Pass by Value?

In order to simplify the development of future school assignments I decided to create an API (is that what you would call it?) for two data structures I commonly use -- a linked list and a hash table.
In developing each of these I ended up with the following two insert functions:
int list_insert(list *l, char *data, unsigned int idx);
int hash_insert(hash_table **ht, char *data);
The list_insert() function (and all of the list functions) ended up being pass-by-value since I never had any need to directly modify the list * itself unless I was malloc'ing or free'ing it. However, because I wanted to include auto-rehashing in my hash table I found that I had to pass the table by-reference instead of by-value in any function that might force a rehash. Now I end up with syntax like the following:
list_insert(l, "foo", 3);
hash_insert(&ht, "foo");
The difference strikes me as a little odd and I found myself wondering if I should change the list functions to be pass-by-reference as well for consistency's sake -- even though none of my functions would need to leverage it. What's the typical consensus here? Should I only pass-by-reference if my function actually needs to modify its arguments or should I pass-by-reference for the sake of consistency?
Structure definitions:
typedef struct list_node list_node;
struct list_node {
char *data;
list_node *next;
list_node *prev;
};
typedef struct list list;
struct list {
list_node *head;
list_node *tail;
size_t size;
};
typedef struct hash_table hash_table;
struct hash_table {
list **table;
size_t entries;
size_t buckets;
float maxLoad;
unsigned int (*hash)(char*, unsigned int);
};
List functions:
list *list_createList();
list_node *list_createNode();
void list_destroyList(list *l);
void list_destroyNode(list_node *n);
int list_append(list *l, char *data);
int list_insert(list *l, char *data, unsigned int idx);
int list_remove(list *l, char *data, int (*compar)(const void*, const void*));
void list_push(list *l, char *data);
char *list_pop(list *l);
int list_count(list *l, char *data, int (*compar)(const void*, const void*));
int list_reverse(list *l);
int list_sort(list *l, int (*compar)(const void*, const void*));
int list_print(list *l, void (*print)(char *data));
Hash functions:
hash_table *hash_createTable(size_t buckets, float maxLoad, unsigned int (*hash)(char*, unsigned int));
void hash_destroyTable(hash_table *ht);
list *hash_list(const hash_table **ht);
int hash_checkLoad(hash_table **ht);
int hash_rehash(hash_table **ht);
int hash_insert(hash_table **ht, char *data);
void hash_stats(hash_table *ht);
int hash_print(hash_table *ht, void (*print)(char*));
Here is a general rule of thumb:
pass by value if its typdef is a native type (char, short, int, long, long long, double or float)
pass by reference if it is a union, struct or array
Additional considerations for passing by reference:
use const if it will not be modified
use restrict if pointers will not point to the same address
Sometimes a struct/union seems like the appropriate type, but can be replaced with arrays if the types are similar. This can help with optimization (loop vectorization for example)
That's up to you and takes a little intuition. When passing large structs I pass by reference so that I am not eating up extra stack space and burning cycles copying the struct. But with small struts like yours it may be more efficient to use the stack depending on your target processor, how often you are using the values, and what your compiler does. Your compiler may break that struct up and put its values into registers.
But if you do pass by reference and do not intend to modify the value it is best practice to pass a pointer to const, eg: const list * l. That way there isn't any risk of you accidentally modifying the value and it makes the interface cleaner- now the caller knows that the value won't be changing.
Consistency is nice and I personally would lean in that direction especially on large interface because it may make things easier in the long run, but I would definitely use const. In doing so you allow the compiler to discover any accidental assignments so that later you don't need to track down a hard to bug.
See also: Passing a struct to a function in C

Is It A Generic Stack Data Structure Linked List Implementation in C?

My college professor taught us that a generic stack looks something like this (I basically copy-pasted this from the course support files):
typedef struct
{ size_t maxe, dime;
char *b, *sv, *vf;
} TStiva, *ASt;
#define DIME(a) (((ASt)(a))->dime)
#define BS(a) (((ASt)(a))->b)
#define SV(a) (((ASt)(a))->sv)
#define VF(a) (((ASt)(a))->vf)
#define DIMDIF(s,d) (DIME(s) != DIME(d))
#define VIDA(a) (VF(a) == BS(a))
#define PLINA(a) (VF(a) == SV(a))
// Function Declarations
void* InitS(size_t d,...);
int Push(void* a, void* ae);
int Pop (void* a, void* ae);
int Top (void* a, void* ae);
void *InitS(size_t d,...)
{ ASt a = (ASt)malloc(sizeof (TStiva));
va_list ap;
if (!a) return NULL;
va_start(ap,d);
a->maxe = va_arg(ap,size_t);
va_end(ap);
a->dime = d;
a->b = (char*)calloc(a->maxe, d);
if (!a->b) { free(a); return NULL; }
a->vf = a->b;
a->sv = a->b + d * a->maxe;
return (void *)a;
}
int Push(void *a, void *ae)
{ if( PLINA(a)) return 0;
memcpy (VF(a), ae, DIME(a));
VF(a) += DIME(a);
return 1;
}
int Pop(void *a, void *ae)
{ if(VIDA(a)) return 0;
VF(a) -= DIME(a);
memcpy (ae, VF(a), DIME(a));
return 1;
}
int Top(void *a, void *ae)
{ if(VIDA(a)) return 0;
memcpy (ae, VF(a)-DIME(a), DIME(a));
return 1;
}
Anyway, what this wants to be is a generic stack implementation with vectors, from which I don't understand why in the Top, Push and Pop functions need to refer to the stack data structure as a void *.
By generic, doesn't it want to mean that the value the data structure wants to hold is generic? This meaning that if you refer to your generic data structure as the typedef instead of void * it doesn't certainly mean that it's not generic.
I am asking this because I am about to create a Generic Stack implemented with Linked Lists and I am a bit confused.
This is my generic linked list data structure:
typedef struct Element {
struct Element *next;
void *value;
} TElement, *TList, **AList;
And for the Stack:
typedef struct Stack {
size_t size;
TList top;
} TStack, *AStack;
/* Function Definitions */
TStack InitStack(size_t);
void DeleteStack(AStack);
int Push(TStack, void*);
int Pop(TStack, void*);
int Top(TStack, void*);
Does anything seem not generic in my implementation?
Generic means that it can hold ANY data type (char*, int*, etc..), or contain any data type. Void pointers void * in C allow you to cast items as such and get those items out(having to re-cast them on retrieval.
So, it allows the program to be ignorant of the data types that you have in your custom data structure.
Referring to the structure itself(as long as you are not specifying the data that is held in said structure), does not break generalities. So, you can specifically mention your TStack in your functions as long as the data that is manipulated inside of that stack is general(id est void *).
void* is for generic purposes. Imagine it as a pointer to the memory, where of course the memory can hold anything. By void* you mean that you do not know what you point to, but you know that you point to something.
Yes a void*can correctly implement a generic stack, but that creates a problem that you have no idea about the type of data you are storing in the Stack. The concept of void* is that it is pointing to some valid block of memory, but there is absolutely no clue as to the type of the memory. So, the code that is using this generic stack has to do type conversion explicitly. void* are used only to store data, manipulation with them are disallowed.

Is there something in C like C++ templates? If not, how to re-use structures and functions for different data types?

I want to write a linked list that can have the data field store any build-in or user-define types. In C++ I would just use a template, but how do I accomplish this in C?
Do I have to re-write the linked list struct and a bunch of operations of it for each data type I want it to store? Unions wouldn't work because what type can it store is predefined.
There's a reason people use languages other than C.... :-)
In C, you'd have your data structure operate with void* members, and you'd cast wherever you used them to the correct types. Macros can help with some of that noise.
There are different approaches to this problem:
using datatype void*: these means, you have pointers to memory locations whose type is not further specified. If you retrieve such a pointer, you can explicitly state what is inside it: *(int*)(mystruct->voidptr) tells the compiler: look at the memory location mystruct->voidptr and interpret the contents as int.
another thing can be tricky preprocessor directives. However, this is usually a very non-trivial issue:
I also found http://sglib.sourceforge.net/
Edit: For the preprocessor trick:
#include <stdio.h>
#define mytype(t) struct { t val; }
int main(int argc, char *argv[]) {
mytype(int) myint;
myint.val=6;
printf ("%d\n", myint.val);
return 0;
}
This would be a simple wrapper for types, but I think it can become quite complicated.
It's less comfortable in C (there's a reason C++ is called C incremented), but it can be done with generic pointers (void *) and the applocation handles the type management itself.
A very nice implementation of generic data structures in C can be found in ubiqx modules, the sources are definitely worth reading.
With some care, you can do this using macros that build and manipulate structs. One of the most well-tested examples of this is the BSD "queue" library. It works on every platform I've tried (Unix, Windows, VMS) and consists of a single header file (no C file).
It has the unfortunate downside of being a bit hard to use, but it preserves as much type-safety as it can in C.
The header file is here: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/sys/queue.h?rev=1.34;content-type=text%2Fplain, and the documentation on how to use it is here: http://www.openbsd.org/cgi-bin/man.cgi?query=queue.
Beyond that, no, you're stuck with losing type-safety (using (void *) all over the place) or moving to the STL.
Here's an option that's very flexible but requires a lot of work.
In your list node, store a pointer to the data as a void *:
struct node {
void *data;
struct node *next;
};
Then you'd create a suite of functions for each type that handle tasks like comparison, assignment, duplication, etc.:
// create a new instance of the data item and copy the value
// of the parameter to it.
void *copyInt(void *src)
{
int *p = malloc(sizeof *p);
if (p) *p = *(int *)src;
return p;
}
void assignInt(void *target, void *src)
{
// we create a new instance for the assignment
*(int *)target = copyInt(src);
}
// returns -1 if lhs < rhs, 0 if lhs == rhs, 1 if lhs > rhs
int testInt(void *lhs, void *rhs)
{
if (*(int *)lhs < *(int *)rhs) return -1;
else if (*(int *)lhs == *(int *)rhs) return 0;
else return 1;
}
char *intToString(void *data)
{
size_t digits = however_many_digits_in_an_int();
char *s = malloc(digits + 2); // sign + digits + terminator
sprintf(s, "%d", *(int *)data);
return s;
}
Then you could create a list type that has pointers to these functions, such as
struct list {
struct node *head;
void *(*cpy)(void *); // copy operation
int (*test)(void *, void *); // test operation
void (*asgn)(void *, void *); // assign operation
char *(*toStr)(void *); // get string representation
...
}
struct list myIntList;
struct list myDoubleList;
myIntList.cpy = copyInt;
myIntList.test = testInt;
myIntList.asgn = assignInt;
myIntList.toStr = intToString;
myDoubleList.cpy = copyDouble;
myDoubleList.test = testDouble;
myDoubleList.asgn = assignDouble;
myDoubleList.toStr = doubleToString;
...
Then, when you pass the list to an insert or search operation, you'd call the functions from the list object:
void addToList(struct list *l, void *value)
{
struct node *new, *cur = l->head;
while (cur->next != NULL && l->test(cur->data, value) <= 0)
cur = cur->next;
new = malloc(sizeof *new);
if (!new)
{
// handle error here
}
else
{
new->data = l->cpy(value);
new->next = cur->next;
cur->next = new;
if (logging)
{
char *s = l->toStr(new->data);
fprintf(log, "Added value %s to list\n", s);
free(s);
}
}
}
...
i = 1;
addToList(&myIntList, &i);
f = 3.4;
addToList(&myDoubleList, &f);
By delegating the type-aware operations to separate functions called through function pointers, you now have a list structure that can store values of any type. To add support for new types, you only need to implement new copy, assign, toString, etc., functions for that new type.
There are drawbacks. For one thing, you can't use constants as function parameters (e.g., you can't do something simple like addToList(&myIntList, 1);) -- you have to assign everything to a variable first, and pass the address of the variable (which is why you need to create new instances of the data member when you add it to the list; if you just assigned the address of the variable, every element in the list would wind up pointing to the same object, which may no longer exist depending on the context).
Secondly, you wind up doing a lot of memory management; you don't just create a new instance of the list node, but you also must create a new instance of the data member. You must remember to free the data member before freeing the node. Then you're creating a new string instance every time you want to display the data, and you have to remember to free that string when you're done with it.
Finally, this solution throws type safety right out the window and into oncoming traffic (after lighting it on fire). The delegate functions are counting on you to keep the types straight; there's nothing preventing you from passing the address of a double variable to one of the int handling functions.
Between the memory management and the fact that you must make a function call for just about every operation, performance is going to suffer. This isn't a fast solution.
Of course, this assumes that every element in the list is the same type; if you're wanting to store elements of different types in the same list, then you're going to have to do something different, such as associate the functions with each node, rather than the list overall.
I wrote a generic linked list "template" in C using the preprocessor, but it's pretty horrible to look at, and heavily pre-processed code is not easy to debug.
These days I think you'd be better off using some other code generation tool such as Python / Cog: http://www.python.org/about/success/cog/
I agree with JonathanPatschke's answer that you should look at sys/queue.h, although I've never tried it myself, as it is not on some of the platforms I work with. I also agree with Vicki's answer to use Python.
But I've found that five or six very simple C macros meet most of my garden-variety needs. These macros help clean up ugly, bug-prone code, without littering it with hidden void *'s, which destroy type-safety. Some of these macros are:
#define ADD_LINK_TO_END_OF_LIST(add, head, tail) \
if (!(head)) \
(tail) = (head) = (add); \
else \
(tail) = (tail)->next = (add)
#define ADD_DOUBLE_LINK_TO_END_OF_LIST(add, head, tail) \
if (!(head)) \
(tail) = (head) = (add); \
else \
(tail) = ((add)->prev = (tail), (tail)->next = (add))
#define FREE_LINK_IN_LIST(p, dtor) do { /* singly-linked */ \
void *myLocalTemporaryPtr = (p)->next; \
dtor(p); \
(p) = myLocalTemporaryPtr;} while (0)
#define FREE_LINKED_LIST(p, dtor) do { \
while (p) \
FREE_LINK_IN_LIST(p, dtor);} while (0)
// copy "ctor" (shallow)
#define NEW_COPY(p) memcpy(myMalloc(sizeof *(p)), p, sizeof *(p))
// iterator
#define NEXT_IN_LIST(p, list) ((p) ? (p)->next : (list))
So, for example:
struct MyContact {
char *name;
char *address;
char *telephone;
...
struct MyContact *next;
} *myContactList = 0, *myContactTail; // the tail doesn't need to be init'd
...
struct MyContact newEntry = {};
...
ADD_LINK_TO_END_OF_LIST(NEW_COPY(newEntry), myContactList, myContactTail);
...
struct MyContact *i = 0;
while ((i = NEXT_IN_LIST(i, myContactList))) // iterate through list
// ...
The next and prev members have hard-coded names. They don't need to be void *, which avoids problems with strict anti-aliasing. They do need to be zeroed when the data item is created.
The dtor argument for FREE_LINK_IN_LIST would typically be a function like free, or (void) to do nothing, or another macro such as:
#define MY_CONTACT_ENTRY_DTOR(p) \
do { if (p) { \
free((p)->name); \
free((p)->address); \
free((p)->telephone); \
free(p); \
}} while (0)
So for example, FREE_LINKED_LIST(myContactList, MY_CONTACT_ENTRY_DTOR) would free all the members of the (duck-typed) list headed by myContactList.
There is one void * here, but perhaps it could be removed via gcc's typeof.
If you need a list that can hold elements of different types simultaneously, e.g. an int followed by three char * followed by a struct tm, then using void * for the data is the solution. But if you only need multiple list types with identical methods, the best solution depends on if you want to avoid generating many instances of almost identical machine code, or just avoid typing source code.
A struct declaration doesn't generate any machine code...
struct int_node {
void *next;
int data;
};
struct long_node {
void *next;
long data;
};
...and one single function which uses a void * parameter and/or return value, can handle them all.
struct generic_node {
void *next;
};
void *insert(void *before_this, void *element, size_t element_sizes);

How to create a generic C free function

I have some C structures related to a 'list' data structure.
They look like this.
struct nmlist_element_s {
void *data;
struct nmlist_element_s *next;
};
typedef struct nmlist_element_s nmlist_element;
struct nmlist_s {
void (*destructor)(void *data);
int (*cmp)(const void *e1, const void *e2);
unsigned int size;
nmlist_element *head;
nmlist_element *tail;
};
typedef struct nmlist_s nmlist;
This way I can have different data types being hold in "nmlist_element->data" .
The "constructor" (in terms of OOP) has the following signature:
nmlist *nmlist_alloc(void (*destructor)(void *data));
Where "destructor" is specific function that de-allocated "data" (being hold by the nmlist_element).
If I want to have a list containing integers as data, my "destructor" would like this:
void int_destructor(void *data)
{
free((int*)data);
}
Still i find it rather "unfriendly" for me to write a destructor functions for every simple primitive data type. So is there a trick to write something like this ? (for primitives):
void "x"_destructor(void *data, "x")
{
free(("x" *)data);
}
PS: I am not a macro fan myself, and in my short experience regarding C, i don't use them, unless necessary.
The C free() function is already generic. Just use free(data).
You shouldn't need to cast it to an int * to free it. You can just free the void * directly.
If you really want to do it with macros (or for future reference if you really do have a function that is not generic, unlike free()), the the best way is to use X macros.
#define TYPES_TO_DESTRUCT \
X(int) \
X(char) \
X(t_mytype)
#define X(type) \
void type##_destructor(void *data) { free((type *)data); }
TYPES_TO_DESTRUCT
#undef X
I think Kosta's got it right. But you're asking for something like
#define DESTROY(x,data) (free((x##*)data))

Resources