Best practice for generic data structure implementation in C

Best practice for generic data structure implementation in C - c

In my adventures implementing generic data structures in C, I've come across a dilemma. For example, in the following code:
void add_something(avl_tree_t * my_tree) {
int new_element = 123;
avl_insert(my_tree, (void*)&new_element);
}
int main() {
avl_tree_t * my_tree = avl_create();
add_something(my_tree);
// do stuff
avl_print(my_tree, function_that_prints_ints);
exit(0);
}
In which avl_insert is defined as
void avl_insert(avl_tree_t * tree, void * data) {
avl_node_t * new_node = malloc(sizeof(struct avl_node));
new_node->data = data;
// do tree balancing stuff
}
In order for my generic insertion function to work, I have to pass it a void * item to store. However, in order for that to work, in this case I need to pass in the address of the new int item I'm adding so that I can then dereference it to a void *. If I am not mistaken, when we're back in the main function, the memory address in which I stored my new element will be compromised.
One way I looked into to solve this issue is to pass in the size of the things I am storing in the tree as a parameter for avl_create, and then allocating memory for a copy of each element I insert. This works because you don't need the original address or value for whatever you added.
Another thing that works is only using the data structure in the span of a single function, which is obviously not viable.
My question is this: what is the best way to go about storing statically allocated data in a generic data structure, be it basic C types or user made structures?
Thank you in advance.

To store pointers to data with automatic storage duration, yes, you would have to know the size of the elements in the container and allocate and copy the pointed-to data.
The simplest way is to just allocate and copy in all cases, optionally using a user-specified clone() or create() function to make deep copies, if necessary. This also entails the use of a user-specified destroy() function to dispose of the copies properly (again, if necessary).
To be able to avoid the allocation, then you have to have some kind of state variable that lets you know if the container should allocate, or just copy the pointer value itself.
Note that this should apply to the container object, not to the individual nodes or elements. If a container stores data in one way or the other, it should store all data that way. See Principle of Least Astonishment.
This is the more complex approach, since you have to be sure to use the correct process for adding and deleting elements based on the state variable. It's ususally much simpler to just make sure you never pass in a pointer to a value with automatic storage duration.

Use a mix-in style; e.g. do not make data part of the node but the node part of the data:
struct avl_node {
struct avl_node *parent;
struct avl_node *left;
struct avl_node *right;
};
struct person {
char const *name;
struct avl_node node;
};
struct animal {
struct avl_node node;
int dangerousness;
};
Constructors for animal are like
struct animal *animal_create(double d)
{
struct animal *animal = malloc(sizeof *animal);
*animal = (struct animal) {
.node = AVL_NODE_INIT(),
.dangerousness = d,
};
return animal;
}
The generic AVL tree operations could look like
void avl_tree_insert(struct avl_node **root, struct avl_node *node,
int (*cmp)(struct avl_node const *a, struct avl_node const *b))
{
/* .... */
}
and a cmp function for animal like
int animal_cmp(struct avl_node const *a_, struct avl_node const *b_)
{
struct animal const *a = container_of(a_, struct animal, node);
struct animal const *b = container_of(b_, struct animal, node);
return a->dangerousness - b->dangerousness;
}

Related

Is it possible to write a generic traverse function in C for different list structures so long as they contain the "next" field?

First time asking a question but I did look around Google and stackoverflow to see if someone has asked something similar before. In malloc, recasting and free, it looked like the OP asked something similar for example. But it was more complicated.
I was wondering whether it's possible to create a generic function for a list structure in C that traverses the list given that you know that the different types of structures will always have a "next" field.
For example, given these two list-type structures:
typedef struct _list1 {
int value;
list1 *next;
} list1;
typedef struct _list2 {
int value;
char *string;
list2 *next;
} list2;
Is it possible to create a generic void freeList((void *) list) function or something which looks something like the below? I am aware it's a simple thing to write both free functions for each individual list separately.
void freeList((void *) list) {
// Included this because the structs would have different sizes
// so I thought it would be possible to cast it in order to properly dereference the field.
if (sizeof *list == sizeof list1)
*list = (list1) list;
else if (sizeof *list == sizeof list2)
*list = (list2) list;
if (!list) return;
else {
free(list->next);
free(list);
}
}
So far, my experiments with the code shown above didn't fare well given that gcc would complain about dereferencing a void * pointer.

Making a heterogeneous list can be achieved by the use of a tagged union, or just a tag and casting:
struct list_item {
struct list_item *next;
enum datatype type;
void *contents;
};
or
struct list_item {
struct list_item *next;
enum datatype type;
union {
int some_int;
char some_char;
} contents;
};
Then while traversing the list you just have to verify the type stored in type before using the contents of the element.
This check:
if (sizeof *list == sizeof list1)
*list = (list1) list;
else if (sizeof *list == sizeof list2)
*list = (list2) list;
doesn't work because sizeof is a static construct: its value is defined at compilation time. You're just asking for the sizeof void.

is it possible to create a generic function for a list structure in C that traverses the list given that you know that the different types of structures will always have a "next" field.
Yes, as mentioned before; you must be careful that every structure starts with the "next" field; the two structures in your post should therefore be reordered like this:
typedef struct _list1 {
list1 *next;
int value;
} list1;
typedef struct _list2 {
list2 *next;
int value;
char *string;
} list2;
It is not clean code, because the compiler could reorder (and pad) the fields of the structure, but in general it should work.
Is it possible to create a generic void freeList((void) *list) function or something which looks something like...
This is possible if your structs do not refer malloced memory; or they do, but in a uniform (and known) way (note the first case is a sub-case of this last).
If the structs contain pointers pointing to memory that has to be freed, in fact, while freeing the struct the freeList() function should also free the referenced memory. A few solutions come to my mind:
1 - If all the different structs contain the same "pointers" layout, the routine can free those pointers in a uniform manner, knowing in advance what to do. In such scenario, one can also use pointer fields that are not used by all the structs, but only some.
2 - Every single instance of a struct could contain some helper field describing the pointer's layout. For example, just after the "next" field, another "mempntcnt" field could tell how many pointers (to be freed) follow the "next" field. Or, this "mempntcnt" could be passed as a parameter to freeList().
3 - This problem could be managed by a totally separated mechanism, outside the scope of freeList(). Much depends on the final usage: I mean, for a given (kind of) linked list, first call a routine that frees all the memory referenced by the list itself, then free the list by calling the common freeList(). After all, if different structs are needed, then different routines are used on them...
I hope I've been clear enough...

If you ensure that the next pointer is the first member of the struct then this is possible.
typedef struct list1 {
// next pointer must be first
struct list1 *next;
int value;
} list1;
typedef struct list2 {
// next pointer must be first
struct list2 *next;
int value;
char *string;
} list2;
void freeList(void *list) {
if (list) {
freeList(*(void**)list);
free(list);
}
}

Define a struct with a member pointing to another member

I'm trying to program a network in C. I have nodes which are linked to each other and I 'd like to do that by making the struct member point to another member (not to another node, because I want to preserve the identity of the links).
The code I made to do that is something like:
struct node{
int k; //number of links
struct node.link **link; //<- wrong
};
but this is not right as node is not a variable but a type of variable (this is already discussed as an error in another QA: first you have to define a variable of node type and then apply the .link, but this doesn't help here). There's also a QA called "Struct member point at another struct member" but they don't do it from definition and it is not so clear how to generalize it (at least for me).
Is it a correct way to do this?

The problem is that the C language doesn't let you create the type you want. You need a type T with the property *T has the same type as T. You can't do that. (Well, function pointers have that property, but that's an irrelevant technicality.)
You have to introduce a new name. C only lets you do this with structs or similar constructions.
struct link {
struct link *ptr;
};
struct node {
int k;
struct link *link;
};
This will get you what you want. Now, in order to go from a struct link * to a struct node *, you'll have to do some pointer math:
struct node *node_from_link(struct link *link) {
return (struct node *) ((char *) link - offsetof(struct node, link));
}
This is also provided by the container_of macro, which is not part of the C standard, but you can find a definition for it online.
Or, you could just go the traditional route.
// Usually easier to do it this way...
struct node {
int k;
struct node *link;
};

Is this what you are after?
struct Node
{
int k; //number of links
void* link;
};
struct Node* create()
{
struct Node* node = malloc(sizeof(struct Node));
node->k = 0;
node->link = 0;
return node;
}
void link(struct Node* from, struct Node* to)
{
from->link = &(to->link);
}
int main()
{
struct Node* child = create();
struct Node* parent = create();
link(parent, child);
return 0;
}
I've used void* for the link for the reason expressed by Dietrich: you want a pointer to the link to be the same type as the link. This effectively means a cast, so why not just use a generic pointer?

Membership in a structure, generalized or specific, is not an attribute of C data types. There is therefore no way to declare a pointer that can only point to a structure member, and not to any other variable of compatible type.
On the other hand, you don't need to do anything special to declare a pointer that can point to a member of another structure. You just need a pointer to that member's data type, and structure membership is irrelevant to that data type.
For example, you can have
struct node {
int k; /* number of links */
struct node **links; /* points to a dynamic array of node pointers */
struct node **one_link; /* points to a node pointer from another node */
};
In that case, it might make sense to do something like this:
struct node *n1 = /* ... */;
struct node *n2 = /* ... */;
n2->one_link = &(n1->links[3]);
Overall, though, I think this is kind of convoluted. There is probably a better way to structure your data.
Update:
Based on your description of what you're after:
[...] links are bidirectional, if I destroy one link (say the one that links node 1 to node 3) I'll need to destroy the node 1 link AND the corresponding link from node 3. Then I need to know more than just who is link to who. I need to know which link they are using.
there are at least two possible solutions, depending on details of how your nodes are structured. If they are structured like I show above, with an array (dynamic or not) of pointers to other nodes, then your general idea simply won't work. That's because the position of each link within an array of links will change as you delete other links (supposing that you close the gaps). Instead, you can just scan:
struct node {
int k; /* number of links */
struct node **links; /* points to a dynamic array of node pointers */
struct node *parent; /* points to a node that links to this one */
};
void delete_node(struct node *n) {
if (n->parent) {
int i;
for (i = 0; i < n->parent->k; i += 1) {
if (n->parent->links[i] == n) {
/* ... delete the ith element of n->parent->links ... */
break;
}
}
}
/* ... clean up node n ... */
}
If one node's links to others are stored in separate members, on the other hand, then you could indeed provide a double-pointer by which to remove links from the parent, but the presence of member k in your original structure tells me that's not your situation.

Ok, this is how I finally solved it in my program:
typedef struct node{
int k; //connectivity
struct link **enlace; //vector of LINKs
}NODE;
typedef struct link{
NODE *node1;
NODE *node2;
}LINK;
Basicly, I defined two structures: one is the NODE type, which contains the information of how connected is the node and a vector of LINKs, and the other is the structure LINK which contains the information of the link itself, I mean which nodes the link connects.
With these two I'm able to create the network of nodes with a connectivity following a Poisson distribution, and then destroy each link one by one, choosing one link at random from a list and then redirecting the pointers of each node to NULL.

Polymorphic data structures in C

I am a C beginner with quite a lot of OOP experience (C#) and I am having trouble understanding how some notion of "polymorphism" can be achieved in C.
Right now, I am thinking how to capture the logical structure of a file system using structs. I have a folder that contains both folders and files. Folders in this folder can contain another files and folders, etc.
My approach:
typedef enum { file, folder } node_type;
struct node;
typedef struct {
node_type type;
char *name;
struct node *next;
struct node *children;
} node;
Is this the best I can do? I have found a lot of posts on "polymorphism in C", but I would like to see how a polymorphic data structure like this can be built cleanly and efficiently (in terms of memory wasted on unused members of those structures).
Thanks.

I hope I understand what you want - I'm unsure but I guess you want to do something like that:
typedef struct
{
int type; // file or folder?
} Item;
typedef struct
{
struct A;
// data related to a file
} File;
typedef struct
{
struct A;
// data related to a folder - like pointer to list of Item
} Folder;
As long as both structure follow the same memory mapping (same variables) and adds to it as a child, you'll be able to use the pointer properly in both structs.
Check this one out as well: How can I simulate OO-style polymorphism in C?
Edit: I'm not sure about the syntax above (took it from the link above). I'm used to writing it this way instead:
typedef struct
{
int type;
// data for file
} File;
typedef struct
{
int type;
// data for folder - list, etc
} Folder;

C has no intrinsic notion of polymorphism.
You will end up implementing the mechanisms that you want from scratch. That's not a bad thing. It gives you a lot more flexibility. For example, C++ virtual methods are hard-wired per class, you can't change method pointers per-instance.
Here are a few ideas:
Your node_type field provides a way to do a runtime type query. Going further, you can pack multiple types into one struct using a discriminated (or tagged) union: http://en.wikipedia.org/wiki/Tagged_union. I'm not sure whether a variant type qualifies as OO though.
Polymorphism is usually about behavior. You could store function pointers ("methods") in the struct, with pointers to different functions providing different behavior for different object instances. The C++ way of doing things is for each class to have a table of function pointers, then each object instance references the table for its class (incidentally the table pointers can also play the role of your node_type for RTTI). This is called a virtual method table.
Data inheritance means that subclasses contain all of the base class' data members plus some extra stuff. In C the easiest way to do this is by embedding the base class struct at the head of the derived class struct. That way a pointer to derived is a pointer to base.
typedef struct BaseClass {
int baseMember;
} BaseClass;
typedef struct DerivedClass {
BaseClass base;
int derivedMember;
} DerivedClass;
You could do worse than read "Inside the C++ Object Model" by Stanley B. Lippman. For example, this will help if you want to get an idea of how to implement multiple inheritance.

Here's an illustration of old-school C polymorphism, based on ancient memories of X/Motif.
If you just want a discriminated union (or even just a typed structure with a child pointer that may be null), it's probably simpler in your case.
enum NodeType { TFile, TFolder };
struct Node {
enum NodeType type;
const char *name;
struct Node *next;
};
struct FileNode {
struct Node base_;
};
struct FolderNode {
struct Node base_;
struct Node *children;
/* assuming children are linked with their next pointers ... */
};
Here are the constructors - I'll leave populating the linked lists as an exercise for the reader ...
struct Node* create_file(const char *name) {
struct FileNode *file = malloc(sizeof(*file));
file->base_.type = TFile;
file->base_.name = name; /* strdup? */
file->base_.next = NULL;
return &file->base_;
}
struct Node* create_folder(const char *name) {
struct FolderNode *folder = malloc(sizeof(*folder));
folder->base_.type = TFolder;
folder->base_.name = name;
folder->base_.next = NULL;
folder->children = NULL;
return &folder->base_;
}
Now we can walk a hierarchy, checking the type of each node and responding appropriately. This relies on the first member subobject having zero offset to the parent - if that doesn't hold (or you need multiple inheritance), you have to use offsetof to convert between base and "derived" types.
void walk(struct Node *root,
void (*on_file)(struct FileNode *),
void (*on_folder)(struct FolderNode *))
{
struct Node *cur = root;
struct FileNode *file;
struct FolderNode *folder;
for (; cur != NULL; cur = cur->next) {
switch (cur->type) {
case TFile:
file = (struct FileNode *)cur;
on_file(file);
break;
case TFolder:
folder = (struct FolderNode *)cur;
on_folder(folder);
walk(folder->children, on_file, on_folder);
break;
}
}
}
Note that we have a sort-of-polymorphic base type, but instead of switching on the type enumeration we could have a more completely polymorphic setup with virtual functions. Just add a function pointer to Node, something like:
void (*visit)(struct Node *self,
void (*on_file)(struct FileNode *),
void (*on_folder)(struct FolderNode *));
and have create_file and create_folder set it to an appropriate function (say, visit_file or visit_folder). Then, instead of switching on the enumerated type, walk would just call
cur->visit(cur, on_file, on_folder);

How is generic list manipulation function written?

I am a beginner in programming, please go easy on me and I am finding difficult to get the answer for my question. I can't get my head around the complex codes. Can some one please explain me with simple coding of how is generic list manipulation function written which accepts elements of any kind? Thanks in advance.

This is normally done using void pointers:
typedef struct node {
struct node *next;
void *data;
} node;
node *insert(node *list, void *data) {
}
node *delete(node *list, node *to_delete) {
}
such manipulation functions do not depend on the actual type of data so they can be implemented generically. For example you can have a data type struct for the data field above:
typedef struct data {
int type;
void *data;
} data;
/* .... */
data d;
d.type = INT;
d.data = malloc(sizeof(int));
node n = {NULL, (void*)&data);

It looks like you need a heterogenous list. Some pointers below:
Make the data element of the list node as a generic structure, which contains an indicator for data type and data.
/** This should be your data node **/
struct nodedata
{
int datatype;
void *data;
};
/** This should be your list node **/
struct listnode
{
struct nodedata *data;
struct listnode *next;
};
Using the above structure, you can store different types of data.
Use function pointers for comparison functions or invoke different functions depending upon the data type.

c function merge help

I have two functions:
void free_this(THIS *this)
{
THIS *this_tmp;
while (this_tmp = this)
{
if (this->str)
free(this->str);
this = this_tmp->next;
free(this_tmp);
}
}
void free_that(THAT *that)
{
THAT *that_tmp;
while (that_tmp = that)
{
if (that->id)
free(that->id);
that = that_tmp->next;
free(that_tmp);
}
}
Since they are very similar I was trying to come up with one function to handle them both. I can already just use a pointer to point to the correct data to free (i.e. point to either str from THIS struct or to id of THAT struct) however I can't figure out how to get around what type of struct is being dealt with since I can't just use a void pointer since void* has no member named 'NEXT'.
Any ideas?
Maybe I should just combine the two structs THIS and THAT into one somehow? here they are:
typedef struct this {
struct this *next;
char *str;
} THIS;
typedef struct that {
struct that *next;
char *id;
unsigned short result;
OTHERTHING *optr;
} THAT;
Could I possibly use the offsetof function somehow to get the next element?

You could implement the free function with a void * and field offsets. Untested:
void free_either(void *either, size_t other_offset, size_t next_offset)
{
void *either_tmp;
while (either_tmp = either)
{
free((char *)either + other_offset);
either_tmp = (char *)either + next_offset;
free(either);
}
}
free_either(this,offsetof(THIS,str),offsetof(THIS,next));
free_either(that,offsetof(THAT,id),offsetof(THAT,next));
You could then create macros to replace the old free_this or free_that functions.

Depends on the exact structure of THIS and THAT. If they are very similar, especially if str and id have the same offsets, you may be able to merge them into one object.
structure THIS {
void* str;
...
};
structure THIS {
void* id; /* is at the same offset as str */
...
};
union THAS {
structure THIS this;
structure THAT that;
void* pointer; /* at the same offset as str and id */
};
/* and use it like */
void free_thas(THAS* thas) {
free(thas->pointer);
...
}
If you have a bad feeling about this, your are right. Some small change in THIS may cause THAT to explode and so on. Don't do it.

You have two different singly linked list types here. You could get around this by creating only a single type:
typedef struct node {
struct node *next;
void *data;
} NODE;
and have data point to either a char* (or simply a char) or another struct with the three data fields from THAT. Of course you have to remember to free() the data in your free_node() function.

Yet another way is through some primitive inheritance:
struct node {
struct node *next;
}
struct this {
struct node mynode;
...
}
struct that {
struct node mynode;
...
}
free_any(struct node *this)
{
struct node *this_tmp;
while (this_tmp = this)
{
this = this_tmp->next;
free(this_tmp);
}
}
This only works if "node" is at the top of the structures, and only allows you to thread one linked-list through these structures.
Also, this doesn't allow you to free anything specific to that type of structure; to do that, you would have to setup a callback function ( either by passing it in the free or in some control structure ) that would get invoked. I would probably instead implement a "pop" function which removes the element from the list, and to free the entire list I would pop off each element and then free them as required.

There are more fancy ways to do what you want - but the following example will suffice.
void free_that(void *mem, int type)
{
switch(type) {
case THIS_FLAG: {
THIS *this = (THIS*)mem;
for(this; this->str != NULL; this = this->next)
(void)free(this->str);
break;
}
case THAT_FLAG: {
THAT *that = (THAT*)mem;
for(that; that->id != NULL; that = that->next)
(void)free(that->id);
}
default: {
(void)free(mem);
}
}
return;
}
The more fancy way would be to add a void *mem as the first element in the structure and assign str and id as pointers that point to mem (where you malloc the memory). Doing this allows you to either always free the mem element or free the offset of zero cast to void*.