I have a struct of the form:
typedef struct node {
unsigned int * keys;
unsigned int * branches;
} NODE;
The number of keys and branches is determined at runtime, but is known. It is derived from another struct:
typedef struct tree {
unsigned int num_keys_per_node;
} TREE;
In order to allocate a NODE for this TREE, the manual steps would be:
NODE node;
unsigned int keys[tree->num_keys_per_node];
unsigned int branches[tree->num_keys_per_node + 1];
node.keys = keys;
node.branches = branches;
I need to allocate a lot of these nodes inside tight loops, only temporarily as I traverse a data structure, discarding them quickly as the node traversal continues. I could write a function that returns a pointer and malloc() the keys and branches on the heap and free() them manually, but I'd prefer to use the stack if possible.
Since this initialization logic is going to be repeated in a number of places, how can I define a macro, so that I can effectively do something like:
NODE node = CREATE_NODE_FOR_TREE(tree);
I'm having difficultly seeing a way to do this which will result in the preprocessor giving a valid syntax.
Happy to hear other approaches to dynamic struct allocation on stack memory too.
EDIT | I should never need more than one node in memory at the same time, so I can re-use the one struct repeatedly too.
Try to pass node as argument to the macro like so:
#define CREATE_NODE_FOR_TREE( \
node, \
tree) \
\
unsigned int keys[tree->num_keys_per_node]; \
unsigned int branches[tree->num_keys_per_node + 1]; \
\
node.keys = keys; \
node.branches = branches;
...
NODE node = {0};
CREATE_NODE_FOR_TREE(node, tree);
...
This solution assumes at least c99.
Compound literals can't be VLA and your size parameter is dynamic, so there is no possibility to do that directly with the syntax that you propose. I'd do the following:
#define NODE_ON_STACK(NAME, TREE) \
NODE NAME = { 0 }; \
register size_t NAME ## keys = (TREE)->num_keys_per_node; \
auto unsigned int NAME ## keys[NAME ## keys]; \
auto unsigned int NAME ## branches[NAME ## keys + 1]; \
node.keys = NAME ## keys; \
node.branches = NAME ## branches
This works at any place in function scope where several declarations can be placed. register and auto ensure that it is never used in file scope. The NAME ## keys variable ensures that the TREE argument is only evaluated once. You also could mangle the names of the identifiers that are generated a bit more to avoid conflicts, if you like.
Nitpicks:
always initialize struct variables
your TREE thing got somehow wrong in the question
the & operator in your question was wrong
int as an integer type is almost certainly wrong, things that are counting stuff should be unsigned
unsigned int is also wrong, size_t is usually best for everything
that is supposed to count objects or parts of it.
Ah and the usual warning tag: VLA as auto variables are to be taken carefully because of stackoverflow. But you knew that already, I suppose.
I think that something like the following might work, though I'm not sure if it's the best way to do this:
#define PASTE2(x,y) x##y
#define PASTE(x,y) PASTE2(x,y)
#define CREATE_NODE_FOR_TREE( n, tree) \
NODE n; \
unsigned int PASTE(n,_keys)[(tree)->num_keys_per_node]; \
unsigned int PASTE(n,_branches)[(tree)->num_keys_per_node + 1]; \
n.keys = &PASTE(n,_keys); \
n.branches = &PASTE(n,_branches);
The token pasting is there so that if at some point you needed to use more than one NODE at a time, the 'hidden' keys and branches locals would have their names 'scoped' to the NODE name to avoid a conflict.
To use it, instead of
NODE node = CREATE_NODE_FOR_TREE(tree);
You'd declare and initialize node like so:
CREATE_NODE_FOR_TREE(node, tree);
Related
Last week I was playing with a threaded suffix tree. The tree is too large to use recursion for traversal, and I have solved the problem in various ways before--using explicit stacks, continuations, you name it--and this time I added threaded pointers from all nodes, so I could traverse the tree without any additional allocation along the way.
The basic structure of nodes is
struct node
{
// more data here...
struct node *child;
tagged_ptr sibling;
};
where tagged_ptr is a pointer to struct node but with the lowest bit used to indicate whether it is pointing at a true sibling or pointing up to a sibling of an ancestor, where the traversal would go after traversing a sub-tree.
The idea is that you can traverse a (sub-)tree just following either child or sibling pointers:
static inline struct node *next(struct node *n)
{
return n->child ? n->child : tp_pointer(n->sibling);
}
...
struct node *sentinel = tp_pointer(n->sibling);
for (; n != sentinel; n = next(n))
// do stuff wit n
(the sentinel is where you return to after seeing the entire subtree of n), or you can traverse just the children of a node, when searching down the tree, with
static inline struct node *next_sibling(struct node *n)
{
return tp_is_taggged(n->sibling) ? 0 : tp_pointer(n->sibling);
}
...
for (struct node *child = n->child;
child;
child = next_sibling(child))
// do something with child...
For this idea, I need to be able to distinguish between true sibling pointers and threaded pointers. At least I think so, I haven't figured out how to recognise when I am through the true children otherwise.
That is where the tagged pointers come in. The alignment of struct node is higher than one
_Static_assert(_Alignof(struct node) > 1,
"Nodes must have alignment higher than one.");
so the least significant bit is free, and I can exploit that. I've used that a couple of times before, and it isn't too difficult to get a tagged pointer. It could be something like this:
typedef uintptr_t tagged_ptr;
static inline tagged_ptr tp_set(tagged_ptr tp) { return tp | 1; }
static inline tagged_ptr tp_unset(tagged_ptr tp) { return tp & ~1; }
static inline void * tp_pointer(tagged_ptr tp) { return (void *)tp_unset(tp); }
static inline bool tp_is_taggged(tagged_ptr tp) { return tp & 1; }
static inline tagged_ptr tag_ptr(void *ptr, bool tag) { return (tagged_ptr)ptr | tag; }
It just bothers me that I am throwing away all type information with this approach. I use the type uintptr_t instead of struct node * so I don't accidentally follow a pointer with a tag, but that is as far as type safety goes. Nothing will prevent me from setting a tagged pointer to struct node * two a pointer to int *.
Of course, in this application it isn't much of an issue. There is only one kind of tagged pointers and I can make sure to cast to the right type. I need some casting anyway to get the bits in the pointer. But I was wondering how far you could get with a generic tagged pointer, if you wanted more type safety.
I can get parts of the way. I can define tagged pointers that remember their type, and I can ensure that you only assign the right kind of pointers to them. Using a union of a pointer and uintptr_t, I make sure that you cannot assign a pointer of the wrong time:
#define tagged_ptr(T) \
_Static_assert(sizeof(T *) == sizeof(uintptr_t), \
"Pointer type must match size of uintptr_t"); \
union { \
T *ptr; \
uintptr_t bits; \
}
#define tp_set(TP, P, TAG) \
do \
{ \
(TP).ptr = P; \
(TP).bits |= ((TAG)&1); \
} while (0)
#define tp_tag(TP) \
((TP).bits & 1)
Now you can declare tagged pointers of different types and you can assign to them and tag them, but only with the right type of pointers.
struct foo
{
int a, b;
tagged_ptr(struct foo) t;
};
_Static_assert(_Alignof(struct foo) > 1,
"Least significant bit must be free for tags.");
_Static_assert(_Alignof(int) > 1,
"Least significant bit must be free for tags.");
...
struct foo *x = malloc(sizeof *x);
tp_set(x->t, x, 1);
assert(tp_tag(x->t) == 1);
int i = 42;
tagged_ptr(int) tip;
tp_set(tip, &i, 0);
assert(tp_tag(tip) == 0);
//tp_set(x->t, &i, 0); // error
//tp_set(tip, x, 0); // error
However, I cannot get the pointer back without using compiler extensions.
If I have __typeof__ I could do this:
#define tp_ptr(TP) \
((__typeof__((TP).ptr))((TP).bits & ~1))
It gets the type from the tagged pointer and returns that, thus keeping the type-checker in the loop.
If I don't have __typeof__ but I have GCC's statement expressions I can provide the type, create a new tagged pointer where I can mask the bits out, check the type of the pointer, mask, and return:
#define tp_ptr(T, TP) \
({ tagged_ptr(T) tp; \
tp.ptr = (TP).ptr; /* checks type */ \
tp.bits &= ~1; \
tp.ptr; })
Is there a more portable way to get the type information preserved when extracting the pointer, i.e. a way to mask the last bit out without throwing the type information away completely? I have to cast to get the bits, of course, but I can preserve the type and cast back with the two approaches above. They just require compiler extensions, so they are not standard compliant.
I realise that it is a bit silly to go for standard C solutions here, considering that the second I start fiddling with the bits in a pointer I have left portability and entered implementation/undefined behaviour, but using the low bit in this way is likely to work more places than compiler extensions will, and I am curious if there is a way to do it.
I don't strictly need it, it just bothers me that I don't know how to do it. I would love either to know that it cannot be done, or know how to do it. Either will suit me equally well. Not knowing bothers me.
As it usually is, you come up with a solution right after you ask...
This will work, and in standard C:
#define tp_ptr(T, TP) \
((T *)0 == (TP).ptr, (T *)((TP).bits & ~1))
I don't need statement expressions if I don't need new variables, and I can simply check the type of the pointer against a NULL pointer of the desired type before I returned the casted manipulated bits in a comma expression.
The ((T *)) == (TP).ptr) expression does the type-checking. After I have made sure that T is the right type, I can return a pointer of that type.
I don't use the result of the comparison, so it doesn't matter if I have a NULL pointer or not, and I would expect any compiler to optimise the comparison away.
I am looking at the <linux/kfifo.h>, specifically the DECLARE_KFIFO macro, and I can't figure out why use an union.
#define DECLARE_KFIFO(name, size) \
union { \
struct kfifo name; \
unsigned char name##kfifo_buffer[size + sizeof(struct kfifo)]; \
}
My questions are as follows:
What is the point of using an union over here ? What design goal does that solve ? Is it for performance reasons ?
How would you even access this anonymous union ? For example this does not work:
#include <stdio.h>
int main()
{
union {
int a;
float b;
};
a = 10;
}
main.c:18:5: error: ‘a’ undeclared (first use in this function)
a = 10;
^
Why declare something that cannot be used ?
How does this work with the INIT_KFIFO macro ? How is able to access the union and what relation does it have with the previous macro ?
If my memory serves me correctly, then in C you can only use one one element in an union. So what is going on here ? It will just use one member from the declared union ?
Note: This code is for the 2.6.33 kernel. I know its old code, but the recent kernel 5.6.12 still uses a union in kfifo, only its a lot more complicated.
The comments to the macro explain almost everything:
/**
* DECLARE_KFIFO - macro to declare a kfifo and the associated buffer
* #name: name of the declared kfifo datatype
* #size: size of the fifo buffer. Must be a power of two.
*
* Note1: the macro can be used inside struct or union declaration
* Note2: the macro creates two objects:
* A kfifo object with the given name and a buffer for the kfifo
* object named name##kfifo_buffer
*/
This macro is used only as a (some other) structure's or union's field. With such usage the macro creates(allocates in the structure or in the union):
a kfifo object which can be accessed as a field with the name name and
a buffer for kfifo.
So, structure declaration like
struct my_struct {
int a;
char b;
DECLARE_KFIFO(my_fifo, 100);
};
gives similar effect as
struct my_struct {
int a;
char b;
// This field may be used for call kfifo functions
struct kfifo my_fifo;
// This field is never used directly.
// Pointer to this buffer is stored in the '.buffer' field of kfifo object.
unsigned char buffer_internal_to_kfifo_implementation[size];
};
It would be more natural for declaring two objects to use a anonymous structure of two fields instead of union:
#define DECLARE_KFIFO(name, size) \
struct { \
struct kfifo name; \
unsigned char name##kfifo_buffer[size]; \
}
It should be exactly anonymous for allow direct access to its name field.
Compared to structure's implementation, actual implementation of DECLARE_KFIFO via anonymous union gives the same "field" name and allocates the same amount of bytes (of number size) for its buffer:
union { \
struct kfifo name; \
unsigned char name##kfifo_buffer[size + sizeof(struct kfifo)]; \
}
It is difficult to say why they choose union instead of struct.
the recent kernel 5.6.12 still uses a union in kfifo, only its a lot more complicated.
Not quite true. The newer kernel versions uses union for absolutely different purposes:
#define __STRUCT_KFIFO_COMMON(datatype, recsize, ptrtype) \
union { \
struct __kfifo kfifo; \
datatype *type; \
const datatype *const_type; \
char (*rectype)[recsize]; \
ptrtype *ptr; \
ptrtype const *ptr_const; \
}
Here kfifo is the only field which is accessed for write/read bytes of data. All other fields are declared just for extract their types using typeof() operator.
So, __STRUCT_KFIFO_COMMON is just a "clever" struct kfifo declaration which knows types of data it contains.
Definition of DECLARE_KFIFO macro
#define DECLARE_KFIFO(fifo, type, size) STRUCT_KFIFO(type, size) fifo
after expanding intermediate macros gives:
#define DECLARE_KFIFO(fifo, type, size) struct { \
__STRUCT_KFIFO_COMMON(type, recsize, ptrtype); \
type buf[((size < 2) || (size & (size - 1))) ? -1 : size]; \
} fifo
So that time it is a field of the structure type with the given name. Similar to the old definition, this structure contain two fields:
struct kfifo object (its "clever" version) and
a buffer for kfifo.
Polymorphic structs are quite common in C but often involve explicit casts which allow for accidentally casting incompatible structs.
struct ID {
char name[32];
};
struct IntID {
struct ID id_base;
int value;
}
struct FloatID {
struct ID id_base;
float value;
}
void id_name_set(ID *id, const char *name)
{
strlcpy(id->name, name, sizeof(id->name));
}
/* macro that happens to use 'id_name_set', this is a bit contrived */
#define ID_NAME_SET_AND_VALUE(id, name, val) \
do { \
id_name_set((ID *)id, name); \
id->value = val; \
} while(0)
void func(void)
{
struct { int value; } not_an_id;
/* this can crash because NotID doesn't have an ID as its first member */
ID_NAME_SET_AND_VALUE(not_an_id, "name", 10);
}
The issue here is we can't type check the id argument in the macro against a single type, since it could be an ID or any struct with an ID as its first member.
A lot of code I've seen simply casts to the struct all over the place, but it seems it is possible to have a more reliable method.
Is there a way to check at compile time?
Note, for the purpose of this question, we can assume all structs use the same member name for the struct they inherit from.
Note, I was hoping to be able to use something like this...
# define CHECK_TYPE_POLYMORPHIC(val, member, struct_name) \
(void)(_Generic((*(val)), \
/* base-struct */ struct_name: 0, \
/* sub-struct */ default: (_Generic(((val)->member), struct_name: 0))))
/* --- snip --- */
/* check that `var` is an `ID`, or `var->id_base` is */
CHECK_TYPE_POLYMORPHIC(var, id_base, ID);
...but this fails for ID types in the default case - because they have no id member.
So far the only way I found to do this is to type-check against a complete list of all structs, which isn't ideal in some cases (may be many — or defined locally, therefore not known to the macro, see: Compile time check against multiple types in C?).
You shouldn't use casts. A cast supposes that you know what you are doing and in the worst case leads to undefined behavior. You'd have to rely on the fact that the types that you are interested in all have that struct ID field with the same name.
Then, in the case that you present where you actually have a do-while kind of functional macro, you can easily place an auxiliary variable:
#define ID_NAME_SET_AND_VALUE(id, name, val) \
do { \
ID* _id = &((id)->id_base)); \
id_name_set(_id, (name)); \
_id->value = (val); \
} while(0)
If all goes well, this is a nop, if not it is a constraint violation and aborts compilation.
In a context where you can't place a variable you could use a compound literal, something like
(ID*){ &((id)->id_base)) }
The closest thing in C11 (the latest C standard) for compile-time polymorphism is its
type-generic expressions using the _Generic keyword, but I am not sure it fits your needs.
The GCC compiler also gives you its __builtin_type_compatible_p with which you could build e.g. some macros.
You could also customize GCC with some MELT extensions
I want to write a linked list that can have the data field store any build-in or user-define types. In C++ I would just use a template, but how do I accomplish this in C?
Do I have to re-write the linked list struct and a bunch of operations of it for each data type I want it to store? Unions wouldn't work because what type can it store is predefined.
There's a reason people use languages other than C.... :-)
In C, you'd have your data structure operate with void* members, and you'd cast wherever you used them to the correct types. Macros can help with some of that noise.
There are different approaches to this problem:
using datatype void*: these means, you have pointers to memory locations whose type is not further specified. If you retrieve such a pointer, you can explicitly state what is inside it: *(int*)(mystruct->voidptr) tells the compiler: look at the memory location mystruct->voidptr and interpret the contents as int.
another thing can be tricky preprocessor directives. However, this is usually a very non-trivial issue:
I also found http://sglib.sourceforge.net/
Edit: For the preprocessor trick:
#include <stdio.h>
#define mytype(t) struct { t val; }
int main(int argc, char *argv[]) {
mytype(int) myint;
myint.val=6;
printf ("%d\n", myint.val);
return 0;
}
This would be a simple wrapper for types, but I think it can become quite complicated.
It's less comfortable in C (there's a reason C++ is called C incremented), but it can be done with generic pointers (void *) and the applocation handles the type management itself.
A very nice implementation of generic data structures in C can be found in ubiqx modules, the sources are definitely worth reading.
With some care, you can do this using macros that build and manipulate structs. One of the most well-tested examples of this is the BSD "queue" library. It works on every platform I've tried (Unix, Windows, VMS) and consists of a single header file (no C file).
It has the unfortunate downside of being a bit hard to use, but it preserves as much type-safety as it can in C.
The header file is here: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/sys/queue.h?rev=1.34;content-type=text%2Fplain, and the documentation on how to use it is here: http://www.openbsd.org/cgi-bin/man.cgi?query=queue.
Beyond that, no, you're stuck with losing type-safety (using (void *) all over the place) or moving to the STL.
Here's an option that's very flexible but requires a lot of work.
In your list node, store a pointer to the data as a void *:
struct node {
void *data;
struct node *next;
};
Then you'd create a suite of functions for each type that handle tasks like comparison, assignment, duplication, etc.:
// create a new instance of the data item and copy the value
// of the parameter to it.
void *copyInt(void *src)
{
int *p = malloc(sizeof *p);
if (p) *p = *(int *)src;
return p;
}
void assignInt(void *target, void *src)
{
// we create a new instance for the assignment
*(int *)target = copyInt(src);
}
// returns -1 if lhs < rhs, 0 if lhs == rhs, 1 if lhs > rhs
int testInt(void *lhs, void *rhs)
{
if (*(int *)lhs < *(int *)rhs) return -1;
else if (*(int *)lhs == *(int *)rhs) return 0;
else return 1;
}
char *intToString(void *data)
{
size_t digits = however_many_digits_in_an_int();
char *s = malloc(digits + 2); // sign + digits + terminator
sprintf(s, "%d", *(int *)data);
return s;
}
Then you could create a list type that has pointers to these functions, such as
struct list {
struct node *head;
void *(*cpy)(void *); // copy operation
int (*test)(void *, void *); // test operation
void (*asgn)(void *, void *); // assign operation
char *(*toStr)(void *); // get string representation
...
}
struct list myIntList;
struct list myDoubleList;
myIntList.cpy = copyInt;
myIntList.test = testInt;
myIntList.asgn = assignInt;
myIntList.toStr = intToString;
myDoubleList.cpy = copyDouble;
myDoubleList.test = testDouble;
myDoubleList.asgn = assignDouble;
myDoubleList.toStr = doubleToString;
...
Then, when you pass the list to an insert or search operation, you'd call the functions from the list object:
void addToList(struct list *l, void *value)
{
struct node *new, *cur = l->head;
while (cur->next != NULL && l->test(cur->data, value) <= 0)
cur = cur->next;
new = malloc(sizeof *new);
if (!new)
{
// handle error here
}
else
{
new->data = l->cpy(value);
new->next = cur->next;
cur->next = new;
if (logging)
{
char *s = l->toStr(new->data);
fprintf(log, "Added value %s to list\n", s);
free(s);
}
}
}
...
i = 1;
addToList(&myIntList, &i);
f = 3.4;
addToList(&myDoubleList, &f);
By delegating the type-aware operations to separate functions called through function pointers, you now have a list structure that can store values of any type. To add support for new types, you only need to implement new copy, assign, toString, etc., functions for that new type.
There are drawbacks. For one thing, you can't use constants as function parameters (e.g., you can't do something simple like addToList(&myIntList, 1);) -- you have to assign everything to a variable first, and pass the address of the variable (which is why you need to create new instances of the data member when you add it to the list; if you just assigned the address of the variable, every element in the list would wind up pointing to the same object, which may no longer exist depending on the context).
Secondly, you wind up doing a lot of memory management; you don't just create a new instance of the list node, but you also must create a new instance of the data member. You must remember to free the data member before freeing the node. Then you're creating a new string instance every time you want to display the data, and you have to remember to free that string when you're done with it.
Finally, this solution throws type safety right out the window and into oncoming traffic (after lighting it on fire). The delegate functions are counting on you to keep the types straight; there's nothing preventing you from passing the address of a double variable to one of the int handling functions.
Between the memory management and the fact that you must make a function call for just about every operation, performance is going to suffer. This isn't a fast solution.
Of course, this assumes that every element in the list is the same type; if you're wanting to store elements of different types in the same list, then you're going to have to do something different, such as associate the functions with each node, rather than the list overall.
I wrote a generic linked list "template" in C using the preprocessor, but it's pretty horrible to look at, and heavily pre-processed code is not easy to debug.
These days I think you'd be better off using some other code generation tool such as Python / Cog: http://www.python.org/about/success/cog/
I agree with JonathanPatschke's answer that you should look at sys/queue.h, although I've never tried it myself, as it is not on some of the platforms I work with. I also agree with Vicki's answer to use Python.
But I've found that five or six very simple C macros meet most of my garden-variety needs. These macros help clean up ugly, bug-prone code, without littering it with hidden void *'s, which destroy type-safety. Some of these macros are:
#define ADD_LINK_TO_END_OF_LIST(add, head, tail) \
if (!(head)) \
(tail) = (head) = (add); \
else \
(tail) = (tail)->next = (add)
#define ADD_DOUBLE_LINK_TO_END_OF_LIST(add, head, tail) \
if (!(head)) \
(tail) = (head) = (add); \
else \
(tail) = ((add)->prev = (tail), (tail)->next = (add))
#define FREE_LINK_IN_LIST(p, dtor) do { /* singly-linked */ \
void *myLocalTemporaryPtr = (p)->next; \
dtor(p); \
(p) = myLocalTemporaryPtr;} while (0)
#define FREE_LINKED_LIST(p, dtor) do { \
while (p) \
FREE_LINK_IN_LIST(p, dtor);} while (0)
// copy "ctor" (shallow)
#define NEW_COPY(p) memcpy(myMalloc(sizeof *(p)), p, sizeof *(p))
// iterator
#define NEXT_IN_LIST(p, list) ((p) ? (p)->next : (list))
So, for example:
struct MyContact {
char *name;
char *address;
char *telephone;
...
struct MyContact *next;
} *myContactList = 0, *myContactTail; // the tail doesn't need to be init'd
...
struct MyContact newEntry = {};
...
ADD_LINK_TO_END_OF_LIST(NEW_COPY(newEntry), myContactList, myContactTail);
...
struct MyContact *i = 0;
while ((i = NEXT_IN_LIST(i, myContactList))) // iterate through list
// ...
The next and prev members have hard-coded names. They don't need to be void *, which avoids problems with strict anti-aliasing. They do need to be zeroed when the data item is created.
The dtor argument for FREE_LINK_IN_LIST would typically be a function like free, or (void) to do nothing, or another macro such as:
#define MY_CONTACT_ENTRY_DTOR(p) \
do { if (p) { \
free((p)->name); \
free((p)->address); \
free((p)->telephone); \
free(p); \
}} while (0)
So for example, FREE_LINKED_LIST(myContactList, MY_CONTACT_ENTRY_DTOR) would free all the members of the (duck-typed) list headed by myContactList.
There is one void * here, but perhaps it could be removed via gcc's typeof.
If you need a list that can hold elements of different types simultaneously, e.g. an int followed by three char * followed by a struct tm, then using void * for the data is the solution. But if you only need multiple list types with identical methods, the best solution depends on if you want to avoid generating many instances of almost identical machine code, or just avoid typing source code.
A struct declaration doesn't generate any machine code...
struct int_node {
void *next;
int data;
};
struct long_node {
void *next;
long data;
};
...and one single function which uses a void * parameter and/or return value, can handle them all.
struct generic_node {
void *next;
};
void *insert(void *before_this, void *element, size_t element_sizes);
The preface to this question is that, I realize C macros are a touchy subject. Many time they can be accomplished by a non-macro solution that is more secure and not subject to classic problems like incremented arguments; so with that out of the way, I have a hash table implementation in C with linked nodes for collision. I'm sure most have seen this a million times but it goes a bit like this.
typedef struct tnode_t {
char* key; void* value; struct tnode_t* next;
} tnode_t;
typedef struct table_t {
tnode_t** nodes;
unsigned long node_count;
unsigned long iterator; // see macro below
...
}
I would like to provide an abstracted way of iterating through the nodes. I considered using a function which takes a function pointer and applies the function to each node but I often find this kind of solution very limiting so I came up with this macro:
#define tbleach(table, node) \
for(node=table->nodes[table->iterator=0];\
table->iterator<table->node_count;\
node=node?node->next:table->nodes[++table->iterator])\
if (node)
Which can be used like:
tnode_t* n;
tbleach(mytable, n) {
do_stuff_to(n->key, n->value);
}
The only downside I can see is that the iterator index is a part of the table so obviously you could not have two loops going on at the same time in the same table. I am not sure how to resolve this but I don't see it as a deal breaker considering how useful this little macro would be. So my question.
** updated **
I incorporated Zack and Jens's suggestion, removing the problem with "else" and declaring the iterator inside the for statement. Everything appears to work but visual studio complains that "type name is not allowed" where the macro is used. I am wondering what exactly happens here because it compiles and runs but I am not sure where the iterator is scoped.
#define tbleach(table, node) \
for(node=table->nodes[0], unsigned long i=0;\
i<table->node_count;\
node=node?node->next:table->nodes[++i])\
if (!node) {} else
Is this approach bad form and if not is there any way to improve it?
The only really unacceptable thing in there is what you already said -- the iterator is part of the table. You should pull that out like so:
typedef unsigned long table_iterator_t;
#define tbleach(table, iter, node) \
for ((iter) = 0, (node) = (table)->nodes[(iter)]; \
(iter) < (table)->node_count; \
(node) = ((node) && (node)->next) \
? (node)->next : (table)->nodes[++(iter)])
// use:
table_iterator_t i;
tnode_t *n;
tbleach(mytable, i, n) {
do_stuff_to(n->key, n->value);
}
I also sucked the if statement into the for-loop expresssions because it's slightly safer that way (weird things will not happen if the next token after the close brace of the loop body is else). Note that the array entry table->nodes[table->node_count] will be read from, unlike the usual convention, so you need to allocate space for it (and make sure it's always NULL). This was true with your version as well, I think.
EDIT: Corrected logic for case where a table entry is NULL.
In addition to Zack's answer about the iterator and about the terminating if that may obfuscate the syntax:
If you have C99, use local variables in the for loop. This will avoid bad surprises of variables of the surrounding scope that will hold dangling pointers. Use something like this inside the macro:
for(nodeS node = ...
And, names with _t at the end are reserved by POSIX. So better not use them for your own types.