How evil is this foreach C macro? - c

The preface to this question is that, I realize C macros are a touchy subject. Many time they can be accomplished by a non-macro solution that is more secure and not subject to classic problems like incremented arguments; so with that out of the way, I have a hash table implementation in C with linked nodes for collision. I'm sure most have seen this a million times but it goes a bit like this.
typedef struct tnode_t {
char* key; void* value; struct tnode_t* next;
} tnode_t;
typedef struct table_t {
tnode_t** nodes;
unsigned long node_count;
unsigned long iterator; // see macro below
...
}
I would like to provide an abstracted way of iterating through the nodes. I considered using a function which takes a function pointer and applies the function to each node but I often find this kind of solution very limiting so I came up with this macro:
#define tbleach(table, node) \
for(node=table->nodes[table->iterator=0];\
table->iterator<table->node_count;\
node=node?node->next:table->nodes[++table->iterator])\
if (node)
Which can be used like:
tnode_t* n;
tbleach(mytable, n) {
do_stuff_to(n->key, n->value);
}
The only downside I can see is that the iterator index is a part of the table so obviously you could not have two loops going on at the same time in the same table. I am not sure how to resolve this but I don't see it as a deal breaker considering how useful this little macro would be. So my question.
** updated **
I incorporated Zack and Jens's suggestion, removing the problem with "else" and declaring the iterator inside the for statement. Everything appears to work but visual studio complains that "type name is not allowed" where the macro is used. I am wondering what exactly happens here because it compiles and runs but I am not sure where the iterator is scoped.
#define tbleach(table, node) \
for(node=table->nodes[0], unsigned long i=0;\
i<table->node_count;\
node=node?node->next:table->nodes[++i])\
if (!node) {} else
Is this approach bad form and if not is there any way to improve it?

The only really unacceptable thing in there is what you already said -- the iterator is part of the table. You should pull that out like so:
typedef unsigned long table_iterator_t;
#define tbleach(table, iter, node) \
for ((iter) = 0, (node) = (table)->nodes[(iter)]; \
(iter) < (table)->node_count; \
(node) = ((node) && (node)->next) \
? (node)->next : (table)->nodes[++(iter)])
// use:
table_iterator_t i;
tnode_t *n;
tbleach(mytable, i, n) {
do_stuff_to(n->key, n->value);
}
I also sucked the if statement into the for-loop expresssions because it's slightly safer that way (weird things will not happen if the next token after the close brace of the loop body is else). Note that the array entry table->nodes[table->node_count] will be read from, unlike the usual convention, so you need to allocate space for it (and make sure it's always NULL). This was true with your version as well, I think.
EDIT: Corrected logic for case where a table entry is NULL.

In addition to Zack's answer about the iterator and about the terminating if that may obfuscate the syntax:
If you have C99, use local variables in the for loop. This will avoid bad surprises of variables of the surrounding scope that will hold dangling pointers. Use something like this inside the macro:
for(nodeS node = ...
And, names with _t at the end are reserved by POSIX. So better not use them for your own types.

Related

Type-checking tagged pointers

Last week I was playing with a threaded suffix tree. The tree is too large to use recursion for traversal, and I have solved the problem in various ways before--using explicit stacks, continuations, you name it--and this time I added threaded pointers from all nodes, so I could traverse the tree without any additional allocation along the way.
The basic structure of nodes is
struct node
{
// more data here...
struct node *child;
tagged_ptr sibling;
};
where tagged_ptr is a pointer to struct node but with the lowest bit used to indicate whether it is pointing at a true sibling or pointing up to a sibling of an ancestor, where the traversal would go after traversing a sub-tree.
The idea is that you can traverse a (sub-)tree just following either child or sibling pointers:
static inline struct node *next(struct node *n)
{
return n->child ? n->child : tp_pointer(n->sibling);
}
...
struct node *sentinel = tp_pointer(n->sibling);
for (; n != sentinel; n = next(n))
// do stuff wit n
(the sentinel is where you return to after seeing the entire subtree of n), or you can traverse just the children of a node, when searching down the tree, with
static inline struct node *next_sibling(struct node *n)
{
return tp_is_taggged(n->sibling) ? 0 : tp_pointer(n->sibling);
}
...
for (struct node *child = n->child;
child;
child = next_sibling(child))
// do something with child...
For this idea, I need to be able to distinguish between true sibling pointers and threaded pointers. At least I think so, I haven't figured out how to recognise when I am through the true children otherwise.
That is where the tagged pointers come in. The alignment of struct node is higher than one
_Static_assert(_Alignof(struct node) > 1,
"Nodes must have alignment higher than one.");
so the least significant bit is free, and I can exploit that. I've used that a couple of times before, and it isn't too difficult to get a tagged pointer. It could be something like this:
typedef uintptr_t tagged_ptr;
static inline tagged_ptr tp_set(tagged_ptr tp) { return tp | 1; }
static inline tagged_ptr tp_unset(tagged_ptr tp) { return tp & ~1; }
static inline void * tp_pointer(tagged_ptr tp) { return (void *)tp_unset(tp); }
static inline bool tp_is_taggged(tagged_ptr tp) { return tp & 1; }
static inline tagged_ptr tag_ptr(void *ptr, bool tag) { return (tagged_ptr)ptr | tag; }
It just bothers me that I am throwing away all type information with this approach. I use the type uintptr_t instead of struct node * so I don't accidentally follow a pointer with a tag, but that is as far as type safety goes. Nothing will prevent me from setting a tagged pointer to struct node * two a pointer to int *.
Of course, in this application it isn't much of an issue. There is only one kind of tagged pointers and I can make sure to cast to the right type. I need some casting anyway to get the bits in the pointer. But I was wondering how far you could get with a generic tagged pointer, if you wanted more type safety.
I can get parts of the way. I can define tagged pointers that remember their type, and I can ensure that you only assign the right kind of pointers to them. Using a union of a pointer and uintptr_t, I make sure that you cannot assign a pointer of the wrong time:
#define tagged_ptr(T) \
_Static_assert(sizeof(T *) == sizeof(uintptr_t), \
"Pointer type must match size of uintptr_t"); \
union { \
T *ptr; \
uintptr_t bits; \
}
#define tp_set(TP, P, TAG) \
do \
{ \
(TP).ptr = P; \
(TP).bits |= ((TAG)&1); \
} while (0)
#define tp_tag(TP) \
((TP).bits & 1)
Now you can declare tagged pointers of different types and you can assign to them and tag them, but only with the right type of pointers.
struct foo
{
int a, b;
tagged_ptr(struct foo) t;
};
_Static_assert(_Alignof(struct foo) > 1,
"Least significant bit must be free for tags.");
_Static_assert(_Alignof(int) > 1,
"Least significant bit must be free for tags.");
...
struct foo *x = malloc(sizeof *x);
tp_set(x->t, x, 1);
assert(tp_tag(x->t) == 1);
int i = 42;
tagged_ptr(int) tip;
tp_set(tip, &i, 0);
assert(tp_tag(tip) == 0);
//tp_set(x->t, &i, 0); // error
//tp_set(tip, x, 0); // error
However, I cannot get the pointer back without using compiler extensions.
If I have __typeof__ I could do this:
#define tp_ptr(TP) \
((__typeof__((TP).ptr))((TP).bits & ~1))
It gets the type from the tagged pointer and returns that, thus keeping the type-checker in the loop.
If I don't have __typeof__ but I have GCC's statement expressions I can provide the type, create a new tagged pointer where I can mask the bits out, check the type of the pointer, mask, and return:
#define tp_ptr(T, TP) \
({ tagged_ptr(T) tp; \
tp.ptr = (TP).ptr; /* checks type */ \
tp.bits &= ~1; \
tp.ptr; })
Is there a more portable way to get the type information preserved when extracting the pointer, i.e. a way to mask the last bit out without throwing the type information away completely? I have to cast to get the bits, of course, but I can preserve the type and cast back with the two approaches above. They just require compiler extensions, so they are not standard compliant.
I realise that it is a bit silly to go for standard C solutions here, considering that the second I start fiddling with the bits in a pointer I have left portability and entered implementation/undefined behaviour, but using the low bit in this way is likely to work more places than compiler extensions will, and I am curious if there is a way to do it.
I don't strictly need it, it just bothers me that I don't know how to do it. I would love either to know that it cannot be done, or know how to do it. Either will suit me equally well. Not knowing bothers me.
As it usually is, you come up with a solution right after you ask...
This will work, and in standard C:
#define tp_ptr(T, TP) \
((T *)0 == (TP).ptr, (T *)((TP).bits & ~1))
I don't need statement expressions if I don't need new variables, and I can simply check the type of the pointer against a NULL pointer of the desired type before I returned the casted manipulated bits in a comma expression.
The ((T *)) == (TP).ptr) expression does the type-checking. After I have made sure that T is the right type, I can return a pointer of that type.
I don't use the result of the comparison, so it doesn't matter if I have a NULL pointer or not, and I would expect any compiler to optimise the comparison away.

Generic hashtable in C

I'm trying to create a generic hash table in C. I've read a few different implementations, and came across a couple of different approaches.
The first is to use macros like this: http://attractivechaos.awardspace.com/khash.h.html
And the second is to use a struct with 2 void pointers like this:
struct hashmap_entry
{
void *key;
void *value;
};
From what I can tell this approach isn't great because it means that each entry in the map requires at least 2 allocations: one for the key and one for the value, regardless of the data types being stored. (Is that right???)
I haven't been able to find a decent way of keeping it generic without going the macro route. Does anyone have any tips or examples that might help me out?
C does not provide what you need directly, nevertheless you may want to do something like this:
Imagine that your hash table is a fixed size array of double linked lists and it is OK that items are always allocated/destroyed on the application layer. These conditions will not work for every case, but in many cases they will. Then you will have these data structures and sketches of functions and protototypes:
struct HashItemCore
{
HashItemCore *m_prev;
HashItemCore *m_next;
};
struct HashTable
{
HashItemCore m_data[256]; // This is actually array of circled
// double linked lists.
int (*GetHashValue)(HashItemCore *item);
bool (*CompareItems)(HashItemCore *item1, HashItemCore *item2);
void (*ReleaseItem)(HashItemCore *item);
};
void InitHash(HashTable *table)
{
// Ensure that user provided the callbacks.
assert(table->GetHashValue != NULL && table->CompareItems != NULL && table->ReleaseItem != NULL);
// Init all double linked lists. Pointers of empty list should point to themselves.
for (int i=0; i<256; ++i)
table->m_data.m_prev = table->m_data.m_next = table->m_data+i;
}
void AddToHash(HashTable *table, void *item);
void *GetFromHash(HashTable *table, void *item);
....
void *ClearHash(HashTable *table);
In these functions you need to implement the logic of the hash table. While working they will be calling user defined callbacks to find out the index of the slot and if items are identical or not.
The users of this table should define their own structures and callback functions for every pair of types that they want to use:
struct HashItemK1V1
{
HashItemCore m_core;
K1 key;
V1 value;
};
int CalcHashK1V1(void *p)
{
HashItemK1V1 *param = (HashItemK1V1*)p;
// App code.
}
bool CompareK1V1(void *p1, void *p2)
{
HashItemK1V1 *param1 = (HashItemK1V1*)p1;
HashItemK1V1 *param2 = (HashItemK1V1*)p2;
// App code.
}
void FreeK1V1(void *p)
{
HashItemK1V1 *param = (HashItemK1V1*)p;
// App code if needed.
free(p);
}
This approach will not provide type safety because items will be passed around as void pointers assuming that every application structure starts with HashItemCore member. This will be sort of hand made polymorphysm. This is maybe not perfect, but this will work.
I implemented this approach in C++ using templates. But if you will strip out all fancies of C++, in the nutshell it will be exactly what I described above. I used my table in multiple projects and it worked like charm.
A generic hashtable in C is a bad idea.
a neat implementation will require function pointers, which are slow, since these functions cannot be inlined (the general case will need at least two function calls per hop: one to compute the hash value and one for the final compare)
to allow inlining of functions you'll either have to
write the code manually
or use a code generator
or macros. Which can get messy
IIRC, the linux kernel uses macros to create and maintain (some of?) its hashtables.
C does not have generic data types, so what you want to do (no extra allocations and no void* casting) is not really possible. You can use macros to generate the right data functions/structs on the fly, but you're trying to avoid macros as well.
So you need to give up at least one of your ideas.
You could have a generic data structure without extra allocations by allocating something like:
size_t key_len;
size_t val_len;
char key[];
char val[];
in one go and then handing out either void pointers, or adding an api for each specific type.
Alternatively, if you have a limited number of types you need to handle, you could also tag the value with the right one so now each entry contains:
size_t key_len;
size_t val_len;
int val_type;
char key[];
char val[];
but in the API at least you can verify that the requested type is the right one.
Otherwise, to make everything generic, you're left with either macros, or changing the language.

How can I concisely assign to the members of a struct depending on a condition?

I have some code that looks like this:
struct mystruct
{
/* lots of members */
};
void mystruct_init( struct mystruct* dst, int const condition )
{
if ( condition )
{
/* initialize members individually a certain way */
}
else
{
/* initialize members individually another way */
}
}
Options I'm considering:
Simplest would be to have a function that assigns to every member and call that. Should I simply hope the compiler optimizes that call away?
Define a macro to explicitly avoid the function call overhead.
Write everything the long way.
What is the proper way to handle such a scenario in C11?
Just write a function that initializes a member, or if you want (opinion based), use a MACRO.
By the way, I would personally do it like this:
void mystruct_init( struct mystruct* dst, int const condition )
{
if ( condition )
init_first_way(..);
else
init_second_way(..);
}
or just use the ternary operator. Remember, you care about readability and always have in mind:
Simplicity is a virtue!
I really think worrying about optimization at this stage will make a victim of immature optimization, since I doubt it will be the bottleneck.
In general, if you want to optimize your code, profile your code(while it runs with optimization flags, many people do not know this, I was one of them: Poor performance of stl list on vs2015 while deleting nodes which contain iterator to self's position in list), find the bottleneck and try to optimize that bottleneck.
I do not think that there is any clear rule here. To me, it depends on the taste of the author.
Two obvious ways are:
// initialize members that are independent of 'condition'
if (condition) {
// initialize members one way
}
else {
// initialize members another way
}
The same may be written as:
// initialize members that are independent of 'condition'
// initialize members based on 'condition'
dst->memberx = condition ? something : something_else;
// ...
Please do not worry about one function call overhead.
I agree with the answers already posted (#gsamaras and #Arun). I just wanted to show another approach that I have found useful a couple of times.
The approach is to make some constants with the two (or more) relevant initialization values and then make a simple assignment based on one (or more) conditions.
Simple example:
#include<stdio.h>
#include <string.h>
struct mystruct
{
int a;
float b;
};
const struct mystruct initializer_a = { 1, 3.4 };
const struct mystruct initializer_b = { 5, 7.2 };
int main (void)
{
int condition = 0;
struct mystruct ms = condition ? initializer_a : initializer_b;
printf("%d %f\n", ms.a, ms.b);
return 1;
}

Implementing different yet similar structure/function sets without copy-paste

I'm implementing a set of common yet not so trivial (or error-prone) data structures for C (here) and just came with an idea that got me thinking.
The question in short is, what is the best way to implement two structures that use similar algorithms but have different interfaces, without having to copy-paste/rewrite the algorithm? By best, I mean most maintainable and debug-able.
I think it is obvious why you wouldn't want to have two copies of the same algorithm.
Motivation
Say you have a structure (call it map) with a set of associated functions (map_*()). Since the map needs to map anything to anything, we would normally implement it taking a void *key and void *data. However, think of a map of int to int. In this case, you would need to store all the keys and data in another array and give their addresses to the map, which is not so convenient.
Now imagine if there was a similar structure (call it mapc, c for "copies") that during initialization takes sizeof(your_key_type) and sizeof(your_data_type) and given void *key and void *data on insert, it would use memcpy to copy the keys and data in the map instead of just keeping the pointers. An example of usage:
int i;
mapc m;
mapc_init(&m, sizeof(int), sizeof(int));
for (i = 0; i < n; ++i)
{
int j = rand(); /* whatever */
mapc_insert(&m, &i, &j);
}
which is quite nice, because I don't need to keep another array of is and js.
My ideas
In the example above, map and mapc are very closely related. If you think about it, map and set structures and functions are also very similar. I have thought of the following ways to implement their algorithm only once and use it for all of them. Neither of them however are quite satisfying to me.
Use macros. Write the function code in a header file, leaving the structure dependent stuff as macros. For each structure, define the proper macros and include the file:
map_generic.h
#define INSERT(x) x##_insert
int INSERT(NAME)(NAME *m, PARAMS)
{
// create node
ASSIGN_KEY_AND_DATA(node)
// get m->root
// add to tree starting from root
// rebalance from node to root
// etc
}
map.c
#define NAME map
#define PARAMS void *key, void *data
#define ASSIGN_KEY_AND_DATA(node) \
do {\
node->key = key;\
node->data = data;\
} while (0)
#include "map_generic.h"
mapc.c
#define NAME mapc
#define PARAMS void *key, void *data
#define ASSIGN_KEY_AND_DATA(node) \
do {\
memcpy(node->key, key, m->key_size);\
memcpy(node->data, data, m->data_size);\
} while (0)
#include "map_generic.h"
This method is not half bad, but it's not so elegant.
Use function pointers. For each part that is dependent on the structure, pass a function pointer.
map_generic.c
int map_generic_insert(void *m, void *key, void *data,
void (*assign_key_and_data)(void *, void *, void *, void *),
void (*get_root)(void *))
{
// create node
assign_key_and_data(m, node, key, data);
root = get_root(m);
// add to tree starting from root
// rebalance from node to root
// etc
}
map.c
static void assign_key_and_data(void *m, void *node, void *key, void *data)
{
map_node *n = node;
n->key = key;
n->data = data;
}
static map_node *get_root(void *m)
{
return ((map *)m)->root;
}
int map_insert(map *m, void *key, void *data)
{
map_generic_insert(m, key, data, assign_key_and_data, get_root);
}
mapc.c
static void assign_key_and_data(void *m, void *node, void *key, void *data)
{
map_node *n = node;
map_c *mc = m;
memcpy(n->key, key, mc->key_size);
memcpy(n->data, data, mc->data_size);
}
static map_node *get_root(void *m)
{
return ((mapc *)m)->root;
}
int mapc_insert(mapc *m, void *key, void *data)
{
map_generic_insert(m, key, data, assign_key_and_data, get_root);
}
This method requires writing more functions that could have been avoided in the macro method (as you can see, the code here is longer) and doesn't allow optimizers to inline the functions (as they are not visible to map_generic.c file).
So, how would you go about implementing something like this?
Note: I wrote the code in the stack-overflow question form, so excuse me if there are minor errors.
Side question: Anyone has a better idea for a suffix that says "this structure copies the data instead of the pointer"? I use c that says "copies", but there could be a much better word for it in English that I don't know about.
Update:
I have come up with a third solution. In this solution, only one version of the map is written, the one that keeps a copy of data (mapc). This version would use memcpy to copy data. The other map is an interface to this, taking void *key and void *data pointers and sending &key and &data to mapc so that the address they contain would be copied (using memcpy).
This solution has the downside that a normal pointer assignment is done by memcpy, but it completely solves the issue otherwise and is very clean.
Alternatively, one can only implement the map and use an extra vectorc with mapc which first copies the data to vector and then gives the address to a map. This has the side effect that deletion from mapc would either be substantially slower, or leave garbage (or require other structures to reuse the garbage).
Update 2:
I came to the conclusion that careless users might use my library the way they write C++, copy after copy after copy. Therefore, I am abandoning this idea and accepting only pointers.
You roughly covered both possible solutions.
The preprocessor macros roughly correspond to C++ templates and have the same advantages and disadvantages:
They are hard to read.
Complex macros are often hard to use (consider type safety of parameters etc.)
They are just "generators" of more code, so in the compiled output a lot of duplicity is still there.
On other side, they allow compiler to optimize a lot of stuff.
The function pointers roughly correspond to C++ polymorphism and they are IMHO cleaner and generally easier-to-use solution, but they bring some cost at runtime (for tight loops, few extra function calls can be expensive).
I generally prefer the function calls, unless the performance is really critical.
There's also a third option that you haven't considered: you can create an external script (written in another language) to generate your code from a series of templates. This is similar to the macro method, but you can use a language like Perl or Python to generate the code. Since these languages are more powerful than the C pre-processor, you can avoid some of the potential problems inherent in doing templates via macros. I have used this method in cases where I was tempted to use complex macros like in your example #1. In the end, it turned out to be less error-prone than using the C preprocessor. The downside is that between writing the generator script and updating the makefiles, it's a little more difficult to get set up initially (but IMO worth it in the end).
What you're looking for is polymorphism. C++, C# or other object oriented languages are more suitable to this task. Though many people have tried to implement polymorphic behavior in C.
The Code Project has some good articles/tutorials on the subject:
http://www.codeproject.com/Articles/10900/Polymorphism-in-C
http://www.codeproject.com/Articles/108830/Inheritance-and-Polymorphism-in-C

getting a substruct out of a big struct in C

I'm having a very big struct in an existing program. This struct includes a great number of bitfields.
I wish to save a part of it (say, 10 fields out of 150).
An example code I would use to save the subclass is:
typedef struct {int a;int b;char c} bigstruct;
typedef struct {int a;char c;} smallstruct;
void substruct(smallstruct *s,bigstruct *b) {
s->a = b->a;
s->c = b->c;
}
int save_struct(bigstruct *bs) {
smallstruct s;
substruct(&s,bs);
save_struct(s);
}
I also wish that selecting which part of it wouldn't be too much hassle, since I wish to change it every now and then. The naive approach I presented before is very fragile and unmaintainable. When scaling up to 20 different fields, you have to change fields both in the smallstruct, and in the substruct function.
I thought of two better approaches. Unfortunately both requires me to use some external CIL like tool to parse my structs.
The first approach is automatically generating the substruct function. I'll just set the struct of smallstruct, and have a program that would parse it and generate the substruct function according to the fields in smallstruct.
The second approach is building (with C parser) a meta-information about bigstruct, and then write a library that would allow me to access a specific field in the struct. It would be like ad-hoc implementation of Java's class reflection.
For example, assuming no struct-alignment, for struct
struct st {
int a;
char c1:5;
char c2:3;
long d;
}
I'll generate the following meta information:
int field2distance[] = {0,sizeof(int),sizeof(int),sizeof(int)+sizeof(char)}
int field2size[] = {sizeof(int),1,1,sizeof(long)}
int field2bitmask[] = {0,0x1F,0xE0,0};
char *fieldNames[] = {"a","c1","c2","d"};
I'll get the ith field with this function:
long getFieldData(void *strct,int i) {
int distance = field2distance[i];
int size = field2size[i];
int bitmask = field2bitmask[i];
void *ptr = ((char *)strct + distance);
long result;
switch (size) {
case 1: //char
result = *(char*)ptr;
break;
case 2: //short
result = *(short*)ptr;
...
}
if (bitmask == 0) return result;
return (result & bitmask) >> num_of_trailing_zeros(bitmask);
}
Both methods requires extra work, but once the parser is in your makefile - changing the substruct is a breeze.
However I'd rather do that without any external dependencies.
Does anyone have any better idea? Where my ideas any good, is there some availible implementation of my ideas on the internet?
From your description, it looks like you have access to and can modify your original structure. I suggest you refactor your substructure into a complete type (as you did in your example), and then make that structure a field on your big structure, encapsulating all of those fields in the original structure into the smaller structure.
Expanding on your small example:
typedef struct
{
int a;
char c;
} smallstruct;
typedef struct
{
int b;
smallstruct mysub;
} bigstruct;
Accessing the smallstruct info would be done like so:
/* stack-based allocation */
bigstruct mybig;
mybig.mysub.a = 1;
mybig.mysub.c = '1';
mybig.b = 2;
/* heap-based allocation */
bigstruct * mybig = (bigstruct *)malloc(sizeof(bigstruct));
mybig->mysub.a = 1;
mybig->mysub.c = '1';
mybig->b = 2;
But you could also pass around pointers to the small struct:
void dosomething(smallstruct * small)
{
small->a = 3;
small->c = '3';
}
/* stack based */
dosomething(&(mybig.mysub));
/* heap based */
dosomething(&((*mybig).mysub));
Benefits:
No Macros
No external dependencies
No memory-order casting hacks
Cleaner, easier-to-read and use code.
If changing the order of the fields isn't out of the question, you can rearrange the bigstruct fields in such a way that the smallstruct fields are together, and then its simply a matter of casting from one to another (possibly adding an offset).
Something like:
typedef struct {int a;char c;int b;} bigstruct;
typedef struct {int a;char c;} smallstruct;
int save_struct(bigstruct *bs) {
save_struct((smallstruct *)bs);
}
Macros are your friend.
One solution would be to move the big struct out into its own include file and then have a macro party.
Instead of defining the structure normally, come up with a selection of macros, such as BEGIN_STRUCTURE, END_STRUCTURE, NORMAL_FIELD, SUBSET_FIELD
You can then include the file a few times, redefining those structures for each pass. The first one will turn the defines into a normal structure, with both types of field being output as normal. The second would define NORMAL_FIELD has nothing and would create your subset. The third would create the appropriate code to copy the subset fields over.
You'll end up with a single definition of the structure, that lets you control which fields are in the subset and automatically creates suitable code for you.
Just to help you in getting your metadata, you can refer to the offsetof() macro, which also has the benefit of taking care of any padding you may have
I suggest to take this approach:
Curse the guy who wrote the big structure. Get a voodoo doll and have some fun.
Mark each field of the big structure that you need somehow (macro or comment or whatever)
Write a small tool which reads the header file and extracts the marked fields. If you use comments, you can give each field a priority or something to sort them.
Write a new header file for the substructure (using a fixed header and footer).
Write a new C file which contains a function createSubStruct which takes a pointer to the big struct and returns a pointer to the substruct
In the function, loop over the fields collected and emit ss.field = bs.field (i.e. copy the fields one by one).
Add the small tool to your makefile and add the new header and C source file to your build
I suggest to use gawk, or any scripting language you're comfortable with, as the tool; that should take half an hour to build.
[EDIT] If you really want to try reflection (which I suggest against; it'll be a whole lot of work do get that working in C), then the offsetof() macro is your friend. This macro returns the offset of a field in a structure (which is most often not the sum of the sizes of the fields before it). See this article.
[EDIT2] Don't write your own parser. To get your own parser right will take months; I know since I've written lots of parsers in my life. Instead mark the parts of the original header file which need to be copied and then rely on the one parser which you know works: The one of your C compiler. Here are a couple of ideas how to make this work:
struct big_struct {
/**BEGIN_COPY*/
int i;
int j : 3;
int k : 2;
char * str;
/**END_COPY*/
...
struct x y; /**COPY_STRUCT*/
}
Just have your tool copy anything between /**BEGIN_COPY*/ and /**END_COPY*/.
Use special comments like /**COPY_STRUCT*/ to instruct your tool to generate a memcpy() instead of an assignment, etc.
This can be written and debugged in a few hours. It would take as long to set up a parser for C without any functionality; that is you'd just have something which can read valid C but you'd still have to write the part of the parser which understands C, and the part which does something useful with the data.

Resources