'Multipurpose' linked list implementation in pure C - c

This is not exactly a technical question, since I know C kind of enough to do the things I need to (I mean, in terms of not 'letting the language get in your way'), so this question is basically a 'what direction to take' question.
Situation is: I am currently taking an advanced algorithms course, and for the sake of 'growing up as programmers', I am required to use pure C to implement the practical assignments (it works well: pretty much any small mistake you make actually forces you to understand completely what you're doing in order to fix it). In the course of implementing, I obviously run into the problem of having to implement the 'basic' data structures from the ground up: actually not only linked lists, but also stacks, trees, et cetera.
I am focusing on lists in this topic because it's typically a structure I end up using a lot in the program, either as a 'main' structure or as a 'helper' structure for other bigger ones (for example, a hash tree that resolves conflicts by using a linked list).
This requires that the list stores elements of lots of different types. I am assuming here as a premise that I don't want to re-code the list for every type. So, I can come up with these alternatives:
Making a list of void pointers (kinda inelegant; harder to debug)
Making only one list, but having a union as 'element type', containing all element types I will use in the program (easier to debug; wastes space if elements are not all the same size)
Using a preprocessor macro to regenerate the code for every type, in the style of SGLIB, 'imitating' C++'s STL (creative solution; doesn't waste space; elements have the explicit type they actually are when they are returned; any change in list code can be really dramatic)
Your idea/solution
To make the question clear: which one of the above is best?
PS: Since I am basically in an academic context, I am also very interested in the view of people working with pure C out there in the industry. I understand that most pure C programmers are in the embedded devices area, where I don't think this kind of problem I am facing is common. However, if anyone out there knows how it's done 'in the real world', I would be very interested in your opinion.

A void * is a bit of a pain in a linked list since you have to manage it's allocation separately to the list itself. One approach I've used in the past is to have a 'variable sized' structure like:
typedef struct _tNode {
struct _tNode *prev;
struct _tNode *next;
int payloadType;
char payload[1]; // or use different type for alignment.
} tNode;
Now I realize that doesn't look variable sized but let's allocate a structure thus:
typedef struct {
char Name[30];
char Addr[50];
} tPerson;
tNode *node = malloc (sizeof (tNode) - 1 + sizeof (tPerson));
Now you have a node that, for all intents and purposes, looks like this:
typedef struct _tNode {
struct _tNode *prev;
struct _tNode *next;
int payloadType;
char Name[30];
char Addr[50];
} tNode;
or, in graphical form (where [n] means n bytes):
+----------------+
| prev[4] |
+----------------+
| next[4] |
+----------------+
| payloadType[4] |
+----------------+ +----------+
| payload[1] | <- overlap -> | Name[30] |
+----------------+ +----------+
| Addr[50] |
+----------+
That is, assuming you know how to address the payload correctly. This can be done as follows:
node->prev = NULL;
node->next = NULL;
node->payloadType = PLTYP_PERSON;
tPerson *person = &(node->payload); // cast for easy changes to payload.
strcpy (person->Name, "Bob Smith");
strcpy (person->Addr, "7 Station St");
That cast line simply casts the address of the payload character (in the tNode type) to be an address of the actual tPerson payload type.
Using this method, you can carry any payload type you want in a node, even different payload types in each node, without the wasted space of a union. This wastage can be seen with the following:
union {
int x;
char y[100];
} u;
where 96 bytes are wasted every time you store an integer type in the list (for a 4-byte integer).
The payload type in the tNode allows you to easily detect what type of payload this node is carrying, so your code can decide how to process it. You can use something along the lines of:
#define PAYLOAD_UNKNOWN 0
#define PAYLOAD_MANAGER 1
#define PAYLOAD_EMPLOYEE 2
#define PAYLOAD_CONTRACTOR 3
or (probably better):
typedef enum {
PAYLOAD_UNKNOWN,
PAYLOAD_MANAGER,
PAYLOAD_EMPLOYEE,
PAYLOAD_CONTRACTOR
} tPayLoad;

My $.002:
Making a list of void pointers (kinda diselegant; harder to debug)
This isn't such a bad choice, IMHO, if you must write in C. You might add API methods to allow the application to supply a print() method for ease of debugging. Similar methods could be invoked when (e.g.) items get added to or removed from the list. (For linked lists, this is usually not necessary, but for more complex data structures -- hash tables, for example) -- it can sometimes be a lifesaver.)
Making only one list, but having a union as 'element type', containing all element types I will use in the program (easier to debug; wastes space if elements are not all the same size)
I would avoid this like the plague. (Well, you did ask.) Having a manually-configured, compile-time dependency from the data structure to its contained types is the worst of all worlds. Again, IMHO.
Using a preprocessor macro to regenerate the code for every type, in the style of SGLIB (sglib.sourceforge.net), 'imitating' C++'s STL (creative solution; doesn't waste space; elements have the explicit type they actually are when they are returned; any change in list code can be really dramatic)
Intriguing idea, but since I don't know SGLIB, I can't say much more than that.
Your idea/solution
I'd go with the first choice.

I've done this in the past, in our code (which has since been converted to C++), and at the time, decided on the void* approach. I just did this for flexibility - we were almost always storing a pointer in the list anyways, and the simplicity of the solution, and usability of it outweighed (for me) the downsides to the other approaches.
That being said, there was one time where it caused some nasty bug that was difficult to debug, so it's definitely not a perfect solution. I think it's still the one I'd take, though, if I was doing this again now.

Using a preprocessor macro is the best option. The Linux kernel linked list is a excellent a eficient implementation of a circularly-linked list in C. Is very portable and easy to use. Here a standalone version of linux kernel 2.6.29 list.h header.
The FreeBSD/OpenBSD sys/queue is another good option for a generic macro based linked list

I haven't coded C in years but GLib claims to provide "a large set of utility functions for strings and common data structures", among which are linked lists.

Although It's tempting to think about solving this kind of problem using the techniques of another language, say, generics, in practice it's rarely a win. There are probably some canned solutions that get it right most of the time (and tell you in their documentation when they get it wrong), using that might miss the point of the assignment, So i'd think twice about it. For a very few number of cases, It might be feasable to roll your own, but for a project of any reasonable size, Its not likely to be worth the debugging effort.
Rather, When programming in language x, you should use the idioms of language x. Don't write java when you're using python. Don't write C when you're using scheme. Don't write C++ when you're using C99.
Myself, I'd probably end up using something like Pax's suggestion, but actually use a union of char[1] and void* and int, to make the common cases convenient (and an enumed type flag)
(I'd also probably end up implementing a fibonacci tree, just cause that sounds neat, and you can only implement RB Trees so many times before it loses it's flavor, even if that is better for the common cases it'd be used for.)
edit: based on your comment, it looks like you've got a pretty good case for using a canned solution. If your instructor allows it, and the syntax it offers feels comfortable, give it a whirl.

This is a good problem. There are two solutions I like:
Dave Hanson's C Interfaces and Implementations uses a list of void * pointers, which is good enough for me.
For my students, I wrote an awk script to generate type-specific list functions. Compared to preprocessor macros, it requires an extra build step, but the operation of the system is much more transparent to programmers without a lot of experience. And it really helps make the case for parametric polymorphism, which they see later in their curriculum.
Here's what one set of functions looks like:
int lengthEL (Explist *l);
Exp* nthEL (Explist *l, unsigned n);
Explist *mkEL (Exp *hd, Explist *tl);
The awk script is a 150-line horror; it searches C code for typedefs and generates a set of list functions for each one. It's very old; I could probably do better now :-)
I wouldn't give a list of unions the time of day (or space on my hard drive). It's not safe, and it's not extensible, so you may as well just use void * and be done with it.

One improvement over making it a list of void* would be making it a list of structs that contain a void* and some meta-data about what the void* points to, including its type, size, etc.
Other ideas: embed a Perl or Lisp interpreter.
Or go halfway: link with the Perl library and make it a list of Perl SVs or something.

I'd probably go with the void* approach myself, but it occurred to me that you could store your data as XML. Then the list can just have a char* for data (which you would parse on demand for whatever sub elements you need)....

Related

Read low pointer bit in way that could *probably* work on as many systems as possible

It seems that the low bit of pointers being 0 is more-or-less pretty portable (where portable obviously does not mean "standard", but that people get away with it and can use it to some advantage in some cases, hopefully disable-able with a compile switch).
Projects that want to get fiddly have used it, with less luck on the second lowest bit:
How portable is using the low bit of a pointer as a flag?
But let's say one doesn't want to just poke a bit of data or not into a pointer of a known type. What you wish you could do instead is to use that low bit being 0 to allow a pointer type to do "double-duty" as a terminator.
So your items look like this:
struct Item {
uintptr_t flags; // low bit zero means "not an item"
type1 field1;
type2 field2;
...
};
Then you'd like to have a situation where some container of items looks like this:
[(flags field1 field2...) (flags field1 field2...) some-pointer stuff stuff...]
You'd be thus getting away with a "sunk-cost" (let's say some internal management pointer in the data structure for another purpose) doing your termination for you.
UPDATE: To be clearer on the situation: this is where one controls the codebase and structures. So any pointer in a structure used like this you could declare as a union type, for instance:
union Maybe_Terminator_Pointer {
uintptr_t flags;
type1* pointer1;
type2* pointer2;
...
};
...and then use that, if it helps. Excluding char*s is fine, as they of course would not count.
So an extra type punning problem here is: the pointer being used to do the test-for-termination is an Item*, and the routine doing the checking doesn't know which sort of pointer some-pointer is specifically.
I'm wondering what--if any--is the best gamble is for being able to port and compile such a trick. That includes turning the pointers into unions, #ifdef'ing the endianness of the machine and getting a char* from the byte with the bit, etc. Whatever might be more likely to work, if anyone has experience or guesses.
Imagine it's worth the effort for your case, shaving off a large amount of data. And you have the backup scenario of if people compiling find the trick isn't working somewhere...an #ifdef could use full-sized items for terminators and waste the extra space. So wondering if there are any tips on to make this obviously-standards-violating trick have a better chance of working on more systems.
(Self-answering to provide more information and allow people to spot any potential problems with my alternative.)
So wondering if there are any tips on to make this obviously-standards-violating trick have a better chance of working on more systems.
Tip One (as per comments) is don't do this if you can possibly find another way.
For example, "you" mention this layout:
[(flags field1 field2...) (flags field1 field2...) some-pointer stuff stuff...]
But is there anything in "stuff stuff" that isn't a pointer--perhaps boring old integers that are known to be even--where you could do the same trick? If so, why not reorder this like:
[(flags field1 field2...) (flags field1 field2...) even-uintptr_t stuff...]
That way when you read flags from either in an Item or not, it will be the same type. If you look around at things that aren't opaque like pointers, you might find obvious non-opaque numbers in the current code that are always even...for instance, a lot of byte counts measuring aggregates are things you likely are guaranteed to have as % 2 = 0.
Tip Two applies to the above alternative--and perhaps helps the odds with the standards-breaking-pointer-version too. Be sure that when you write the value you go through an "aliasing" pointer, and do not write the field directly via . or ->.
There is no requirement for the compiler to ensure memory coherence between two different structures on a field, just because they are the same type. Say struct A begins with an uintptr_t field_a and struct B begins with an uintptr_t field_b, and you put a pointer to both at the same address. If you do some_a->field_a = value;, then reading back from a some_b->field_b pointer at that address might well not see that update, because the compiler doesn't expect you to be writing B fields via an A pointer.
Hence go through a pointer to do the write. Something like uintptr_t *alias = &some_a->field_a; and then *alias = value will enforce coherence with the successive reads of any integer (!). (Dissatisfaction with the performance consequences of this property of pointers is why restrict exists. If this trick is to work, it can only do so by exploiting the non-restrict behavior of pointers.)
(!) - I think you only have to do the write through a pointer, and not the reads, but perhaps someone can provide insight.

Linux kernel: why do 'subclass' structs put base class info at end?

I was reading the chapter in Beautiful Code on the Linux kernel and the author discusses how Linux kernel implements inheritance in the C language (amongst other topics). In a nutshell, a 'base' struct is defined and in order to inherit from it the 'subclass' struct places a copy of the base at the end of the subclass struct definition. The author then spends a couple pages explaining a clever and complicated macro to figure out how many bytes to back in order to convert from the base part of the object to the subclass part of the object.
My question: Within the subclass struct, why not declare the base struct as the first thing in the struct, instead of the last thing?
The main advantage of putting the base struct stuff first is when casting from the base to the subclass you wouldn't need to move the pointer at all - essentially, doing the cast just means telling the compiler to let your code use the 'extra' fields that the subclass struct has placed after the stuff that the base defines.
Just to clarify my question a little bit let me throw some code out:
struct device { // this is the 'base class' struct
int a;
int b;
//etc
}
struct usb_device { // this is the 'subclass' struct
int usb_a;
int usb_b;
struct device dev; // This is what confuses me -
// why put this here, rather than before usb_a?
}
If one happens to have a pointer to the "dev" field inside of a usb_device object then in order to cast it back to that usb_device object one needs to subtract 8 from that pointer. But if "dev" was the first thing in a usb_device casting the pointer wouldn't need to move the pointer at all.
Any help on this would be greatly appreciated. Even advice on where to find an answer would be appreciated - I'm not really sure how to Google for the architectural reason behind a decision like this. The closest I could find here on StackOverflow is:
why to use these weird nesting structure
And, just to be clear - I understand that a lot of bright people have worked on the Linux kernel for a long time so clearly there's a good reason for doing it this way, I just can't figure out what it is.
The Amiga OS uses this "common header" trick in a lot of places and it looked like a good idea at the time: Subclassing by simply casting the pointer type. But there are drawbacks.
Pro:
You can extend existing data structures
You can use the same pointer in all places where the base type is expected, no pointer arithmetic needed, saving precious cycles
It feels natural
Con:
Different compilers tend to align data structures differently. If the base structure ended with char a;, then you could have 0, 1 or 3 pad bytes afterwards before the next field of the subclass starts. This led to quite nasty bugs, especially when you had to maintain backwards compatibility (i.e. for some reason, you have to have a certain padding because an ancient compiler version had a bug and now, there is lots of code which expects the buggy padding).
You don't notice quickly when you pass the wrong structure around. With the code in your question, fields get trashed very quickly if the pointer arithmetic is wrong. That is a good thing since it raises chances that a bug is discovered more early.
It leads to an attitude "my compiler will fix it for me" (which it sometimes won't) and all the casts lead to a "I know better than the compiler" attitude. The latter one would make you automatically insert casts before understanding the error message, which would lead to all kinds of odd problems.
The Linux kernel is putting the common structure elsewhere; it can be but doesn't have to be at the end.
Pro:
Bugs will show early
You will have to do some pointer arithmetic for every structure, so you're used to it
You don't need casts
Con:
Not obvious
Code is more complex
I'm new to the Linux kernel code, so take my ramblings here with a grain of salt. As far as I can tell, there is no requirement as to where to put the "subclass" struct. That is exactly what the macros provide: You can cast to the "subclass" structure, regardless of its layout. This provides robustness to your code (the layout of a structure can be changed, without having to change your code.
Perhaps there is a convention of placing the "base class" struct at the end, but I'm not aware of it. I've seen lots of code in drivers, where different "base class" structs are used to cast back to the same "subclass" structure (from different fields in the "subclass" of course).
I don't have fresh experience from the Linux kernel, but from other kernels. I'd say that this doesn't matter at all.
You are not supposed to cast from one to the other. Allowing casts like that should only be done in very specific situations. In most cases it reduces the robustness and flexibility of the code and is considered quite sloppy. So the deepest "architectural reason" you're looking for might just be "because that's the order someone happened to write it in". Or alternatively, that's what the benchmarks showed would be the best for performance of some important code path in that code. Or alternatively, the person who wrote it thinks it looks pretty (I always build upside-down pyramids in my variable declarations and structs if I have no other constraints). Or someone happened to write it this way 20 years ago and since then everyone else has been copying it.
There might be some deeper design behind this, but I doubt it. There's just no reason to design those things at all. If you want to find out from an authoritative source why it's done this way, just submit a patch to linux that changes it and see who yells at you.
It's for multiple inheritance. struct dev isn't the only interface you can apply to a struct in the linux kernel, and if you have more than one, just casting the sub class to a base class wouldn't work. For example:
struct device {
int a;
int b;
// etc...
};
struct asdf {
int asdf_a;
};
struct usb_device {
int usb_a;
int usb_b;
struct device dev;
struct asdf asdf;
};

Why do C written libraries use so many structs?

I've looked to some open source Libraries in some places. And, I've realized which that Libraries are basically a great stack of structs. I've seen few methods.
Why does C written libraries uses so much structs? What's the basis behind this? This, for me, looked like a attempt to simulate object orientation, 'cause a fast searching told me that each struct is "instantiated" by the using program to make something, per example, in some Desktop enviroments for linux that I've seen that each window was a struct in the used GUI library.
Anyway, the question is that.
Structs are a great way to organize data. And data is fundamental, as Fred Brooks knew decades ago:
Show me your flowcharts and conceal your tables, and I shall continue
to be mystified. Show me your tables, and I won't usually need your
flowcharts; they'll be obvious.
Object-oriented programming doesn't have to be merely simulated in C, it can be realized. For example, did you know that inside your structs you can store function pointers which operate on those same structs, and then you are a little bit closer to C++'s classes?
Also consider extensibility: even a function taking many arguments may be improved by taking a single struct, because then its signature does not need to change when a new argument is added.
Finally, C does not have multiple return values from a single function call. But it can return a struct, which is about the same thing. C is a lot about building your own tools from the raw language, and being able to stash a bunch of related data and/or functions together in one place is a good building block.
With or without object orientation, structures are a useful way to group aggregate data into a single symbol. You can copy the structure wherever you like without having to write out all the members each time, and this makes the structure easier to change if you have to.
It also makes it easier to reference certain members using pointer arithmetic, if you're careful (see sockaddr).
Same argument as with arrays.
Simply put, there's no reason not to use structures.
Structures are useful while retrieving data using a pointer. Because single pointer is enough for complete bunch of data with in a structure.
One, it keeps the APIs clean. Instead of passing N separate arguments to a function, you pass a single argument containing N members.
Two, it allows the library to hide implementation details from the programmer. For example, the C FILE type abstracts away some details of stream I/O, details which vary from implementation to implementation. We don't need to know those details, so they're not exposed to us; we just use the FILE type to pass that information around.

what the author of nedtries means by "in-place"?

I. Just implemented a kind of bitwise trie (based on nedtries), but my code does lot
Of memory allocation (for each node).
Contrary to my implemetation, nedtries are claimed to be fast , among othet things,
Because of their small number of memory allocation (if any).
The author claim his implementation to be "in-place", but what does it really means in this context ?
And how does nedtries achieve such a small number of dynamic memory allocation ?
Ps: I know that the sources are available, but the code is pretty hard to follow and I cannot figure how it works
I'm the author, so this is for the benefit of the many according to Google who are similarly having difficulties in using nedtries. I would like to thank the people here on stackflow for not making unpleasant comments about me personally which some other discussions about nedtries do.
I am afraid I don't understand the difficulties with knowing how to use it. Usage is exceptionally easy - simply copy the example in the Readme.html file:
typedef struct foo_s foo_t;
struct foo_s {
NEDTRIE_ENTRY(foo_t) link;
size_t key;
};
typedef struct foo_tree_s foo_tree_t;
NEDTRIE_HEAD(foo_tree_s, foo_t);
static foo_tree_t footree;
static size_t fookeyfunct(const foo_t *RESTRICT r)
{
return r->key;
}
NEDTRIE_GENERATE(static, foo_tree_s, foo_s, link, fookeyfunct, NEDTRIE_NOBBLEZEROS(foo_tree_s));
int main(void)
{
foo_t a, b, c, *r;
NEDTRIE_INIT(&footree);
a.key=2;
NEDTRIE_INSERT(foo_tree_s, &footree, &a);
b.key=6;
NEDTRIE_INSERT(foo_tree_s, &footree, &b);
r=NEDTRIE_FIND(foo_tree_s, &footree, &b);
assert(r==&b);
c.key=5;
r=NEDTRIE_NFIND(foo_tree_s, &footree, &c);
assert(r==&b); /* NFIND finds next largest. Invert the key function to invert this */
NEDTRIE_REMOVE(foo_tree_s, &footree, &a);
NEDTRIE_FOREACH(r, foo_tree_s, &footree)
{
printf("%p, %u\n", r, r->key);
}
NEDTRIE_PREV(foo_tree_s, &footree, &a);
return 0;
}
You declare your item type - here it's struct foo_s. You need the NEDTRIE_ENTRY() inside it otherwise it can contain whatever you like. You also need a key generating function. Other than that, it's pretty boilerplate.
I wouldn't have chosen this system of macro based initialisation myself! But it's for compatibility with the BSD rbtree.h so nedtries is very easy to swap in to anything using BSD rbtree.h.
Regarding my usage of "in place"
algorithms, well I guess my lack of
computer science training shows
here. What I would call "in place"
is when you only use the memory
passed into a piece of code, so if
you hand 64 bytes to an in place
algorithm it will only touch that 64
bytes i.e. it won't make use of
extra metadata, or allocate some
extra memory, or indeed write to
global state. A good example is an
"in place" sort implementation where
only the collection being sorted
(and I suppose the thread stack)
gets touched.
Hence no, nedtries doesn't need a
memory allocator. It stores all the
data it needs in the NEDTRIE_ENTRY
and NEDTRIE_HEAD macro expansions.
In other words, when you allocate
your struct foo_s, you do all the
memory allocation for nedtries.
Regarding understanding the "macro
goodness", it's far easier to
understand the logic if you compile
it as C++ and then debug it :). The
C++ build uses templates and the
debugger will cleanly show you state
at any given time. In fact, all
debugging from my end happens in a
C++ build and I meticulously
transcribe the C++ changes into
macroised C.
Lastly, before a new release, I
search Google for people having
problems with my software to see if
I can fix things and I am typically
amazed what someone people say about
me and my free software. Firstly,
why didn't those people having
difficulties ask me directly for
help? If I know that there is
something wrong with the docs, then
I can fix them - equally, asking on
stackoverflow doesn't let me know
immediately that there is a docs
problem bur rather relies on me to
find it next release. So all I would
say is that if anyone finds a
problem with my docs, please do
email me and say so, even if there
is a discussion say like here on
stackflow.
Niall
I took a look at the nedtrie.h source code.
It seems that the reason it is "in-place" is that you have to add the trie bookkeeping data to the items that you want to store.
You use the NEDTRIE_ENTRY macro to add parent/child/next/prev links to your data structure, and you can then pass that data structure to the various trie routines, which will extract and use those added members.
So it is "in-place" in the sense that you augment your existing data structures and the trie code piggybacks on that.
At least that's what it looks like. There's lots of macro goodness in that code so I could have gotten myself confused (:
In-place means you operate on the original (input) data, so the input data becomes the output data. Not-in-place means that you have separate input and output data, and the input data is not modified. In-place operations have a number of advantages - smaller cache/memory footprint, lower memory bandwidth, hence typically better performance, etc, but they have the disadvantage that they are destructive, i.e. you lose the original input data (which may or may not matter, depending on the use case).
In-place means to operate on the input data and (possibly) update it. The implication is that there no copying and/moving of the input data. This may result in loosing the input data original values which you will need to consider if it is relevant for your particular case.

Automated field re-ordering in C structs to avoid padding

I've spent a few minutes manually re-ordering fields in a struct in order to reduce padding effects[1], which feels like a few minutes too much. My gut feeling says that my time could probably be better spent writing up a Perl script or whatnot to do this kind of optimization for me.
My question is whether this too is redundant; is there already some tool that I'm not aware of, or some compiler feature that I should be able to turn on[2] to pack structs?
The issue is even more complicated by the fact that this needs to be consistently optimized across a few different architectures, so whatever tool used needs to be able to account for different struct alignments and pointer sizes as well.
EDIT: A quick clarification -- what I want to do is re-order the field in the source code in order to avoid padding, not "pack" the struct as is compiling without padding.
EDIT #2: Another complication: depending on the configuration, sizes of some data types may also change. The obvious ones are pointers and pointer-diffs for different architectures, but also floating-point types (16, 32 or 64-bit depending on the 'exactness'), checksums (8 or 16-bit depending on 'speed') and some other non-obvious stuff.
[1] The struct in question is instantiated thousands of times on an embedded device, so each 4-byte reduction of the struct could mean the difference between a go and no-go for this project.
[2] Available compilers are GCC 3.* and 4.* , Visual Studio, TCC, ARM ADS 1.2, RVCT 3.* and a few others more obscure.
If every single word you can squeeze out of the storage is critical, then I have to recommend optimizing the struct by hand. A tool could arrange the members optimally for you, but it doesn't know, for example, that this value here that you're storing in 16 bits actually never goes above 1024, so you could steal the upper 6 bits for this value over here...
So a human will almost certainly beat a robot on this job.
[Edit] But it seems like you really don't want to hand-optimize your structs for each architecture. Maybe you really have a great many architectures to support?
I do think this problem isn't amenable to a general solution, but you might be able to encode your domain knowledge into a custom Perl/Python/something script that generates the struct definition for each architecture.
Also, if all your members have sizes that are powers of two, then you will get optimal packing simply by sorting members by size (largest first.) In that case, you can just use good old-fashioned macro-based struct-building - something like this:
#define MYSTRUCT_POINTERS \
Something* m_pSomeThing; \
OtherThing* m_pOtherThing;
#define MYSTRUCT_FLOATS \
FLOAT m_aFloat; \
FLOAT m_bFloat;
#if 64_BIT_POINTERS && 64_BIT_FLOATS
#define MYSTRUCT_64_BIT_MEMBERS MYSTRUCT_POINTERS MYSTRUCT_FLOATS
#else if 64_BIT_POINTERS
#define MYSTRUCT_64_BIT_MEMBERS MYSTRUCT_POINTERS
#else if 64_BIT_FLOATS
#define MYSTRUCT_64_BIT_MEMBERS MYSTRUCT_FLOATS
#else
#define MYSTRUCT_64_BIT_MEMBERS
#endif
// blah blah blah
struct MyStruct
{
MYSTRUCT_64_BIT_MEMBERS
MYSTRUCT_32_BIT_MEMBERS
MYSTRUCT_16_BIT_MEMBERS
MYSTRUCT_8_BIT_MEMBERS
};
There is a Perl script called pstruct that is usually included with Perl installations. The script will dump out structure member offsets and sizes. You could either modify pstruct or use its output as a starting point for making a utility that packs your structures the way you want.
$ cat foo.h
struct foo {
int x;
char y;
int b[5];
char c;
};
$ pstruct foo.h
struct foo {
int foo.x 0 4
char foo.y 4 1
foo.b 8 20
char foo.c 28 1
}
Most C compilers won't do this based on the fact that you can do weird stuff (like taking the address of an element in the struct and then use pointer magic to access the rest, bypassing the compiler). A famous example are the double linked lists in the AmigaOS which used guardian nodes as head and tail of the list (this makes it possible to avoid ifs when traversing the list). The guardian head node would always have pred == null and the tail node would have next == null, the developers rolled the two nodes into a single three-pointer struct head_next null tail_pred. By using the address of head_next or the null as the address of the head and tail nodes, they saved four bytes and one memory allocation (since they needed the whole structure only once).
So your best bet is probably to write the structures as pseudo code and then write a preprocessor script that creates the real structures from that.
It'll depend on the platform/compiler too. As noted, most compilers pad everything to a 4-byte alignment (or worse!), so assuming a struct with 2 shorts and a long:
short
long
short
will take up 12 bytes (with 2*2 bytes of padding).
reordering it to be
short
short
long
will still take up 12 bytes as the compiler will pad it to make data access quicker (which is the default for most desktops, as they prefer quick access over memory usage). Your embedded system has different needs, so you will have to use the #pragma pack regardless.
As for a tool to reorder, I would simply (manually) reorganise your struct layout so that different types are placed together. Put all the shorts in first, then put all the longs in, etc. If you're going to get packing done, that's what a tool would do anyway. You might have 2 bytes of padding in the middle at the transition points between types, but I wouldn't consider that to be worth worrying about.
Have a look at #pragma pack. This changes how the compiler aligns elements in the structure. You can use it to force them to be closely packed together without spaces.
See more details here
The compiler may not reorder fields in structs by its own head. The standard mandates that fields should be layed out in the order they are defined. Doing something else might break code in subtle ways.
As you write, it's of course entirely possible to make some kind of code generator that shuffles around fields in an efficient way. But I prefer to do this manually.
Thinking about how I'd go about making such a tool... I think I'd start with the debugging info.
Getting the size of each structure from the source is a pain. It overlaps a lot of work that the compiler already does. I'm not familiar enough with ELF to say exactly how to extract the structure size info from a debug binary, but I know that the info exists because debuggers can display it. Perhaps objdump or something else in the binutils package can get this for you trivially (for platforms that use ELF, at least).
After you've got the info, the rest is pretty straightforward. Order the members from largest to smallest, trying to keep as much as the ordering of the original struct as possible. With perl or python it'd even be easy to crossreference it with the rest of the source and maybe even preserve comments or #ifdefs depending on how cleanly they were used. The biggest pain would be changing all initializations of the struct in the entire codebase. Yikes.
Here's the thing. It sounds really nice, but I don't know of any such existing tool that does this, and by the time you write your own... I think you'll have been able to manually reorder most of the structs in your program.
I had same problem. As suggested in another answer, pstruct may help. But, it does give exactly what we need. In fact pstruct use debug information provided by gcc. I wrote another script based on same idea.
You have to generate assembly files with STUBS debug informations (-gstubs). (It would be possible to get same information from dwarf, but I used same method than pstruct). A good way todo this without modifing compiling process is is to add "-gstubs -save-temps=obj" to your compile options.
The followed script read assembly files and detect when an extra byte is added in a struct:
#!/usr/bin/perl -n
if (/.stabs[\t ]*"([^:]*):T[()0-9,]*=s([0-9]*)(.*),128,0,0,0/) {
my $struct_name = $1;
my $struct_size = $2;
my $desc = $3;
# Remove unused information from input
$desc =~ s/=ar\([0-9,]*\);[0-9]*;[-0-9]*;\([-0-9,]*\)//g;
$desc =~ s/=[a-zA-Z_0-9]+://g;
$desc =~ s/=[\*f]?\([0-9,]*\)//g;
$desc =~ s/:\([0-9,]*\)*//g;
my #members = split /;/, $desc;
my ($prev_size, $prev_offset, $prev_name) = (0, 0, "");
for $i (#members) {
my ($name, $offset, $size) = split /,/, $i;
my $correct_offset = $prev_offset + $prev_size;
if ($correct_offset < $offset) {
my $diff = ($offset - $correct_offset) / 8;
print "$struct_name.$name looks misplaced: $prev_offset + $prev_size = $correct_offset < $offset (diff = $diff bytes)\n";
}
# Skip static members
if ($offset != 0 || $size != 0) {
($prev_name, $prev_offset, $prev_size) = ($name, $offset, $size);
}
}
}
A good way to invoke it:
find . -name *.s | xargs ./detectPaddedStructs.pl | sort | un

Resources