Automated field re-ordering in C structs to avoid padding - c

I've spent a few minutes manually re-ordering fields in a struct in order to reduce padding effects[1], which feels like a few minutes too much. My gut feeling says that my time could probably be better spent writing up a Perl script or whatnot to do this kind of optimization for me.
My question is whether this too is redundant; is there already some tool that I'm not aware of, or some compiler feature that I should be able to turn on[2] to pack structs?
The issue is even more complicated by the fact that this needs to be consistently optimized across a few different architectures, so whatever tool used needs to be able to account for different struct alignments and pointer sizes as well.
EDIT: A quick clarification -- what I want to do is re-order the field in the source code in order to avoid padding, not "pack" the struct as is compiling without padding.
EDIT #2: Another complication: depending on the configuration, sizes of some data types may also change. The obvious ones are pointers and pointer-diffs for different architectures, but also floating-point types (16, 32 or 64-bit depending on the 'exactness'), checksums (8 or 16-bit depending on 'speed') and some other non-obvious stuff.
[1] The struct in question is instantiated thousands of times on an embedded device, so each 4-byte reduction of the struct could mean the difference between a go and no-go for this project.
[2] Available compilers are GCC 3.* and 4.* , Visual Studio, TCC, ARM ADS 1.2, RVCT 3.* and a few others more obscure.

If every single word you can squeeze out of the storage is critical, then I have to recommend optimizing the struct by hand. A tool could arrange the members optimally for you, but it doesn't know, for example, that this value here that you're storing in 16 bits actually never goes above 1024, so you could steal the upper 6 bits for this value over here...
So a human will almost certainly beat a robot on this job.
[Edit] But it seems like you really don't want to hand-optimize your structs for each architecture. Maybe you really have a great many architectures to support?
I do think this problem isn't amenable to a general solution, but you might be able to encode your domain knowledge into a custom Perl/Python/something script that generates the struct definition for each architecture.
Also, if all your members have sizes that are powers of two, then you will get optimal packing simply by sorting members by size (largest first.) In that case, you can just use good old-fashioned macro-based struct-building - something like this:
#define MYSTRUCT_POINTERS \
Something* m_pSomeThing; \
OtherThing* m_pOtherThing;
#define MYSTRUCT_FLOATS \
FLOAT m_aFloat; \
FLOAT m_bFloat;
#if 64_BIT_POINTERS && 64_BIT_FLOATS
#define MYSTRUCT_64_BIT_MEMBERS MYSTRUCT_POINTERS MYSTRUCT_FLOATS
#else if 64_BIT_POINTERS
#define MYSTRUCT_64_BIT_MEMBERS MYSTRUCT_POINTERS
#else if 64_BIT_FLOATS
#define MYSTRUCT_64_BIT_MEMBERS MYSTRUCT_FLOATS
#else
#define MYSTRUCT_64_BIT_MEMBERS
#endif
// blah blah blah
struct MyStruct
{
MYSTRUCT_64_BIT_MEMBERS
MYSTRUCT_32_BIT_MEMBERS
MYSTRUCT_16_BIT_MEMBERS
MYSTRUCT_8_BIT_MEMBERS
};

There is a Perl script called pstruct that is usually included with Perl installations. The script will dump out structure member offsets and sizes. You could either modify pstruct or use its output as a starting point for making a utility that packs your structures the way you want.
$ cat foo.h
struct foo {
int x;
char y;
int b[5];
char c;
};
$ pstruct foo.h
struct foo {
int foo.x 0 4
char foo.y 4 1
foo.b 8 20
char foo.c 28 1
}

Most C compilers won't do this based on the fact that you can do weird stuff (like taking the address of an element in the struct and then use pointer magic to access the rest, bypassing the compiler). A famous example are the double linked lists in the AmigaOS which used guardian nodes as head and tail of the list (this makes it possible to avoid ifs when traversing the list). The guardian head node would always have pred == null and the tail node would have next == null, the developers rolled the two nodes into a single three-pointer struct head_next null tail_pred. By using the address of head_next or the null as the address of the head and tail nodes, they saved four bytes and one memory allocation (since they needed the whole structure only once).
So your best bet is probably to write the structures as pseudo code and then write a preprocessor script that creates the real structures from that.

It'll depend on the platform/compiler too. As noted, most compilers pad everything to a 4-byte alignment (or worse!), so assuming a struct with 2 shorts and a long:
short
long
short
will take up 12 bytes (with 2*2 bytes of padding).
reordering it to be
short
short
long
will still take up 12 bytes as the compiler will pad it to make data access quicker (which is the default for most desktops, as they prefer quick access over memory usage). Your embedded system has different needs, so you will have to use the #pragma pack regardless.
As for a tool to reorder, I would simply (manually) reorganise your struct layout so that different types are placed together. Put all the shorts in first, then put all the longs in, etc. If you're going to get packing done, that's what a tool would do anyway. You might have 2 bytes of padding in the middle at the transition points between types, but I wouldn't consider that to be worth worrying about.

Have a look at #pragma pack. This changes how the compiler aligns elements in the structure. You can use it to force them to be closely packed together without spaces.
See more details here

The compiler may not reorder fields in structs by its own head. The standard mandates that fields should be layed out in the order they are defined. Doing something else might break code in subtle ways.
As you write, it's of course entirely possible to make some kind of code generator that shuffles around fields in an efficient way. But I prefer to do this manually.

Thinking about how I'd go about making such a tool... I think I'd start with the debugging info.
Getting the size of each structure from the source is a pain. It overlaps a lot of work that the compiler already does. I'm not familiar enough with ELF to say exactly how to extract the structure size info from a debug binary, but I know that the info exists because debuggers can display it. Perhaps objdump or something else in the binutils package can get this for you trivially (for platforms that use ELF, at least).
After you've got the info, the rest is pretty straightforward. Order the members from largest to smallest, trying to keep as much as the ordering of the original struct as possible. With perl or python it'd even be easy to crossreference it with the rest of the source and maybe even preserve comments or #ifdefs depending on how cleanly they were used. The biggest pain would be changing all initializations of the struct in the entire codebase. Yikes.
Here's the thing. It sounds really nice, but I don't know of any such existing tool that does this, and by the time you write your own... I think you'll have been able to manually reorder most of the structs in your program.

I had same problem. As suggested in another answer, pstruct may help. But, it does give exactly what we need. In fact pstruct use debug information provided by gcc. I wrote another script based on same idea.
You have to generate assembly files with STUBS debug informations (-gstubs). (It would be possible to get same information from dwarf, but I used same method than pstruct). A good way todo this without modifing compiling process is is to add "-gstubs -save-temps=obj" to your compile options.
The followed script read assembly files and detect when an extra byte is added in a struct:
#!/usr/bin/perl -n
if (/.stabs[\t ]*"([^:]*):T[()0-9,]*=s([0-9]*)(.*),128,0,0,0/) {
my $struct_name = $1;
my $struct_size = $2;
my $desc = $3;
# Remove unused information from input
$desc =~ s/=ar\([0-9,]*\);[0-9]*;[-0-9]*;\([-0-9,]*\)//g;
$desc =~ s/=[a-zA-Z_0-9]+://g;
$desc =~ s/=[\*f]?\([0-9,]*\)//g;
$desc =~ s/:\([0-9,]*\)*//g;
my #members = split /;/, $desc;
my ($prev_size, $prev_offset, $prev_name) = (0, 0, "");
for $i (#members) {
my ($name, $offset, $size) = split /,/, $i;
my $correct_offset = $prev_offset + $prev_size;
if ($correct_offset < $offset) {
my $diff = ($offset - $correct_offset) / 8;
print "$struct_name.$name looks misplaced: $prev_offset + $prev_size = $correct_offset < $offset (diff = $diff bytes)\n";
}
# Skip static members
if ($offset != 0 || $size != 0) {
($prev_name, $prev_offset, $prev_size) = ($name, $offset, $size);
}
}
}
A good way to invoke it:
find . -name *.s | xargs ./detectPaddedStructs.pl | sort | un

Related

Read low pointer bit in way that could *probably* work on as many systems as possible

It seems that the low bit of pointers being 0 is more-or-less pretty portable (where portable obviously does not mean "standard", but that people get away with it and can use it to some advantage in some cases, hopefully disable-able with a compile switch).
Projects that want to get fiddly have used it, with less luck on the second lowest bit:
How portable is using the low bit of a pointer as a flag?
But let's say one doesn't want to just poke a bit of data or not into a pointer of a known type. What you wish you could do instead is to use that low bit being 0 to allow a pointer type to do "double-duty" as a terminator.
So your items look like this:
struct Item {
uintptr_t flags; // low bit zero means "not an item"
type1 field1;
type2 field2;
...
};
Then you'd like to have a situation where some container of items looks like this:
[(flags field1 field2...) (flags field1 field2...) some-pointer stuff stuff...]
You'd be thus getting away with a "sunk-cost" (let's say some internal management pointer in the data structure for another purpose) doing your termination for you.
UPDATE: To be clearer on the situation: this is where one controls the codebase and structures. So any pointer in a structure used like this you could declare as a union type, for instance:
union Maybe_Terminator_Pointer {
uintptr_t flags;
type1* pointer1;
type2* pointer2;
...
};
...and then use that, if it helps. Excluding char*s is fine, as they of course would not count.
So an extra type punning problem here is: the pointer being used to do the test-for-termination is an Item*, and the routine doing the checking doesn't know which sort of pointer some-pointer is specifically.
I'm wondering what--if any--is the best gamble is for being able to port and compile such a trick. That includes turning the pointers into unions, #ifdef'ing the endianness of the machine and getting a char* from the byte with the bit, etc. Whatever might be more likely to work, if anyone has experience or guesses.
Imagine it's worth the effort for your case, shaving off a large amount of data. And you have the backup scenario of if people compiling find the trick isn't working somewhere...an #ifdef could use full-sized items for terminators and waste the extra space. So wondering if there are any tips on to make this obviously-standards-violating trick have a better chance of working on more systems.
(Self-answering to provide more information and allow people to spot any potential problems with my alternative.)
So wondering if there are any tips on to make this obviously-standards-violating trick have a better chance of working on more systems.
Tip One (as per comments) is don't do this if you can possibly find another way.
For example, "you" mention this layout:
[(flags field1 field2...) (flags field1 field2...) some-pointer stuff stuff...]
But is there anything in "stuff stuff" that isn't a pointer--perhaps boring old integers that are known to be even--where you could do the same trick? If so, why not reorder this like:
[(flags field1 field2...) (flags field1 field2...) even-uintptr_t stuff...]
That way when you read flags from either in an Item or not, it will be the same type. If you look around at things that aren't opaque like pointers, you might find obvious non-opaque numbers in the current code that are always even...for instance, a lot of byte counts measuring aggregates are things you likely are guaranteed to have as % 2 = 0.
Tip Two applies to the above alternative--and perhaps helps the odds with the standards-breaking-pointer-version too. Be sure that when you write the value you go through an "aliasing" pointer, and do not write the field directly via . or ->.
There is no requirement for the compiler to ensure memory coherence between two different structures on a field, just because they are the same type. Say struct A begins with an uintptr_t field_a and struct B begins with an uintptr_t field_b, and you put a pointer to both at the same address. If you do some_a->field_a = value;, then reading back from a some_b->field_b pointer at that address might well not see that update, because the compiler doesn't expect you to be writing B fields via an A pointer.
Hence go through a pointer to do the write. Something like uintptr_t *alias = &some_a->field_a; and then *alias = value will enforce coherence with the successive reads of any integer (!). (Dissatisfaction with the performance consequences of this property of pointers is why restrict exists. If this trick is to work, it can only do so by exploiting the non-restrict behavior of pointers.)
(!) - I think you only have to do the write through a pointer, and not the reads, but perhaps someone can provide insight.

Linux kernel: why do 'subclass' structs put base class info at end?

I was reading the chapter in Beautiful Code on the Linux kernel and the author discusses how Linux kernel implements inheritance in the C language (amongst other topics). In a nutshell, a 'base' struct is defined and in order to inherit from it the 'subclass' struct places a copy of the base at the end of the subclass struct definition. The author then spends a couple pages explaining a clever and complicated macro to figure out how many bytes to back in order to convert from the base part of the object to the subclass part of the object.
My question: Within the subclass struct, why not declare the base struct as the first thing in the struct, instead of the last thing?
The main advantage of putting the base struct stuff first is when casting from the base to the subclass you wouldn't need to move the pointer at all - essentially, doing the cast just means telling the compiler to let your code use the 'extra' fields that the subclass struct has placed after the stuff that the base defines.
Just to clarify my question a little bit let me throw some code out:
struct device { // this is the 'base class' struct
int a;
int b;
//etc
}
struct usb_device { // this is the 'subclass' struct
int usb_a;
int usb_b;
struct device dev; // This is what confuses me -
// why put this here, rather than before usb_a?
}
If one happens to have a pointer to the "dev" field inside of a usb_device object then in order to cast it back to that usb_device object one needs to subtract 8 from that pointer. But if "dev" was the first thing in a usb_device casting the pointer wouldn't need to move the pointer at all.
Any help on this would be greatly appreciated. Even advice on where to find an answer would be appreciated - I'm not really sure how to Google for the architectural reason behind a decision like this. The closest I could find here on StackOverflow is:
why to use these weird nesting structure
And, just to be clear - I understand that a lot of bright people have worked on the Linux kernel for a long time so clearly there's a good reason for doing it this way, I just can't figure out what it is.
The Amiga OS uses this "common header" trick in a lot of places and it looked like a good idea at the time: Subclassing by simply casting the pointer type. But there are drawbacks.
Pro:
You can extend existing data structures
You can use the same pointer in all places where the base type is expected, no pointer arithmetic needed, saving precious cycles
It feels natural
Con:
Different compilers tend to align data structures differently. If the base structure ended with char a;, then you could have 0, 1 or 3 pad bytes afterwards before the next field of the subclass starts. This led to quite nasty bugs, especially when you had to maintain backwards compatibility (i.e. for some reason, you have to have a certain padding because an ancient compiler version had a bug and now, there is lots of code which expects the buggy padding).
You don't notice quickly when you pass the wrong structure around. With the code in your question, fields get trashed very quickly if the pointer arithmetic is wrong. That is a good thing since it raises chances that a bug is discovered more early.
It leads to an attitude "my compiler will fix it for me" (which it sometimes won't) and all the casts lead to a "I know better than the compiler" attitude. The latter one would make you automatically insert casts before understanding the error message, which would lead to all kinds of odd problems.
The Linux kernel is putting the common structure elsewhere; it can be but doesn't have to be at the end.
Pro:
Bugs will show early
You will have to do some pointer arithmetic for every structure, so you're used to it
You don't need casts
Con:
Not obvious
Code is more complex
I'm new to the Linux kernel code, so take my ramblings here with a grain of salt. As far as I can tell, there is no requirement as to where to put the "subclass" struct. That is exactly what the macros provide: You can cast to the "subclass" structure, regardless of its layout. This provides robustness to your code (the layout of a structure can be changed, without having to change your code.
Perhaps there is a convention of placing the "base class" struct at the end, but I'm not aware of it. I've seen lots of code in drivers, where different "base class" structs are used to cast back to the same "subclass" structure (from different fields in the "subclass" of course).
I don't have fresh experience from the Linux kernel, but from other kernels. I'd say that this doesn't matter at all.
You are not supposed to cast from one to the other. Allowing casts like that should only be done in very specific situations. In most cases it reduces the robustness and flexibility of the code and is considered quite sloppy. So the deepest "architectural reason" you're looking for might just be "because that's the order someone happened to write it in". Or alternatively, that's what the benchmarks showed would be the best for performance of some important code path in that code. Or alternatively, the person who wrote it thinks it looks pretty (I always build upside-down pyramids in my variable declarations and structs if I have no other constraints). Or someone happened to write it this way 20 years ago and since then everyone else has been copying it.
There might be some deeper design behind this, but I doubt it. There's just no reason to design those things at all. If you want to find out from an authoritative source why it's done this way, just submit a patch to linux that changes it and see who yells at you.
It's for multiple inheritance. struct dev isn't the only interface you can apply to a struct in the linux kernel, and if you have more than one, just casting the sub class to a base class wouldn't work. For example:
struct device {
int a;
int b;
// etc...
};
struct asdf {
int asdf_a;
};
struct usb_device {
int usb_a;
int usb_b;
struct device dev;
struct asdf asdf;
};

Use casts to access a byte-array like a structure?

I'm working on a microcontroller-based software project.
A part of the project is a parser for a binary protocol.
The protocol is fixed and cannot be changed.
A PC is acting as a "master" and mainly transmits commands, which have to be executed by the "slave", the microcontroller board.
The protocol data is received by a hardware communication interface, e.g. UART, CAN or Ethernet.
That's not the problem.
After all bytes of a frame (4 - 10, depending on the command) are received, they are stored in a buffer of type "uint8_t cmdBuffer[10]" and a flag is set, indicating that the command can now be executed.
The first byte of a frame (cmdBuffer[0]) contains the command, the rest of the frame are parameters for the command, which may differ in number and size, depending on the command.
This means, the payload can be interpreted in many ways. For every possible command, the data bytes change their meaning.
I don't want to have much ugly bit operations, but self-documentating code.
So my approach is:
I create a "typedef struct" for each command
After determining the command, the pointer to the cmdBuffer is casted to a pointer of my new typedef
by doing so, I can access the command's parameters as structure members, avoiding magic numbers in array acces, bit operations for parameters > 8 bit, and it is easier to read
Example:
typedef struct
{
uint8_t commandCode;
uint8_t parameter_1;
uint32_t anotherParameter;
uint16 oneMoreParameter;
}payloadA_t;
//typedefs for payloadB_t and payloadC_t, which may have different parameters
void parseProtocolData(uint8_t *data, uint8_t length)
{
uint8_t commandToExecute;
//this, in fact, just returns data[0]
commandToExecute = getPayloadType(data, length);
if (commandToExecute == COMMAND_A)
{
executeCommand_A( (payloadA_t *) data);
}
else if (commandToExecute == COMMAND_B)
{
executeCommand_B( (payloadB_t *) data);
}
else if (commandToExecute == COMMAND_C)
{
executeCommand_C( (payloadC_t *) data);
}
else
{
//error, unknown command
}
}
I see two problems with this:
First, depending on the microcontroller architecture, the byteorder may be intel or motorola for 2 or 4- byte parameters.
This should not be much problem. The protocol itself uses network byte order. On the target controller, a macro can be used for correcting the order.
The major problem: there may be padding bytes in my tyepdef struct. I'm using gcc, so i can just add a "packed"-attribute to my typedef. Other compilers provide pragmas for this. However, on 32-bit machines, packed structures will result in bigger (and slower) machine code. Ok, this may also be not a problem. But I'v heard, there can be a hardware fault when accessing un-aligned memory (on ARM architecture, for example).
There are many commands (around 50), so I don't want access the cmdBuffer as an array
I think the "structure approach" increases code readability in contrast to the "array approach"
So my questions:
Is this approach OK, or is it just a dirty hack?
are there cases where the compiler can rely on the "strict aliasing rule" and make my approach not work?
Is there a better solution? How would you solve this problem?
Can this be kept, at least a little, portable?
Regards,
lugge
Generally, structs are dangerous for storing data protocols because of padding. For portable code, you probably want to avoid them. Keeping the raw array of data is therefore the best idea still. You only need a way to interpret it differently depending on the received command.
This scenario is a typical example where some kind of polymorphism is desired. Unfortunately, C has no built-in support for that OO feature, so you'll have to create it yourself.
The best way to do this depends on the nature of these different kinds of data. Since I don't know that, I can only suggest on such way, it may or may not be optimal for your specific case:
typedef enum
{
COMMAND_THIS,
COMMAND_THAT,
... // all 50 commands
COMMANDS_N // a constant which is equal to the number of commands
} cmd_index_t;
typedef struct
{
uint8_t command; // the original command, can be anything
cmd_index_t index; // a number 0 to 49
uint8_t data [MAX]; // the raw data
} cmd_t;
Step one would then be that upon receiving a command, you translate it to the corresponding index.
// ...receive data and place it in cmdBuffer[10], then:
cmd_t cmd;
cmd_create(&cmd, cmdBuffer[0], &cmdBuffer[1]);
...
void cmd_create (cmd_t* cmd, uint8_t command, uint8_t* data)
{
cmd->command = command;
memcpy(cmd->data, data, MAX);
switch(command)
{
case THIS: cmd->index = COMMAND_THIS; break;
case THAT: cmd->index = COMMAND_THAT; break;
...
}
}
Once you have an index 0 to N means that you can implement lookup tables. Each such lookup table can be an array of function pointers, which determine the specific interpretation of the data. For example:
typedef void (*interpreter_func_t)(uint8_t* data);
const interpreter_func_t interpret [COMMANDS_N] =
{
&interpret_this_command,
&interpret_that_command,
...
};
Use:
interpret[cmd->index] (cmd->data);
Then you can make similar lookup tables for different tasks.
initialize [cmd->index] (cmd->data);
interpret [cmd->index] (cmd->data);
repackage [cmd->index] (cmd->data);
do_stuff [cmd->index] (cmd->data);
...
Use different lookup tables for different architectures. Things like endianess can be handled inside the interpreter functions. And you can of course change the function prototypes, maybe you need to return something or pass more parameters etc.
Note that the above example is most suitable when all commands result in the same kind of actions. If you need to do entirely different things depending on command, other approaches are more suitable.
IMHO it is a dirty hack. The code may break when ported to a system with different alignment requirements, different variable sizes, different type representations (e.g. big endian / little endian). Or even on the same system but different version of compiler / system headers / whatever.
I don't think it violates strict aliasing, so long as the relevant bytes form a valid representation.
I would just write code to read the data in a well-defined manner, e.g.
bool extract_A(PayloadA_t *out, uint8_t const *in)
{
out->foo = in[0];
out->bar = read_uint32(in + 1, 4);
// ...
}
This may run slightly slower than the "hack" version, it depends on your requirements whether you prefer maintenance headaches, or those extra microseconds.
Answering your questions in the same order:
This approach is quite common, but it's still called a dirty hack by any book I know that mentions this technique. You spelled the reasons out yourself: in essence it's highly unportable or requires a lot of preprocessor magic to make it portable.
strict aliasing rule: see the top voted answer for What is the strict aliasing rule?
The only alternative solution I know is to explicitly code the deserialization as you mentioned yourself. This can actually be made very readable like this:
uint8_t *p = buffer;
struct s;
s.field1 = read_u32(&p);
s.field2 = read_u16(&p);
I. E. I would make the read functions move the pointer forward by the number of deserialized bytes.
As said above, you can use the preprocessor to handle different endianness and struct packing.
It's a dirty hack. The biggest problem I see with this solution is memory alignment rather than endianness or struct packing.
The memory alignment issue is this. Some microcontrollers such as ARM require that multi-byte variables be aligned with certain memory offsets. That is, 2-byte half-words must be aligned on even memory addresses. And 4-byte words must be aligned on memory addresses that are multiples of 4. These alignment rules are not enforced by your serial protocol. So if you simply cast the serial data buffer into a packed structure then the individual structure members may not have the proper alignment. Then when your code tries to access a misaligned member it will result in an alignment fault or undefined behavior. (This is why the compiler creates an un-packed structure by default.)
Regarding endianness, it sounds like your proposing to correct the byte-order when your code accesses the member in the packed structure. If your code accesses the packed structure member multiple times then it will have to correct the endianness every time. It would be more efficient to just correct the endianness once, when the data is first received from the serial port. And this is another reason not to simply cast the data buffer into a packed structure.
When you receive the command, you should parse out each field individually into an unpacked structure where each member is properly aligned and has the proper endianness. Then your microcontroller code can access each member most efficiently. This solution is also more portable if done correctly.
Yes this is the problem of memory alignment.
Which controller you are using ?
Just declare the structure along with following syntax,
__attribute__(packed)
may be it will solve your problem.
Or you can try to access the variable as reference by address instead of reference by value.

what the author of nedtries means by "in-place"?

I. Just implemented a kind of bitwise trie (based on nedtries), but my code does lot
Of memory allocation (for each node).
Contrary to my implemetation, nedtries are claimed to be fast , among othet things,
Because of their small number of memory allocation (if any).
The author claim his implementation to be "in-place", but what does it really means in this context ?
And how does nedtries achieve such a small number of dynamic memory allocation ?
Ps: I know that the sources are available, but the code is pretty hard to follow and I cannot figure how it works
I'm the author, so this is for the benefit of the many according to Google who are similarly having difficulties in using nedtries. I would like to thank the people here on stackflow for not making unpleasant comments about me personally which some other discussions about nedtries do.
I am afraid I don't understand the difficulties with knowing how to use it. Usage is exceptionally easy - simply copy the example in the Readme.html file:
typedef struct foo_s foo_t;
struct foo_s {
NEDTRIE_ENTRY(foo_t) link;
size_t key;
};
typedef struct foo_tree_s foo_tree_t;
NEDTRIE_HEAD(foo_tree_s, foo_t);
static foo_tree_t footree;
static size_t fookeyfunct(const foo_t *RESTRICT r)
{
return r->key;
}
NEDTRIE_GENERATE(static, foo_tree_s, foo_s, link, fookeyfunct, NEDTRIE_NOBBLEZEROS(foo_tree_s));
int main(void)
{
foo_t a, b, c, *r;
NEDTRIE_INIT(&footree);
a.key=2;
NEDTRIE_INSERT(foo_tree_s, &footree, &a);
b.key=6;
NEDTRIE_INSERT(foo_tree_s, &footree, &b);
r=NEDTRIE_FIND(foo_tree_s, &footree, &b);
assert(r==&b);
c.key=5;
r=NEDTRIE_NFIND(foo_tree_s, &footree, &c);
assert(r==&b); /* NFIND finds next largest. Invert the key function to invert this */
NEDTRIE_REMOVE(foo_tree_s, &footree, &a);
NEDTRIE_FOREACH(r, foo_tree_s, &footree)
{
printf("%p, %u\n", r, r->key);
}
NEDTRIE_PREV(foo_tree_s, &footree, &a);
return 0;
}
You declare your item type - here it's struct foo_s. You need the NEDTRIE_ENTRY() inside it otherwise it can contain whatever you like. You also need a key generating function. Other than that, it's pretty boilerplate.
I wouldn't have chosen this system of macro based initialisation myself! But it's for compatibility with the BSD rbtree.h so nedtries is very easy to swap in to anything using BSD rbtree.h.
Regarding my usage of "in place"
algorithms, well I guess my lack of
computer science training shows
here. What I would call "in place"
is when you only use the memory
passed into a piece of code, so if
you hand 64 bytes to an in place
algorithm it will only touch that 64
bytes i.e. it won't make use of
extra metadata, or allocate some
extra memory, or indeed write to
global state. A good example is an
"in place" sort implementation where
only the collection being sorted
(and I suppose the thread stack)
gets touched.
Hence no, nedtries doesn't need a
memory allocator. It stores all the
data it needs in the NEDTRIE_ENTRY
and NEDTRIE_HEAD macro expansions.
In other words, when you allocate
your struct foo_s, you do all the
memory allocation for nedtries.
Regarding understanding the "macro
goodness", it's far easier to
understand the logic if you compile
it as C++ and then debug it :). The
C++ build uses templates and the
debugger will cleanly show you state
at any given time. In fact, all
debugging from my end happens in a
C++ build and I meticulously
transcribe the C++ changes into
macroised C.
Lastly, before a new release, I
search Google for people having
problems with my software to see if
I can fix things and I am typically
amazed what someone people say about
me and my free software. Firstly,
why didn't those people having
difficulties ask me directly for
help? If I know that there is
something wrong with the docs, then
I can fix them - equally, asking on
stackoverflow doesn't let me know
immediately that there is a docs
problem bur rather relies on me to
find it next release. So all I would
say is that if anyone finds a
problem with my docs, please do
email me and say so, even if there
is a discussion say like here on
stackflow.
Niall
I took a look at the nedtrie.h source code.
It seems that the reason it is "in-place" is that you have to add the trie bookkeeping data to the items that you want to store.
You use the NEDTRIE_ENTRY macro to add parent/child/next/prev links to your data structure, and you can then pass that data structure to the various trie routines, which will extract and use those added members.
So it is "in-place" in the sense that you augment your existing data structures and the trie code piggybacks on that.
At least that's what it looks like. There's lots of macro goodness in that code so I could have gotten myself confused (:
In-place means you operate on the original (input) data, so the input data becomes the output data. Not-in-place means that you have separate input and output data, and the input data is not modified. In-place operations have a number of advantages - smaller cache/memory footprint, lower memory bandwidth, hence typically better performance, etc, but they have the disadvantage that they are destructive, i.e. you lose the original input data (which may or may not matter, depending on the use case).
In-place means to operate on the input data and (possibly) update it. The implication is that there no copying and/moving of the input data. This may result in loosing the input data original values which you will need to consider if it is relevant for your particular case.

'Multipurpose' linked list implementation in pure C

This is not exactly a technical question, since I know C kind of enough to do the things I need to (I mean, in terms of not 'letting the language get in your way'), so this question is basically a 'what direction to take' question.
Situation is: I am currently taking an advanced algorithms course, and for the sake of 'growing up as programmers', I am required to use pure C to implement the practical assignments (it works well: pretty much any small mistake you make actually forces you to understand completely what you're doing in order to fix it). In the course of implementing, I obviously run into the problem of having to implement the 'basic' data structures from the ground up: actually not only linked lists, but also stacks, trees, et cetera.
I am focusing on lists in this topic because it's typically a structure I end up using a lot in the program, either as a 'main' structure or as a 'helper' structure for other bigger ones (for example, a hash tree that resolves conflicts by using a linked list).
This requires that the list stores elements of lots of different types. I am assuming here as a premise that I don't want to re-code the list for every type. So, I can come up with these alternatives:
Making a list of void pointers (kinda inelegant; harder to debug)
Making only one list, but having a union as 'element type', containing all element types I will use in the program (easier to debug; wastes space if elements are not all the same size)
Using a preprocessor macro to regenerate the code for every type, in the style of SGLIB, 'imitating' C++'s STL (creative solution; doesn't waste space; elements have the explicit type they actually are when they are returned; any change in list code can be really dramatic)
Your idea/solution
To make the question clear: which one of the above is best?
PS: Since I am basically in an academic context, I am also very interested in the view of people working with pure C out there in the industry. I understand that most pure C programmers are in the embedded devices area, where I don't think this kind of problem I am facing is common. However, if anyone out there knows how it's done 'in the real world', I would be very interested in your opinion.
A void * is a bit of a pain in a linked list since you have to manage it's allocation separately to the list itself. One approach I've used in the past is to have a 'variable sized' structure like:
typedef struct _tNode {
struct _tNode *prev;
struct _tNode *next;
int payloadType;
char payload[1]; // or use different type for alignment.
} tNode;
Now I realize that doesn't look variable sized but let's allocate a structure thus:
typedef struct {
char Name[30];
char Addr[50];
} tPerson;
tNode *node = malloc (sizeof (tNode) - 1 + sizeof (tPerson));
Now you have a node that, for all intents and purposes, looks like this:
typedef struct _tNode {
struct _tNode *prev;
struct _tNode *next;
int payloadType;
char Name[30];
char Addr[50];
} tNode;
or, in graphical form (where [n] means n bytes):
+----------------+
| prev[4] |
+----------------+
| next[4] |
+----------------+
| payloadType[4] |
+----------------+ +----------+
| payload[1] | <- overlap -> | Name[30] |
+----------------+ +----------+
| Addr[50] |
+----------+
That is, assuming you know how to address the payload correctly. This can be done as follows:
node->prev = NULL;
node->next = NULL;
node->payloadType = PLTYP_PERSON;
tPerson *person = &(node->payload); // cast for easy changes to payload.
strcpy (person->Name, "Bob Smith");
strcpy (person->Addr, "7 Station St");
That cast line simply casts the address of the payload character (in the tNode type) to be an address of the actual tPerson payload type.
Using this method, you can carry any payload type you want in a node, even different payload types in each node, without the wasted space of a union. This wastage can be seen with the following:
union {
int x;
char y[100];
} u;
where 96 bytes are wasted every time you store an integer type in the list (for a 4-byte integer).
The payload type in the tNode allows you to easily detect what type of payload this node is carrying, so your code can decide how to process it. You can use something along the lines of:
#define PAYLOAD_UNKNOWN 0
#define PAYLOAD_MANAGER 1
#define PAYLOAD_EMPLOYEE 2
#define PAYLOAD_CONTRACTOR 3
or (probably better):
typedef enum {
PAYLOAD_UNKNOWN,
PAYLOAD_MANAGER,
PAYLOAD_EMPLOYEE,
PAYLOAD_CONTRACTOR
} tPayLoad;
My $.002:
Making a list of void pointers (kinda diselegant; harder to debug)
This isn't such a bad choice, IMHO, if you must write in C. You might add API methods to allow the application to supply a print() method for ease of debugging. Similar methods could be invoked when (e.g.) items get added to or removed from the list. (For linked lists, this is usually not necessary, but for more complex data structures -- hash tables, for example) -- it can sometimes be a lifesaver.)
Making only one list, but having a union as 'element type', containing all element types I will use in the program (easier to debug; wastes space if elements are not all the same size)
I would avoid this like the plague. (Well, you did ask.) Having a manually-configured, compile-time dependency from the data structure to its contained types is the worst of all worlds. Again, IMHO.
Using a preprocessor macro to regenerate the code for every type, in the style of SGLIB (sglib.sourceforge.net), 'imitating' C++'s STL (creative solution; doesn't waste space; elements have the explicit type they actually are when they are returned; any change in list code can be really dramatic)
Intriguing idea, but since I don't know SGLIB, I can't say much more than that.
Your idea/solution
I'd go with the first choice.
I've done this in the past, in our code (which has since been converted to C++), and at the time, decided on the void* approach. I just did this for flexibility - we were almost always storing a pointer in the list anyways, and the simplicity of the solution, and usability of it outweighed (for me) the downsides to the other approaches.
That being said, there was one time where it caused some nasty bug that was difficult to debug, so it's definitely not a perfect solution. I think it's still the one I'd take, though, if I was doing this again now.
Using a preprocessor macro is the best option. The Linux kernel linked list is a excellent a eficient implementation of a circularly-linked list in C. Is very portable and easy to use. Here a standalone version of linux kernel 2.6.29 list.h header.
The FreeBSD/OpenBSD sys/queue is another good option for a generic macro based linked list
I haven't coded C in years but GLib claims to provide "a large set of utility functions for strings and common data structures", among which are linked lists.
Although It's tempting to think about solving this kind of problem using the techniques of another language, say, generics, in practice it's rarely a win. There are probably some canned solutions that get it right most of the time (and tell you in their documentation when they get it wrong), using that might miss the point of the assignment, So i'd think twice about it. For a very few number of cases, It might be feasable to roll your own, but for a project of any reasonable size, Its not likely to be worth the debugging effort.
Rather, When programming in language x, you should use the idioms of language x. Don't write java when you're using python. Don't write C when you're using scheme. Don't write C++ when you're using C99.
Myself, I'd probably end up using something like Pax's suggestion, but actually use a union of char[1] and void* and int, to make the common cases convenient (and an enumed type flag)
(I'd also probably end up implementing a fibonacci tree, just cause that sounds neat, and you can only implement RB Trees so many times before it loses it's flavor, even if that is better for the common cases it'd be used for.)
edit: based on your comment, it looks like you've got a pretty good case for using a canned solution. If your instructor allows it, and the syntax it offers feels comfortable, give it a whirl.
This is a good problem. There are two solutions I like:
Dave Hanson's C Interfaces and Implementations uses a list of void * pointers, which is good enough for me.
For my students, I wrote an awk script to generate type-specific list functions. Compared to preprocessor macros, it requires an extra build step, but the operation of the system is much more transparent to programmers without a lot of experience. And it really helps make the case for parametric polymorphism, which they see later in their curriculum.
Here's what one set of functions looks like:
int lengthEL (Explist *l);
Exp* nthEL (Explist *l, unsigned n);
Explist *mkEL (Exp *hd, Explist *tl);
The awk script is a 150-line horror; it searches C code for typedefs and generates a set of list functions for each one. It's very old; I could probably do better now :-)
I wouldn't give a list of unions the time of day (or space on my hard drive). It's not safe, and it's not extensible, so you may as well just use void * and be done with it.
One improvement over making it a list of void* would be making it a list of structs that contain a void* and some meta-data about what the void* points to, including its type, size, etc.
Other ideas: embed a Perl or Lisp interpreter.
Or go halfway: link with the Perl library and make it a list of Perl SVs or something.
I'd probably go with the void* approach myself, but it occurred to me that you could store your data as XML. Then the list can just have a char* for data (which you would parse on demand for whatever sub elements you need)....

Resources