This document called "Good Practices in Library Design, Implementation, and Maintenance" by Ulrich Drepper says (bottom of page 5):
[...] the type definition should always create at least a minimal
amount of padding to allow future growth
[...]
Second, a structure should contain at the end a certain number of fill bytes.
struct the_struct
{
int foo;
// ...and more fields
uintptr_t filler[8];
};
[...]
If at a later time a field has to be added to the structure the type definition can be changed to this:
struct the_struct
{
int foo;
// ...and more fields
union
{
some_type_t new_field;
uintptr_t filler[8];
} u;
};
I don't see the point of adding this filler at the end of the structure. Yes it means that when adding a new field (new_field) to the structure, it doesn't actually grow. But isn't the whole point of adding new fields to a structure that you didn't knew you were going to need them? In this example, what if you want to add not one field but 20? Should you then use a filler of 1k bytes just in case? Also, why does is it important that the size of a struct doesn't change in subsequent versions of a library? If the library provides clean abstractions, that shouldn't matter right? Finally, using a 64 bytes filler (8 uintpr_t (yes, it's not necessarily 64 bytes)) sounds like a waste of memory...
The document doesn't go into the details of this at all. Would you have any explanations to why this advice "adding fillers at the end of struct to plan for future growth" is a good one?
Depending on circumstances, yes, the size of the structure can be important for binary compatibility.
Consider stat(). It's typically called like this:
struct stat stbuf;
int r = stat(filename, &stbuf);
With this setup, if the size of the stat structure ever changes, every caller becomes invalid, and will need to be recompiled. If both the called and the calling code are part of the same project, that may not be a problem. But if (as in the case of stat(), which is a system call into the Unix/Linux kernel) there are lots and lots of callers out there, it's practically impossible to force them all to recompile, so the implication is that the size of the stat structure can never be changed.
This sort of problem mainly arises when the caller allocates (or inspects/manipulates) actual instances of the structure. If, on the other hand, the insides of the structure are only ever allocated and manipulated by library code -- if calling code deals only with pointers to the struct, and doesn't try to interpret the pointed-to structures -- it may not matter if the structure changes.
(Now, with all of that said, there are various other things that can be done to mitigate the issues if a struct has to change size. There are libraries where the caller allocates instances of a structure, but then passes both a pointer to the structure, and the size of the structure as the caller knows it, down into the library code. Newer library code can then detect a mismatch, and avoid setting or using newer fields which an older caller didn't allocate space for. And I believe gcc, at least, implements special hooks so that glibc can implement multiple versions of the same structure, and multiple versions of the library functions that use them, so that the correct library function can be used corresponding to the version of the structure that a particular caller is using. Going back to stat(), for example, under Linux there are at least two different versions of the stat structure, one which allocates 32 bits for the file size and one which allocates 64.)
But isn't the whole point of adding new fields to a structure that you
didn't knew you were going to need them?
Well yes, if you knew all along that you would need those members, then it would be counter-productive to intentionally omit them. But sometimes you indeed discover only later that you need some additional fields. Drepper's recommendations speak to ways to design your code -- specifically your structure definitions -- so that you can add members with the minimum possible side effects.
In this example, what if you
want to add not one field but 20?
You don't start out saying "I'm going to want to add 20 members". Rather, you start out saying "I may later discover a need for some more members." That's a prudent position to take.
Should you then use a filler of 1k
bytes just in case?
That's a judgment call. I recon that a KB of extra space in the structure definition is probably overkill in most cases, but there might be a context where that's reasonable.
Also, why does is it important that the size of a
struct doesn't change in subsequent versions of a library? If the
library provides clean abstractions, that shouldn't matter right?
How important it is that the size remains constant is a subjective question, but the size is indeed relevant to binary compatibility for shared libraries. Specifically, the question is whether I can drop a new version of the shared lib in place of the old one, and expect existing programs to work with the new one without recompilation.
Technically, if the definition of the structure changes, even without its size changing, then the new definition is incompatible with the old one as far as the C language is concerned. In practice, however, with most C implementations, if the structure size is the same and the layout does not change except possibly within previously-unused space, then existing users will not notice the difference in many operations.
If the size does change, however, then
dynamic allocation of instances of the structure will not allocate the correct amount of space.
arrays of the structure will not be laid out correctly.
copying from one instance to another via memcpy() will not work correctly.
binary I/O involving instances of the structure will not transfer the correct number of bytes.
There are likely other things that could go wrong with a size change that would (again, in practice) be ok under conversion of some trailing padding into meaningful members.
Do note: one thing that might still be a problem if the structure members change without the overall size changing is passing structures to functions by value and (somewhat less so) receiving them as return values. A library making use of this approach to provide for binary compatibility would do well to avoid providing functions that do those things.
Finally, using a 64 bytes filler (8 uintpr_t (yes, it's not
necessarily 64 bytes)) sounds like a waste of memory...
In a situation in which those 64 bytes per structure is in fact a legitimate concern, then that might trump binary compatibility concerns. That would be the case if you anticipate a very large number of those structures to be in use at the same time, or if you are extremely memory-constrained. In many cases, however, the extra space is inconsequential, whereas the extra scope for binary compatibility afforded by including padding is quite valuable.
The document doesn't go into the details of this at all. Would you
have any explanations to why this advice "adding fillers at the end of
struct to plan for future growth" is a good one?
Like most things, the recommendation needs to be evaluated relative to your particular context. In the foregoing, I've touched on most of the points you would want to consider in such an evaluation.
Related
When browsing through code on the Internet, I often see snippets like these:
struct epoll_event event;
memset(&event, 0, sizeof(event));
This pattern seems needless to me, if event is filled out in full, but it is widespread. Perhaps to take into account possible future changes of the struct?
This is surely just bad copy-and-paste coding. The man page for epoll does not document any need to zero-initialize the epoll_event structure, and does not do so in the examples. Future changes to the struct do not even seem to be possible (ABI), but if they were, the contract would clearly be that any parts of the structure not related to the events you requested would be ignored (and not even read, since the caller may be passing a pointer to storage that does not extend past the original definition).
Also, in general it's at best pointless and at worst incorrect/nonportable to use memset when a structure is supposed to be zero-initialized, since the zero representation need not be the zero value (for pointer and floating point types). Nowadays this generality is mostly a historical curiosity, and not relevant to a Linux-specific interface like epoll anyway, but it comes up as well with mbstate_t which exists in fully general C, and where zero initialization is required to correctly use the associated interfaces. The correct way to zero-initialize things that need zero values, rather than all-zero-bytes representations, is with the universal zero initializer, { 0 }.
Using memset like this can help you locate bugs faster. Consider it a defensive (even secure) style of programming.
Lets say you didn't use memset, and instead attempt to diligently fill in each member as documented by the API. But if you ever forget to fill in a field (or a later API change leads to the addition of a new field), then the value that field takes at run-time is undefined; and in practice will use whatever the memory previously held.
What are the consequences?
If you are lucky, your code will immediately fail in a nice way that can be debugged, for example, if the unset field needs a highly specific value.
If you are unlucky, your code may still work, and it may work for years. Perhaps on your current operating system the program memory somehow already held the correct value expected by the API. But as you move your code across systems and compilers, expect confusing behavior: "it works on my machine, but I don't understand why it doesn't work on yours".
So in this case, memset is helping you avoid this undeterministic behavior.
Of course, you can still profile your code, check for undefined memory, unit tests etc. Doing memset is not a replacement for those. It's just another technique to get to safe software.
When designing structures to contain textual data, I have been using two basic approaches illustrated below:
typedef struct {
STRING address1;
STRING address2;
STRING city;
STRING state;
STRING zip;
} ADDRESS;
typedef struct {
STRING* address1;
STRING* address2;
STRING* city;
STRING* state;
STRING* zip;
} ADDRESS;
where STRING is some variable length string-storing type. The advantage of the pointer version is that I can store NULL indicating that data is missing. For example, address2 might be not provided for some addresses. In the type with embedded STRINGs, I have to use a "blank" string, meaning one that has 0 length.
With the pointers there is (possibly) more code burden because I have to check every member for NULL before using. The advantage is not that great, however, because usually the embedded version has to be checked too. For example, if I am printing an address, I have to check for a zero-length string and skip that line. With pointers the user can actually indicate they want a "blank" versus a missing value, although it is hard to see a use for this.
When creating or freeing the structure, pointers add a bunch of additional steps. My instinct is to standardize on the embedded style to save these steps, but I am concerned that there might be a hidden gotcha. Is this an unwarranted fear, or should I be using the pointers for some compelling reason?
Note that memory use is an issue, but it is pretty minor. The pointer version takes a little bit more memory because I am storing pointers to the structs in addition to the structs. But each string struct takes maybe 40 bytes on average, so if I am storing 4 byte pointers, then the pointer version costs maybe 10% more memory which is not significant. Having null pointers possible does not save significant memory because most fields are populated.
Question is About ADDRESS not STRING
Some of the respondents seem to be confused and think I am asking about global tradeoffs, like how to minimize my total work. That is not the case. I am asking about how to design ADDRESS, not STRING. The members of address could have fixed arrays, or in other cases not. For the purposes of my question, I am not concerned about are the consequences for the container.
I have already stated that the only issue I can see is that it costs more time to use pointers, but I get the benefit of being able to store a NULL. However, as I already said, that benefit does not seem to be significant, but maybe it is for some reason. That is the essence of my question: is there some hidden benefit of having this flexibility that I am not seeing and will wish I had later on.
If you don't understand the question, please read the preliminary answer I have written myself below (after some additional thought) to see the kind of answer I am looking for.
Tradeoffs over memory usage and reduction in mallocs
Seems like the tradeoffs center around two questions: 1) How precious is memory? and 2) Does it matter that a fixed amount of memory is allocated for the strings, limiting the lengths to be stored in each field?
If memory is more important than anything else, then the pointer version probably wins. If predictability of storage usage and avoidance of mallocs is preferred, and limiting the length of the names to some fixed amount is acceptabe, then the fixed length version may be the winner.
One problem with the embedded style is that STRING needs to be defined as something like char[MAX_CHAR + 1] where MAX_CHAR is a pessimistic maximum length for the given fields. The pointer style allows one to allocate the correct amount of memory. The downside as you mention is a much higher cognitive overhead managing your struct.
I have been considering this more deeply and I think that in most cases pointers are necessary because it is important to discriminate between blank and missing. The reason for this is that the missing data is needed when an input is invalid or corrupt or left out. For example, let's imagine that when reading from a file, the file is corrupted so a field like zip code is unreadable. In that case the data is "missing" and the pointer should be NULL. On the other hand, let's imagine that the place has no zip code, then it is "blank". So, NULL means that the user has not yet provided information, but blank means the user has provided the information and there is none of the type in question.
So, to further illustrate the importance of using the pointer, imagine that a complex structure is populated over time in different, asynchronous steps. Here we need to know what fields have been read, and which ones have not. Unless we are using pointers (or adding additional metadata), we have no way of telling the difference between a field that has been answered and one for which the answer is "none". Imagine the system prompting the user "what is the zip code?". User says, "this place has no zip code". Then 5 minutes later the system asks again, "what is the zip code?". That use case makes it clear that we need pointers in most cases.
In this light, the only situation where I should use embedded structs is when the container structure is guaranteed to have a full set of data whenever it is created.
In all of the create info structs (vk*CreateInfo) in the new Vulkan API, there is ALWAYS a .sType member. Why is this there if the value can only be one thing? Also the Vulkan specification is very explicit that you can only use vk*CreateInfo structs as parameters for their corresponding vkCreate* function. It seems a little redundant. I can see that if the driver was passing this struct straight to the GPU, you might need to have it (I did notice it is always the first member). But this seems like a really bad idea for the app to do it because if the driver did it, apps would be much less error prone, and prepending an int to a struct doesn't seems like an extremely computational inefficient operation. I just don't see why it exists.
TL;DR
Why do the vk*CreateInfo structs have the .sType member?
They have one so that the pNext field actually works.
Yes, the API takes a struct with a proper C type, so both the caller and the receiver agree on what type that struct is. But especially nowadays, many such structs have linked lists of structures that provide additional information to the implementation. These extension structures (though many are core in Vulkan 1.1/2) are just like all other structures, with their own sType field.
These fields are crucial because the linked lists are built with pNext pointers... which are void*s. They have no set type. The way the implementation determines what a non-NULL pNext pointer points to is by examining the first 4 bytes stored there. This is the sType field; it allows the implementation to know what type to cast the pointer to.
Of course, the primary struct that an API takes doesn't strictly need an sType field, since its type is part of the API itself. However, there is a hypothetical reason to do so (it hasn't panned out in Vulkan releases).
A later version of Vulkan could expand on the creation of, for example, command buffer pools. But how would it do that? Well, they could add a whole new entrypoint: vkCreateCommandPool2. But this function would have almost the exact same signature as vkCreateCommandPool; the only difference is that they take different pCreateInfo structures.
So instead, all you have to do is declare a VkCommandPoolCreateInfo2 structure. And then declare that vkCreateCommandPool can take either one. How would the implementation tell which one you passed in?
Because the first 4 bytes of any such structure is sType. They can test that value. If the value is VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO, then it's the old structure. If it's VK_STRUCTURE_TYPE_COMMAND_POOL_CREATE_INFO_2, then it's the new one.
Of course, as previously stated, this hasn't panned out; post-1.0 Vulkan versions opted to incorporate extension structs rather than replacing existing ones. But the option is there.
IMO all code that returns structure directly can be modified to return pointer to structure.
When is returning a structure directly a good practice?
Modified how? Returning a pointer to a static instance of the structure within the function, thus making the function non-reentrant; or by returning a pointer to a heap allocated structure that the caller has to make sure to free and do so appropiately? I would consider returning a structure being the good practice in the general case.
The biggest advantage to returning a complete structure instead of a pointer is that you don't have to mess with pointers. By avoiding the risks inherent with pointers, especially if you're allocating and freeing your own memory, both coding and debugging can be significantly simplified.
In many cases, the advantages of passing the structure directly outweigh the downsides (time/memory) of copying the entire structure to the stack. Unless you know that optimization is necessary, no reason not to take the easier path.
I see the following cases as the ones I would most commonly opt for the passing structs directly approach:
"Functional programming" style code. Lots of stuff is passed around and having pointers would complicate the code a lot (and that is not even counting if you need to start using malloc+free)
Small structs, like for example
struct Point{ int x, y; };
aren't worth the trouble of passing stuff around by reference.
And lastly, lets not forget that pass-by-value and pass-by-reference are actually very different so some classes of programs will be more suited to one style and will end up looking ugly if the other style is used instead.
These other answers are good, but I think missingno comes closest to "answering the question" by mentioning small structs. To be more concrete, if the struct itself is only a few machine words long, then both the "space" objection and the "time" objection are overcome. If a pointer is one word, and the struct is two words, how much slower is the struct copy operation vs the pointer copy? On a cached architecture, I suspect the answer is "none aat all". And as for space, 2 words on stack < 1 word on stack + 2 words (+overhead) on heap.
But thes considerations are only appropriate for specific cases: THIS porion of THIS program on THIS architecture.
For the level of writing C programs, you should use whichever is easier to read.
If you're trying to make your function side-effect free, returning a struct directly would help, because it would effectively be pass-by-value. Is it more efficient? No, passing by reference is quicker. But having no side effects can really simplify working with threads (a notoriously difficult task).
There are a few cases where returning a structure by value is contra-indicated:
1) A library function that returns 'token' data that is to be re-used later in other calls, eg. a file or socket stream descriptor. Returning a complete structure would break encapsulation of the library.
2) Structs containing data buffers of variable length where the struct has been sized to accommodate the absolute maximum size of the data but where the average data size is much less, eg. a network buffer struct that has a 'dataLen' int and a 'char data[65536]' at its end.
3) Large structs of any typedef where the cost of copying the data becomes significant, eg:
a) When the struct has to be returned through several function calls - multiple copying of the same data.
b) Where the struct is subsequently queued off to other threads - wide queues means longer lock times during the copy-in/copy-out and so increased chance of contention. That, and the size of the struct is inflicted on both producer and consumer thread stacks.
c) Where the struct is often moved around between layers, eg. protocol stack.
4) Where structs of varying def. are to be stored in any array/list/queue/stack/whateverContainer.
I suspect that I am so corrupted by c++ and other OO languages that I tend to malloc/new almost anything that cannot be stored in a native type
Rgds,
Martin
Basically, I'm interested in writing a platform-independent garbage collector in C, probably using the mark-and-sweep algorithm or one of its common variants. Ideally, the interface would work along the following lines:
(1) gc_alloc() allocates memory
(2) gc_realloc() reallocates memory
(3) gc_run() runs the garbage collector.
I've already taken a look at the libgc garbage collection library developed by Boehm et. al., but it isn't platform-independent; it's just been ported to many different systems. I'd like to implement a garbage collector that contains no system-dependent code. Speed isn't a huge issue.
Any suggestions?
Unfortunately, it's not really possible to make a truly platform-independent garbage collector in C. A strict reading of the C standard allows any type (except unsigned char) to have trap bits - bits which, when they have the wrong value, result in the system signalling an exception (ie, undefined behavior). When scanning allocated blocks for pointers, you have no way of determining whether a particular block of memory contains a legal pointer value, or if it'll trap as soon as you try to look at the value in it.
Examining pointers as ints doesn't help either - no int type is required to have representation compatible with a pointer. intptr_t is only available on very recent compilers, and I don't believe its representation is required to be compatible either. And ints can have trap bits as well.
You also don't know the alignment requirements of pointers. On a platform where pointers have no alignment requirements (ie, can start at any byte) this means you need to stop at every byte, memcpy to a suitable pointer type, and examine the result. Oh, and different pointer types can have different representations as well, which is also unavoidable.
But the bigger issue is finding the root set. Bohem GC and the others tend to scan the stack, as well as static data, for pointers that should go in the root set. This is impossible without knowledge of the OS's memory layout. So you will need to have the user explicitly mark members of the root set, which sort of defeats the point of a garbage collector.
So, in short, you can't make a GC in truly portable C. In principle you can if you make a few assumptions:
Assume the root set will be explicitly given to you by the user.
Assume there are no trap bits in pointer or int representations.
Assume intptr_t is available or assume all void *s are strictly ordered (ie, < and > work reasonably with pointers from different mallocations)
Assume all data pointer types have representation compatible with void *.
Optional, but gives a big speed boost: Hardcode the alignment of pointers (this is far from universal, and will need to be compiler- and platform-specific) This assumption will let you skip memcpying pointers to a known-aligned location, and will also cut down in the number of potential pointers to examine.
If you make these assumptions you should be able to make a conservative mark-sweep allocator. Use a binary tree to hold information on where allocations are, and scan over every possible aligned pointer location in allocated blocks for pointers. However, the need to explicitly provide the root set will make this all pointless - it'll be malloc and free all over again, except that for a certain ill-defined set of objects you can skip it. Not exactly what GC is supposed to provide, but I suppose it might have its place as, eg, part of a virtual machine (in which case the root set would be derived from information available to the virtual machine).
Note that this all applies only to conservative GCs - that is, ones which work blindly, scanning for pointers in data without knowing where it could be. If you're working on a VM, it's a lot easier - you can build a unified data type for all allocations by the VM that explicitly lists where pointers can be found. With this plus an explicit root set, you can build a non-conservative GC; this should be sufficient for building a VM or an interpreter.
For a mark-and-sweep algorithm, all you really need to do is calculate which objects are reachable from the root set, right? (It was a while ago since I dug into this...)
This could be managed by a separate object graph for GC-managed objects, and "all" you would need to do is add functions to properly manage that graph whenever you allocate or modify managed objects.
If you also add reference counting for managed objects you would make it easier to calculate which ones are reachable directly from stack references.
This should probably be possible to write fairly platform-independent, though it might be arguable whether this would be a true garbage collector.
Simple pseudocode to show what I mean by reference counting and graph management:
some_object * pObject = gc_alloc(sizeof(some_object));
some_container * pContainer = gc_alloc(sizeof(some_container));
pContainer->pObject = pObject;
/* let the GC know that pContainer has a reference to pObject */
gc_object_reference(pContainer, pObject);
/* a new block of some kind */
{
/* let the GC know we have a new reference for pObject */
some_object * pReference = gc_reference(pObject);
/* do stuff */
...
/* let the GC know that this reference is no longer used */
gc_left_scope(pReference);
}
gc_left_scope(pObject);
gc_run(); /* should not be able to recycle anything, since there still is a live
* reference to pContainer, and that has a reference to pObject
*/