What are the problems of mixing (Visual Studio) C/C++ projects which have different structure alignment options set? I know that I can obviously set the options differently in different projects, but does this have an affect on the behaviour of the output library/application?
Say I have a library, with structure alignment set to 1-byte, would there be a problem if I had an application which links to the library, but has structure alignment set to default? Does it make a difference if the structures of the library are not used by the application?
Thanks for any help with this!
The structure layout is determined by the alignment rules in affect at compilation time. For efficiency reasons, structure members are aligned in way as to make their addresses in memory multiples of 2, 4, 8 or sometimes larger values.
Extra padding in inserted between structure members depending on their sizes and alignment requirements, and extra padding may be added at the end of the structure to allow arrays of structures to keep this alignment valid for members of all array elements. malloc is guarantied to return a pointer aligned for all practical purposes:
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement.
This behavior can sometimes be controlled via compiler switches and or pragmas as seems to be possible with Visual C/C++.
The consequence is important for structures for which said options will produce different padding. The code to access structure members with be different in the various libraries and programs handling shared data, invoking undefined behaviour.
It is therefore strongly advised to not use different alignment options for data structures shared between different pieces of compiled code. Whether they live in different programs, dynamic libraries, statically linked object files, user or kernel code, device drivers, etc. the result is the same: undefined behavior with potential catastrophic consequences (like any UB).
It only matters for structures used and shared between the incompatible modules, but it is very difficult to keep track over time of who uses and who does not use structure definitions published in shared header files. Structures that are only used locally and whose definition is not visible should not pose a problem, but only for as long as they stay private.
I you use compile time switches, all structures will be affected by these, including standard definitions from the C library, and all structures defined locally or globally in your header files. Standard include files may have provisions to keep them immune from such discrepancies, but your own structure definitions will not.
Don't do this with compile time switches unless you know exactly why you need to do it. Don't do this in source code (via pragmas, attributes and other non portable constructs) unless you are a real expert with complete control over every use of your data.
If you are concerned with structure packing, reorder the struct members and chose their types appropriately to achieve your goals. Changing the compiler's packing rules in highly error prone. See http://www.catb.org/esr/structure-packing/ for a complete study.
Related
This document called "Good Practices in Library Design, Implementation, and Maintenance" by Ulrich Drepper says (bottom of page 5):
[...] the type definition should always create at least a minimal
amount of padding to allow future growth
[...]
Second, a structure should contain at the end a certain number of fill bytes.
struct the_struct
{
int foo;
// ...and more fields
uintptr_t filler[8];
};
[...]
If at a later time a field has to be added to the structure the type definition can be changed to this:
struct the_struct
{
int foo;
// ...and more fields
union
{
some_type_t new_field;
uintptr_t filler[8];
} u;
};
I don't see the point of adding this filler at the end of the structure. Yes it means that when adding a new field (new_field) to the structure, it doesn't actually grow. But isn't the whole point of adding new fields to a structure that you didn't knew you were going to need them? In this example, what if you want to add not one field but 20? Should you then use a filler of 1k bytes just in case? Also, why does is it important that the size of a struct doesn't change in subsequent versions of a library? If the library provides clean abstractions, that shouldn't matter right? Finally, using a 64 bytes filler (8 uintpr_t (yes, it's not necessarily 64 bytes)) sounds like a waste of memory...
The document doesn't go into the details of this at all. Would you have any explanations to why this advice "adding fillers at the end of struct to plan for future growth" is a good one?
Depending on circumstances, yes, the size of the structure can be important for binary compatibility.
Consider stat(). It's typically called like this:
struct stat stbuf;
int r = stat(filename, &stbuf);
With this setup, if the size of the stat structure ever changes, every caller becomes invalid, and will need to be recompiled. If both the called and the calling code are part of the same project, that may not be a problem. But if (as in the case of stat(), which is a system call into the Unix/Linux kernel) there are lots and lots of callers out there, it's practically impossible to force them all to recompile, so the implication is that the size of the stat structure can never be changed.
This sort of problem mainly arises when the caller allocates (or inspects/manipulates) actual instances of the structure. If, on the other hand, the insides of the structure are only ever allocated and manipulated by library code -- if calling code deals only with pointers to the struct, and doesn't try to interpret the pointed-to structures -- it may not matter if the structure changes.
(Now, with all of that said, there are various other things that can be done to mitigate the issues if a struct has to change size. There are libraries where the caller allocates instances of a structure, but then passes both a pointer to the structure, and the size of the structure as the caller knows it, down into the library code. Newer library code can then detect a mismatch, and avoid setting or using newer fields which an older caller didn't allocate space for. And I believe gcc, at least, implements special hooks so that glibc can implement multiple versions of the same structure, and multiple versions of the library functions that use them, so that the correct library function can be used corresponding to the version of the structure that a particular caller is using. Going back to stat(), for example, under Linux there are at least two different versions of the stat structure, one which allocates 32 bits for the file size and one which allocates 64.)
But isn't the whole point of adding new fields to a structure that you
didn't knew you were going to need them?
Well yes, if you knew all along that you would need those members, then it would be counter-productive to intentionally omit them. But sometimes you indeed discover only later that you need some additional fields. Drepper's recommendations speak to ways to design your code -- specifically your structure definitions -- so that you can add members with the minimum possible side effects.
In this example, what if you
want to add not one field but 20?
You don't start out saying "I'm going to want to add 20 members". Rather, you start out saying "I may later discover a need for some more members." That's a prudent position to take.
Should you then use a filler of 1k
bytes just in case?
That's a judgment call. I recon that a KB of extra space in the structure definition is probably overkill in most cases, but there might be a context where that's reasonable.
Also, why does is it important that the size of a
struct doesn't change in subsequent versions of a library? If the
library provides clean abstractions, that shouldn't matter right?
How important it is that the size remains constant is a subjective question, but the size is indeed relevant to binary compatibility for shared libraries. Specifically, the question is whether I can drop a new version of the shared lib in place of the old one, and expect existing programs to work with the new one without recompilation.
Technically, if the definition of the structure changes, even without its size changing, then the new definition is incompatible with the old one as far as the C language is concerned. In practice, however, with most C implementations, if the structure size is the same and the layout does not change except possibly within previously-unused space, then existing users will not notice the difference in many operations.
If the size does change, however, then
dynamic allocation of instances of the structure will not allocate the correct amount of space.
arrays of the structure will not be laid out correctly.
copying from one instance to another via memcpy() will not work correctly.
binary I/O involving instances of the structure will not transfer the correct number of bytes.
There are likely other things that could go wrong with a size change that would (again, in practice) be ok under conversion of some trailing padding into meaningful members.
Do note: one thing that might still be a problem if the structure members change without the overall size changing is passing structures to functions by value and (somewhat less so) receiving them as return values. A library making use of this approach to provide for binary compatibility would do well to avoid providing functions that do those things.
Finally, using a 64 bytes filler (8 uintpr_t (yes, it's not
necessarily 64 bytes)) sounds like a waste of memory...
In a situation in which those 64 bytes per structure is in fact a legitimate concern, then that might trump binary compatibility concerns. That would be the case if you anticipate a very large number of those structures to be in use at the same time, or if you are extremely memory-constrained. In many cases, however, the extra space is inconsequential, whereas the extra scope for binary compatibility afforded by including padding is quite valuable.
The document doesn't go into the details of this at all. Would you
have any explanations to why this advice "adding fillers at the end of
struct to plan for future growth" is a good one?
Like most things, the recommendation needs to be evaluated relative to your particular context. In the foregoing, I've touched on most of the points you would want to consider in such an evaluation.
According to CERT coding rule POS49-C it is possible that different threads accessing different fields of the same structure may conflict.
Instead of bit-field, I use regular unsigned int.
struct multi_threaded_flags {
unsigned int flag1;
unsigned int flag2;
};
struct multi_threaded_flags flags;
void thread1(void) {
flags.flag1 = 1;
}
void thread2(void) {
flags.flag2 = 2;
}
I can see that even unsigned int, there can still be racing condition IF compiler decides to use load/store 8 bytes instead of 4 bytes.
I think compiler will never do that and racing condition will never happen here, but that's completely just my guess.
Is there any well-defined assembly/compiler documentation regarding this case ? I hope locking, which is costly, is the last resort when this situation happens to be undefined.
FYI, I use gcc.
The C11 memory model guarantees that accesses to distinct structure members (which aren't part of a bit-field) are independent, so you'll run into no problems modifying the two flags from different threads (i.e., the "load 8 bytes, modify 4, and write back 8" scenario is not allowed).
This guarantee does not extend in general to bitfields, so you have to be careful there.
Of course, if you are concurrently modifying the same flag from more than one thread, you'll likely trigger the prohibition against data races, so don't do that.
Before C11, ISO C had nothing to say about threads, and writing multi-threaded code relied on other standards (e.g. POSIX which defines a memory model for pthreads), and multi-threaded code essentially depended on the way real compilers worked.
Note that this CERT coding-standard rule is in the POSIX section, and appears to be about pthreads without C11. (There's a CON32-C. Prevent data races when accessing bit-fields from multiple threads rule for C11, where they solve the bit-field concurrency problem by simply promoting the bit-fields to unsigned char, which C11 defines as "separate memory locations". This rule appears to be an insufficiently-edited copy of that, because many of its suggestions suck.)
But unfortunately POSIX pthreads doesn't clearly define what a "memory location is", and this is all they have to say on the subject:
Single UNIX® Specification, Version 4, 2016 Edition (online HTML copy, requires free registration)
4.12 Memory Synchronization
Applications shall ensure that access to any memory location by more
than one thread of control (threads or processes) is restricted such
that no thread of control can read or modify a memory location while
another thread of control may be modifying it. Such access is
restricted using functions that synchronize thread execution and also
synchronize memory with respect to other threads.
This is why C11 defines it more clearly, where only bitfields are dangerous to write from different threads (barring compiler bugs).
However, I think everyone (including all compilers) agreed that separate int variables / struct members / array elements were separate "memory locations". Most real-world software doesn't take any special precautions for int or char variables that may be written by separate threads (especially outside of structs).
A compiler that gets int wrong will cause problems all over the place unless the bug is limited to very specific circumstances.
Most bugs like this are very hard to detect with testing, because usually the other data that's non-atomically loaded and stored back isn't written by another thread very often / ever. But if a compiler always did that for every int, problems would show up in some software pretty quickly.
Normally, separate char members would also be considered separate "memory locations", but some pre-C11 implementations might have exceptions to that rule. (Especially on early Alpha AXP which famously has no byte store instruction (so a C11 implementation would have to use 32-bit char), but optimizations that invent writes when updating multiple members can happen anywhere, either by accident or because the compiler developers define "memory location" as a 32 or 64-bit word.)
There's also the issue of compiler bugs. This can affect even compilers that intend to conform to C11. For example gcc bug 52080 which affected some non-x86 architectures. (Discovered in gcc4.7 in 2012, fixed in gcc4.8 a couple months later). Using a bitfield "tricked" the compiler into doing a non-atomic read-modify-write of the containing 64-bit word, even though that included a non-bitfield member. (Bitfields are bait for compiler bugs. Any defensive / safe-coding standard should recommend avoiding them in structs where different members can be modified from different threads. And definitely don't put them next to the actual lock.)
Herb Sutter's talk atomic<> Weapons: The C++ Memory Model and Modern Hardware part 2 goes into some detail about the kinds of compiler bugs that have affected multi-threaded code. Most of these should be shaken out by now (2017) if you're using a modern compiler. Most things like inventing writes (or non-atomic read and write-back of the same value) were usually still considered bugs before C11; C11 mostly just firmed up the rules compilers were already trying to follow. It also made it easier to report such bugs, because you could say unequivocally that it violates the standard instead of just "it breaks my code".
That coding-rule article is poorly written. Its examples with adjacent bit-fields are unsafe, but it claims that all variables are at risk. This is not true in general, especially not with C11. Many users of pthreads can or already do compile with C11 compilers.
(The phrasing I'm referring to is "Bit-fields are especially prone to this behavior", which incorrectly implies that this is allowed to happen with ordinary members of structs, or variables that happen to be adjacent outside of structs)
It's part of a defensive coding standard, but it should definitely make the distinction between what the standards require and what is just belt-and-suspenders defense against compiler bugs.
Also, putting variables that will usually be accessed by different threads into one struct is generally terrible. False sharing of a cache line (typically 64 bytes) is really bad for performance, causing cache misses and (on out-of-order x86 CPUs) memory-ordering mis-speculation (like a branch mispredict requiring a roll-back.) Putting separately-used shared variables into the same byte with bit-fields is even worse, because it prevents efficient stores (any store has to be a RMW of the containing byte).
Solving the bit-field problem by promoting the two bit-fields to unsigned char makes much more sense than using a mutex if they need to be independently writeable from separate threads. Or even unsigned long if you're paranoid.
If the two members are often used together, it makes sense to put them nearby. But if you're going to pad to a whole long to contain both members (like that article does), you might as well make them at least unsigned char or bool instead of 1-byte bitfields.
Although honestly, having two threads modify separate members of a struct at the same time seems like poor design unless one of the members is the lock, and the modification is part of an attempt to take the lock. Using a bit-field as a lock is a bad idea unless you're writing for a specific ISA building and your own lock primitive using something like x86's lock bts instruction to atomically test-and-set a bit. Even then it's a bad idea unless you need to pack it with other bitfields for space saving; the Linux code that exposed the gcc bug with an int lock:1 member was a horrible idea.
In addition, the flags are declared volatile to ensure that the compiler will not attempt to move operations on them outside the mutex.
If your compiler needs this, your compiler is seriously broken, and will create broken code for most multi-threaded programs. (Unless the compiler bug only happens with bit-fields, because shared bit-fields are rare).
Most code doesn't make shared variables volatile, and relies on the guarantee that mutex lock/unlock stops operations from reordering at compile or run time out of the critical section.
Back in 2012, and possibly still today, gcc -pthread might affect code-gen choices in C89/C99 mode (-std=gnu99). In discussion on an LWN article about that gcc bug, this user claimed that -pthread would prohibit the compiler from doing a 64-bit load/store when modifying a 32-bit variable, but that without -pthread it could do so (although on most architectures, IDK why it would). But it turns out that gcc bug manifested even with -pthread, so it was really a bug rather than an aggressive optimization choice.
ISO C11 standard:
N1570, section 3.14 definitions:
memory location: either an object of scalar type, or a maximal sequence of adjacent bit-fields all having nonzero width
NOTE 1 Two threads of execution can update and access separate memory locations without interfering with each other.
A bit-field and an adjacent non-bit-field member are in separate memory locations. ... It is not safe to concurrently update two non-atomic bit-fields in the same structure if all members declared between them are also (non-zero-length) bit-fields, no matter what the sizes of those intervening bit-fields happen to be.
(...gives an example of a struct with bit-fields...)
So in C11, you can't assume anything about the compiler munging other bit-fields when writing one bit-field, but otherwise you're safe. Unless you use a separator :0 field to force the compiler pad enough (or use atomic bit-ops) so that it can update your bit-field without concurrency problems for other fields. But if you want to be safe, it's probably not a good idea to use bit-fields at all in structs that are written by multiple threads at once.
See also other Notes in the C11 standard, e.g. the one linked by #Fuz in Is it well-defined behavior to modify one element of an array while another thread modifies another element of the same array? that explicitly says that compiler transformations that would make this dangerous are disallowed.
I am wondering whether C packs bytes in the stack for optimal CPU retrieval even if they are outside of a struct. And if not, why do so specifically for a struct?
Structs are very widely used in C, and the compiler does various tricks for (1) alignment of objects for access speeds (2) for specific architecture mappings (ex in ARM - Thumb) where a developer can write code to map to Peripheral registers. But sometimes, we need explicit control for transmission across different systems (like network protocols).
From the point of view of embedded systems (ARM), below is a specific recommendation - "peripheral locations should not be accessed using __packed structs (where unaligned members are allowed and there is no internal padding), or using C bitfields. This is because it is not possible to control the number and type of memory access that is being performed by the compiler. The result is code which is non-portable, has undesirable side-effects, and will not work as intended".
Also see Structure padding and packing
I have a large block of data where some operations would be fastest if the block were viewed as an array of 64 bit unsigned integers and others would be fastest if viewed as an array of 32 bit unsigned integers. By 'fastest', I mean fastest on average for the machines that will be running the code. My goal is to be near optimal in all the environments running the code, and I think this is possible if I use a void pointer, casting it to one of the two types for dereferencing. This brings me to my questions:
1) If I use a void pointer, will casting it to one of the two types for dereferencing be slower than directly using a pointer of the desired type?
2) Am I correct in my understanding of the standard that doing this will not violate the anti-aliasing rules, and that it will not produce any undefined or unspecified behaviour? The 32 and 64 bit types I am using exist and have no padding (this is a static assertion).
3) Am I correct in understanding the anti-aliasing rules to basically serve two purposes: type safety and compiler guarantees to enable optimization? If so, if all situations where the code I am discussing will be executed are such that no other dereferencing is happening, am I likely to loose out on any significant compiler optimizations?
I have tagged this with 'c11' because I need to prove from the c11 standard that the behaviour is well defined. Any references to the standard would be appreciated.
Finally, I would like to address a likely concern to be brought up in the responses, regarding "premature optimization". First off, this code is being ran on a diverse computing cluster, were performance is critical, and I know that even a one instruction slowdown in dereferencing would be significant. Second, testing this on all the hardware would take time I don't have to finish the project. There are a lot of different types of hardware, and I have a limited amount of time on site to actually work with the hardware. However, I am confident that an answer to this question will enable me to make the right design choice anyway.
EDIT: An answer and comments pointed out that there is an aliasing problem with this approach, which I verified directly in the c11 standard. An array of unions would require two address calculations and dereferencings in the 32 bit case, so I'd prefer a union of arrays. The questions then become:
1) Is there a performance problem in using a union member as an array as opposed to a pointer to the memory? I.e., is there a cost in union member access? Note that declaring two pointers to the the arrays violates the anti-aliasing rules, so access would need to be made directly through the union.
2) Are the contents of the array guaranteed invariant when accessed through one array then through the other?
There are different aspects to your question. First of all, interpreting memory with different types has several problems:
aliasing
alignment
padding
Aliasing is a "local" problem. Inside a function, you don't want to have pointers to the same object that have a different target type. If you do modify such pointed to objects, the compiler may pretend not to know that the object may have changed and optimize your program falsely. If you don't do that inside a function (e.g do a cast right at the beginning and stay with that interpretation) you should be fine for aliasing.
Alignment problems are often overlooked nowadays because many processors now are quite tolerant with alignment problems, but this is nothing portable and might also have performance impacts. So you'd have to ensure that your array is aligned in a way that is suitable for all types that you access it. This can be done with _Alignas in C11, older compilers have extensions that also allow for this. C11 adds some restrictions to aligment, e.g that this is always a power of 2, which should enable you to write portable code with respect to this problem.
Integer type padding is something rare these days (only exception is _Bool) but to be sure you should use types that are known not to have problems with that. In your case these are [u]int32_t and [u]int64_t that are known to have exactly the number of bits requested and of have two's complement representation for the signed types. If a platform doesn't support them, your program would simply not compile.
I would refrain from using a void pointer. A union of two arrays or an array of union will do better.
Use a proper alignment on the whole type. C11 provides alignas() as keywords. GCC has attributes for alignment which are non-standard (and work in per-11 standards as well). Other compilers may have none at all.
Depending on your architecture, there should be no performance impact. But this cannot be guaranteed (I do not see an issue her, however). You might even align the type to a larger type than 64 bits to fill a cache line perfectly. That might speed up prefetch and writeback.
Aliasing refers to the fact that an object is referenced by multiple pointer a the same time. This means the same memory address can be addressed using two different "sources". The problem is that the compiler may not be aware abouth this and thus may hold the value of a variable in a CPU register during some calculation without writing it back to memory instantly.
If the same variable is then referenced by the other "source" (i.e. pointer), the compiler may read invalid data from the memory location.
Imo is aliasing only relevant within a function if two pointers are pased inside. So, if you do not intend to pass two pointers to the same object (or part of it) there should be no problem at all. Otherwise, you should get comfortable with (compiler)barriers. Edit: C standard seems to be a bit more strict on that, as it requires just the lvalues accessing an object to fulfill certain criteria (C11 6.5/7 (n1570) - thks Matt McNabb).
Oh, and don't use int/long/etc. You really should use stdint.h types if you really need proper sized types.
Systems demand that certain primitives be aligned to certain points within the memory (ints to bytes that are multiples of 4, shorts to bytes that are multiples of 2, etc.). Of course, these can be optimized to waste the least space in padding.
My question is why doesn't GCC do this automatically? Is the more obvious heuristic (order variables from biggest size requirement to smallest) lacking in some way? Is some code dependent on the physical ordering of its structs (is that a good idea)?
I'm only asking because GCC is super optimized in a lot of ways but not in this one, and I'm thinking there must be some relatively cool explanation (to which I am oblivious).
gcc does not reorder the elements of a struct, because that would violate the C standard. Section 6.7.2.1 of the C99 standard states:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
Structs are frequently used as representations of the packing order of binary file formats and network protocols. This would break if that were done. In addition, different compilers would optimize things differently and linking code together from both would be impossible. This simply isn't feasible.
GCC is smarter than most of us in producing machine code from our source code; however, I shiver if it was smarter than us in re-arranging our structs, since it's data that e.g. can be written to a file. A struct that starts with 4 chars and then has a 4 byte integer would be useless if read on another system where GCC decided that it should re-arrange the struct members.
gcc SVN does have a structure reorganization optimization (-fipa-struct-reorg), but it requires whole-program analysis and isn't very powerful at the moment.
C compilers don't automatically pack structs precisely because of alignment issues like you mention. Accesses not on word boundaries (32-bit on most CPUs) carry heavy penalty on x86 and cause fatal traps on RISC architectures.
Not saying it's a good idea, but you can certainly write code that relies on the order of the members of a struct. For example, as a hack, often people cast a pointer to a struct as the type of a certain field inside that they want access to, then use pointer arithmetic to get there. To me this is a pretty dangerous idea, but I've seen it used, especially in C++ to force a variable that's been declared private to be publicly accessible when it's in a class from a 3rd party library and isn't publicly encapsulated. Reordering the members would totally break that.
You might want to try the latest gcc trunk or, struct-reorg-branch which is under active development.
https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Olga+Golovanevsky_+Memory+Layout+Optimizations+of+Structures+and+Objects.pdf