I have read articles and SO answers about how we can cutoff compiler added padding if we layout the structs in the decreasing order of size of independent fields.
eg: Instead of laying out (L1) the struct like this
typedef struct dummy {
char c1;
short s1;
int i1;
unsigned int ui1;
double d1;
} dummy_t;
create the layout (L2) like this
typedef struct dummy {
double d1;
unsigned int ui1;
int i1;
short s1;
char c1;
} dummy_t;
My question here is, are there scenarios where layout L2 has downsides when compared to L1 or any other layout ? Or can I take it into my design rules to always use L2 ? Also is the natural alignment rule
natural alignment = sizeof(T)
always true for primitive types ?
You should wonder about alignement/structure size only if you are running on a memory constrained system (you want to limit the size of each element). This makes sense only for systems/designs where you know you will store a lot of such structure and used storage space is an issue. On the other hand grouping the content of a structure logically can bring clarity to your code. I would always favor clarity against performance, especially at early design phases. Even more if you do not have performance related constraints.
About performance it might happen that accessing "random" content from L2 is worse than from L1 because memory cache alignement constraints can also come into play. (eg accessing in a row 2 elements of struct L1 can be slower or quicker than 2 elements of L2). For such optimization, profiling is the only way to know which layout is the best.
My question here is, are there scenarios where layout L2 has downsides when compared to L1 or any other layout ?
Sometimes you need to have members in a different order. Reasons for this may include:
The structure is part of a communications protocol and has to be sent byte-by-byte across a network or other communication device. In this case, the layout of the structure may be dictated by the protocol, and you will have to match it exactly (including padding or lack thereof).
The structure is part of a family of structures that share initial sequences of members, so those members must be placed first. This is sometimes done to allow structures to be handled in a “generic” way by operating on those initial members only.
The structure includes a buffer whose length will vary (by being allocated dynamically according to needs that arise during program execution). Such a buffer is implemented with a flexible array member, which must be the last member in the structure.
Also, there may be incidental effects of how members are ordered. For example, if member x happens to be used much more frequently than other members of the structure, putting it at the front might allow the compiler to access it with simpler address arithmetic, since its offset from the beginning of the structure will be zero. This is rarely a consideration in programming, but I mention it for completeness.
Aside such considerations, you are generally free to order members as you desire.
Also is the natural alignment rule natural alignment = sizeof(T) always true for primitive types ?
No. As an example, an eight-byte long might have an alignment require of one, two, four, or eight bytes.
… we can cutoff compiler added padding if we layout the structs in the decreasing order of size of independent fields.
This is not true for members that are aggregates (arrays, structures, and unions). Consider that a member char x[13]; is 13 bytes in size but only requires alignment of one byte. To minimize padding, order members in order of decreasing alignment requirement, not decreasing size.
IMO abstracting from the implementation details like caching (which is too broad to be discussed in the post answer) there is no difference.
The difference is if you place the variable sized (or zero sized) object at the end of the struct (example):
typedef struct
{
size_t size;
char data[];
}string;
string *create(size_t size)
{
string *ptr = malloc(sizeof(*ptr) + size);
if(ptr)
{
ptr -> size = size;
}
return ptr;
}
If course the logical member order (and potential packing) is required if struct will store some received binary data (for example communication packet header)
Related
I'm new to structures and was learning how to find the size of structures. I'm aware of how padding comes in to play in order to properly align the memory. From what I've understood, the alignment is done so that the size in memory comes out to be a multiple of 4.
I tried the following piece of code on GCC.
struct books{
short int number;
char name[3];
}book;
printf("%lu",sizeof(book));
Initially I had thought that the short int would occupy 2 bytes, followed by the character array starting at the third memory location from the beginning. The character array then would need a padding of 3 bytes which would give a size of 8. Something like this, where each word represents a byte in memory.
short short char char
char padding padding padding
However on running it gives a size of 6, which confuses me.
Any help would be appreciated, thanks!
Generally, padding is inserted to allow for aligned access of the internal elements of the structure, not to allow the entire structure to be a size of multiple words. Alignment is a compiler implementation issue, not a requirement of the C standard.
So, the char elements which are 3 bytes in length, need no alignment because they are byte elements.
It is preferred, though not required, that the short element needs to be aligned on a short boundary -- which means an even address. By aligning it on a short boundary, the compiler can issue a single load short instruction rather than having to load a word, mask, and then shift.
In this case, the padding is probably, but not necessarily, happening at the end rather than in the middle. You will have to write code to dump the address of the elements to determine where padding is taking place.
EDIT: . As #Euguen Sh mentions, even if you discover the padding scheme that the compiler is using for the structure, the compiler could modify that in a different version of the compiler.
It is unwise to count on the padding scheme of the compiler. There are always methods to access the elements in such a way that you do not guess at alignments.
The sizeof() operator is used to allow you to see how much memory is used AND to know how much will be added to a ptr to the structure if that pointer is incremented by 1 (ptr++).
EDIT 2, Packing: Structures may be packed to prevent padding using the __packed__ attribute. When designing a structure, it is wise to use elements that naturally pack. This is especially important when sending data over a communications link. A carefully designed structure avoids the need for padding in the middle of the strucuture. A poorly designed structure which is then compiled with the __packed__ attribute may have internal elements that are not naturally aligned. One might do this to ensure that the structure will transmit across a wire as it was originally designed. This type of effort has diminished with the introduction of JSON for transmission of data over a wire.
#include <stdalign.h>
#include <assert.h>
The size of a struct is always divisible by the maximum alignment of the members (which must be a power of two).
If you have a struct with char and short the alignment is 2, because the alignment of short is two, if you have a struct, only out of chars it has an alignment of 1.
There are multiple ways to manipulate the alignment:
alignas(4) char[4]; // this can hold 32-bit ints
This is nonstandart, but available in most compilers (GCC, Clang, ...):
struct A {
char a;
short b;
};
struct __attribute__((packed)) B {
char a;
short b;
};
static_assert(sizeof(struct A) == 4);
static_assert(alignof(struct A) == 2);
static_assert(sizeof(struct B) == 3);
static_assert(alignof(struct B) == 1);
Usually compilers follow ABI of the target architecture.
It defines alignments of structures and primitive datatypes. And that affects to needed padding and sizes of structures. Because alignment is multiple of 4 in many architectures, size of structures are too.
Compilers may offer some attributes/options for changing alignments more or less directly.
For example gcc and clang offers: __attribute__ ((packed))
I have a system that can run either 32 or 64 bit. If I define a structure with 7 longs and 1 char, then I understand that if the structure runs on 32 bits, a long will be assigned 32 bits, a char will be assigned 8 bits and the structure will require at least 232 bits. But if the structure runs on 64 bits, then a long will be assigned 64 bits, a char will be assigned 8 bits and the structure will require at least 456 bits. I also understand that the memory will be optimized for arrays of a structure if the structure requires a power of 2 bits. This structure, then, will have to fill 256 bits on a 32 bit system or 512 bits on a 64 bit system. Will that padding be automatically added to the structure to optimize the memory, or should I add something to the structure to bring it closer to a power of 2 bits in order to optimize the processing of an array of those structures?
Edit: Just saw the article about array indexing using shifts vs multiplication. My advice would be to size the structs appropriately for your data, taking care not to waste space due to slop if you can help it. If a profiler determines that indexing elements is the major performance hit for you, you can try adding slop to specifically reach a specific byte size. However, my intuition tells me (on a modern system with caches), you'll suffer a bigger performance hit by needlessly increasing the size of your structures and pushing useful memory out of cache! :D
(Original response follows)
I don't think you'll see a performance penalty from not having your structure sized to a power of two. The performance issue with arrays of structs is typically due to alignment.
Alignment and Performance
In order to minimize the number of instructions required to access a scalar variable, the variable has to exist at a location in memory that is a multiple of its size in bytes. The implication for structures is as follows:
Practically speaking, since the address of a struct is equal to the address of its first member, the beginning address of a given struct has to be a multiple of the size of its first member, in bytes
For compiler writers, the pragmatic approach is to align structures to a multiple of the widest scalar value they contain
Arrays of structs are allocated with padding between them to guarantee that these alignment guarantees hold over the entire array - most compilers will do this for you! :)
Padding is also added between variables in the struct, if necessary to ensure that all structure members are properly aligned
Dealing with Structure Alignment
Most compilers on modern systems will automatically add padding after the structure in order to satisfy alignment requirements for self-aligned types.
So, in general, your structure will be aligned to a multiple of the largest element in the structure. As a result of the longs in your struct, each struct in an array will be spaced such that the beginning address of each is a multiple of sizeof(long). This is accomplished by transparently adding "slop" to the end of your struct. Try this and see what you get:
#include <stdio.h>
struct my_struct
{
long l1;
long l2;
long l3;
long l4;
long l5;
long l6;
long l7;
char c;
};
int main( int argc, char** argv )
{
printf("sizeof(my_struct) == %lu\n", sizeof(struct my_struct));
return 0;
};
/* EOF */
A Note on Packing:
In general, for self-aligned types on systems which support them you can usually use __attribute__((packed)) but this will likely result in a performance penalty as the number of machine instructions required to access a given member will be increased.
If you really care about not wasting space due to alignment slop and you don't need the full range of values in one of those longs, see if you can move that char into one of the longs with a mask or try using bitfields.
One of my personal favorite resources on structure packing and alignment: The Lost Art of C Structure Packing
Will that padding be automatically added to the structure to optimize
the memory, or should I add something to the structure to bring it
closer to a power of 2 bits in order to optimize the processing of an
array of those structures?
No, you don't need to add anything to your struct declaration. Generally speaking alignment and padding is taken care of by the compiler.
You can test that yourself by printing the output of sizeof(your_struct);.
It is however possible to do the opposite and optimize for size instead of speed. This can be useful is memory is scare or if you send your raw struct over the network. GCC has __attribute__((packed)) to do this.
This is a bit late, but I just want to share this article about structure packing with details on how to optimize the size of the struct variables with rearrangement of the order of declaration of the struct members:
http://www.catb.org/esr/structure-packing/
This question already has answers here:
Practical Use of Zero-Length Bitfields
(5 answers)
Closed 8 years ago.
In C, sometimes certain members of a structure tend to have misaligned offsets, as in case of this thread in HPUX community
In such a case, one is suggested to use zero-width bit field to align the(misaligned) next member.
Under what circumstance does misalignment of structure members happen? Is it not the job of the compiler to align offsets of members at word boundary?
"Misalignment" of a structure member can only occur if the alignment requirements of the structure member are deliberately hidden. (Or if some implementation-specific mechanism is used to suppress alignment, such as gcc's packed attribute`.)
For example, in the referenced problem, the issue is that there is a struct:
struct {
// ... stuff
int val;
unsigned char data[DATA_SIZE];
// ... more stuff
}
and the programmer attempts to use data as though it were a size_t:
*(size_t*)s->data
However, the programmer has declared data as unsigned char and the compiler therefore only guarantees that it is aligned for use as an unsigned char.
As it happens, data follows an int and is therefore also aligned for an int. On some architectures this would work, but on the target architecture a size_t is bigger than an int and requires a stricter alignment.
Obviously the compiler cannot know that you intend to use a structure member as though it were some other type. If you do that and compile for an architecture which requires proper alignment, you are likely to experience problems.
The referenced thread suggests inserting a zero-length size_t bit-field before the declaration of the unsigned char array in order to force the array to be aligned for size_t. While that solution may work on the target architecture, it is not portable and should not be used in portable code. There is no guarantee that a 0-length bit-field will occupy 0 bits, nor is there any guarantee that a bit-field based on size_t will actually be stored in a size_t or be appropriately aligned for any non bit-field use.
A better solution would be to use an anonymous union:
// ...
int val;
union {
size_t dummy;
unsigned char data[DATA_SIZE];
};
// ...
With C11, you can specify a minimum alignment explicitly:
// ...
int val;
_Alignas(size_t) unsigned char data[DATA_SIZE];
// ...
In this case, if you #include <stdalign.h>, you can spell _Alignas in a way which will also work with C++11:
int val;
alignas(size_t) unsigned char data[DATA_SIZE];
Q: Why does it misalignment happen? Is it not the job of the compiler to align offsets of members at word boundary?
You are probably aware that the reason that structure fields are aligned to specific boundaries is to improve performance. A properly aligned field may only require a single memory fetch operation by the CPU; where a mis-aligned field will require at least two memory fetch operations (twice the CPU time).
As you indicated, it is the compilers job to align structure fields for fastest CPU access; unless a programmer over-rides the compiler's default behavior.
Then the question might be; Why would the programmer over-ride the compiler's default alignment of structure fields?
One example of why a programmer would want to over-ride the default alignment is when sending a structure 'over the wire' to another computer. Generally, a programmer wants to pack as much data as possible, into fewest number of bytes.
Hence, the programmer will disable the default alignment when structure density is more important than CPU performance accessing structure fields.
Members of a structure are allocated within the structure in the order of their appearance in the declaration and have ascending addresses.
I am faced with the following dilemma: when I need to declare a structure, do I
(1) group the fields logically, or
(2) in decreasing size order, to save RAM and ROM size?
Here is an example, where the largest data member should be at the top, but also should be grouped with the logically-connected colour:
struct pixel{
int posX;
int posY;
tLargeType ColourSpaceSecretFormula;
char colourRGB[3];
}
The padding of a structure is non-deterministic (that is, is implementation-dependent), so we cannot reliably do pointer arithmetic on structure elements (and we shouldn't: imagine someone reordering the fields to his liking: BOOM, the whole code stops working).
-fpack-structs solves this in gcc, but bears other limitations, so let's leave compiler options out of the question.
On the other hand, code should be, above all, readable. Micro optimizations are to be avoided at all cost.
So, I wonder, why are structures' members ordered by the standard, making me worry about the micro-optimization of ordering struct member in a specific way?
The compiler is limited by several traditional and practical limitations.
The pointer to the struct after a cast (the standard calls it "suitably converted") will be equal to the pointer to the first element of the struct. This has often been used to implement overloading of messages in message passing. In that case a struct has the first element that describes what type and size the rest of the struct is.
The last element can be a dynamically resized array. Even before official language support this has been often used in practice. You allocate sizeof(struct) + length of extra data and can access the last element as a normal array with as many elements that you allocated.
Those two things force the compiler to have the first and last elements in the struct in the same order as they are declared.
Another practical requirement is that every compilation must order the struct members the same way. A smart compiler could make a decision that since it sees that some struct members are always accessed close to each other they could be reordered in a way that makes them end up in a cache line. This optimization is of course impossible in C because structs often define an API between different compilation units and we can't just reorder things differently on different compilations.
The best we could do given the limitations is to define some kind of packing order in the ABI to minimize alignment waste that doesn't touch the first or last element in the struct, but it would be complex, error prone and probably wouldn't buy much.
If you couldn't rely on the ordering, then it would be much harder to write low-level code which maps structures onto things like hardware registers, network packets, external file formats, pixel buffers, etc.
Also, some code use a trick where it assumes that the last member of the structure is the highest-addressed in memory to signify the start of a much larger data block (of unknown size at compile time).
Reordering fields of structures can sometime yield good gains in data size and often also in code size, especially in 64 bit memory model. Here an example to illustrate (assuming common alignment rules):
struct list {
int len;
char *string;
bool isUtf;
};
will take 12 bytes in 32 bit but 24 in 64 bit mode.
struct list {
char *string;
int len;
bool isUtf;
};
will take 12 bytes in 32 bit but only 16 in 64 bit mode.
If you have an array of these structures you gain 50% in the data but also in code size, as indexing on a power of 2 is simpler than on other sizes.
If your structure is a singleton or not frequent, there's not much point in reordering the fields. If it is used a lot, it's a point to look at.
As for the other point of your question. Why doesn't the compiler do this reordering of fields, it is because in that case, it would be difficult to implement unions of structures that use a common pattern. Like for example.
struct header {
enum type;
int len;
};
struct a {
enum type;
int len;
bool whatever1;
};
struct b {
enum type;
int len;
long whatever2;
long whatever4;
};
struct c {
enum type;
int len;
float fl;
};
union u {
struct h header;
struct a a;
struct b b;
struct c c;
};
If the compiler rearranged the fields, this construct would be much more inconvenient, as there would be no guarantee that the type and len fields were identical when accessing them via the different structs included in the union.
If I remember correctly the standard even mandates this behaviour.
All,
Here is an example on Unions which I find confusing.
struct s1
{
int a;
char b;
union
{
struct
{
char *c;
long d;
}
long e;
}var;
};
Considering that char is 1 byte, int is 2 bytes and long is 4 bytes. What would be the size of the entire struct here ? Will the union size be {size of char*}+ {size of double} ? I am confused because of the struct wrapped in the union.
Also, how can I access the variable d in the struct. var.d ?
The sizes are implementation-defined, because of padding. A union will be at least the size of the largest member, while a struct will be at least the sum of the members' sizes. The inner struct will be at least sizeof(char *) to sizeof(long), so the union will be at least that big. The outer struct will be at least sizeof(int) + 1 + sizeof(char *) + sizeof(long). All of the structs and unions can have padding.
You are using an extension to the standard, unnamed fields. In ISO C, there would be no way to access the inner struct. But in GCC (and I believe MSVC), you can do var.d.
Also, you're missing the semi-colon after the inner struct.
With no padding, and assuming sizeof(int)==sizeof(char *)==sizeof(long)==4, the size of the outer struct will be 13.
Breaking it down, the union var overlaps an anonymous struct with a single long. That inner struct is larger (a pointer and a long) so its size controls the size of the union, making the union consume 8 bytes. The other members are 4 bytes and 1 byte, so the total is 13.
In any sensible implementation with the size assumptions I made above, this struct will be padded to either 2 byte or 4 byte boundaries, adding at least 1 or 3 additional bytes to the size.
Edit: In general, since the sizes of all of the member types are themselves implementation defined, and the padding is implementation defined, you need to refer to the documentation for your implementation and the platform to know for sure.
The implementation is allowed to insert padding after essentially any element of a struct. Sensible implementations use as little padding as required to comply with platform requirements (e.g. RISC processors often require that a value is aligned to the size of that value) or for performance.
If using a struct to map fields to the layout of values assumed by file format specification, a coprocessor in shared memory, a hardware device, or any similar case where the packing and layout actually matter, then you might want to be concerned that you are testing at either compile time or run time that your assumptions of the member layout are true. This can be done by verifying the size of the whole structure, as well as the offsets of its members.
See this question among others for a discussion of compile-time assertion tricks.
Unions are dangerous and risky to use without strict discipline. And the fact you put it in a struct is really dangerous because by default all struct members are public: that exposes the possibility of client code making changes to your union, without informing your program what type of data it stuffed in there. If you use a union you should put it in a class where at least you can hide it by making it private.
We had a dev years ago who drank the koolaid of unions, and put it in all his data structures. As a result, the features he wrote with it, are now one of the most despised parts of our entire application, since they are unmodifiable, unfixable and incomprehensible.
Also Unions throw away all the type safety that modern c/c++ compilers give you. Surely if you lie to the compiler it will get back at you someday. Well actually it will get back at your customer when your app crashes.