Structure Size Optimization

Structure Size Optimization - c

I have a system that can run either 32 or 64 bit. If I define a structure with 7 longs and 1 char, then I understand that if the structure runs on 32 bits, a long will be assigned 32 bits, a char will be assigned 8 bits and the structure will require at least 232 bits. But if the structure runs on 64 bits, then a long will be assigned 64 bits, a char will be assigned 8 bits and the structure will require at least 456 bits. I also understand that the memory will be optimized for arrays of a structure if the structure requires a power of 2 bits. This structure, then, will have to fill 256 bits on a 32 bit system or 512 bits on a 64 bit system. Will that padding be automatically added to the structure to optimize the memory, or should I add something to the structure to bring it closer to a power of 2 bits in order to optimize the processing of an array of those structures?

Edit: Just saw the article about array indexing using shifts vs multiplication. My advice would be to size the structs appropriately for your data, taking care not to waste space due to slop if you can help it. If a profiler determines that indexing elements is the major performance hit for you, you can try adding slop to specifically reach a specific byte size. However, my intuition tells me (on a modern system with caches), you'll suffer a bigger performance hit by needlessly increasing the size of your structures and pushing useful memory out of cache! :D
(Original response follows)
I don't think you'll see a performance penalty from not having your structure sized to a power of two. The performance issue with arrays of structs is typically due to alignment.
Alignment and Performance
In order to minimize the number of instructions required to access a scalar variable, the variable has to exist at a location in memory that is a multiple of its size in bytes. The implication for structures is as follows:
Practically speaking, since the address of a struct is equal to the address of its first member, the beginning address of a given struct has to be a multiple of the size of its first member, in bytes
For compiler writers, the pragmatic approach is to align structures to a multiple of the widest scalar value they contain
Arrays of structs are allocated with padding between them to guarantee that these alignment guarantees hold over the entire array - most compilers will do this for you! :)
Padding is also added between variables in the struct, if necessary to ensure that all structure members are properly aligned
Dealing with Structure Alignment
Most compilers on modern systems will automatically add padding after the structure in order to satisfy alignment requirements for self-aligned types.
So, in general, your structure will be aligned to a multiple of the largest element in the structure. As a result of the longs in your struct, each struct in an array will be spaced such that the beginning address of each is a multiple of sizeof(long). This is accomplished by transparently adding "slop" to the end of your struct. Try this and see what you get:
#include <stdio.h>
struct my_struct
{
long l1;
long l2;
long l3;
long l4;
long l5;
long l6;
long l7;
char c;
};
int main( int argc, char** argv )
{
printf("sizeof(my_struct) == %lu\n", sizeof(struct my_struct));
return 0;
};
/* EOF */
A Note on Packing:
In general, for self-aligned types on systems which support them you can usually use __attribute__((packed)) but this will likely result in a performance penalty as the number of machine instructions required to access a given member will be increased.
If you really care about not wasting space due to alignment slop and you don't need the full range of values in one of those longs, see if you can move that char into one of the longs with a mask or try using bitfields.
One of my personal favorite resources on structure packing and alignment: The Lost Art of C Structure Packing

Will that padding be automatically added to the structure to optimize
the memory, or should I add something to the structure to bring it
closer to a power of 2 bits in order to optimize the processing of an
array of those structures?
No, you don't need to add anything to your struct declaration. Generally speaking alignment and padding is taken care of by the compiler.
You can test that yourself by printing the output of sizeof(your_struct);.
It is however possible to do the opposite and optimize for size instead of speed. This can be useful is memory is scare or if you send your raw struct over the network. GCC has __attribute__((packed)) to do this.

This is a bit late, but I just want to share this article about structure packing with details on how to optimize the size of the struct variables with rearrangement of the order of declaration of the struct members:
http://www.catb.org/esr/structure-packing/

Related

Best practice for struct layout in C/C++

I have read articles and SO answers about how we can cutoff compiler added padding if we layout the structs in the decreasing order of size of independent fields.
eg: Instead of laying out (L1) the struct like this
typedef struct dummy {
char c1;
short s1;
int i1;
unsigned int ui1;
double d1;
} dummy_t;
create the layout (L2) like this
typedef struct dummy {
double d1;
unsigned int ui1;
int i1;
short s1;
char c1;
} dummy_t;
My question here is, are there scenarios where layout L2 has downsides when compared to L1 or any other layout ? Or can I take it into my design rules to always use L2 ? Also is the natural alignment rule
natural alignment = sizeof(T)
always true for primitive types ?

You should wonder about alignement/structure size only if you are running on a memory constrained system (you want to limit the size of each element). This makes sense only for systems/designs where you know you will store a lot of such structure and used storage space is an issue. On the other hand grouping the content of a structure logically can bring clarity to your code. I would always favor clarity against performance, especially at early design phases. Even more if you do not have performance related constraints.
About performance it might happen that accessing "random" content from L2 is worse than from L1 because memory cache alignement constraints can also come into play. (eg accessing in a row 2 elements of struct L1 can be slower or quicker than 2 elements of L2). For such optimization, profiling is the only way to know which layout is the best.

My question here is, are there scenarios where layout L2 has downsides when compared to L1 or any other layout ?
Sometimes you need to have members in a different order. Reasons for this may include:
The structure is part of a communications protocol and has to be sent byte-by-byte across a network or other communication device. In this case, the layout of the structure may be dictated by the protocol, and you will have to match it exactly (including padding or lack thereof).
The structure is part of a family of structures that share initial sequences of members, so those members must be placed first. This is sometimes done to allow structures to be handled in a “generic” way by operating on those initial members only.
The structure includes a buffer whose length will vary (by being allocated dynamically according to needs that arise during program execution). Such a buffer is implemented with a flexible array member, which must be the last member in the structure.
Also, there may be incidental effects of how members are ordered. For example, if member x happens to be used much more frequently than other members of the structure, putting it at the front might allow the compiler to access it with simpler address arithmetic, since its offset from the beginning of the structure will be zero. This is rarely a consideration in programming, but I mention it for completeness.
Aside such considerations, you are generally free to order members as you desire.
Also is the natural alignment rule natural alignment = sizeof(T) always true for primitive types ?
No. As an example, an eight-byte long might have an alignment require of one, two, four, or eight bytes.
… we can cutoff compiler added padding if we layout the structs in the decreasing order of size of independent fields.
This is not true for members that are aggregates (arrays, structures, and unions). Consider that a member char x[13]; is 13 bytes in size but only requires alignment of one byte. To minimize padding, order members in order of decreasing alignment requirement, not decreasing size.

IMO abstracting from the implementation details like caching (which is too broad to be discussed in the post answer) there is no difference.
The difference is if you place the variable sized (or zero sized) object at the end of the struct (example):
typedef struct
{
size_t size;
char data[];
}string;
string *create(size_t size)
{
string *ptr = malloc(sizeof(*ptr) + size);
if(ptr)
{
ptr -> size = size;
}
return ptr;
}
If course the logical member order (and potential packing) is required if struct will store some received binary data (for example communication packet header)

How can size of a structure be a non-multiple of 4?

I'm new to structures and was learning how to find the size of structures. I'm aware of how padding comes in to play in order to properly align the memory. From what I've understood, the alignment is done so that the size in memory comes out to be a multiple of 4.
I tried the following piece of code on GCC.
struct books{
short int number;
char name[3];
}book;
printf("%lu",sizeof(book));
Initially I had thought that the short int would occupy 2 bytes, followed by the character array starting at the third memory location from the beginning. The character array then would need a padding of 3 bytes which would give a size of 8. Something like this, where each word represents a byte in memory.
short short char char
char padding padding padding
However on running it gives a size of 6, which confuses me.
Any help would be appreciated, thanks!

Generally, padding is inserted to allow for aligned access of the internal elements of the structure, not to allow the entire structure to be a size of multiple words. Alignment is a compiler implementation issue, not a requirement of the C standard.
So, the char elements which are 3 bytes in length, need no alignment because they are byte elements.
It is preferred, though not required, that the short element needs to be aligned on a short boundary -- which means an even address. By aligning it on a short boundary, the compiler can issue a single load short instruction rather than having to load a word, mask, and then shift.
In this case, the padding is probably, but not necessarily, happening at the end rather than in the middle. You will have to write code to dump the address of the elements to determine where padding is taking place.
EDIT: . As #Euguen Sh mentions, even if you discover the padding scheme that the compiler is using for the structure, the compiler could modify that in a different version of the compiler.
It is unwise to count on the padding scheme of the compiler. There are always methods to access the elements in such a way that you do not guess at alignments.
The sizeof() operator is used to allow you to see how much memory is used AND to know how much will be added to a ptr to the structure if that pointer is incremented by 1 (ptr++).
EDIT 2, Packing: Structures may be packed to prevent padding using the __packed__ attribute. When designing a structure, it is wise to use elements that naturally pack. This is especially important when sending data over a communications link. A carefully designed structure avoids the need for padding in the middle of the strucuture. A poorly designed structure which is then compiled with the __packed__ attribute may have internal elements that are not naturally aligned. One might do this to ensure that the structure will transmit across a wire as it was originally designed. This type of effort has diminished with the introduction of JSON for transmission of data over a wire.

#include <stdalign.h>
#include <assert.h>
The size of a struct is always divisible by the maximum alignment of the members (which must be a power of two).
If you have a struct with char and short the alignment is 2, because the alignment of short is two, if you have a struct, only out of chars it has an alignment of 1.
There are multiple ways to manipulate the alignment:
alignas(4) char[4]; // this can hold 32-bit ints
This is nonstandart, but available in most compilers (GCC, Clang, ...):
struct A {
char a;
short b;
};
struct __attribute__((packed)) B {
char a;
short b;
};
static_assert(sizeof(struct A) == 4);
static_assert(alignof(struct A) == 2);
static_assert(sizeof(struct B) == 3);
static_assert(alignof(struct B) == 1);

Usually compilers follow ABI of the target architecture.
It defines alignments of structures and primitive datatypes. And that affects to needed padding and sizes of structures. Because alignment is multiple of 4 in many architectures, size of structures are too.
Compilers may offer some attributes/options for changing alignments more or less directly.
For example gcc and clang offers: __attribute__ ((packed))

What does it mean for an SSE vector to be "16 byte alligned" and how can I ensure that it is?

I'm working with vectors and matrices right now and it was suggested to me that I should use SSE instead of using float arrays. However while reading the definition for the C intrinsics and the Assembly instructions it looks like there is a different version of some of the function where the vector has to be "16 byte aligned" and a slower version where the vector isn't aligned. What does having the vector be 16 byte aligned mean? How can I ensure that my vectors are 16 byte aligned?

Alignment ensures that objects are aligned on an address that is a multiple of some power of two. 16-byte-aligned means that the numeric value of the address is a multiple of 16. Alignment is important because CPUs are often less efficient or downright incapable of loading memory that doesn't have the required alignment.
Your ABI determines the natural alignment of types. In general, integer types and floating-point types are aligned to either their own size, or the size of the largest object of that kind that your CPU can treat at once, whichever is smaller. For instance, on 64-bit Intel machines, 32-bit integers are aligned on 4 bytes, 64-bit integers are aligned on 8 bytes, and 128-bit integers are also aligned on 8 bytes.
The alignment of structures and unions is the same as their most aligned field. This means that if your struct contains a field that has a 2-byte alignment and another field that has an 8-byte alignment, the structure will be aligned to 8 bytes.
In C++, you can use the alignof operator, just like the sizeof operator, to get the alignment of a type. In C, the same construct becomes available when you include <stdalign.h>; alternatively, you can use _Alignof without including anything.
AFAIK, there is no standard way to force alignment to be specific value in C or C++, but there are compiler-specific extensions to do it. On Clang and GCC, you can use the __attribute__((aligned(N))) attribute:
struct s_Stuff {
int var1;
short var2;
char padding[10];
} __attribute__((aligned(16)));
(Example.)
(This attribute is not to be confused with __attribute__((align(N))), which sets the alignment of a variable.)
Off the top of my head, I'm not sure for Visual Studio, but according to SoronelHaetir, that would be __declspec(align(N)). Not sure where it goes on the struct declaration.
In the context of vector instructions, alignment is important because people tend to create arrays of floating-point values and operate on them, instead of using types that are known to be aligned. However, __m128, __m256 and __m512 (and all of their variants, like _m128i and such) from <emmintrin.h>, if your compiler environment has it, are guaranteed to be aligned on the proper boundaries for use with aligned intrinsics.
Depending on your platform, malloc may or may not return memory that is aligned on the correct boundary for vector objects. aligned_alloc was introduced in C11 to address these issues, but not all platforms support it.
Apple: does not support aligned_alloc; malloc returns objects on the most exigent alignment that the platform supports;
Windows: does not support aligned_alloc; malloc returns objects aligned on the largest alignment that VC++ will naturally put an object on without an alignment specification; use _aligned_malloc for vector types
Linux: malloc returns objects aligned on an 8- or 16-byte boundary; use aligned_alloc.
In general, it's possible to request slightly more memory and perform alignment yourself with minimal penalties (aside that you're on your own to write a free-like function that will accept a pointer returned by this function):
void* aligned_malloc(size_t size, size_t alignment) {
intptr_t alignment_mask = alignment - 1;
void* memory = malloc(size + alignment_mask);
intptr_t unaligned_ptr = (intptr_t)memory;
intptr_t aligned_ptr = (unaligned_ptr + alignment_mask) & ~alignment_mask;
return (void*)aligned_ptr;
}
Purists might argue that treating pointers as integers is evil, but at the time of writing, they probably won't have a practical cross-platform solution to offer in exchange.

xx-byte alignment means that a the variable's memory address modulo xx is 0.
Ensuring that is a compiler-specific operation, visual c++ for example has __declspec(align(...)), which will work for variables that the compiler allocates (at file or function scope for example), alignment is somewhat harder for dynamic memory, you can use aligned_malloc for that, although your library may already guarantee 16-byte alignment for malloc, it's generally larger alignments that require such a call.

New Edit to improve and focus my answer to the specific query
To ensure data alignment in memory, there are specific functions in C to force this (assuming your data is compatible - where your data matches or discretely fits into your required alignment)
The function to use is [_aligned_malloc][1] instead of vanilla malloc.
// Using _aligned_malloc
// Note alignment should be 2^N where N is any positive int.
int alignment = 16;
ptr = _aligned_malloc('required_size', alignment);
if (ptr == NULL)
{
printf_s( "Error allocation aligned memory.");
return -1;
}
This will (if it succeeds) force your data to align on the 16 byte boundary and should satisfy the requirements for SSE.
Older answer where I waffle on about struct member alignment, which matters - but is not directly answering the query
To ensure struct member byte alignment, you can be careful how you arrange members in your structs (largest first), or you can set this (to some degree) in your compiler settings, member attributes or struct attributes.
Assuming 32 bit machine, 4 byte ints: This is still 4 byte aligned in memory (first largest member is 4 bytes), but padded to be 16 bytes in size.
struct s_Stuff {
int var1; /* 4 bytes */
short var2; /* 2 bytes */
char padding[10]; /* ensure totals struct size is 16 */
}
The compiler usually pads each member to assist with natural alignment, but the padding may be at the end of the struct too. This is struct member data alignment.
Older compiler struct member alignment settings could look similar to these 2 images below...But this is different to data alignment which relates to memory allocation and storage of the data.
It confuses me when Borland uses the phrase (from the images) Data Alignment, and MS uses Struct member alignment. (Although they both refer to specifically struct member alignment)
To maximise efficiency, you need to code for your hardware (or vector processing in this case), so lets assume 32 bit, 4 byte ints, etc. Then you want to use tight structs to save space, but padded structs may improve speed.
struct s_Stuff {
float f1; /* 4 bytes */
float f2; /* 4 bytes */
float f3; /* 4 bytes */
short var2; /* 2 bytes */
}
This struct may be padded to also align the struct members to 4 byte multiples....The compiler will do this unless you specify that it keeps single byte struct member alignment - so the size ON FILE could be 14 bytes, but still in MEMORY an array of this struct would be 16 bytes in size (with 2 bytes wasted), with an unknown data alignment (possibly 8 bytes as default by malloc but not guaranteed. As mentioned above you can force the data alignment in memory with _aligned_malloc on some platforms)
Also regarding member alignment in a struct, the compiler will use multiples of the largest member to set the alignment. Or more specifically:
A struct is always aligned to the largest type’s alignment
requirements
...from here
If you are using a UNION, you are correct that it is forced to the largest possible struct see here
Check that your compiler settings do not contradict your desired struct member alignment / padding too, or else your structs may differ in size to what you expect.
Now, why is it faster? See here which explains how alignment allows the hardware to transmit discrete chunks of data and maximises the use of the hardware that passes around data. That is, the data does not need to be split up or re-arranged at every stage - through the hardware processing
As a rule, its best to set your compiler to resonate with your hardware (and platform OS) so that your alignment (and padding) works best with your hardware processing ability. 32 bit machines usually work best with 4 byte (32 bit) member alignment, but then data written to file with 4 byte member alignment can consume more space than wanted.
Specifically regarding SSE vectors, as this link states, 4 * 4 bytes is they best way to ensure 16 byte alignment, perhaps like this. (And they refer to data alignment here)
struct s_data {
float array[4];
}
or simply an array of floats, or doubles.

What is the importance of trailing padding bytes in that case?

I have a structure of the following type:
typedef struct
{
unsigned char A;
unsigned long int B;
unsigned short int C;
}
According to the alignment requirements of each basic data type and the alignment requirement of the whole structure, the allocation in memory will be like that:
My question is what is the importance of those trailing padding bytes as long as each structure member is naturally aligned to its size and could be accessed by our processor in one cycle (assuming that the bus size of processor is 32-bit) without alignment faults.
Also, if we declared an array of "2" of this structure, without taking into consideration the trailing bytes, the allocation in memory will be as following:
Each member in the two structures is naturally aligned to its size and could be accessed in one cycle without alignment faults.
So, what is the importance of trailing bytes in this case ?!

The comments from Bryan Olivier and Hans Passant are both right.
Essentially, you have answered your own question: In the 2nd drawing, the alignment of the members of both the first and second array item are correct. If the compiler could layout structures like this there would be no importance to the trailing pad bytes. But it can't.
In C, a structure's layout and size must be the same for every instance of a structure. In your second example, the sizeof(array[0]) is 10 and sizeof(array[1]) is 8. The address of B2 is only two greater than A2, but &B1 is four greater than &A1.
It's more than just alignment - it's ensuring a constant layout and size while still ensuring alignment. Even then it would not require trailing pad bytes if the first byte was aligned, but as you have noticed if you add arrays, then you need them.
Alignment+Layout/Size+Arrays => trailing pad is required.

As Andres said, the compiler can't generate special layout to every member in the array of structures.
For example, assume that the programmer has defined a structure with the following type
typedef struct
{
unsigned char A;
unsigned long int B;
unsigned short int C;
} myStructureType;
And then the programmer has created an instance of this type:
myStructureType myStructure;
The compiler will allocate memory to myStructure as following:
If the programmer has decided to create an array of myStructureType, the compiler will repeat the previous pattern in memory by the number of array's elements as following:
If the compiler neglects the trailing padding bytes, the memory will become misaligned as following
32-bit modern processor will need two memory cycles and some masking operations to fetch B1 (the element B in the index "1" of the array of structures). However, an old processor will fire an alignment faults.
That's why the trailing padding was important in this case

Why aren't bitfields allowed with normal variables?

I wonder why bitfields work with unions/structs but not with a normal variable like int or short.
This works:
struct foo {
int bar : 10;
};
But this fails:
int bar : 10; // "Expected ';' at end of declaration"
Why is this feature only available in unions/structs and not with variables? Isn't it technical the same?
Edit:
If it would be allowed you could make a variable with 3 bytes for instance without using the struct/union member each time. This is how I would to it with a struct:
struct int24_t {
int x : 24 __attribute__((packed));
};
struct int24_t var; // sizeof(var) is now 3
// access the value would be easier:
var.x = 123;

This is a subjective question, "Why does the spec say this?" But I'll give it my shot.
Variables in a function normally have "automatic" storage, as opposed to one of the other durations (static duration, thread duration, and allocated duration).
In a struct, you are explicitly defining the memory layout of some object. But in a function, the compiler automatically allocates storage in some unspecified manner to your variables. Here's a question: how many bytes does x take up on the stack?
// sizeof(unsigned) == 4
unsigned x;
It could take up 4 bytes, or it could take up 8, or 12, or 0, or it could get placed in three different registers at the same time, or the stack and a register, or it could get four places on the stack.
The point is that the compiler is doing the allocation for you. Since you are not doing the layout of the stack, you should not specify the bit widths.
Extended discussion: Bitfields are actually a bit special. The spec states that adjacent bitfields get packed into the same storage unit. Bitfields are not actually objects.
You cannot sizeof() a bit field.
You cannot malloc() a bit field.
You cannot &addressof a bit field.
All of these things you can do with objects in C, but not with bitfields. Bitfields are a special thing made just for structures and nowhere else.
About int24_t (updated): It works on some architectures, but not others. It is not even slightly portable.
typedef struct {
int x : 24 __attribute__((packed));
} int24_t;
On Linux ELF/x64, OS X/x86, OS X/x64, sizeof(int24_t) == 3. But on OS X/PowerPC, sizeof(int24_t) == 4.
Note the code GCC generates for loading int24_t is basically equivalent to this:
int result = (((char *) ptr)[0] << 16) |
(((unsigned char *) ptr)[1] << 8) |
((unsigned char *)ptr)[2];
It's 9 instructions on x64, just to load a single value.

Members of a structure or union have relationships between their storage location. A compiler cannot reorder or pack them in clever ways to save space due to strict constraints on the layout; basically the only freedom a compiler has in laying out structures is the freedom to add extra padding beyond the amount that's needed for alignment. Bitfields allow you to manually give the compiler more freedom to pack information tightly by promising that (1) you don't need the address of these members, and (2) you don't need to store values outside a certain limited range.
If you're talking about individual variables rather than structure members, in the abstract machine they have no relationship between their storage locations. If they're local automatic variables in a function and their addresses are never taken, the compiler is free to keep them in registers or pack them in memory however it likes. There would be little or no benefit to providing such hints to the compiler manually.

Because it's not meaningful. Bitfield declarations are used to share and reorganize bits between fields of a struct. If you have no members, just a single variable, that is of constant size (which is implementation-defined), For example, it's a contradiction to declare a char, which is almost certainly 8 bits wide, as a one or twelwe bit variable.

If one has a struct QBLOB which contains combines four 2-bit bitfields into a single byte, every time that struct is used will represent a savings of three bytes as compared with a struct that simply contained four fields of type unsigned char. If one declares an array QBLOB myArray[1000000], such an array will take only 1,000,000 bytes; if QBLOB had been a struct with four unsigned char fields, it would have needed 3,000,000 bytes more. Thus, the ability to use bitfields may represent a big memory savings.
By contrast, on most architectures, declaring a simple variable to be of an optimally-sized bitfield type could save at most 15 bits as compared with declaring it to be the smallest suitable standard integral type. Since accessing bitfields generally requires more code than accessing variables of standard integral types, there are few cases where declaring individual variables as bit fields would offer any advantage.
There is one notable exception to this principle, though: some architectures include features which can set, clear, and test individual bits even more efficiently than they can read and write bytes. Compilers for some such architectures include a bit type, and will pack eight variables of that type into each byte of of storage. Such variables are often restricted to static or global scope, since the specialized instructions that handle them may be restricted to using certain areas of memory (the linker can ensure any such variables get placed where they have to go).

All objects must occupy one or more contiguous bytes or words, but a bitfield is not an object; it's simply a user-friendly way of masking out bits in a word. The struct containing the bitfield must occupy a whole number of bytes or words; the compiler just adds the necessary padding in case the bitfield sizes don't add up to a full word.
There's no technical reason why you couldn't extend C syntax to define bitfields outside of a struct (AFAIK), but they'd be of questionable utility for the amount of work involved.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight