Size of a struct - c

As my known, the size of the structure depends upon what compiler is used and the compiler may add padding for alignment requirements.
On a 64-bit system. I have test for 2 examples:
Example 1:
struct
{
uint8 a;
uint32 b;
uint8 c;
}ABC;
size of(uint8 a) == 1 bytes + 3 bytes padding
size of(uint32 b) == 4 bytes + 0 padding
size of(uint8 c) == 1 bytes + 3 padding
==> So, size of(ABC) = 12 bytes.
Example 2:
struct
{
uint8 a;
uint16 b;
uint8 c;
}ABC;
size of(uint8 a) == 1 bytes + 1 bytes padding
size of(uint16 b) == 2 bytes + 0 padding
size of(uint8 c) == 1 bytes + 3 padding
==> So, I assumed size of(ABC) = 8 bytes.
However, the compiler return size of(ABC) = 6 bytes.
Why does the size of(ABC) = 6 bytes in Example 2 instead of 8 bytes as my understanding?

The compiler tries to align objects of structure types such a way that the data member with strictest alignment would be appropriately aligned.
In this structure declaration
struct
{
uint8 a;
uint16 b;
uint8 c;
}ABC;
the data member with the strictest alignment is the data member b. Its address should be aligned by two bytes. So the data member a is padded by one byte. To make the address of an object of the structure to be aligned by 2 bytes the data member c is also padded with one byte.

The compiler may add padding for alignment requirements. Note that this applies not only to padding between the fields of a struct, but also may apply to the end of the struct (so that arrays of the structure type will have each element properly aligned).
For example:
struct foo_t {
int x;
char c;
};
Even though the c field doesn't need padding, the struct will generally have a sizeof(struct foo_t) == 8 (on a 32-bit system - rather a system with a 32-bit int type) because there will need to be 3 bytes of padding after the c field.
Note that the padding might not be required by the system (like x86 or Cortex M3) but compilers might still add it for performance reasons.

size of(uint8 c) == 1 bytes + 3 padding`
==> So, I assumed size of(ABC) = 8 bytes.
There is no reason to add three bytes of padding. Because the alignment requirement of uint8 (presumably really uint8_t or equivalent) is one byte and the alignment requirement of uint16 (presumably really uint16_t or equivalent) is two bytes, the alignment requirement of the full structure is the maximum of those, two bytes. Having used one byte for uint8 a, a byte for padding to make uint16 b aligned, two bytes for uint16 b, and a byte for uint8 c, the size of the structure up to that point is five bytes. Then only one more byte is needed to make it a multiple of its alignment requirement, so the total is six bytes.
The rules typically used to layout a structure are:
Each member in the structure has some size s and some alignment requirement a.
The compiler starts with a size S set to zero and an alignment requirement A set to one (byte).
The compiler processes each member in the structure in order:
Consider the member’s alignment requirement a. If S is not currently a multiple of a, then add just enough bytes to S so that it is a multiple of a. This determines where the member will go; it will go at offset S from the beginning of the structure (for the current value of S).
Set A to the least common multiple1 of A and a.
Add s to S, to set aside space for the member.
When the above process is done for each member, consider the structure’s alignment requirement A. If S is not currently a multiple of A, then add just enough to S so that it is a multiple of A.
The size of the structure is the value of S when the above is done.
Additionally:
If any member is an array, its size is the number of elements multiplied by the size of each element, and its alignment requirement is the alignment requirement of an element.
If any member is a structure, its size and alignment requirement are calculated as above.
If any member is a union, its size is the size of its largest member plus just enough to make it a multiple of the least common multiple1 of the alignments of all the members.
For elementary types (int, double, et cetera), the alignment requirements are implementation-defined and are usually largely determined by the hardware. On many processors, it is faster to load and store data when it has a certain alignment (usually when its address in memory is a multiple of its size). Beyond this, the rules above follow largely from logic; they put each member where it must be to satisfy alignment requirements without using more space than necessary.
Footnote
1 I have worded this for a general case as using the least common multiple of alignment requirements. However, since alignment requirements are always powers of two, the least common multiple of any set of alignment requirements is the largest of them.

Related

When do structures not have padding? [duplicate]

This question already has answers here:
Size of struct containing double field
(5 answers)
Why padding are added, if char comes after int?
(4 answers)
Closed 4 years ago.
sizeof(x) returns 2 for the structure below
struct s {
short c;
} x;
but for the structure
struct s {
short c;
char a;
} x;
sizeof(x) returns 4, Why?
The second one gets one padding byte (assuming short is 2 bytes long and char 1 byte long). Shouldn't the first structure have 2 padding bytes then (and thus be 4 bytes long)?
The predominant use of padding is to align structure members as required by the hardware (or other aspects of the C implementation). An algorithm for laying out data in a struct is in this answer.
To answer the question in your title, when do structures not have padding: A structure does not require padding for alignment if each member’s alignment requirement is a divisor of the total size of all preceding members and of the total size of all members. (A C implementation may still add padding for reasons other than alignment, but that is a bit unusual.)
For your examples, let’s suppose, in a C implementation, short is two bytes in size and requires two-byte alignment. By definition, char is one byte and requires one-byte alignment.
Then, in struct s {short c;}:
c is put at the beginning of the struct. There is never any padding at the beginning.
If we make an array of these struct, the next struct s will begin two bytes beyond the first, and its member c will still be at a multiple of two bytes, so it is aligned correctly.
Therefore, we do not need any padding to make this work.
In contrast, in struct s {short c; char a;}:
c is put at the beginning.
a is put two bytes after c. This is fine, since a only requires one-byte alignment.
If we do not add any padding, the size of the struct is three bytes. Then, if we make an array of these struct, the next struct s will begin three bytes from the start.
In that second struct s, the c member will be at an offset of three bytes. That violates the alignment requirement for short.
Therefore, to make this struct work, we must add one byte of padding. This makes the total size four bytes. Then, in an array of these struct, all the members will be at boundaries required by their alignment.
Even if you declare just a single object of a structure, as in struct s {short c; char a;} x;, a structure is always laid out so it can be used in an array.
The first structure has one element of size 2 (assuming short has size 2 on your system). It is as good as directly having an array of short directly.
The second structure is a special thing: access to short variables is best done on even addresses. If we hadn't padding, we had the following:
struct s arr[5]; // an array
void * a = arr; // needed to reference it
Then,
arr[0].c is at a.
arr[0].a is at a + 2 bytes.
arr[1].c is at a + 3 bytes (!).
arr[1].a is at a + 5 bytes (!).
As it is preferrable to have arr[1].c at an even address, we add padding. Then,
arr[1].c is at a + 4 bytes.
arr[1].a is at a + 6 bytes.

Size of the structure assuming that padding is enabled

Assume that i have a structure which is defined as shown below:
typedef struct
{
char a;
int b;
char c;
}abc_t;
Now as per the rules of padding, character variable could start at any address since it has only a single byte while intger variable should start at an address which is divisible by 4 while short variable should start at any even address.
in that case if we assume that character variable starts at OFFSET 0.
struct
{
char a; // OFFSET 0+3 bytes padding
int b; // OFFSET 4
char c; //OFFSET 5+3 bytes padding
}abc_t;
Here the total size of structure would become 12.
But my doubt is if the first element of a structure which is 'char a' here starts at an offset 1 , then based on the rules of padding, we would have only 2 bytes padded after a and hence the size of structure would be 8 bytes.
struct
{
char a;//OFFSET 1+2 bytes
int b;//OFFSET 4
char c;//OFFSET 8
}abc_t;
Same would be the case of any structure variable which would start with short variables.
Can you please tell me if my understanding regarding this is correct or can we safely assume that first member of any structure would always start at an address which is divisible by 4?
Thanks a lot in advance.
There are a few issues here:
1) Size of int is not necessary 4 bytes, its sizeof(int) which is compiler defined. Can be 4 bytes, 8 bytes or even 2 bytes depending on word length. See Is the size of C "int" 2 bytes or 4 bytes?.
2) Field alignment in a struct is up to the compiler. The order in which you put the fields doesn't guarantee tighter packing. The order of fields is preserved, however. See Why isn't sizeof for a struct equal to the sum of sizeof of each member?.
3) Putting shorter structs together would USUALLY cause tighter packing.
4) Structs begin at word boundaries. There is never any padding before the first field. So yes, the address would be at the beginning of a word. Why use address of first element of struct, rather than struct itself?.
Hope this helps.

Why should a struct's size reflect its alignment?

According to Wikipedia:
the last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member
In my understanding, it means that in the following:
struct A {
char *p; // 8 bytes
char c; // 1 byte
};
struct B {
struct A a; // 16 bytes
char d; // 1 bytes
};
Struct A will have a size of 16 bytes, and struct B will have a size of 24 bytes.
The common explanation is that arrays of A should have their elements accessible at the address of the array plus the index times the size of A.
But I fail to see why that is the case. Why could we not say A has size 9 and B has size 10 (both with 8 bytes alignment), and use a special "array-storage" size when indexing into an array?
Of course, we'd still store those types in arrays in a way compatible with their alignment (using 16 bytes to store each B element). Then, we'd simply compute element addresses by taking into account their alignment, instead of considering their size alone (the compiler can do that statically).
For example, we could store 64 objects in a 1Kb bytes array of B's, instead of only 42.
In each translation unit of C, sizeof(T) is the same, regardless of the context of T. Your proposal would introduce at least two values for sizeof(T): one for arrays of T and a different one for individual objects of T. This basically introduces context-dependence into the sizeof operator. It is incompatible with how C handles pointers, arrays, and addresses of objects.
Consider the following:
void zero_A(struct A *a) { memset(a,0,sizeof(*a)); }
/* ... */
struct A single;
struct A several[3];
struct B b;
b.d = 3;
zero_A(&b.a);
zero_A(&single);
zero_A(several+1);
Under your proposal, zero_A would have to know whether the pointer it was passed pointed to struct A in an array context (where sizeof(*a) == 16) or struct A outside of an array context (where sizeof(*a) == 9). Standard C doesn't support this. If the compiler guessed wrong, or the information was lost (eg: in a round-trip through a volatile struct A *), then zero_A(&single) would invoke undefined behavior (by writing past the bounds of single), and zero_A(&b.a) would overwrite b.d and also invoke undefined behavior.
Tightly packing structs into an array is a relatively uncommon requirement, and adding context-dependence to sizeof would introduce a lot of complications to the language, its libraries, and ABIs. There are times you need to do this, and C gives you the tools you need: memcpy and unions.

Why padding are added, if char comes after int?

For example, there is a structure
struct A
{
char a;
int i;
};
In this case, we have a[1 byte] + padding[3 byte] + int[4 byte] = 8.
Now let's make little update into struct above,
struct A
{
int i;
char a;
};
In this case char comes after int and no need to add padding bytes, it means sizeof(A) = 5 byte, but in this case I also get the 8 byte result. Why ?
Ok, and what about this case
struct s
{
int b;
double c;
char a;
};
According logic given below, there is a: size = b[4 bytes] + padding[4 bytes] + c[8] + a[1] + padding[7 bytes to align with double] = 24,
but after execution I get 16. How this is possible ?
In this case char comes after int and no need to add padding bytes, it means sizeof(A) = 5 byte, but in this case I also get the 8 byte result. Why ?
First you need to understand why padding is needed?
Wiki says that:
Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system) or larger. Data alignment means putting the data at a memory offset equal to some multiple of the word size, which increases the system's performance due to the way the CPU handles memory. To align the data, it may be necessary to insert some meaningless bytes between the end of the last data structure and the start of the next, which is data structure padding.
To make the size multiple of 4 (alignment of int) , the second snippet will be padded with 3 bytes. After compilation the second snippet will be padded for proper alignment as
struct A
{
int i;
char a;
char Padding[3]; // 3 bytes to make total size of the structure 8 bytes
};
EDIT: Always remember these two golden rules of structure padding:
Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure.
The last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member.
In case of
struct s
{
int b;
double c;
char a;
};
alignment will take place as
struct s
{
int b; // 4 bytes. b is followed by a member with larger alignment.
char Padding1[4]; // 4 bytes of padding is needed
double c; // 8 bytes
char d; // 1 byte. Last member of struct.
char Padding2[7]; // 7 bytes to make total size of the structure 24 bytes
};
Also note that by changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment. This can be done by if members are sorted by descending alignment requirements.
struct s
{
double c; // 8 bytes
int b; // 4 bytes
char a; // 1 byte. Only last member will be padded to give structure of size 16
};
The reason the compiler have to add padding at the end of your struct is that the struct can be part of an array, and each element of an array must be properly aligned.
It seems your platform wants an int to be aligned to 4 bytes.
If you declare an array of your struct A:
struct A array[2];
Then the first int member of array[1] should also have an alignment of 4 bytes. So the compiler pads your struct A to be 8 bytes to accomplish that, whilst if it didn't add any padding and sizeof(struct A) were 5 bytes, array[1] would not be properly aligned.
(Keep in mind that a compiler can't insert padding inbetween array elements, padding have to be part of the array elements themselves since sizeof array must be the same as sizeof(struct A) * 2 in the above case)
Not only each member of the struct has to be data aligned, but the struct itself has to aligned to the size of the largest member in the struct. So, padding is added to struct A such that its size should be a multiple of the larger of sizeof i and sizeof a.
Have a look at C FAQ here
If one is going to have an array of structures, all elements within the array must have the same size and alignment; this would imply that for things in an array the size must be a multiple of alignment. The only time it would be useful to have a structure whose size was not a multiple of alignment would be if it was not incorporated directly into another array, but was instead used as part of another structure. That kind of situation does occur sometimes, but not sufficiently often as to merit special attention in the language design.

size of C structure

struct st1{
int a:1; int b:3; int c:6; int d:3;
}s1;
struct st2{
char a:3;
}s2;
int main(){
printf("%d : %d",sizeof(s1),sizeof(s2));
getchar();
}
I am getting the output as 2 : 1
will you please tell me, how this program works and whats the use of : operator (a:1) here.
Thank you
The : defines a bit-field.
In your example, objects of type struct st1 use 13 bits in some arrangement chosen by the compiler.
The particular arrangement chosen when you compiled the code originated an object that occupies 2 bytes. The 13 bits are not necessarily the first (or last) in those bytes.
The other struct type (struct st2) occupies (3 bits out of) 1 byte.
The : operator used there specifies sizes in bits of the fields contained there. sizeof() return byte boundary length, so for the first, 13 bits (2 bytes), and for the second, 1 byte.
There's at least two things worth noting here:
Every object must be addressable, which means it will at least occupy the size of one char.
The implementation is free to add padding for alignment or other issues as it sees fit. Iow, a struct containing two ints is not guaranteed to be equal in size to sizeof(int)*2.

Resources