Size of C Structure containing char - c

I have defined 2 structures in C Language, both contains 2 ints and 1 char. When I print size of both of them, its giving results I can't justify. This is my code :
#include<stdio.h>
struct sample
{
int a:6;
int b:12;
char s;
} st;
struct name
{
int a:28;
int b:12;
char ch;
} st1;
int main()
{
int i=sizeof(st);
printf("st : %d\n\n",i);
i=sizeof(st1);
printf("st1 : %d\n\n",i);
}
output is :
st : 4
st1 : 8
How the size of st is 4 byte and st1 is 8 bytes?
I found this similar question but the values I am getting are still not justified. According to this question, my structure st should be taking 3 bytes and st1 should be taking 6 bytes.

A typical compiler will determine structure alignment requirements from the most strict alignment requirement among the struct members. Bit-fields are typically treated as ordinary fields in that process.
That means since your first structure contains a member of type int (which, apparently, has alignment requirement of 4), the entire structure will have the alignment requirement of 4 and its size will always be divisible by 4. 4 is the minimum size it can possibly have. Since you used only 18 bits for the bit-fields (3 bytes at least), the compiler managed to pack everything into 4 bytes. But it can't make it smaller than 4.
(Note, BTW, that just by counting the bits required gives us the total of 26. How on Earth you expected this to fit into 3 bytes, even ignoring any alignment considerations, is not clear to me.)
In your second structure you used 40 bits for the bit-fields (which involves 5 bytes already). Now the compiler is unable to pack everything into 4 bytes, so it uses 8 bytes. Again, the struct size is required to be divisible by 4, meaning that if 4 is not enough, then next smallest size is 8.
If you want to override that alignment behavior, you have to use your compiler-specific features to force the 1-byte alignment to all data types (#pragma pack etc.). That way you might be able to reduce the size of your second struct. But what you observe now is perfectly expected under the default alignment settings.

Related

Understanding memory alignment constraints and padding bytes in C

I have following code snippet.
#include<stdio.h>
int main(){
typedef struct{
int a;
int b;
int c;
char ch1;
int d;
} str;
printf("Size: %d \n",sizeof(str));
return 0;
}
Which is giving output as follows
Size: 20
I know that size of the structure is greater than the summation of the sizes of components of the structure because of padding added to satisfy memeory alignment constraints.
I want to know how it is decided that how many bytes of padding have to be added. On what does it depend ? Does it depends on CPU architecture ? And does it depends on compiler also ? I am using here 64bit CPU and gcc compiler. How will the output change if these parameters change.
I know there are similar questions on StackOverflow, but they do not explain this memory alignment constraints thoroughly.
It in general depends on the requirements of the architecture. There's loads over here, but it can be summarized as follows:
Storage for the basic C datatypes on an x86 or ARM processor doesn’t
normally start at arbitrary byte addresses in memory. Rather, each
type except char has an alignment requirement; chars can start on any
byte address, but 2-byte shorts must start on an even address, 4-byte
ints or floats must start on an address divisible by 4, and 8-byte
longs or doubles must start on an address divisible by 8. Signed or
unsigned makes no difference.
In your case the following is probably taking place: sizeof(str) = 4 (4 bytes for int) + 4 (4 bytes for int) + 1 ( 1 byte for char) + 7 (3 bytes padding + 4 bytes for int) = 20
The padding is there so that int is at an address that's a multiple of 4 bytes. This requirement comes from the fact that int is 4 bytes long (my assumption regarding the architecture you're using). But this will vary from one architecture to another.
On what does it depend ? Does it depends on CPU architecture ? And does it depends on compiler also ?
CPU, operating system, and compiler at least.
I know that it depends on the CPU architecture, I think you can find some interesting articles that talks about this on the net, wikipedia is not bad in my opinion.
For me I am using a 64 bit linux machine, and what I can say is that, every field is aligned so that it would be on a memory address divisible by its size (for basic types), for example :
int and float are aligned by 4 (must be in a memory adress divisible by 4)
char and bool by 1 ( which means no padding)
double and pointers are aligned by 8
Best way to avoid padding is to put your fields from largest size to smallest (when there is only basic fields)
When there is composed fields, it a little more difficult for me to explain here, but I think you can figure it out yourself in a paper

C - Type Name : Number?

I was wondering what the following code is doing exactly? I know it's something to do with memory alignment but when I ask for the sizeof(vehicle) it prints 20 but the struct's actual size is 22. I just need to understand how this works, thanks!
struct vehicle {
short wheels:8;
short fuelTank : 6;
short weight;
char license[16];
};
printf("\n%d", sizeof(struct vehicle));
20
Memory will be allocated as (assuming memory word size is of 8 bits)
struct vehicle {
short wheels:8; // 1 byte
short fuelTank : 6;
// padd 2 bits to make fuelTank of 1 byte.
short weight; // 2 bytes.
char license[16]; // 16 bytes.
};
1 + 1 + 2 + 16 = 20 bytes.
Consider a machine with a word size of 32bit. The two first fields fit in a whole 16bit word as they occupy 8 + 6 = 14 bits. The second field, while not a bitfield (doesn't have the :<number> thing to allocate space in bits) can fit another 16 bits word to complete a 32 bit word, so the three first fields can pack in a 32bit word (4 bytes) if the architecture allows to access the memory in 16 bit quantities. Finaly, if you add 16 characters to that, this gives the 20 bytes that sizeof operator sends to printf.
Why do you assume the sizeof (struct vehicle) is 22 bytes? You allowed the compiler to print it and it said it's 20. Compilers are free to pad (or not) the structures to achieve better performance. That's an architecture dependency, and as you have not said architecture and compiler used, it is not possible to go further.
For example, 32bit intel arch allows to pad words at even boundaries without performance penalties, so this is a good selection in order to save memory. On other architectures, perhaps it's not allowed to use 16bit integers and data must be padded to fit the third field (leading to 22 bytes for the whole structure)
The only warranty you have when sizing data is that the compiler must allocate enough space to fit everything in an efficient way, so the only thing you can assume from that declaration is that it will occupy at least the minimum space to represent one field of 8 bit, other of 6, a complete short (I'll assume a short is 16 bit) and 16 characters (assuming 8 bits per char) it ammounts to 8 + 6 + 16 + 16*8 = 158 bits minimum.
Suppose we are writing a compiler for D. Knuth MIX machine. As it's stated in his book Fundamental Algorithms, this machine has an unspecified byte size of 64..100 bytes, requiring five to construct one addressable word (plus a binary sign). If you had a byte size independent compiler (one that compiles for any MIX machine, without assumptions of byte size) you have to use no more than 64 possible values per byte, leading to 6 bit per byte. You then would assume the second field fills one complete byte (and the sign drawn from the word it belongs to) and the first field needs two complete bytes (using half of the values for negative values) The third field might be in the second word, filling three complete bytes (6*3 = 18) and the sign of that word. The next 16 chars can begin on the next word, summing up to five complete words, so the whole structure will have 1 + 1 + 4 = 6 words, or 30 bytes. But if you want to handle effectively three signed fields, you'll need three complete words for the three fields (as each has a sign field only) leading to 7 words or 35 bytes.
I have suggested this example because of the particular characteristics of this architecture, that makes one to think on not so uncommon architectures that some time ago where in common use (the first machines ever built where not binary based, like some of these MIX machines)
Note
You can try to print the actual offsets of the fields, to see where in the structure are located and see where the compiler is padding.
#define OFFSET(Typ, field) ((int)&((Typ *)0)->field)
(Note, edited)
This macro will tell you the offset as an int. Use it as OFFSET(struct vehicle, weight) or OFFSET(struct vehicle, license[3])
Note
I had to edit the last macro definition as it complains on some architectures as the conversion of pointer -> int is not always possible (on 64bit architectures, it looses some bits) so it's better to compute the difference of two pointers, which is a proper size_t value, than to convert it directly from pointer.
#define OFFSET(Typ, field) ((char *)&((Typ *)0)->field - (char *)0)

Size of struct containing double field

Firstly, I understand byte padding in structs. But I have still a small test contain a double field in struct and I don't know how to explain this :
typedef struct {
char a;
double b;
}data;
typedef struct{
char a;
int b;
}single;
int main(){
printf("%d\n",sizeof(double));
printf("%d\n",sizeof(single));
printf("%d\n",sizeof(data));
}
Through out this test, the answer is : 8 8 and 16.
Why this result make me thinking ?
By second test, we can see size of word on my machine is 4 bytes.
By first test, we can see size of double is 8 bytes.
So, at the struct data : the result should be 12 bytes : 4 bytes for char and 8 bytes for double.
But, I don't know why the result is 16 bytes. (So strange with me)
Please explain it for me, thanks :)
It's sixteen bytes so that if you have an array of datas, the double values can all be aligned on 8-byte boundaries. Aligning data properly in memory can make a big difference in performance. Misaligned data can be slower to operate on, and slower to fetch and store.
The procedure typically used for laying out data in a struct is essentially this:
Set Offset = 0.
For each member in the struct: Let A be its alignment requirement (e.g., 1, 2, 4, or 8 bytes, possibly more). Add to Offset the number of bytes needed to make it a multiple of A. (Given that A is a power of two, this can be done with Offset += -Offset & A-1, assuming two’s complement for the negation.) Assign the current value of Offset to be the offset of this member. Add the size of the member to Offset.
After processing all members: Let A be the greatest alignment requirement of any member. Add to Offset the number of bytes needed to make it a multiple of A. This final value of Offset is the size of the struct.
As Earnest Friedman-Hill stated, the last step adds padding to the end of the struct so that, in an array of them, each struct begins at the required alignment.
So, for a struct such as struct { char c; double d; int32_t i; }, on a typical implementation, you have:
Set Offset to 0.
char requires alignment of 1, so Offset is already a multiple of 1 (0•1 is 0). Put c at this offset, 0. Add the size of c, 1, to Offset, making it 1.
double requires alignment of 8, so add 7 to Offset, making it 8. Put d at this offset, 8. Add the size of d, 8, to Offset, making it 16.
int requires alignment of 4, so Offset is already a multiple of 4 (4•4 is 16). Put i at this offset, 16. Add the size of i, 4, to Offset, making it 20.
At the end, the largest alignment required was 8, so add 4 to Offset, making it 24. The size of this struct is 24 bytes.
Observe that the above has nothing to do with any word size of the machine. It only uses the alignment requirement of each member. The alignment requirement can be different for each type, and it can be different from the size of the type, and the above will still work.
(The algorithm breaks if alignment requirements are not powers of two. That could be fixed by making the last step increase the offset to be a multiple of the least common multiple of all the alignments.)
What is strange to you (or I'm missing something)?
The logic is the same (the padding is according to the "biggest" primitive field in the struct (I mean - int, double, char, etc.))
As in single, you have
1 (sizeof(char)) + 3 (padding) + 4 (sizeof(int))
it's the same in data:
1 (sizeof(char)) +
7 (padding, it's sizeof(double) - sizeof(char)) +
8 (sizeof(double))
which is 16.
The compiler probably aligns all structure sizes to be a multiple of 8
Alignment is up to the compiler unless you explicitly specify it using compiler specific directives.
Variables aren't necessarily word aligned. Sometimes they're double word aligned for efficieny. In the particular case of floating points, they can be aligned to even higher values so that SSE will work.

sizeof operator returning strange result

language is C with gcc compiler
if i make a struct like so
struct test {
char item_1[2];
char item_3[4];
char item_4[2];
char item_5[2];
char item_6[4];
};
and i do a sizeof(struct test) it returns 14(bytes). which is the expected result.
but if i do
struct test {
int16_t item_1;
int32_t item_3;
int16_t item_4;
int16_t item_5;
int32_t item_6;
};
sizeof(struct test) will return something strange like 44. when i debug using gnu ddd i can see inside the structure and see that it all looks normal and all items have the expected number of bytes.
so why is the sizeof operator returning a unexpected value ?
You have mixed 32/16 bit integers.
int16_t item_1[2]; // 2 * 2 = 4
int32_t item_3[4]; // 4 * 4 = 16
int16_t item_4[2]; // 2 * 2 = 4
int16_t item_5[2]; // 2 * 2 = 4
int32_t item_6[4]; // 4 * 4 = 16
// sum = 44
Compilers may insert padding between struct members, or after the last member. This is normally done to satisfy alignment requirements. For example, an int32_t object might require 4-byte alignment, so the compiler inserts 2 bytes of padding between the first and second members. The details will vary depending on the platform.
compiler is required to insert padding between structure members to align each member on its type's boundary and to order the structure's members as written. If you reorder them such that the largest members are at the beginning of the structure definition, they will be the same size. But since you're trying it with arrays of char, I'm guessing you don't have the structure's original definition, and you're trying to access some externally defined and created object's fields. In that case, I suggest you either obtain the proper headers or use the char[] version and cast char* to whatever type it really is.
The alignment issue is because modern 32-bit CPU's work in 32-bit word sizes on 32-bit address boundaries. If a 32-bit word on a 32-bit boundary contains 2 16-bit values, this results in extra instructions being required to obtain the correct 16-bits (i.e. masking and shifting so that the correct 16-bits are all that remains). If a 32-bit word is split across a 32-bit boundary then there is even more work to do to retrieve it.
By default the trade-off is to have faster programs and not use the memory as efficiently as possible. If an extra couple of instructions are needed everywhere where a structure member is used then this will probably use more memory than tighter packing the structure saves so whilst not obvious at first it's the correct choice.
If you have a requirement to store structures efficiently (at a cost to performance) then "#pragma pack" allows you to tighter pack members, but it results in bigger slower programs.
http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html
http://www.cplusplus.com/forum/general/14659/
struct test {
int16_t item_1[2]; // 2 * 16 +
int32_t item_3[4]; // 4 * 32 +
int16_t item_4[2]; // 2 * 16 +
int16_t item_5[2]; // 2 * 16 +
int32_t item_6[4]; // 4 * 32
// total = 352
};
352 bits divided by 8 (8 bits go into one byte) is 44 bytes.

How the size of this structure comes out to be 4 byte

I do have a structure having bit-fields in it.Its comes out to be 2 bytes according to me but its coming out to be 4 .I have read some question related to this here on stackoverflow but not able to relate to my problem.This is structure i do have
struct s{
char b;
int c:8;
};
int main()
{
printf("sizeof struct s = %d bytes \n",sizeof(struct s));
return 0;
}
if int type has to be on its memory boundary,then output should be 8 bytes but its showing 4 bytes??
Source: http://geeksforgeeks.org/?p=9705
In sum: it is optimizing the packing of bits (that's what bit-fields are meant for) as maximum as possible without compromising on alignment.
A variable’s data alignment deals with the way the data stored in these banks. For example, the natural alignment of int on 32-bit machine is 4 bytes. When a data type is naturally aligned, the CPU fetches it in minimum read cycles.
Similarly, the natural alignment of short int is 2 bytes. It means, a short int can be stored in bank 0 – bank 1 pair or bank 2 – bank 3 pair. A double requires 8 bytes, and occupies two rows in the memory banks. Any misalignment of double will force more than two read cycles to fetch double data.
Note that a double variable will be allocated on 8 byte boundary on 32 bit machine and requires two memory read cycles. On a 64 bit machine, based on number of banks, double variable will be allocated on 8 byte boundary and requires only one memory read cycle.
So the compiler will introduce alignment requirement to every structure. It will be as that of the largest member of the structure. If you remove char from your struct, you will still get 4 bytes.
In your struct, char is 1 byte aligned. It is followed by an int bit-field, which is 4 byte aligned for integers, but you defined a bit-field.
8 bits = 1 byte. Char can be any byte boundary. So Char + Int:8 = 2 bytes. Well, that's an odd byte boundary so the compiler adds an additional 2 bytes to maintain the 4-byte boundary.
For it to be 8 bytes, you would have to declare an actual int (4 bytes) and a char (1 byte). That's 5 bytes. Well that's another odd byte boundary, so the struct is padded to 8 bytes.
What I have commonly done in the past to control the padding is to place fillers in between my struct to always maintain the 4 byte boundary. So if I have a struct like this:
struct s {
int id;
char b;
};
I am going to insert allocation as follows:
struct d {
int id;
char b;
char temp[3];
}
That would give me a struct with a size of 4 bytes + 1 byte + 3 bytes = 8 bytes! This way I can ensure that my struct is padded the way I want it, especially if I transmit it somewhere over the network. Also, if I ever change my implementation (such as if I were to maybe save this struct into a binary file, the fillers were there from the beginning and so as long as I maintain my initial structure, all is well!)
Finally, you can read this post on C Structure size with bit-fields for more explanation.
int c:8; means that you are declaring a bit-field with the size of 8 bits. Since the alignemt on 32 bit systems is normally 4 bytes (=32 bits) your object will appear to have 4 bytes instead of 2 bytes (char + 8 bit).
But if you specify that c should occupy 8 bits, it's not really an int, is it? The size of c + b is 2 bytes, but your compiler pads the struct to 4 bytes.
They alignment of fields in a struct is compiler/platform dependent.
Maybe your compiler uses 16-bit integers for bitfields less than or equal to 16 bits in length, maybe it never aligns structs on anything smaller than a 4-byte boundary.
Bottom line: If you want to know how struct fields are aligned you should read the documentation for the compiler and platform you are using.
In generic, platform-independent C, you can never know the size of a struct/union nor the size of a bit-field. The compiler is free to add as many padding bytes as it likes anywhere inside the struct/union/bit-field, except at the very first memory location.
In addition, the compiler is also free to add any number of padding bits to a bit-field, and may put them anywhere it likes, because which bit is msb and lsb is not defined by C.
When it comes to bit-fields, you are left out in the cold by the C language, there is no standard for them. You must read compiler documentation in detail to know how they will behave on your specific platform, and they are completely non-portable.
The sensible solution is to never ever use bit fields, they are a reduntant feature of C. Instead, use bit-wise operators. Bit-wise operators and in-depth documented bit-fields will generate the same machine code (non-documented bit-fields are free to result in quite arbitrary code). But bit-wise operators are guaranteed to work the same on any compiler/system in the world.

Resources