How to calculate size of structure with bit field? - c

#include <stdio.h>
struct test {
unsigned int x;
long int y : 33;
unsigned int z;
};
int main()
{
struct test t;
printf("%d", sizeof(t));
return 0;
}
I am getting the output as 24. How does it equate to that?

As your implementation accepts long int y : 33; a long int hase more than 32 bits on your system, so I shall assume 64.
If plain int are also 64 bits, the result of 24 is normal.
If they are only 32 bits, you have encountered padding and alignment. For performance reasons, 64 bits types on 64 bits systems are aligned on a 64 bits boundary. So you have:
4 bytes for the first int
4 padding bytes to have a 8 bytes boundary
8 bytes for the container of the bit field
4 bytes for the second int
4 padding bytes to allow proper alignment of arrays
Total: 24 bytes

Related

Bitfields and alignment

Trying to pack data into a packet. This packet should be 64 bits. I have this:
typedef union {
uint64_t raw;
struct {
unsigned int magic : 8;
unsigned int parity : 1;
unsigned int stype : 8;
unsigned int sid : 8;
unsigned int mlength : 31;
unsigned int message : 8;
} spacket;
} packet_t;
But it seems that alignment is not guaranteed. Because when I run this:
#include <strings.h>
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
const char *number_to_binary(uint64_t x)
{
static char b[65];
b[64] = '\0';
uint64_t z;
int w = 0;
for (z = 1; w < 64; z <<= 1, ++w)
{
b[w] = ((x & z) == z) ? '1' : '0';
}
return b;
}
int main(void)
{
packet_t ipacket;
bzero(&ipacket, sizeof(packet_t));
ipacket.spacket.magic = 255;
printf("%s\n", number_to_binary(ipacket.raw));
ipacket.spacket.parity = 1;
printf("%s\n", number_to_binary(ipacket.raw));
ipacket.spacket.stype = 255;
printf("%s\n", number_to_binary(ipacket.raw));
ipacket.spacket.sid = 255;
printf("%s\n", number_to_binary(ipacket.raw));
ipacket.spacket.mlength = 2147483647;
printf("%s\n", number_to_binary(ipacket.raw));
ipacket.spacket.message = 255;
printf("%s\n", number_to_binary(ipacket.raw));
}
I get (big endian):
1111111100000000000000000000000000000000000000000000000000000000
1111111110000000000000000000000000000000000000000000000000000000
1111111111111111100000000000000000000000000000000000000000000000
1111111111111111111111111000000000000000000000000000000000000000
1111111111111111111111111000000011111111111111111111111111111110
1111111111111111111111111000000011111111111111111111111111111110
My .mlength field is lost somewhere on the right part although it should be right next to the .sid field.
This page confirms it: Alignment of the allocation unit that holds a bit field is unspecified. But if this is the case, how do people are packing data into bit fields which is their purpose in the first place?
24 bits seems to be the maximum size the .mlength field is able to take before the .message field is kicked out.
Almost everything about the layout of bit-fields is implementation-defined in the standard, as you'd find from numerous other questions on the subject on SO. (Amongst others, you could look at Questions about bitfields and especially Bit field's memory management in C).
If you want your bit fields to be packed into 64 bits, you'll have to trust that your compiler allows you to use 64-bit types for the fields, and then use:
typedef union {
uint64_t raw;
struct {
uint64_t magic : 8;
uint64_t parity : 1;
uint64_t stype : 8;
uint64_t sid : 8;
uint64_t mlength : 31;
uint64_t message : 8;
} spacket;
} packet_t;
As originally written, under one plausible (common) scheme, your bit fields would be split into new 32-bit words when there isn't space enough left in the current one. That is, magic, parity, stype and sid would occupy 25 bits; there isn't enough room left in a 32-bit unsigned int to hold another 31 bits, so mlength is stored in the next unsigned int, and there isn't enough space left over in that unit to store message so that is stored in the third unsigned int unit. That would give you a structure occupying 3 * sizeof(unsigned int) or 12 bytes — and the union would occupy 16 bytes because of the alignment requirements on uint64_t.
Note that the standard does not guarantee that what I show will work. However, under many compilers, it probably will work. (Specifically, it works with GCC 5.3.0 on Mac OS X 10.11.4.)
Depending on your architecture and/or compiler your data will be aligned to different sizes. From your observations I would guess that you are seeing the consequences of 32 bit aligning. If you take a look at the sizeof your union and that is more than 8 bytes (64 bits) data has been padded for alignment.
With 32 bit alignment mlength and message will only be able to stay next to each other if they sum up to less than or equal 32 bits. This is probably what you see with your 24 bit limit.
If you want your struct to only take 64 bits with 32 bit alignment you will have to rearrange it a little bit. The single bit parity should be next to the 31 bit mlength and your 4 8 bit variables should be grouped together.

What's the value of CMSG_ALIGN macro

Is it right to say the value of CMSG_ALIGN(i) is the minimum value of multiple of 8 and >=i if i is an unsigned int variable?
#include <stdio.h>
int main(void)
{
int i;
for (i=0; i<50; ++i) {
printf("%d\n", CMSG_ALIGN(i));
}
}
Output I get:
/* i CMSG_ALIGN(i)
*
* 0 0
* [1,8] 8
* [9,16] 16
*/
Is it right to say the value of CMSG_ALIGN(i) is the minimum value of
multiple of 8 and >=i if i is an unsigned int variable?
No. The alignment for a given value is not necessarily 8 on all platforms. If it were to be 8 always, CMSG_ALIGN() wouldn't be necessary at all.
You are probably on a 64-bit system. So it's aligned on 8 byte boundary. But if you run the same code on a 32-bit platform, you would probably see that it's a 4 byte alignment.
Note that CMCG_ALIGN() returns size_t. So %zu is the correct format string to print a size_t value.

Extract chars from array as UINT32

So I have a buffer filled with bytes that I know should be at least 16 bytes long.
I dont care about bytes 0 - 11.
I know that the 4 bytes from 12 to 15 represent a 32 bit number.
How can I just extract these bytes and represent them as a 32 bit number.
You can convert each byte to an 8-bit unsigned number, and you can combine these numbers to one 32-bit number using bit operations:
uint32_t result = 0;
for (int i = 12; i < 16; i++) {
result <<= 8;
result |= (uint8_t)bytes[i];
}
I have a fixation with unions{}, I can't help it. This might help:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>
uint32_t
convert_int(char bytes[4])
{
union {uint32_t n; char bytes[4];} box;
memcpy(&box.bytes[0], bytes, sizeof(*bytes));
return ntohl(box.n);
}
int
main(void)
{
uint32_t number;
number = convert_int("\x0A\x00\x00\x00");
printf("%d\n", number);
return 0;
}
convert_int() accepts 4 bytes in network byte order (Big endian) (most significant byte first) and translate it to a 32 bits integer; host byte order (either Big or little endian). Since you have control over the buffer, you can place the argument as needed.

Size of structure with bit fields

Here I have a code snippet.
#include <stdio.h>
int main()
{
struct value
{
int bit1 : 1;
int bit2 : 4;
int bit3 : 4;
} bit;
printf("%d",sizeof(bit));
return 0;
}
I'm getting the output as 4 (32 bit compiler).
Can anyone explain me how? Why is it not 1+ 4 + 4 = 9?
I've never worked with bit fields before so would love some help. Thank you. :)
When you tell the C compiler this:
int bit1 : 1
It interprets it as, and allocates to it, an integer; but refers to it's first bit as bit1.
So if we consider your code:
struct value
{
int bit1 : 1;
int bit2 : 4;
int bit3 : 4;
} bit;
What you are telling the compiler is this: Take necessary number of the ints, and refer to the chunks bit 1 as bit1, then refer to bits 2 - 5 as bit2, and then refer to bits 6 - 9 as bit3.
Since the complete number of bits required are 9, and an int is 32 bits (in your computer's architecture), memory space of only 1 int is required. Thus you get the size as 4 (bytes).
Instead, if you were to define the struct using chars, since char is 8 bits, the compiler would allocate the memory space of two chars for each struct value. And you will get 2 (bytes) as your output.
Because C requests to pack the bits in the same unit (here one signed int / unsigned int):
(C99, 6.7.2.1p10) "If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit"
The processor just likes chucking around 32 bits in one go - not 9, 34 etc.
It just rounds it up to what the processor likes. (Keep the worker happy)

struct representation in memory on 64 bit machine

For my curiosity I have written a program which was to show each byte of my struct. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <limits.h>
#define MAX_INT 2147483647
#define MAX_LONG 9223372036854775807
typedef struct _serialize_test{
char a;
unsigned int b;
char ab;
unsigned long long int c;
}serialize_test_t;
int main(int argc, char**argv){
serialize_test_t *t;
t = malloc(sizeof(serialize_test_t));
t->a = 'A';
t->ab = 'N';
t->b = MAX_INT;
t->c = MAX_LONG;
printf("%x %x %x %x %d %d\n", t->a, t->b, t->ab, t->c, sizeof(serialize_test_t), sizeof(unsigned long long int));
char *ptr = (char *)t;
int i;
for (i=0; i < sizeof(serialize_test_t) - 1; i++){
printf("%x = %x\n", ptr + i, *(ptr + i));
}
return 0;
}
and here is the output:
41 7fffffff 4e ffffffff 24 8
26b2010 = 41
26b2011 = 0
26b2012 = 0
26b2013 = 0
26b2014 = ffffffff
26b2015 = ffffffff
26b2016 = ffffffff
26b2017 = 7f
26b2018 = 4e
26b2019 = 0
26b201a = 0
26b201b = 0
26b201c = 0
26b201d = 0
26b201e = 0
26b201f = 0
26b2020 = ffffffff
26b2021 = ffffffff
26b2022 = ffffffff
26b2023 = ffffffff
26b2024 = ffffffff
26b2025 = ffffffff
26b2026 = ffffffff
And here is the question:
if sizeof(long long int) is 8, then why sizeof(serialize_test_t) is 24 instead of 32 - I always thought that size of struct is rounded to largest type and multiplied by number of fields like here for example: 8(bytes)*4(fields) = 32(bytes) — by default, with no pragma pack directives?
Also when I cast that struct to char * I can see from the output that the offset between values in memory is not 8 bytes. Can you give me a clue? Or maybe this is just some compiler optimizations?
On modern 32-bit machines like the SPARC or the Intel [34]86, or any Motorola chip from the 68020 up, each data iten must usually be ``self-aligned'', beginning on an address that is a multiple of its type size. Thus, 32-bit types must begin on a 32-bit boundary, 16-bit types on a 16-bit boundary, 8-bit types may begin anywhere, struct/array/union types have the alignment of their most restrictive member.
The total size of the structure will depend on the packing.In your case it's going as 8 byte so final structure will look like
typedef struct _serialize_test{
char a;//size 1 byte
padding for 3 Byte;
unsigned int b;//size 4 Byte
char ab;//size 1 Byte again
padding of 7 byte;
unsigned long long int c;//size 8 byte
}serialize_test_t;
in this manner first two and last two are aligned properly and total size reaches upto 24.
Depends on the alignment chosen by your compiler. However, you can reasonably expect the following defaults:
typedef struct _serialize_test{
char a; // Requires 1-byte alignment
unsigned int b; // Requires 4-byte alignment
char ab; // Requires 1-byte alignment
unsigned long long int c; // Requires 4- or 8-byte alignment, depending on native register size
}serialize_test_t;
Given the above requirements, the first field will be at offset zero.
Field b will start at offset 4 (after 3 bytes padding).
The next field starts at offset 8 (no padding required).
The next field starts at offset 12 (32-bit) or 16 (64-bit) (after another 3 or 7 bytes padding).
This gives you a total size of 20 or 24, depending on the alignment requirements for long long on your platform.
GCC has an offsetof function that you can use to identify the offset of any particular member, or you can define one yourself:
// modulo errors in parentheses...
#define offsetof(TYPE,MEMBER) (int)((char *)&((TYPE *)0)->MEMBER - (char *)((TYPE *)0))
Which basically calculates the offset using the difference in address using an imaginary base address for the aggregate type.
The padding is generally added so that the struct is a multiple of the word size (in this case 8)
So the first 2 fields are in one 8 byte chunk. The third field is in another 8 byte chunk and the last is in one 8 byte chunk. For a total of 24 bytes.
char
padding
padding
padding
unsigned int
unsigned int
unsigned int
unsigned int
char // Word Boundary
padding
padding
padding
padding
padding
padding
padding
unsigned long long int // Word Boundary
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
Has to do with alignment.
The size of the struct is not rounded to the largest type and multiplied by the fields. The bytes are aligned each by their respective types:
http://en.wikipedia.org/wiki/Data_structure_alignment#Architectures
Alignment works in that the type must appear in a memory address that is a multiple of its size, so:
Char is 1 byte aligned, so it can appear anywhere in memory that is a multiple of 1 (anywhere).
The unsigned int is needs to start at an address that is a multiple of 4.
The char can be anywhere.
and then the long long needs to be in a multiple of 8.
If you take a look at the addresses, this is the case.
The compiler is only concerned about the individual alignment of the struct members, one by one. It does not think about the struct as whole. Because on the binary level a struct does not exist, just a chunk of individual variables allocated at a certain address offset. There's no such thing as "struct round-up", the compiler couldn't care less about how large struct is, as long as all struct members are properly aligned.
The C standard says nothing about the manner of padding, apart from that a compiler is not allowed to add padding bytes at the very beginning of the struct. Apart from that, the compiler is free to add any number of padding bytes anywhere in the struct. It could 999 padding bytes and it would still conform to the standard.
So the compiler goes through the struct and sees: here's a char, it needs alignment. In this case, the CPU can probably handle 32 bit accesses, i.e. 4 byte alignment. Because it only adds 3 padding bytes.
Next it spots a 32 bit int, no alignment required, it is left as it is. Then another char, 3 padding bytes, then a 64 bit int, no alignment required.

Resources