Unexpected Behaviour of bit field operator in C - c

void main()
{
struct bitfield
{
unsigned a:5;
unsigned c:5;
unsigned b:6;
}bit;
char *p;
struct bitfield *ptr,bit1={1,3,3};
p=&bit1;
p++;
printf("%d",*p);
}
Explanation:
Binary value of a=1 is 00001 (in 5 bit)
Binary value of b=3 is 00011 (in 5 bit)
Binary value of c=3 is 000011 (in 6 bit)
My question is: In memory how it will represented as?
When I compile it's giving output 12 I am not able to figure out why It's happening: In my view let say memory representation will be in below format:
00001 000011 00011
| |
501 500 (Let Say starting address)
Please Correct me If I am wrong here.

The actual representation is like:
000011 00011 00001
b c a
When aligned as bytes:
00001100 01100001
| |
p+1 p
On the address (p+1) is 0001100 which gives 12.

The C standard does not completely specify how bit-fields are packed into bytes. The details depend on each C implementation.
From C 2011 6.7.2.1:
11 An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

From the C11 standard (6.7.2.1):
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
I know for a fact that GCC and other compilers on unix-like systems order bit fields in the host byte order which can be evidenced from the definition of an IP header from an operating system source I had handy:
struct ip {
#if _BYTE_ORDER == _LITTLE_ENDIAN
u_int ip_hl:4, /* header length */
ip_v:4; /* version */
#endif
#if _BYTE_ORDER == _BIG_ENDIAN
u_int ip_v:4, /* version */
ip_hl:4; /* header length */
#endif
Other compilers might do the same. Since you're most likely on a little endian machine, your bit field will be backwards from what you're expecting (in addition to the words being backwards already). Most likely it looks like this in memory (notice that the order of your fields in the struct in your question is "a, c, b", not "a, b, c", just to make this all more confusing):
01100001 00001100
| |
byte 0 byte 1
| | | |
x a b c
So, all three bit fields can be stuffed in an int. Padding is added automatically and it's at the start of all the bitfields, it is put at byte 2 and 3. Then the b starts at the lowest bit of byte 1. After it c starts in byte 1 two, but we can only fit two bits of it, the two highest bits of c are 0, then c continues in byte 0 (x in my picture above), and then after that you have a.
Notice that the picture is with the lowest address of both the bytes and the bits on the left side growing to the right (this is pretty much standard in literature, your picture had the bits in one direction and bytes in another which makes everything more confusing, especially adding your weird ordering of the fields "a, c, b").
I none of the above made any sense run this program and then read up on byte-ordering:
#include <stdio.h>
int
main(int argc, char **argv)
{
unsigned int i = 0x01020304;
unsigned char *p;
p = (unsigned char *)&i;
printf("0x%x 0x%x 0x%x 0x%x\n", (unsigned int)p[0], (unsigned int)p[1], (unsigned int)p[2], (unsigned int)p[3]);
return 0;
}
Then when you understand what little-endian does to the ordering of bytes in an int, map your bit-field on top of that, but with the fields backwards. Then it might start making sense (I've been doing this for years and it's still confusing as hell).
Another example to show how the bit fields are backwards twice, once because of the compiler deciding to put them backwards on a little-endian machine, and then once again because the byte order of ints:
#include <stdio.h>
int
main(int argc, char **argv)
{
struct bf {
unsigned a:4,b:4,c:4,d:4,e:4,f:4,g:4,h:4;
} bf = { 1, 2, 3, 4, 5, 6, 7, 8 };
unsigned int *i;
unsigned char *p;
p = (unsigned char *)&bf;
i = (unsigned int *)&bf;
printf("0x%x 0x%x 0x%x 0x%x\n", (unsigned int)p[0], (unsigned int)p[1], (unsigned int)p[2], (unsigned int)p[3]);
printf("0x%x\n", *i);
return 0;
}

Related

different between C struct bitfields on char and on int

When using bitfields in C, I found out differences I did not expect related to the actual type that is used to declare the fields.
I didn't find any clear explanation. Now, the problem is identified, so if though there is no clear response, this post may be useful to anyone facing the same issue.
Still if some can point to a formal explanation, this coudl be great.
The following structure, takes 2 bytes in memory.
struct {
char field0 : 1; // 1 bit - bit 0
char field1 : 2; // 2 bits - bits 2 down to 1
char field2 ; // 8 bits - bits 15 down to 8
} reg0;
This one takes 4 bytes in memory, the question is why ?
struct {
int field0 : 1; // 1 bit - bit 0
int field1 : 2; // 2 bits - bits 2 down to 1
char field2 ; // 8 bits - bits 15 down to 8
} reg1;
In both cases, the bits are organized in memory in the same way: field 2 is always taking bits 15 down to 8.
I tried to find some literarure on the subject, but still can't get a clear explanation.
The two most appropriate links I can found are:
http://www.catb.org/esr/structure-packing/
http://www.msg.ucsf.edu/local/programs/IBM_Compilers/C:C++/html/language/ref/clrc03defbitf.htm
However, none really explains really why the second structure is taking 4 bytes. Actually reading carefully, I would even expect the structure to take 2 bytes.
In both cases,
field0 takes 1 bit
field1 takes 2 bits
field2 takes 8 bits, and is aligned on the first available byte address
Hence, the useful data requires 2 bytes in both cases.
So what is behind the scene that makes reg1 to take 4 bytes ?
Full Code Example:
#include "stdio.h"
// Register Structure using char
typedef struct {
// Reg0
struct _reg0_bitfieldsA {
char field0 : 1;
char field1 : 2;
char field2 ;
} reg0;
// Nextreg
char NextReg;
} regfileA_t;
// Register Structure using int
typedef struct {
// Reg1
struct _reg1_bitfieldsB {
int field0 : 1;
int field1 : 2;
char field2 ;
} reg1;
// Reg
char NextReg;
} regfileB_t;
regfileA_t regsA;
regfileB_t regsB;
int main(int argc, char const *argv[])
{
int* ptrA, *ptrB;
printf("sizeof(regsA) == %-0d\n",sizeof(regsA)); // prints 3 - as expected
printf("sizeof(regsB) == %-0d\n",sizeof(regsB)); // prints 8 - why ?
printf("\n");
printf("sizeof(regsA.reg0) == %-0d\n",sizeof(regsA.reg0)); // prints 2 - as epxected
printf("sizeof(regsB.reg0) == %-0d\n",sizeof(regsB.reg1)); // prints 4 - int bit fields tells the struct to use 4 bytes then.
printf("\n");
printf("addrof(regsA.reg0) == 0x%08x\n",(int)(&regsA.reg0)); // 0x0804A028
printf("addrof(regsA.reg1) == 0x%08x\n",(int)(&regsA.NextReg)); // 0x0804A02A = prev + 2
printf("addrof(regsB.reg0) == 0x%08x\n",(int)(&regsB.reg1)); // 0x0804A020
printf("addrof(regsB.reg1) == 0x%08x\n",(int)(&regsB.NextReg)); // 0x0804A024 = prev + 4 - my register is not at the righ place then.
printf("\n");
regsA.reg0.field0 = 1;
regsA.reg0.field1 = 3;
regsA.reg0.field2 = 0xAB;
regsB.reg1.field0 = 1;
regsB.reg1.field1 = 3;
regsB.reg1.field2 = 0xAB;
ptrA = (int*)&regsA;
ptrB = (int*)&regsB;
printf("regsA.reg0.value == 0x%08x\n",(int)(*ptrA)); // 0x0000AB07 (expected)
printf("regsB.reg0.value == 0x%08x\n",(int)(*ptrB)); // 0x0000AB07 (expected)
return 0;
}
When I first write the struct I expected to get reg1 to take only 2 bytes, hence the next register was at the offset = 2.
The relevant part of the standard is C11/C17 6.7.2.1p11:
An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
that, in connection with C11/C17 6.7.2.1p5
A. bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type. It is implementation-defined whether atomic types are permitted.
and that you're using char means that there is nothing to discuss in general - for a specific implementation check the compiler manuals. Here's the one for GCC.
From the 2 excerpts it follows that an implementation is free to use absolutely whatever types it wants to to implement the bitfields - it could even use int64_t for both of these cases having the structure of size 16 bytes. The only thing a conforming implementation must do is to pack the bits within the chosen addressable storage unit if enough space remains.
For GCC on System-V ABI on 386-compatible (32-bit processors), the following stands:
Plain bit-fields (that is, those neither signed nor unsigned) always have non- negative values. Although they may have type char, short, int, long, (which can have negative values),
these bit-fields have the same range as a bit-field of the same size
with the corresponding unsigned type. Bit-fields obey the same
size and alignment rules as other structure and union members, with
the following additions:
Bit-fields are allocated from right to left (least to most significant).
A bit-field must entirely reside in a storage unit appropriate for its declared type. Thus a bit-field never crosses its unit boundary.
Bit-fields may share a storage unit with other struct/union members, including members that are not bit-fields. Of course,
struct members occupy different parts of the storage unit.
Unnamed bit-fields' types do not affect the alignment of a structure or union, although individual bit-fields' member offsets obey the
alignment constraints.
i.e. in System-V ABI, 386, int f: 1 says that the bit-field f must be within an int. If entire bytes of space remains, a following char within the same struct will be packed inside this int, even if it is not a bit-field.
Using this knowledge, the layout for
struct {
int a : 1; // 1 bit - bit 0
int b : 2; // 2 bits - bits 2 down to 1
char c ; // 8 bits - bits 15 down to 8
} reg1;
will be
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|a b b x x x x x|c c c c c c c c|x x x x x x x x|x x x x x x x x|
<------------------------------ int ---------------------------->
and the layout for
struct {
char a : 1; // 1 bit - bit 0
char b : 2; // 2 bits - bits 2 down to 1
char c ; // 8 bits - bits 15 down to 8
} reg1;
will be
1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
|a b b x x x x x|c c c c c c c c|
<---- char ----><---- char ---->
So there are tricky edge cases. Compare the 2 definitions here:
struct x {
short a : 2;
short b : 15;
char c ;
};
struct y {
int a : 2;
int b : 15;
char c ;
};
Because the bit-field must not cross the unit boundary, the struct x members a and b need to go to different shorts. Then there is not enough space to accommodate the char c, so it must come after that. And the entire struct must be suitably aligned for short so it will be 6 bytes on i386. The latter however, will pack a and b in the 17 lowest bits of the int, and since there is still one entire addressable byte left within the int, the c will be packed here too, and hence sizeof (struct y) will be 4.
Finally, you must really specify whether the int or char is signed or not - the default might be not what you expect! Standard leaves it up to the implementation, and GCC has a compile-time switch to change them.

Copying a 4 element character array into an integer in C

A char is 1 byte and an integer is 4 bytes. I want to copy byte-by-byte from a char[4] into an integer. I thought of different methods but I'm getting different answers.
char str[4]="abc";
unsigned int a = *(unsigned int*)str;
unsigned int b = str[0]<<24 | str[1]<<16 | str[2]<<8 | str[3];
unsigned int c;
memcpy(&c, str, 4);
printf("%u %u %u\n", a, b, c);
Output is
6513249 1633837824 6513249
Which one is correct? What is going wrong?
It's an endianness issue. When you interpret the char* as an int* the first byte of the string becomes the least significant byte of the integer (because you ran this code on x86 which is little endian), while with the manual conversion the first byte becomes the most significant.
To put this into pictures, this is the source array:
a b c \0
+------+------+------+------+
| 0x61 | 0x62 | 0x63 | 0x00 | <---- bytes in memory
+------+------+------+------+
When these bytes are interpreted as an integer in a little endian architecture the result is 0x00636261, which is decimal 6513249. On the other hand, placing each byte manually yields 0x61626300 -- decimal 1633837824.
Of course treating a char* as an int* is undefined behavior, so the difference is not important in practice because you are not really allowed to use the first conversion. There is however a way to achieve the same result, which is called type punning:
union {
char str[4];
unsigned int ui;
} u;
strcpy(u.str, "abc");
printf("%u\n", u.ui);
Neither of the first two is correct.
The first violates aliasing rules and may fail because the address of str is not properly aligned for an unsigned int. To reinterpret the bytes of a string as an unsigned int with the host system byte order, you may copy it with memcpy:
unsigned int a; memcpy(&a, &str, sizeof a);
(Presuming the size of an unsigned int and the size of str are the same.)
The second may fail with integer overflow because str[0] is promoted to an int, so str[0]<<24 has type int, but the value required by the shift may be larger than is representable in an int. To remedy this, use:
unsigned int b = (unsigned int) str[0] << 24 | …;
This second method interprets the bytes from str in big-endian order, regardless of the order of bytes in an unsigned int in the host system.
unsigned int a = *(unsigned int*)str;
This initialization is not correct and invokes undefined behavior. It violates C aliasing rules an potentially violates processor alignment.
You said you want to copy byte-by-byte.
That means the the line unsigned int a = *(unsigned int*)str; is not allowed. However, what you're doing is a fairly common way of reading an array as a different type (such as when you're reading a stream from disk.
It just needs some tweaking:
char * str ="abc";
int i;
unsigned a;
char * c = (char * )&a;
for(i = 0; i < sizeof(unsigned); i++){
c[i] = str[i];
}
printf("%d\n", a);
Bear in mind, the data you're reading may not share the same endianness as the machine you're reading from. This might help:
void
changeEndian32(void * data)
{
uint8_t * cp = (uint8_t *) data;
union
{
uint32_t word;
uint8_t bytes[4];
}temp;
temp.bytes[0] = cp[3];
temp.bytes[1] = cp[2];
temp.bytes[2] = cp[1];
temp.bytes[3] = cp[0];
*((uint32_t *)data) = temp.word;
}
Both are correct in a way:
Your first solution copies in native byte order (i.e. the byte order the CPU uses) and thus may give different results depending on the type of CPU.
Your second solution copies in big endian byte order (i.e. most significant byte at lowest address) no matter what the CPU uses. It will yield the same value on all types of CPUs.
What is correct depends on how the original data (array of char) is meant to be interpreted.
E.g. Java code (class files) always use big endian byte order (no matter what the CPU is using). So if you want to read ints from a Java class file you have to use the second way. In other cases you might want to use the CPU dependent way (I think Matlab writes ints in native byte order into files, c.f. this question).
If your using CVI (National Instruments) compiler you can use the function Scan to do this:
unsigned int a;
For big endian:
Scan(str,"%1i[b4uzi1o3210]>%i",&a);
For little endian:
Scan(str,"%1i[b4uzi1o0123]>%i",&a);
The o modifier specifies the byte order.
i inside the square brackets indicates where to start in the str array.

struct representation in memory on 64 bit machine

For my curiosity I have written a program which was to show each byte of my struct. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <limits.h>
#define MAX_INT 2147483647
#define MAX_LONG 9223372036854775807
typedef struct _serialize_test{
char a;
unsigned int b;
char ab;
unsigned long long int c;
}serialize_test_t;
int main(int argc, char**argv){
serialize_test_t *t;
t = malloc(sizeof(serialize_test_t));
t->a = 'A';
t->ab = 'N';
t->b = MAX_INT;
t->c = MAX_LONG;
printf("%x %x %x %x %d %d\n", t->a, t->b, t->ab, t->c, sizeof(serialize_test_t), sizeof(unsigned long long int));
char *ptr = (char *)t;
int i;
for (i=0; i < sizeof(serialize_test_t) - 1; i++){
printf("%x = %x\n", ptr + i, *(ptr + i));
}
return 0;
}
and here is the output:
41 7fffffff 4e ffffffff 24 8
26b2010 = 41
26b2011 = 0
26b2012 = 0
26b2013 = 0
26b2014 = ffffffff
26b2015 = ffffffff
26b2016 = ffffffff
26b2017 = 7f
26b2018 = 4e
26b2019 = 0
26b201a = 0
26b201b = 0
26b201c = 0
26b201d = 0
26b201e = 0
26b201f = 0
26b2020 = ffffffff
26b2021 = ffffffff
26b2022 = ffffffff
26b2023 = ffffffff
26b2024 = ffffffff
26b2025 = ffffffff
26b2026 = ffffffff
And here is the question:
if sizeof(long long int) is 8, then why sizeof(serialize_test_t) is 24 instead of 32 - I always thought that size of struct is rounded to largest type and multiplied by number of fields like here for example: 8(bytes)*4(fields) = 32(bytes) — by default, with no pragma pack directives?
Also when I cast that struct to char * I can see from the output that the offset between values in memory is not 8 bytes. Can you give me a clue? Or maybe this is just some compiler optimizations?
On modern 32-bit machines like the SPARC or the Intel [34]86, or any Motorola chip from the 68020 up, each data iten must usually be ``self-aligned'', beginning on an address that is a multiple of its type size. Thus, 32-bit types must begin on a 32-bit boundary, 16-bit types on a 16-bit boundary, 8-bit types may begin anywhere, struct/array/union types have the alignment of their most restrictive member.
The total size of the structure will depend on the packing.In your case it's going as 8 byte so final structure will look like
typedef struct _serialize_test{
char a;//size 1 byte
padding for 3 Byte;
unsigned int b;//size 4 Byte
char ab;//size 1 Byte again
padding of 7 byte;
unsigned long long int c;//size 8 byte
}serialize_test_t;
in this manner first two and last two are aligned properly and total size reaches upto 24.
Depends on the alignment chosen by your compiler. However, you can reasonably expect the following defaults:
typedef struct _serialize_test{
char a; // Requires 1-byte alignment
unsigned int b; // Requires 4-byte alignment
char ab; // Requires 1-byte alignment
unsigned long long int c; // Requires 4- or 8-byte alignment, depending on native register size
}serialize_test_t;
Given the above requirements, the first field will be at offset zero.
Field b will start at offset 4 (after 3 bytes padding).
The next field starts at offset 8 (no padding required).
The next field starts at offset 12 (32-bit) or 16 (64-bit) (after another 3 or 7 bytes padding).
This gives you a total size of 20 or 24, depending on the alignment requirements for long long on your platform.
GCC has an offsetof function that you can use to identify the offset of any particular member, or you can define one yourself:
// modulo errors in parentheses...
#define offsetof(TYPE,MEMBER) (int)((char *)&((TYPE *)0)->MEMBER - (char *)((TYPE *)0))
Which basically calculates the offset using the difference in address using an imaginary base address for the aggregate type.
The padding is generally added so that the struct is a multiple of the word size (in this case 8)
So the first 2 fields are in one 8 byte chunk. The third field is in another 8 byte chunk and the last is in one 8 byte chunk. For a total of 24 bytes.
char
padding
padding
padding
unsigned int
unsigned int
unsigned int
unsigned int
char // Word Boundary
padding
padding
padding
padding
padding
padding
padding
unsigned long long int // Word Boundary
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
Has to do with alignment.
The size of the struct is not rounded to the largest type and multiplied by the fields. The bytes are aligned each by their respective types:
http://en.wikipedia.org/wiki/Data_structure_alignment#Architectures
Alignment works in that the type must appear in a memory address that is a multiple of its size, so:
Char is 1 byte aligned, so it can appear anywhere in memory that is a multiple of 1 (anywhere).
The unsigned int is needs to start at an address that is a multiple of 4.
The char can be anywhere.
and then the long long needs to be in a multiple of 8.
If you take a look at the addresses, this is the case.
The compiler is only concerned about the individual alignment of the struct members, one by one. It does not think about the struct as whole. Because on the binary level a struct does not exist, just a chunk of individual variables allocated at a certain address offset. There's no such thing as "struct round-up", the compiler couldn't care less about how large struct is, as long as all struct members are properly aligned.
The C standard says nothing about the manner of padding, apart from that a compiler is not allowed to add padding bytes at the very beginning of the struct. Apart from that, the compiler is free to add any number of padding bytes anywhere in the struct. It could 999 padding bytes and it would still conform to the standard.
So the compiler goes through the struct and sees: here's a char, it needs alignment. In this case, the CPU can probably handle 32 bit accesses, i.e. 4 byte alignment. Because it only adds 3 padding bytes.
Next it spots a 32 bit int, no alignment required, it is left as it is. Then another char, 3 padding bytes, then a 64 bit int, no alignment required.

Memory layout of struct having bitfields

I have this C struct: (representing an IP datagram)
struct ip_dgram
{
unsigned int ver : 4;
unsigned int hlen : 4;
unsigned int stype : 8;
unsigned int tlen : 16;
unsigned int fid : 16;
unsigned int flags : 3;
unsigned int foff : 13;
unsigned int ttl : 8;
unsigned int pcol : 8;
unsigned int chksm : 16;
unsigned int src : 32;
unsigned int des : 32;
unsigned char opt[40];
};
I'm assigning values to it, and then printing its memory layout in 16-bit words like this:
//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
unsigned short int* ptr = (unsigned short int*)&dgram;
int i,j;
//print only 10 words
for(i=0 ; i<10 ; i++)
{
for(j=15 ; j>=0 ; j--)
{
if( (*ptr) & (1<<j) ) printf("1");
else printf("0");
if(j%8==0)printf(" ");
}
ptr++;
printf("\n");
}
}
int main()
{
struct ip_dgram dgram;
dgram.ver = 4;
dgram.hlen = 5;
dgram.stype = 0;
dgram.tlen = 28;
dgram.fid = 1;
dgram.flags = 0;
dgram.foff = 0;
dgram.ttl = 4;
dgram.pcol = 17;
dgram.chksm = 0;
dgram.src = (unsigned int)htonl(inet_addr("10.12.14.5"));
dgram.des = (unsigned int)htonl(inet_addr("12.6.7.9"));
print_dgram(dgram);
return 0;
}
I get this output:
00000000 01010100
00000000 00011100
00000000 00000001
00000000 00000000
00010001 00000100
00000000 00000000
00001110 00000101
00001010 00001100
00000111 00001001
00001100 00000110
But I expect this:
The output is partially correct; somewhere, the bytes and nibbles seem to be interchanged. Is there some endianness issue here? Are bit-fields not good for this purpose? I really don't know. Any help? Thanks in advance!
No, bitfields are not good for this purpose. The layout is compiler-dependant.
It's generally not a good idea to use bitfields for data where you want to control the resulting layout, unless you have (compiler-specific) means, such as #pragmas, to do so.
The best way is probably to implement this without bitfields, i.e. by doing the needed bitwise operations yourself. This is annoying, but way easier than somehow digging up a way to fix this. Also, it's platform-independent.
Define the header as just an array of 16-bit words, and then you can compute the checksum easily enough.
The C11 standard says:
An implementation may allocate any addressable storage unit large
enough to hold a bitfield. If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The order of
allocation of bit-fields within a unit (high-order to low-order or
low-order to high-order) is implementation-defined.
I'm pretty sure this is undesirable, as it means there might be padding between your fields, and that you can't control the order of your fields. Not just that, but you're at the whim of the implementation in terms of network byte order. Additionally, imagine if an unsigned int is only 16 bits, and you're asking to fit a 32-bit bitfield into it:
The expression that specifies the width of a bit-field shall be an
integer constant expression with a nonnegative value that does not
exceed the width of an object of the type that would be specified were
the colon and expression omitted.
I suggest using an array of unsigned chars instead of a struct. This way you're guaranteed control over padding and network byte order. Start off with the size in bits that you want your structure to be, in total. I'll assume you're declaring this in a constant such as IP_PACKET_BITCOUNT: typedef unsigned char ip_packet[(IP_PACKET_BITCOUNT / CHAR_BIT) + (IP_PACKET_BITCOUNT % CHAR_BIT > 0)];
Write a function, void set_bits(ip_packet p, size_t bitfield_offset, size_t bitfield_width, unsigned char *value) { ... } which allows you to set the bits starting at p[bitfield_offset / CHAR_BIT] bit bitfield_offset % CHARBIT to the bits found in value, up to bitfield_width bits in length. This will be the most complicated part of your task.
Then you could define identifiers for VER_OFFSET 0 and VER_WIDTH 4, HLEN_OFFSET 4 and HLEN_WIDTH 4, etc to make modification of the array seem less painless.
Although question was asked long time back, there's no answer with explaination of your result. I'll answer it, hopefully it'll be useful to someone.
I'll illustrate the bug using first 16 bits of your data structure.
Please Note: This explaination is guarranteed to be true only with the set of your processor and compiler. If any of these changes, behaviour may change.
Fields:
unsigned int ver : 4;
unsigned int hlen : 4;
unsigned int stype : 8;
Assigned to:
dgram.ver = 4;
dgram.hlen = 5;
dgram.stype = 0;
Compiler starts assigning bit fields starting with offset 0. This means first byte of your data structure is stored in memory as:
Bit offset: 7 4 0
-------------
| 5 | 4 |
-------------
First 16 bits after assignment look like this:
Bit offset: 15 12 8 4 0
-------------------------
| 5 | 4 | 0 | 0 |
-------------------------
Memory Address: 100 101
You are using Unsigned 16 pointer to dereference memory address 100. As a result address 100 is treated as LSB of a 16 bit number. And 101 is treated as MSB of a 16 bit number.
If you print *ptr in hex you'll see this:
*ptr = 0x0054
Your loop is running on this 16 bit value and hence you get:
00000000 0101 0100
-------- ---- ----
0 5 4
Solution:
Change order of elements to
unsigned int hlen : 4;
unsigned int ver : 4;
unsigned int stype : 8;
And use unsigned char * pointer to traverse and print values.
It should work.
Please note, as others've said, this behavior is platform and compiler specific. If any of these changes, you need to verify that memory layout of your data structure is correct.
For Chinese users, I think you can refer blog for more details, really good.
In summary, due to endianness, there is byte order as well as bit order. Bit order is the order how each bit of one byte saved in memory. Bit order has same rule with byte order in sense of endianness issue.
For your picture, it's designed in network order which is big endian. So your struct defination is actually for big endian. Per your output, your PC is little endian, so you need change struct field orders when use.
The way to show each bits is incorrect since when get by char, the bit order has changed from machine order (little endian in your case) to normal order which we human use. You may change it as following per refered blog.
void
dump_native_bits_storage_layout(unsigned char *p, int bytes_num)
{
union flag_t {
unsigned char c;
struct base_flag_t {
unsigned int p7:1,
p6:1,
p5:1,
p4:1,
p3:1,
p2:1,
p1:1,
p0:1;
} base;
} f;
for (int i = 0; i < bytes_num; i++) {
f.c = *(p + i);
printf("%d%d%d%d %d%d%d%d ",
f.base.p7,
f.base.p6,
f.base.p5,
f.base.p4,
f.base.p3,
f.base.p2,
f.base.p1,
f.base.p0);
}
printf("\n");
}
//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
unsigned char* ptr = (unsigned short int*)&dgram;
int i,j;
//print only 10 words
for(i=0 ; i<10 ; i++)
{
dump_native_bits_storage_layout(ptr, 1);
/* for(j=7 ; j>=0 ; j--)
{
if( (*ptr) & (1<<j) ) printf("1");
else printf("0");
if(j%8==0)printf(" ");
}*/
ptr++;
//printf("\n");
}
}
#unwind
A typical use case of Bit Fields is interpreting/emulation of byte code or CPU instructions with given layout. "Don't use it, because you cannot control it" is the answer for children.
#Bruce
For Intel/GCC I see a packed LITTLE ENDIAN bit layout, i.e. in struct ip_dgram field ver is represented by bits 0..3, field hlen is represented by bits 4..7 ...
For correctness of operation it is required to verify the memory layout against your design at runtime.
struct ModelIndicator
{
int a:4;
int b:4;
int c:4;
};
union UModelIndicator
{
ModelIndicator i;
int v;
};
// test packed little endian
static bool verifyLayoutModel()
{
UModelIndicator um;
um.v = 0;
um.i.a = 2; // 0..3
um.i.b = 3; // 4..7
um.i.c = 9; // 8..11
return um.v = (9 << 8) + (3 << 4) + 2;
}
int main()
{
if (!verifyLayoutModel())
{
std::cerr << "Invalid memory layout" << std::endl;
return -1;
}
// ...
}
At the earliest, when above test fails, you need to consider compiler pragmas or adjust your structures accordingly, resp. verifyLayoutModel().
I agree with what unwind said. Bit fields are compiler dependent.
If you need the bits to be in a specific order, pack the data into a pointer to a character array. Increment the buffer the size of the element being packed. Pack the next element.
pack( char** buffer )
{
if ( buffer & *buffer )
{
//pack ver
//assign first 4 bits to 4.
*((UInt4*) *buffer ) = 4;
*buffer += sizeof(UInt4);
//assign next 4 bits to 5
*((UInt4*) *buffer ) = 5;
*buffer += sizeof(UInt4);
... continue packing
}
}
Compiler dependant or not, It depends whether you want to write a very fast program or if you want one that works with different compilers. To write for C a fast, compact application, use a stuct with bit fields/. If you want a slow general purpose program , long code it.

Storing an unsigned integer into a struct

I've written this piece of code where I've assigned an unsigned integer to two different structs. In fact they're the same but one of them has the __attribute__((packed)).
#include
#include
struct st1{
unsigned char opcode[3];
unsigned int target;
}__attribute__((packed));
struct st2{
unsigned char opcode[3];
unsigned int target;
};
void proc(void* addr) {
struct st1* varst1 = (struct st1*)addr;
struct st2* varst2 = (struct st2*)addr;
printf("opcode in varst1: %c,%c, %c\n",varst1->opcode[0],varst1->opcode[1],varst1->opcode[2]);
printf("opcode in varst2: %c,%c,%c\n",varst2->opcode[0],varst2->opcode[1],varst2->opcode[2]);
printf("target in varst1: %d\n",varst1->target);
printf("target in varst2: %d\n",varst2->target);
};
int main(int argc,char* argv[]) {
unsigned int* var;
var =(unsigned int*) malloc(sizeof(unsigned int));
*var = 0x11334433;
proc((void*)var);
return 0;
}
The output is:
opcode in varst1: 3,D,3
opcode in varst2: 3,D,3
target in varst1: 17
target in varst2: 0
Given that I'm storing this number
0x11334433 == 00010001001100110100010000110011
I'd like to know why that is the output I get.
This is to do with data alignment. Most compilers will align data on address boundaries that help with general performance. So, in the first case, the struct with the packed attribute, there is an extra byte between the char [3] and the int to align the int on a four byte boundary. In the packed version that padding byte is missing.
byte : 0 1 2 3 4 5 6 7
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
You allocate an unsigned int and pass that to the function:
byte : 0 1 2 3 4 5 6 7
alloc : |-----------int------------------| |---unallocated---|
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
If you're using a little endian system then the lowest eight bits (right most) are stored at byte 0 (0x33), byte 1 has 0x44, byte 2 has 0x33 and byte 4 has 0x11. In the st1 structure the int value is mapped to memory beyond the end of the allocated amount and the st2 version the lowest byte of the int is mapped to the byte 4, 0x11. So st1 produces 0 and st2 produces 0x11.
You are lucky that the unallocated memory is zero and that you have no memory range checking going on. Writing to the ints in st1 and st2 in this case could corrupt memory at worst, generate memory guard errors or do nothing. It is undefined and dependant on the runtime implementation of the memory manager.
In general, avoid void *.
Your bytes look like this:
00010001 00110011 01000100 00110011
Though obviously your endianness is wrong and in fact they're like this:
00110011 01000100 00110011 00010001
If your struct is packed then the first three bytes are associated with opcode, and the 4th is target - thats why the packed array has atarget of 17 - 0001001 in binary.
The unpacked array is padded with zeros, which is why target in varst2 is zero.
%c interprets the argument as the ascii code of a character and prints the character
3's ascii code is 0x33
D's ascii code is 0x44
17 is 0x11
an int is stored little endian or big endian depending on the processor architecture -- you can't depend on it going into your struct's fields in order.
The int target in the unpacked version is past the position of the int, so it stays 0.

Resources