Storing an unsigned integer into a struct - c

I've written this piece of code where I've assigned an unsigned integer to two different structs. In fact they're the same but one of them has the __attribute__((packed)).
#include
#include
struct st1{
unsigned char opcode[3];
unsigned int target;
}__attribute__((packed));
struct st2{
unsigned char opcode[3];
unsigned int target;
};
void proc(void* addr) {
struct st1* varst1 = (struct st1*)addr;
struct st2* varst2 = (struct st2*)addr;
printf("opcode in varst1: %c,%c, %c\n",varst1->opcode[0],varst1->opcode[1],varst1->opcode[2]);
printf("opcode in varst2: %c,%c,%c\n",varst2->opcode[0],varst2->opcode[1],varst2->opcode[2]);
printf("target in varst1: %d\n",varst1->target);
printf("target in varst2: %d\n",varst2->target);
};
int main(int argc,char* argv[]) {
unsigned int* var;
var =(unsigned int*) malloc(sizeof(unsigned int));
*var = 0x11334433;
proc((void*)var);
return 0;
}
The output is:
opcode in varst1: 3,D,3
opcode in varst2: 3,D,3
target in varst1: 17
target in varst2: 0
Given that I'm storing this number
0x11334433 == 00010001001100110100010000110011
I'd like to know why that is the output I get.

This is to do with data alignment. Most compilers will align data on address boundaries that help with general performance. So, in the first case, the struct with the packed attribute, there is an extra byte between the char [3] and the int to align the int on a four byte boundary. In the packed version that padding byte is missing.
byte : 0 1 2 3 4 5 6 7
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
You allocate an unsigned int and pass that to the function:
byte : 0 1 2 3 4 5 6 7
alloc : |-----------int------------------| |---unallocated---|
st1 : opcode[0] opcode[1] opcode[2] padding |----int------|
st2 : opcode[0] opcode[1] opcode[2] |-------int--------|
If you're using a little endian system then the lowest eight bits (right most) are stored at byte 0 (0x33), byte 1 has 0x44, byte 2 has 0x33 and byte 4 has 0x11. In the st1 structure the int value is mapped to memory beyond the end of the allocated amount and the st2 version the lowest byte of the int is mapped to the byte 4, 0x11. So st1 produces 0 and st2 produces 0x11.
You are lucky that the unallocated memory is zero and that you have no memory range checking going on. Writing to the ints in st1 and st2 in this case could corrupt memory at worst, generate memory guard errors or do nothing. It is undefined and dependant on the runtime implementation of the memory manager.
In general, avoid void *.

Your bytes look like this:
00010001 00110011 01000100 00110011
Though obviously your endianness is wrong and in fact they're like this:
00110011 01000100 00110011 00010001
If your struct is packed then the first three bytes are associated with opcode, and the 4th is target - thats why the packed array has atarget of 17 - 0001001 in binary.
The unpacked array is padded with zeros, which is why target in varst2 is zero.

%c interprets the argument as the ascii code of a character and prints the character
3's ascii code is 0x33
D's ascii code is 0x44
17 is 0x11
an int is stored little endian or big endian depending on the processor architecture -- you can't depend on it going into your struct's fields in order.
The int target in the unpacked version is past the position of the int, so it stays 0.

Related

Why is the output of this c program like this?

The ouput of the following code is coming out to be 512 0 2 however it should have been 512 0 0. Can somebody please help !
#include<stdio.h>
int main()
{
union a
{
int i;
char ch[2];
};
union a z = { 512 };
printf("%d %d %d\n",z.i, z.ch[0], z.ch[1]);
return 0;
}
You have build a union of two bytes. Know you assign 512d (0x0200) to the union.
First Byte = 0x00
Second Byte = 0x02
The integer (int16_t) i and your array ch[2] use the same memory!
Let's assume for simplicity that int is 2 bytes.In this case the memory for your struct will be 2 bytes.Let's also assume that the union is located on address 0x0000.By the results you are getting i can tell you're using a little endian machine - address 0x0000 -> value 0x0000, address 0x0002 -> value 0x0002.
z.i prints 512 correctly.
z.ch[0] getting the value from address 0x0000 which is 0
z.ch[1] getting the value from address 0x0002 which is 2
Big Endian Byte Order: The most significant byte (the "big end") of
the data is placed at the byte with the lowest address. The rest of
the data is placed in order in the next three bytes in memory.
Little Endian Byte Order: The least significant byte (the "little
end") of the data is placed at the byte with the lowest address. The
rest of the data is placed in order in the next three bytes in memory.
I think you have some confusion with a struct and a union.
A union uses the same memory for all of its members and a struct has a separate memory for each member.
See the following extension of your code (at IDEone ):
#include<stdio.h>
int main()
{
union a
{
int i;
char ch[2];
};
union a aa = { 512 };
printf("%d %d %d\n",aa.i, aa.ch[0], aa.ch[1]);
struct b
{
int i;
char ch[2];
};
struct b bb = { 512 };
printf("%d %d %d\n",bb.i, bb.ch[0], bb.ch[1]);
return 0;
}
Output:
Union: 512 0 2
Struct: 512 0 0

Size of structure with bit fields

Here I have a code snippet.
#include <stdio.h>
int main()
{
struct value
{
int bit1 : 1;
int bit2 : 4;
int bit3 : 4;
} bit;
printf("%d",sizeof(bit));
return 0;
}
I'm getting the output as 4 (32 bit compiler).
Can anyone explain me how? Why is it not 1+ 4 + 4 = 9?
I've never worked with bit fields before so would love some help. Thank you. :)
When you tell the C compiler this:
int bit1 : 1
It interprets it as, and allocates to it, an integer; but refers to it's first bit as bit1.
So if we consider your code:
struct value
{
int bit1 : 1;
int bit2 : 4;
int bit3 : 4;
} bit;
What you are telling the compiler is this: Take necessary number of the ints, and refer to the chunks bit 1 as bit1, then refer to bits 2 - 5 as bit2, and then refer to bits 6 - 9 as bit3.
Since the complete number of bits required are 9, and an int is 32 bits (in your computer's architecture), memory space of only 1 int is required. Thus you get the size as 4 (bytes).
Instead, if you were to define the struct using chars, since char is 8 bits, the compiler would allocate the memory space of two chars for each struct value. And you will get 2 (bytes) as your output.
Because C requests to pack the bits in the same unit (here one signed int / unsigned int):
(C99, 6.7.2.1p10) "If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit"
The processor just likes chucking around 32 bits in one go - not 9, 34 etc.
It just rounds it up to what the processor likes. (Keep the worker happy)

Unexpected Behaviour of bit field operator in C

void main()
{
struct bitfield
{
unsigned a:5;
unsigned c:5;
unsigned b:6;
}bit;
char *p;
struct bitfield *ptr,bit1={1,3,3};
p=&bit1;
p++;
printf("%d",*p);
}
Explanation:
Binary value of a=1 is 00001 (in 5 bit)
Binary value of b=3 is 00011 (in 5 bit)
Binary value of c=3 is 000011 (in 6 bit)
My question is: In memory how it will represented as?
When I compile it's giving output 12 I am not able to figure out why It's happening: In my view let say memory representation will be in below format:
00001 000011 00011
| |
501 500 (Let Say starting address)
Please Correct me If I am wrong here.
The actual representation is like:
000011 00011 00001
b c a
When aligned as bytes:
00001100 01100001
| |
p+1 p
On the address (p+1) is 0001100 which gives 12.
The C standard does not completely specify how bit-fields are packed into bytes. The details depend on each C implementation.
From C 2011 6.7.2.1:
11 An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
From the C11 standard (6.7.2.1):
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
I know for a fact that GCC and other compilers on unix-like systems order bit fields in the host byte order which can be evidenced from the definition of an IP header from an operating system source I had handy:
struct ip {
#if _BYTE_ORDER == _LITTLE_ENDIAN
u_int ip_hl:4, /* header length */
ip_v:4; /* version */
#endif
#if _BYTE_ORDER == _BIG_ENDIAN
u_int ip_v:4, /* version */
ip_hl:4; /* header length */
#endif
Other compilers might do the same. Since you're most likely on a little endian machine, your bit field will be backwards from what you're expecting (in addition to the words being backwards already). Most likely it looks like this in memory (notice that the order of your fields in the struct in your question is "a, c, b", not "a, b, c", just to make this all more confusing):
01100001 00001100
| |
byte 0 byte 1
| | | |
x a b c
So, all three bit fields can be stuffed in an int. Padding is added automatically and it's at the start of all the bitfields, it is put at byte 2 and 3. Then the b starts at the lowest bit of byte 1. After it c starts in byte 1 two, but we can only fit two bits of it, the two highest bits of c are 0, then c continues in byte 0 (x in my picture above), and then after that you have a.
Notice that the picture is with the lowest address of both the bytes and the bits on the left side growing to the right (this is pretty much standard in literature, your picture had the bits in one direction and bytes in another which makes everything more confusing, especially adding your weird ordering of the fields "a, c, b").
I none of the above made any sense run this program and then read up on byte-ordering:
#include <stdio.h>
int
main(int argc, char **argv)
{
unsigned int i = 0x01020304;
unsigned char *p;
p = (unsigned char *)&i;
printf("0x%x 0x%x 0x%x 0x%x\n", (unsigned int)p[0], (unsigned int)p[1], (unsigned int)p[2], (unsigned int)p[3]);
return 0;
}
Then when you understand what little-endian does to the ordering of bytes in an int, map your bit-field on top of that, but with the fields backwards. Then it might start making sense (I've been doing this for years and it's still confusing as hell).
Another example to show how the bit fields are backwards twice, once because of the compiler deciding to put them backwards on a little-endian machine, and then once again because the byte order of ints:
#include <stdio.h>
int
main(int argc, char **argv)
{
struct bf {
unsigned a:4,b:4,c:4,d:4,e:4,f:4,g:4,h:4;
} bf = { 1, 2, 3, 4, 5, 6, 7, 8 };
unsigned int *i;
unsigned char *p;
p = (unsigned char *)&bf;
i = (unsigned int *)&bf;
printf("0x%x 0x%x 0x%x 0x%x\n", (unsigned int)p[0], (unsigned int)p[1], (unsigned int)p[2], (unsigned int)p[3]);
printf("0x%x\n", *i);
return 0;
}

struct representation in memory on 64 bit machine

For my curiosity I have written a program which was to show each byte of my struct. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <limits.h>
#define MAX_INT 2147483647
#define MAX_LONG 9223372036854775807
typedef struct _serialize_test{
char a;
unsigned int b;
char ab;
unsigned long long int c;
}serialize_test_t;
int main(int argc, char**argv){
serialize_test_t *t;
t = malloc(sizeof(serialize_test_t));
t->a = 'A';
t->ab = 'N';
t->b = MAX_INT;
t->c = MAX_LONG;
printf("%x %x %x %x %d %d\n", t->a, t->b, t->ab, t->c, sizeof(serialize_test_t), sizeof(unsigned long long int));
char *ptr = (char *)t;
int i;
for (i=0; i < sizeof(serialize_test_t) - 1; i++){
printf("%x = %x\n", ptr + i, *(ptr + i));
}
return 0;
}
and here is the output:
41 7fffffff 4e ffffffff 24 8
26b2010 = 41
26b2011 = 0
26b2012 = 0
26b2013 = 0
26b2014 = ffffffff
26b2015 = ffffffff
26b2016 = ffffffff
26b2017 = 7f
26b2018 = 4e
26b2019 = 0
26b201a = 0
26b201b = 0
26b201c = 0
26b201d = 0
26b201e = 0
26b201f = 0
26b2020 = ffffffff
26b2021 = ffffffff
26b2022 = ffffffff
26b2023 = ffffffff
26b2024 = ffffffff
26b2025 = ffffffff
26b2026 = ffffffff
And here is the question:
if sizeof(long long int) is 8, then why sizeof(serialize_test_t) is 24 instead of 32 - I always thought that size of struct is rounded to largest type and multiplied by number of fields like here for example: 8(bytes)*4(fields) = 32(bytes) — by default, with no pragma pack directives?
Also when I cast that struct to char * I can see from the output that the offset between values in memory is not 8 bytes. Can you give me a clue? Or maybe this is just some compiler optimizations?
On modern 32-bit machines like the SPARC or the Intel [34]86, or any Motorola chip from the 68020 up, each data iten must usually be ``self-aligned'', beginning on an address that is a multiple of its type size. Thus, 32-bit types must begin on a 32-bit boundary, 16-bit types on a 16-bit boundary, 8-bit types may begin anywhere, struct/array/union types have the alignment of their most restrictive member.
The total size of the structure will depend on the packing.In your case it's going as 8 byte so final structure will look like
typedef struct _serialize_test{
char a;//size 1 byte
padding for 3 Byte;
unsigned int b;//size 4 Byte
char ab;//size 1 Byte again
padding of 7 byte;
unsigned long long int c;//size 8 byte
}serialize_test_t;
in this manner first two and last two are aligned properly and total size reaches upto 24.
Depends on the alignment chosen by your compiler. However, you can reasonably expect the following defaults:
typedef struct _serialize_test{
char a; // Requires 1-byte alignment
unsigned int b; // Requires 4-byte alignment
char ab; // Requires 1-byte alignment
unsigned long long int c; // Requires 4- or 8-byte alignment, depending on native register size
}serialize_test_t;
Given the above requirements, the first field will be at offset zero.
Field b will start at offset 4 (after 3 bytes padding).
The next field starts at offset 8 (no padding required).
The next field starts at offset 12 (32-bit) or 16 (64-bit) (after another 3 or 7 bytes padding).
This gives you a total size of 20 or 24, depending on the alignment requirements for long long on your platform.
GCC has an offsetof function that you can use to identify the offset of any particular member, or you can define one yourself:
// modulo errors in parentheses...
#define offsetof(TYPE,MEMBER) (int)((char *)&((TYPE *)0)->MEMBER - (char *)((TYPE *)0))
Which basically calculates the offset using the difference in address using an imaginary base address for the aggregate type.
The padding is generally added so that the struct is a multiple of the word size (in this case 8)
So the first 2 fields are in one 8 byte chunk. The third field is in another 8 byte chunk and the last is in one 8 byte chunk. For a total of 24 bytes.
char
padding
padding
padding
unsigned int
unsigned int
unsigned int
unsigned int
char // Word Boundary
padding
padding
padding
padding
padding
padding
padding
unsigned long long int // Word Boundary
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
unsigned long long int
Has to do with alignment.
The size of the struct is not rounded to the largest type and multiplied by the fields. The bytes are aligned each by their respective types:
http://en.wikipedia.org/wiki/Data_structure_alignment#Architectures
Alignment works in that the type must appear in a memory address that is a multiple of its size, so:
Char is 1 byte aligned, so it can appear anywhere in memory that is a multiple of 1 (anywhere).
The unsigned int is needs to start at an address that is a multiple of 4.
The char can be anywhere.
and then the long long needs to be in a multiple of 8.
If you take a look at the addresses, this is the case.
The compiler is only concerned about the individual alignment of the struct members, one by one. It does not think about the struct as whole. Because on the binary level a struct does not exist, just a chunk of individual variables allocated at a certain address offset. There's no such thing as "struct round-up", the compiler couldn't care less about how large struct is, as long as all struct members are properly aligned.
The C standard says nothing about the manner of padding, apart from that a compiler is not allowed to add padding bytes at the very beginning of the struct. Apart from that, the compiler is free to add any number of padding bytes anywhere in the struct. It could 999 padding bytes and it would still conform to the standard.
So the compiler goes through the struct and sees: here's a char, it needs alignment. In this case, the CPU can probably handle 32 bit accesses, i.e. 4 byte alignment. Because it only adds 3 padding bytes.
Next it spots a 32 bit int, no alignment required, it is left as it is. Then another char, 3 padding bytes, then a 64 bit int, no alignment required.

Casting int pointer to char pointer causes loss of data in C?

I have the following piece of code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 0;
printf("n = %d\n", n);
system("PAUSE");
return 0;
}
The output put of the program is n = 256.
I may understand why it is, but I am not really sure.
Can anyone give me a clear explanation, please?
Thanks a lot.
The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int:
0x04 0x01 0x00 0x00
By using a char pointer, you point to the first byte and change it to 0x00, which changes the int to 256 (= 256 * 1 + 0).
You're apparently working on a little-endian machine. What's happening is that you're starting with an int that takes up at least two bytes. The value 260 is 256+4. The 256 goes in the second byte, and the 4 in the first byte. When you write 0 to the first byte, you're left with only the 256 in the second byte.
In C a pointer references a block of bytes based on the type associated with the pointer. So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value
I understood what exactly happens by changing value:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int n = 260;
int *p = &n;
char *pp = (char*)p;
*pp = 20;
printf("pp = %d\n", (int)*pp);
printf("n = %d\n", (int)n);
system("PAUSE");
return 0;
}
The output value are
20
and
276
So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number
Your problem is the assignment
*pp = 0;
You're dereferencing pp which points to n, and changing n.
However, pp is a char pointer so it doesn't change all of n
which is an int. This causes the binary complications in the other answers.
In terms of the C language, the description for what you are doing is modifying the representation of the int variable n. In C, all types have a "representation" as one or more bytes (unsigned char), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here.
As schnaader answered, on a little endian, twos complement implementation with 32-bit int, the representation of 260 is:
0x04 0x01 0x00 0x00
and overwriting the first byte with 0 yields:
0x00 0x01 0x00 0x00
which is the representation for 256 on such an implementation.
C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t, it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent).
Considering 32 bit systems,
256 will be represented in like this.
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000100(Byte-0)
Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. So, only the LSB gets changes to zero, not all the 4 bytes.
And it becomes
00000000 (Byte-3) 00000000 (Byte-2) 00000001(Byte-1) 00000000(Byte-0)
Hence, the o/p is 256.

Resources