Memory allocation for union in C - c

I was recently studying union and end up with a confusion even after reading a lot about it.
#include<stdio.h>
union test
{
int x;
char arr[4];
int y;
};
int main()
{
union test t;
t.x = 0;
t.arr[1] = 'G';
printf("%s\n", t.arr);
printf("%d\n",t.x);
return 0;
}
What I understood is :
Since x and arr[4] share the same memory, when we set x = 0, all characters of arr are set as 0. 0 is ASCII value of '\0'. When we do "t.arr[1] = 'G'", arr[] becomes "\0G\0\0". When we print a string using "%s", the printf function starts from the first character and keeps printing till it finds a \0. Since the first character itself is \0, nothing is printed.
What I don't get is second printf statement
Now since arr[] is "\0G\0\0" , the same location is shared with x and y.
So what I think x to be is the following
00000000 01000111 00000000 00000000 ("\0G\0\0")
so t.x should print 4653056.
But what it's printing is 18176.
Where am I going wrong?
Is this technically undefined or is it due to some silly mistake or am I missing some concept??

All members of Union will share the same common memory. assume starting address of union is 0x100.
when you wrote t.x = 0; whole 4 bytes got initialize with zero as
-------------------------------------------------
| 0000 0000 | 0000 0000 | 0000 0000 | 0000 0000 |
-------------------------------------------------
0x104 0x103 0x102 0x101 0x100
x,arr,y
when you are writing t.arr[1] = 'G'; arr[1] will overwritten with 'G' ascii value, it looks like
-------------------------------------------------
| 0000 0000 | 0000 0000 | 0100 0111 | 0000 0000 |
-------------------------------------------------
0x104 0x103 0x102 0x101 0x100
now calculates this value which is 18176.

tl;dr: endianity!
When printf reads the data from the memory pointed at by your union, it looks at the endianity of your system and reads the data stored in little endian.
So, instead of printing the data as it is stored in memory (0x00470000) it gets the number 0x00004700, which correlates to 18176, like you get.
Code example:
#include<stdio.h>
union test
{
int x;
char arr[4];
int y;
};
int main()
{
union test t;
t.x = 0;
t.arr[1] = 'G';
printf("%s\n", t.arr);
printf("%d\n",t.x); // prints 18176
t.x = 0;
t.arr[2] = 'G';
printf("%d\n",t.x); // prints 4653056
return 0;
}
Or in Python:
import struct
union_data = "\x00G\x00\x00"
print struct.unpack("<I", a)[0] # This is little endian. Prints 18176
print struct.nupack(">I", a)[0] # This is big endian. Prints 4653056
Bonus! You can also use the function htonl to convert integers read as little endian to big endian. See more at the docs.

Related

Why pointer not giving its Ascii Value?

This is my current code as folows:
#include<stdio.h>
int main() {
/* code */
char a[5] = {'a','b'};
int *p =a;
printf("%d\n", *p);
return 0;
}
When I execute my code it is showing 25185 instead of giving me an ASCII value.
Why is this happening?
Thank you
This is undefined behavior, so anything can happen. As for what you're observing in particular, here's the explanation:
If an array only has some of its values initialized at declaration, the remaining values are zero. So your array a is 'a', 'b', '\0', '\0', '\0'. When a pointer to the beginning of this array is interpreted as a 32-bit, little-endian int, this has the value 0x00006261, or 25185 in decimal.
(Disclaimer: the other answer shows you why you get 25185, this one shows how you can achieve your goal.)
If you want to output the ASCII value a[0] (which seems to be what you're trying to do by int *p=a;), tells to printf() you want to pass a byte, and use a char* (a pointer to a character, which is a byte in C) to point to it:
int main (int arg, char **argv)
{
char a[5] = {'a','b'};
char *p =a; // points to a char, ie a byte
printf("%hhx\n", *p); // tells to printf it's a byte type
return 0;
}
initializing a[5] with {'a', 'b'} is giving values to 2 bytes
ascii binary
a 97 0110 0001
b 98 0110 0010
so when you read it via integer pointer it reads 0110 0001 0110 0010 = 25185 (in decimal) and following little endianness upper 2 bytes contains 0
if you read via char pointer it reads one byte 0110 0001 = 97

inexplicable change in value while using pointer typecasting

#include <stdio.h>
int main( void )
{
int num = 1;
char *b;
b = (char*) &num;
*(++b) = 2;
printf("%d\n",num);
return 0;
}
Explanation : When I compiled this code , I encountered "513" as an output.When I use a comment line for that line:
`*(++b)=2;
Output converts into "1".
Question 1: Why did I encounter "513" as an output ?
Question 1: Why did output change when use comment line that I implied ?
Assuming that an int is 32 bit with little endian byte ordering on your system, the representation of num is 0x00000001 and looks like this in memory:
-----------------
| 1 | 0 | 0 | 0 |
-----------------
Then you point b to num:
b
|
v
-----------------
| 1 | 0 | 0 | 0 |
-----------------
Then you do *(++b)=2;, it increments b, dereferences the incremented pointer, and writes 2 to that location. So you now have.
b
|
v
-----------------
| 1 | 2 | 0 | 0 |
-----------------
So now the representation of num is 0x00000201` which is 513 in decimal.
It appears that on your machine, chars are byte-sized. That is why you are seeing this completely normal behavior.
Your num variable looks like this in memory.
0x00000001
You take address of it, treat it as a char* then set second char to 2 so now your num becomes.
0x00000201
When you convert that back to decimal, it correctly outputs 513.
When I compiled this code , I encountered "513" as an output.
You are getting this output because of this statement:
*(++b)=2;
Initially, b is pointing to num. Statement *(++b)=2 will first increment the pointer b and point to next byte and then dereferencing it and assigning 2.
Assuming on your platform int is 32 bit then:
initially
num = 1
00000000 00000000 00000000 00000001
num after this statement
*(++b)=2;
00000000 00000000 00000010 00000001
which is the binary representation of `513`.
When I use a comment line for that line: *(++b)=2; Output converts into "1".
Of course, you have initialized num with 1 and what else you expect when printing num without making any change to it.
This
int num=1;
represented in 32 bit system as below
0x103 0x102 0x101 0x100 (let's assume base address of num 0x100)
-----------------------------------------------
| 0000 0000 | 0000 0000 | 0000 0000 | 0000 0001 |
-----------------------------------------------
num
MSB <--LSB
And here
char *b; /* char pointer i.e at a time points to 1 byte */
b=(char*)&num; /* b points to &num i.e 0x100 in above diagram */
it looks like
0x103 0x102 0x101 0x100
-----------------------------------------------
| 0000 0000 | 0000 0000 | 0000 0000 | 0000 0001 |
-----------------------------------------------
num
b <-- b points here
Now when this
*(++b)=2;
gets executed, first ++b happens that means char pointer b gets incremented by one byte i.e it points to 0x101 location and then content of only 0x101 location assigned by 2. It looks like
0x103 0x102 0x101 0x100
-----------------------------------------------
| 0000 0000 | 0000 0000 | 0000 0010 | 0000 0001 |
-----------------------------------------------
| num
b <-- b points to 0x101
Now when you print num it prints 512 + 1 which is 513. I hope it clears your doubt.

Printf in C prints ffffffe1 instead of e1

I am very much confused, I have a small program where I am printing the value at different address location.
int main ()
{
// unsigned int x = 0x15711056;
unsigned int x = 0x15b11056;
char *c = (char*) &x;
printf ("*c is: 0x%x\n", *c);
printf("size of %d\n", sizeof(x));
printf("Value at first address %x\n", *(c+0));
printf("Value at second address %x\n", *(c+1));
printf("Value at third address %x\n", *(c+2));
printf("Value at fourth address %x\n", *(c+3));
For the commented unsigned int x the printf values are as expected i.e.
printf("Value at first address %x\n", *(c+0)) = 56
printf("Value at second address %x\n", *(c+1))= 10
printf("Value at third address %x\n", *(c+2))= 71
printf("Value at fourth address %x\n", *(c+3))= 15
But for un-commented int x why I am getting below result for *(c+2) It should be b1 not ffffffb1. Please help me to understand this I am running this on an online IDE https://www.onlinegdb.com/online_c_compiler. My PC is i7 intel.
printf("Value at first address %x\n", *(c+0)) = 56
printf("Value at second address %x\n", *(c+1))= 10
printf("Value at third address %x\n", *(c+2))= ffffffb1
printf("Value at fourth address %x\n", *(c+3))= 15
The value is signed as 0xB1 is 10110001 in binary, you need to use an unsigned char pointer:
unsigned char *c = (unsigned char*) &x;
Your code would work for any bytes up to 0x7F.
c is a signed char, 0xB1 (which is signed) is 1011 0001, you see that
the most significant bit is 1, so it's a negative number.
When you pass *(c+2) to printf, it gets promoted to an int which is
signed. Sign extension fills the rest of the bits with the same value as the
most significant bit from your char, which is 1. At this point printf
gets 1111 1111 1111 1111 1111 1111 1011 0001.
%x in printf prints it as an unsigned int, thus it prints 0xFFFFFFB1.
You have to declare your pointer as an unsigned char.
unsigned char *c = (unsigned char*) &x;
unsigned int x = 0x15b11056; /*lets say starting address of x is 0x100 */
char *c = (char*) &x; /** c is char pointer i.e at a time it can fetch 1 byte and it points to 0x100 **/
x looks like as below
------------------------------------------------------
| 0001 0101 | 1011 0001 | 0001 0000 | 0101 0110 |
------------------------------------------------------
0x104 0x103 0x102 0x101 0x100
x
c
Next, when you are doing *(c+2)); Lets expand it
*(c+2)) = *(0x100 + 2*1) /** increment by 1 byte */
= *(0x102)
= 1011 0001 (in binary) Notice here that sign bit is 1
means sign bit is going to copy to remaining bytes
As you are printing in %x format which expects unsigned type but c is of signed byte,sign bit gets copied into remaining bytes.
for *(c+2) input will be looks like
0000 0000 | 0000 0000 | 0000 0000 | 1011 0001
|
sign bit is one so this bit will be copied into remaining bytes, resultant will look like below
1111 1111 | 1111 1111 | 1111 1111 | 1011 0001
f f f f f f b 1
I explained particular part which you had doubt, I hope it helps.

Initialization of a union in C

I came across this objective question on the C programming language. The output for the following code is supposed to be 0 2, but I don't understand why.
Please explain the initialization process. Here's the code:
#include <stdio.h>
int main()
{
union a
{
int x;
char y[2];
};
union a z = {512};
printf("\n%d %d", z.y[0], z.y[1]);
return 0;
}
I am going to assume that you use a little endian system where sizeof int is 4 bytes (32 bits) and sizeof a char is 1 byte (8 bits), and one in which integers are represented in two's complement form. A union only has the size of its largest member, and all the members point to this exact piece of memory.
Now, you are writing to this memory an integer value of 512.
512 in binary is 1000000000.
or in 32 bit two's complement form:
00000000 00000000 00000010 00000000.
Now convert this to its little endian representation and you'll get:
00000000 00000010 00000000 00000000
|______| |______|
| |
y[0] y[1]
Now see the above what happens when you access it using indices of a char array.
Thus, y[0] is 00000000 which is 0,
and y[1] is 00000010 which is 2.
The memory allocated for the union is the size of the largest type in the union, which is intin this case. Let's say the size of int on your system is 2 bytes then
512 will be 0x200.
Represenataion looks like:
0000 0010 0000 0000
| | |
-------------------
Byte 1 Byte 0
So the first byte is 0 and the second one is 2.(On Little endian systems)
char is one byte on all systems.
So the access z.y[0] and z.y[1] is per byte access.
z.y[0] = 0000 0000 = 0
z.y[1] = 0000 0010 = 2
I am just giving you how memory is allocated and the value is stored.You need to consider the below points since the output depends on them.
Points to be noted:
The output is completely system dependent.
The endianess and the sizeof(int) matters, which will vary across the systems.
PS: The memory occupied by both the members is the same in union.
The standard says that
6.2.5 Types:
A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.
The compiler allocates only enough space for the largest of the members, which overlay each other within this space. In your case, memory is allocated for int data type (assuming 4-bytes). The line
union a z = {512};
will initialize the first member of union z, i.e. x becomes 512. In binary it is represented as 0000 0000 0000 0000 0000 0010 0000 0000 on a 32 machine.
Memory representation for this would depend on the machine architecture. On a 32-bit machine it either will be like (store the least significant byte in the smallest address-- Little Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0010
0x1002 0000 0000
0x1003 0000 0000
or like (store the most significant byte in the smallest address -- Big Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0000
0x1002 0000 0010
0x1003 0000 0000
z.y[0] will access the content at addrees 0x1000 and z.y[1] will access the content at address 0x1001 and those content will depend on the above representation.
It seems that your machine supports Little Endian representation and therefore z.y[0] = 0 and z.y[1] = 2 and output would be 0 2.
But, you should note that footnote 95 of section 6.5.2.3 states that
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
The size of the union is derived by the maximum size to hold a single element of it. So, here it is the size of int.
Assuming it to be 4 bytes/int and 1 bytes/char, we can say: sizeof union a = 4 bytes.
Now, let's see how it is actually stored in memory:
For example, an instance of the union, a, is stored at 2000-2003:
2000 -> last(4th / least significant / rightmost) byte of int x, y[0]
2001 -> 3rd byte of int x, y[1]
2002 -> 2nd byte of int x
2003 -> 1st byte of int x (most significant)
Now, when you say z=512:
since z = 0x00000200,
M[2000] = 0x00
M[2001] = 0x02
M[2002] = 0x00
M[2003] = 0x00
So, whey you print, y[0] and y[1], it will print data M[2000] and M[2001] which is 0 and 2 in decimal respectively.
For automatic (non-static) members, the initialization is identical to assignment:
union a z;
z.x = 512;

using C Pointer with char array

int i=512;
char *c = (char *)&i;
c[0] =1;
printf("%d",i);
this displays "513", it adds 1 to i.
int i=512;
char *c = (char *)&i;
c[1] =1;
printf("%d",i);
whereas this displays 256. Divides it by 2.
Can someone please explain why? thanks a lot
Binary
The 32-bit number 512 expressed in binary, is just:
00000000000000000000001000000000
because 2 to the power of 9 is 512. Conventionally, you read the bits from right-to-left.
Here are some other decimal numbers in binary:
0001 = 1
0010 = 2
0011 = 3
0100 = 4
The Cast: Reinterpreting the Int as an Array of Bytes
When you do this:
int i = 512;
char *c = (char *)&i;
you are interpreting the 4-byte integer as an array of characters (8-bit bytes), as you probably know. If not, here's what's going on:
&i
takes the address of the variable i.
(char *)&i
reinterprets it (or casts it) to a pointer to char type. This means it can now be used like an array. Since you know an int is at least 32-bit on your machine, can access its bytes using c[0], c[1], c[2], c[3].
Depending on the endianness of the system, the bytes of the number might be laid out: most significant byte first (big endian), or least significant byte first (little endian). x86 processors are little endian. This basically means the number 512 is laid out as in the example above, i.e.:
00000000 00000000 00000010 00000000
c[3] c[2] c[1] c[0]
I've grouped the bits into separate 8-bit chunks (bytes) corresponding to the way they are laid out in memory. Note, you also read them right-to-left here, so we can keep with conventions for the binary number system.
Consequences
Now setting c[0] = 1 has this effect:
00000000 00000000 00000010 00000001
c[3] c[2] c[1] c[0]
which is 2^9 + 2^0 == 513 in decimal.
Setting c[1] = 1 has this effect:
00000000 00000000 00000001 00000000
c[3] c[2] c[1] c[0]
which is 2^8 == 256 in decimal, because you've overwritten the second byte 00000010 with 00000001
Do note on a big endian system, the bytes would be stored in reverse order to a little endian system. This would mean you'd get totally different results to ones you got if you ran it on one of those machines.
Remember char is 8 bit, 512 is bit representation is
512 = 10 0000 0000
when you do char *c = (char *)&i; you make:
c[1] = 10
c[0] = 0000 0000
when you do c[0] = 1
you make it 10 0000 0001 which is 513.
when you do c[1] = 1, you make it 01 0000 0000 which is 256.
Before you wonder why what you're seeing is "odd", consider the platform you're running your code on, and the endianness therein.
Then consider the following
int main(int argc, char *argv[])
{
int i=512;
printf("%d : ", i);
unsigned char *p = (unsigned char*)&i;
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
char *c = (char *)&i;
c[0] =1;
printf("%d : ", i);
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
i = 512;
c[1] =1;
printf("%d : ", i);
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
return 0;
}
On my platform (Macbook Air, OS X 10.8, Intel x64 Arch)
512 : 00020000
513 : 01020000
256 : 00010000
Couple what you see above with what you have hopefully read about endianness, and you can clearly see my platform is little endian. So whats yours?
Since you are aliasing an int through a char pointer, and a char is 8 bits wide (a byte), the assignment:
c[1] = 1;
will set the second byte of i to 000000001. Bytes 1, 3 and 4 (if sizeof(int) == 4) will stay unmodified. Previously, that second byte was 000000010 (since I assume you're on an x86-based computer, which is a little-endian architecture.) So basically, you shifted the only bit that was set one position to the right. That's a division by 2.
On a little-endian machine and a compiler with 32-bit int, you originally had these four bytes in i:
c[0] c[1] c[2] c[3]
00000000 00000010 00000000 00000000
After the assignment, i was set to:
c[0] c[1] c[2] c[3]
00000000 00000001 00000000 00000000
and therefore it went from 512 to 256.
Now you should understand why c[0] = 1 results in 513 :-) Think about which byte is set to 1 and that the assignment doesn't change the other bytes at all.
It's because your machine is little endian, meaning the least-significant byte is stored first in memory.
You said int i=512;. 512 is 0x00000200 in hex (assuming a 32-bit OS for simplicity). Let's look at how i would be stored in memory as hexadecimal bytes:
00 02 00 00 // 4 bytes, least-significant byte first
Now we interpret that same memory location as a character array by doing char *c = (char *)&i; - same memory, different interpretation:
00 02 00 00
c[0][1][2][3]
Now we change c[0] with c[0] =1; and the memory looks like
01 02 00 00
Which means if we look at it as a little endian int again (by doing printf("%d",i);), it's hex 0x00000201, which is 513 decimal.
Now if we go back and change c[1] with c[1] =1;, your memory now becomes:
00 01 00 00
Now we go back and interpret it as a little endian int, it's hex 0x00000100, which is 256 decimal.
It's depends on the machine whether that is little endian or big endian that how data is stored in bits.for more read this about endianness
C language doesn't guarantee about this .
512 in binary :
=============================================
0000 0000 | 0000 0000 | 0000 0010 | 0000 0000 ==>512
=============================================
12 34 56 78
(0x12345678 suppose address of this int)
char *c =(char *)&i now c[0] either point to 0x78 or 0x12
Modifying the value using c[0] may result to 513 if it points to 0x78
=============================================
0000 0000 | 0000 0000 | 0000 0010 | 0000 0001 ==> 513
=============================================
or, can be
=============================================
0000 0001 | 0000 0000 | 0000 0010 | 0000 0000 ==>2^24+512
=============================================
Similarly for 256 also : because your c1 will have the address of 2nd byte from right.
in figure below,
=============================================
0000 0000 | 0000 0000 | 0000 0001 | 0000 0000 ==>256
=============================================
So its implemention of representation of numbers in our system

Resources