unable to understand the output of union program in C - c

I know the basic properties of union in C but still couldn't understand the output, can somebody explain this?
#include <stdio.h>
int main()
{
union uni_t{
int i;
char ch[2];
};
union uni_t z ={512};
printf("%d%d",z.ch[0],z.ch[1]);
return 0;
}
The output when running this program is
02

union a
{
int i;
char ch[2];
}
This declares a type union a, the contents of which (i.e. the memory area of a variable of this type) could be accessed as either an integer (a.i) or a 2-element char array (a.ch).
union a z ={512};
This defines a variable z of type union a and initializes its first member (which happens to be a.i of type int) to the value of 512. (Cantfindname has the binary representation of that.)
printf( "%d%d", z.ch[0], z.ch[1] );
This takes the first character, then the second character from a.ch, and prints their numerical value. Again, Cantfindname talks about endianess and how it affects the results. Basically, you are taking apart an int byte-by-byte.
And the whole shebang is apparently assuming that sizeof( int ) == 2, which hasn't been true for desktop computers for... quite some time, so you might want to be looking at a more up-to-date tutorial. ;-)

What you get here is the result of endianess (http://en.wikipedia.org/wiki/Endianness).
512 is 0b0000 0010 0000 0000 in binary, which in little endian is stored in the memory as 0000 0000 0000 0010. Then ch[0] reads the last 8 bits (0b0000 0010 = 2 in decimal) and ch[1] reads the first 8 bits (0b0000 0000 = 0 in decimal).

Using int will not lead to this output in 32 bit machines as sizeof(int) = 4. This output will occur only if we use a 16 bit system or we use short int having memory size of 2 bytes.
A Union is a variable that may hold (at different times) objects of different types and sizes, with the compiler keeping track of size and alignment requirements.
union uni_t
{
short int i;
char ch[2];
};
This code snippet declares a union having two members- a integer and a character array.
The union can be used to hold different values at different times by simply allocating the values.
union uni_t z ={512};
This defines a variable z of type union uni_t and initializes the integer member ( i ) to the value of 512.
So the value stored in z becomes : 0b0000 0010 0000 0000
When this value is referenced using character array then ch[1] refers to first byte of data and ch[0] refers to second byte.
ch[1] = 0b00000010 = 2
ch[0] = ob00000000 = 0
So printf("%d%d",z.ch[0],z.ch[1]) results to
02

Related

Union data type field behavior

I am having trouble figuring out how this piece of code works.
Mainly, I am confused by how does x.br get the value of 516 after x.str.a and x.str.b get their values of 4 and 2, respectively.
I am new to unions, so maybe there is something I am missing, but shouldn't there only be 1 active field in an union at any given time?
#include <stdio.h>
void f(short num, short* res) {
if (num) {
*res = *res * 10 + num % 10;
f(num / 10, res);
}
}
typedef union {
short br;
struct {
char a, b;
} str;
} un;
void main() {
short res = 0; un x;
x.str.a = 4;
x.str.b = 2;
f(x.br, &res);
x.br = res;
printf("%d %d %d\n", x.br, x.str.a, x.str.b);
}
I would be very thankful if somebody cleared this up for me, thank you!
To add to #Deepstop answer, and to correct an important point about your understanding -
shouldn't there only be 1 active field in an union at any given time?
There's no such a thing as active field in unions. All the different fields are referring to the same exact piece of memory (except miss-aligned data.
You can look at the different fields as different ways to interpret the same data, i.e. you can read your union as two fields or 8 bits or one field of 16 bits. But both will always "work" at the same time.
OK short is likely a 16 bit integer. Char a, b are each 8 bit chars.
So you are using the same 16 bit memory location for both.
0000 0010 0000 0100 is the 16 bit representation of 516
0000 0010 is the 8 bit representation of 2
0000 0100 is the 8 bit representation of 4
The CPU you are running this on is 'little-endian' so the low-order byte of a 16 bit integer comes first, which is the 2, and the high order byte, the 4, comes second.
So by writing 2 then 4 into consecutive bytes, and reading them back as a 16 bit integer, you get 516, which is 2 * 256 + 4. If you wrote 3 then 5, you'd get 3 * 256 + 5, which is 783.
The point is that union puts two data structures in exactly the same memory location.
how does x.br get the value of 516 after x.str.a and x.str.b get their
values of 4 and 2
Your union definition
typedef union {
short br;
struct {
char a, b;
} str;
} un;
Specifies that un.br share the same memory address as un.str. This is the whole point of a union. This means that when you modify the value of un.br that you are also modifying the values for un.str.a and un.str.b.
I am new to unions, so maybe there is something I am missing, but
shouldn't there only be 1 active field in an union at any given time?
Not sure what you mean by "only be 1 active field", but the members of a union are all mapped to the same memory address so any time that you write a value to a union member it writes that value to the same memory address as the other members. If you want the members to be mapped to different memory addresses so that when you write the value of a member it only modifies the value of that specific member, then you should use a struct and not a union.

Variable assignment disparity inside a union

What I've heard about union is that it will assign the memory space for biggest sized variable within it. Here I'm trying to assign 'same' value in two different ways , but its ending up problematic.
First,
union h {
int a;
char b;
};
int main()
{
union h h1;
h1.b = 'X';
printf("%d %c\n",h1.a, h1.b );
return 0;
}
The output would be a large random number followed by 'X'
-1674402216 X
When I tried assigning h1.a also into a number,
union h {
int a;
char b;
};
int main()
{
union h h1;
h1.a = 1;
h1.b = 'X';
printf("%d %c\n",h1.a, h1.b );
return 0;
}
This gives the output
88 X
Can someone help me to figure out what exactly is happening here ?
Thankyou :)
Union members occupy same space in memory.
So your union looks something like this:
N-1 ...
--------
N ||X||a||
N+1 | |a||
N+2 | |a||
N+3 | |a||
... | |
--------
...
(Assuming system with 32 bit integer.)
By assiging X you have also modified one byte of your un-initialized a. Your value (-1674402216) can be interpreted as 9C32A658 in base 16. You least significant byte is 58 HEX, which is ASCII code of X adn your other three bytes kept their initial uninitialized value.
In your second case you first initialized int to 1 (which set all but least significant byte to 0), then you have overwritten least significant byte by X have gotten 88 (ASCII code of X) when interpreted as int, and original 'X', when
looking at char member.
Not to forget to mention: Layout like this is implementation defined. Standard does say, as mentioned in comments to you question, that you should not actually access member not written last while at the same time, it is a common practice to use them exactly to do this (see this threads: Why do we need C Unions?, What is the strict aliasing rule?).

Initialization of a union in C

I came across this objective question on the C programming language. The output for the following code is supposed to be 0 2, but I don't understand why.
Please explain the initialization process. Here's the code:
#include <stdio.h>
int main()
{
union a
{
int x;
char y[2];
};
union a z = {512};
printf("\n%d %d", z.y[0], z.y[1]);
return 0;
}
I am going to assume that you use a little endian system where sizeof int is 4 bytes (32 bits) and sizeof a char is 1 byte (8 bits), and one in which integers are represented in two's complement form. A union only has the size of its largest member, and all the members point to this exact piece of memory.
Now, you are writing to this memory an integer value of 512.
512 in binary is 1000000000.
or in 32 bit two's complement form:
00000000 00000000 00000010 00000000.
Now convert this to its little endian representation and you'll get:
00000000 00000010 00000000 00000000
|______| |______|
| |
y[0] y[1]
Now see the above what happens when you access it using indices of a char array.
Thus, y[0] is 00000000 which is 0,
and y[1] is 00000010 which is 2.
The memory allocated for the union is the size of the largest type in the union, which is intin this case. Let's say the size of int on your system is 2 bytes then
512 will be 0x200.
Represenataion looks like:
0000 0010 0000 0000
| | |
-------------------
Byte 1 Byte 0
So the first byte is 0 and the second one is 2.(On Little endian systems)
char is one byte on all systems.
So the access z.y[0] and z.y[1] is per byte access.
z.y[0] = 0000 0000 = 0
z.y[1] = 0000 0010 = 2
I am just giving you how memory is allocated and the value is stored.You need to consider the below points since the output depends on them.
Points to be noted:
The output is completely system dependent.
The endianess and the sizeof(int) matters, which will vary across the systems.
PS: The memory occupied by both the members is the same in union.
The standard says that
6.2.5 Types:
A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.
The compiler allocates only enough space for the largest of the members, which overlay each other within this space. In your case, memory is allocated for int data type (assuming 4-bytes). The line
union a z = {512};
will initialize the first member of union z, i.e. x becomes 512. In binary it is represented as 0000 0000 0000 0000 0000 0010 0000 0000 on a 32 machine.
Memory representation for this would depend on the machine architecture. On a 32-bit machine it either will be like (store the least significant byte in the smallest address-- Little Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0010
0x1002 0000 0000
0x1003 0000 0000
or like (store the most significant byte in the smallest address -- Big Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0000
0x1002 0000 0010
0x1003 0000 0000
z.y[0] will access the content at addrees 0x1000 and z.y[1] will access the content at address 0x1001 and those content will depend on the above representation.
It seems that your machine supports Little Endian representation and therefore z.y[0] = 0 and z.y[1] = 2 and output would be 0 2.
But, you should note that footnote 95 of section 6.5.2.3 states that
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
The size of the union is derived by the maximum size to hold a single element of it. So, here it is the size of int.
Assuming it to be 4 bytes/int and 1 bytes/char, we can say: sizeof union a = 4 bytes.
Now, let's see how it is actually stored in memory:
For example, an instance of the union, a, is stored at 2000-2003:
2000 -> last(4th / least significant / rightmost) byte of int x, y[0]
2001 -> 3rd byte of int x, y[1]
2002 -> 2nd byte of int x
2003 -> 1st byte of int x (most significant)
Now, when you say z=512:
since z = 0x00000200,
M[2000] = 0x00
M[2001] = 0x02
M[2002] = 0x00
M[2003] = 0x00
So, whey you print, y[0] and y[1], it will print data M[2000] and M[2001] which is 0 and 2 in decimal respectively.
For automatic (non-static) members, the initialization is identical to assignment:
union a z;
z.x = 512;

How this code works - regarding union initialization in c?

I got the output 0 2 for this program.....
but don't know why?
Please explain i think only int i is initialized with 512.
But how ch[1] got the value 2.
#include <stdio.h>
int main()
{
union a /* declared */
{
int i; char ch[2];
};
union a z = { 512 };
printf("%d%d", z.ch[0], z.ch[1]);
return 0;
}
Union declaration means that all its members are allocated the same memory. So your int i and char ch[2] are referencing the same memory space -- in other words, they are aliased. Whenever you change one, you will change the other as well.
Now, assuming your ints are 32-bit wide and you're on a little-endian system like x86, i = 512 (512 == 0x00000200) actually looks like this in memory:
0x00 0x02 0x00 0x00.
with the first two values corresponding directly to the 2-character array:
ch[0] ch[1]
So you get ch[0] == 0x0 and ch[1] == 0x02.
Try setting your i = 0x1234 and see what effect it will have on your character array.
Based on your question, it's possible that you may want to use a struct instead of union -- then its members would be allocated in memory sequentially (one after the other).
512 is 0x200 in hex, so the first byte of your union is 0 the second is 2. If you dont specify which union member should be initialized, the first one will be taken, the int in your case.
You get 2 for the second byte of your string as the first byte of ch is intialized with 0, the second one with 2.
Simple: 512 = binary 1000000000, so ch[0] will get the 8 zeroes (assuming your system is little endian) and ch[1] will get the 10 part, which, in decimal, is 2.
you intermix 'struct' with 'union'. in union you collect different typed and named data into one field (with lenght = maximum (size of data)), which you can access, and for which you have yourself make sure you get the right data.
your example allocs memory for max(int, char[2])
It is no difference, if you say z.i = 32 or z.ch[0]=' '
You got 0 2 for good reasons but the C standard says that the behavior is not defined. If you write i then the value of ch can be theoretically anything.
However, the gcc assures that the data will be well-aligned.

int to char casting

int i = 259; /* 03010000 in Little Endian ; 00000103 in Big Endian */
char c = (char)i; /* returns 03 in both Little and Big Endian?? */
In my computer it assigns 03 to char c and I have Little Endian, but I don't know if the char casting reads the least significant byte or reads the byte pointed by the i variable.
Endianness doesn't actually change anything here. It doesn't try to store one of the bytes (MSB, LSB etc).
If char is unsigned it will wrap around. Assuming 8-bit char 259 % 256 = 3
If char is signed the result is implementation defined. Thank you pmg: 6.3.1.3/3 in the C99 Standard
Since you're casting from a larger integer type to a smaller one, it takes the least significant part regardless of endianness. If you were casting pointers instead, though, it would take the byte at the address, which would depend on endianness.
So c = (char)i assigns the least-significant byte to c, but c = *((char *)(&i)) would assign the first byte at the address of i to c, which would be the same thing on little-endian systems only.
If you want to test for little/big endian, you can use a union:
int isBigEndian (void)
{
union foo {
size_t i;
char cp[sizeof(size_t)];
} u;
u.i = 1;
return *u.cp != 1;
}
It works because in little endian, it would look like 01 00 ... 00, but in big endian, it would be 00 ... 00 01 (the ... is made up of zeros). So if the first byte is 0, the test returns true. Otherwise it returns false. Beware, however, that there also exist mixed endian machines that store data differently (some can switch endianness; others just store the data differently). The PDP-11 stored a 32-bit int as two 16-bit words, except the order of the words was reversed (e.g. 0x01234567 was 4567 0123).
When casting from int(4 bytes) to char(1 byte), it will preserve the last 1 byte.
Eg:
int x = 0x3F1; // 0x3F1 = 0000 0011 1111 0001
char y = (char)x; // 1111 0001 --> -15 in decimal (with Two's complement)
char z = (unsigned char)x; // 1111 0001 --> 241 in decimal

Resources