Initialization of a union in C - c

I came across this objective question on the C programming language. The output for the following code is supposed to be 0 2, but I don't understand why.
Please explain the initialization process. Here's the code:
#include <stdio.h>
int main()
{
union a
{
int x;
char y[2];
};
union a z = {512};
printf("\n%d %d", z.y[0], z.y[1]);
return 0;
}

I am going to assume that you use a little endian system where sizeof int is 4 bytes (32 bits) and sizeof a char is 1 byte (8 bits), and one in which integers are represented in two's complement form. A union only has the size of its largest member, and all the members point to this exact piece of memory.
Now, you are writing to this memory an integer value of 512.
512 in binary is 1000000000.
or in 32 bit two's complement form:
00000000 00000000 00000010 00000000.
Now convert this to its little endian representation and you'll get:
00000000 00000010 00000000 00000000
|______| |______|
| |
y[0] y[1]
Now see the above what happens when you access it using indices of a char array.
Thus, y[0] is 00000000 which is 0,
and y[1] is 00000010 which is 2.

The memory allocated for the union is the size of the largest type in the union, which is intin this case. Let's say the size of int on your system is 2 bytes then
512 will be 0x200.
Represenataion looks like:
0000 0010 0000 0000
| | |
-------------------
Byte 1 Byte 0
So the first byte is 0 and the second one is 2.(On Little endian systems)
char is one byte on all systems.
So the access z.y[0] and z.y[1] is per byte access.
z.y[0] = 0000 0000 = 0
z.y[1] = 0000 0010 = 2
I am just giving you how memory is allocated and the value is stored.You need to consider the below points since the output depends on them.
Points to be noted:
The output is completely system dependent.
The endianess and the sizeof(int) matters, which will vary across the systems.
PS: The memory occupied by both the members is the same in union.

The standard says that
6.2.5 Types:
A union type describes an overlapping nonempty set of member objects, each of which has an optionally specified name and possibly distinct type.
The compiler allocates only enough space for the largest of the members, which overlay each other within this space. In your case, memory is allocated for int data type (assuming 4-bytes). The line
union a z = {512};
will initialize the first member of union z, i.e. x becomes 512. In binary it is represented as 0000 0000 0000 0000 0000 0010 0000 0000 on a 32 machine.
Memory representation for this would depend on the machine architecture. On a 32-bit machine it either will be like (store the least significant byte in the smallest address-- Little Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0010
0x1002 0000 0000
0x1003 0000 0000
or like (store the most significant byte in the smallest address -- Big Endian)
Address Value
0x1000 0000 0000
0x1001 0000 0000
0x1002 0000 0010
0x1003 0000 0000
z.y[0] will access the content at addrees 0x1000 and z.y[1] will access the content at address 0x1001 and those content will depend on the above representation.
It seems that your machine supports Little Endian representation and therefore z.y[0] = 0 and z.y[1] = 2 and output would be 0 2.
But, you should note that footnote 95 of section 6.5.2.3 states that
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

The size of the union is derived by the maximum size to hold a single element of it. So, here it is the size of int.
Assuming it to be 4 bytes/int and 1 bytes/char, we can say: sizeof union a = 4 bytes.
Now, let's see how it is actually stored in memory:
For example, an instance of the union, a, is stored at 2000-2003:
2000 -> last(4th / least significant / rightmost) byte of int x, y[0]
2001 -> 3rd byte of int x, y[1]
2002 -> 2nd byte of int x
2003 -> 1st byte of int x (most significant)
Now, when you say z=512:
since z = 0x00000200,
M[2000] = 0x00
M[2001] = 0x02
M[2002] = 0x00
M[2003] = 0x00
So, whey you print, y[0] and y[1], it will print data M[2000] and M[2001] which is 0 and 2 in decimal respectively.

For automatic (non-static) members, the initialization is identical to assignment:
union a z;
z.x = 512;

Related

Why casting short* to int* shows incorrect value

To better learn how malloc and pointers work internally, I created an array of short. On my system, int is double the size of short, so I created another pointer q of type int* and set its address to the casted value of p:
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main() {
short* p = (short*) malloc(2 * sizeof(short));
int* q = (int*) p;
assert(sizeof *q == 2 * sizeof *p);
p[0] = 0;
p[1] = 1;
printf("%u\n", *q);
}
When I print *q it shows the number 65536 instead of 1 and I can't figure out why. If I understand correctly, p should be represented as the following (assuming short is 2 bytes and int is 4 bytes):
p[0] p[1]
0000 0000 0000 0000 | 0000 0000 0000 0001
So *q should read 4 bytes hence reading the value 1. Instead it shows 65536 which is represented as:
0000 0000 0000 0001 0000 0000 0000 0000
Most systems you're likely to interact with these days use little-endian byte ordering, which mean that the least significant byte comes first.
So the bytes starting at p[1] contain 0x01 0x00, not 0x00 0x01. This also means the bytes starting at p[0] are 0x00 0x00 0x10 0x00. If these bytes are then interpreted as a 4 byte int it has the value 0x00010000, i.e. 65536 decimal.
Also, reinterpreting bytes in this fashion (i.e. taking a pointer to one type, casting it to another pointer type, and dereferencing), is an aliasing violation and triggers undefined behavior, so there is no guarantee this will always work in this way.
This is due to endianness (https://en.wikipedia.org/wiki/Endianness).
This determines which byte comes first in memory. Therefore, if you flip the bytes in your representation, you get exactly what you provided as the representation for 65536.
You seem to be on a little endian machine.

Converting array of characters to an array of uint32_t in c-- is this the proper way?

I am trying to convert an array of characters into an array of uint32_t in order to use that in a CRC calculation. I was curious if this is the correct way to do this or if it is dangerous? I have a habit of doing dangerous conversions and I am trying to learn better ways to convert things that are less dangerous :). I know that each char in the array is 8 bits. Should I sum 4 of the characters up and toss it into an index of the unsigned int array or is it ok just to place each character in its separate array? Would summing four 8 bit characters up change their values into the array? I have read something about shifting characters, however, I am not sure exactly how to shift the four characters into one index of the unsigned int array.
text[i] is my array of characters.
uint32_t inputText[512];
for( i = 0; i < 504; i++)
{
inputText[i] = (uint32_t)text[i];
}
The cast seems fine; although, I'm not sure why you say i < 504 when your array of uint32_ts is 512. (If you do want to only convert 504 values and you want a 512-length array, you might want to use array[512] = {0} to ensure the memory is zeroed out instead of the last 8 values being set to whatever was previously in the memory.) Nonetheless, it is perfectly safe to say: SomeArrayOfLargerType[i] = (largerType_t)SomeArrayOfSmallerType[i], but bear in mind that how it is now, your binary will end up looking something like:
0100 0001 -> 0000 0000 0000 0000 0000 0000 0100 0001
So, those 24 leading 0s might be an undesired result.
As for summing up four characters, that will almost definitely not work out how you want; unless you literally want the sum like 0000 0001 (one) + 0000 0010 (two) = 0000 0100 (three). If you would instead want the previous example to produce 00000001 000000010, then yes, you would need to apply shifts.
UPDATE - Some information about shifting via example:
The following would be an example of shifting:
uint32_t valueArray[FINAL_LENGTH] = {0};
int i;
for(i=0; i < TEXT_LENGTH; i++){ // text_length is the initial message/text length (512 bytes or something)
int mode = i % 4; // 4-to-1 value storage ratio (4 uint8s being stored as 1 uint32)
int writeLocation = (int)(i/4); // values will be truncated, so something like 3/4 = 0 (which is desired)
switch(mode){
case(0):
// add to bottom 8-bits of index
valueArray[writeLocation] = text[i];
break;
case(1):
valueArray[writeLocation] |= (text[i] << 8); // shift to left by 8 bits to insert to second byte
break;
case(2):
valueArray[writeLocation] |= (text[i] << 16); // shift to left by 16 bits to insert to third byte
break;
case(3):
valueArray[writeLocation] |= (text[i] << 24); // shift to left by 24 bits to insert to fourth byte
break;
default:
printf("Some error occurred here... If source has been modified, please check to make sure the number of case handlers == the possible values for mode.\n");
}
}
You can see an example of that running here: https://ideone.com/OcDMoM (Note, there is some runtime error when executing that on IDEOne. I haven't looked intensely for that issue, though, as the output still seems to be accurate and the code is just meant to serve as an example.)
Essentially, because each byte is 8-bits, and you want to store the bytes in 4-byte chunks (32-bits each), you need four different cases for how far you shift. In the first case, the first 8-bits are filled in by a byte from the message. In the second case, the second 8-bits are filled in by the following byte in the message (which is left shifted by 8-bits because that is the offset for the binary position). And that continues for the remaining 2 bytes, and then it repeats starting at the next index of the initial message array.
When combining the bytes, |= is used because that will take what is already in uint32 and it will perform a bitwise OR on it, so the final values will combine into one single value.
So, to break down a simple example like what I had in my initial post, let's say I have 0000 0001 (one) and 0000 0010 (two), with an initial 16-bit integer to hold them 0000 0000 0000 0000. The first byte is assigned to the 16-bit integer making it 0000 0000 0000 0001. Then the second byte is left shifted by 8 making it 0000 0010 0000 0000. Finally, the two are via a bitwise OR, so the 16-bit integer becomes: 0000 0010 0000 0001.
In the case of a 32-bit integer to hold 4 bytes, that process will repeat 2 more times with 8 additional shifts, and then it will proceed to the next uint32 to repeat the process.
Hopefully that all makes sense. If not, I can try to clarify further.

unable to understand the output of union program in C

I know the basic properties of union in C but still couldn't understand the output, can somebody explain this?
#include <stdio.h>
int main()
{
union uni_t{
int i;
char ch[2];
};
union uni_t z ={512};
printf("%d%d",z.ch[0],z.ch[1]);
return 0;
}
The output when running this program is
02
union a
{
int i;
char ch[2];
}
This declares a type union a, the contents of which (i.e. the memory area of a variable of this type) could be accessed as either an integer (a.i) or a 2-element char array (a.ch).
union a z ={512};
This defines a variable z of type union a and initializes its first member (which happens to be a.i of type int) to the value of 512. (Cantfindname has the binary representation of that.)
printf( "%d%d", z.ch[0], z.ch[1] );
This takes the first character, then the second character from a.ch, and prints their numerical value. Again, Cantfindname talks about endianess and how it affects the results. Basically, you are taking apart an int byte-by-byte.
And the whole shebang is apparently assuming that sizeof( int ) == 2, which hasn't been true for desktop computers for... quite some time, so you might want to be looking at a more up-to-date tutorial. ;-)
What you get here is the result of endianess (http://en.wikipedia.org/wiki/Endianness).
512 is 0b0000 0010 0000 0000 in binary, which in little endian is stored in the memory as 0000 0000 0000 0010. Then ch[0] reads the last 8 bits (0b0000 0010 = 2 in decimal) and ch[1] reads the first 8 bits (0b0000 0000 = 0 in decimal).
Using int will not lead to this output in 32 bit machines as sizeof(int) = 4. This output will occur only if we use a 16 bit system or we use short int having memory size of 2 bytes.
A Union is a variable that may hold (at different times) objects of different types and sizes, with the compiler keeping track of size and alignment requirements.
union uni_t
{
short int i;
char ch[2];
};
This code snippet declares a union having two members- a integer and a character array.
The union can be used to hold different values at different times by simply allocating the values.
union uni_t z ={512};
This defines a variable z of type union uni_t and initializes the integer member ( i ) to the value of 512.
So the value stored in z becomes : 0b0000 0010 0000 0000
When this value is referenced using character array then ch[1] refers to first byte of data and ch[0] refers to second byte.
ch[1] = 0b00000010 = 2
ch[0] = ob00000000 = 0
So printf("%d%d",z.ch[0],z.ch[1]) results to
02

using C Pointer with char array

int i=512;
char *c = (char *)&i;
c[0] =1;
printf("%d",i);
this displays "513", it adds 1 to i.
int i=512;
char *c = (char *)&i;
c[1] =1;
printf("%d",i);
whereas this displays 256. Divides it by 2.
Can someone please explain why? thanks a lot
Binary
The 32-bit number 512 expressed in binary, is just:
00000000000000000000001000000000
because 2 to the power of 9 is 512. Conventionally, you read the bits from right-to-left.
Here are some other decimal numbers in binary:
0001 = 1
0010 = 2
0011 = 3
0100 = 4
The Cast: Reinterpreting the Int as an Array of Bytes
When you do this:
int i = 512;
char *c = (char *)&i;
you are interpreting the 4-byte integer as an array of characters (8-bit bytes), as you probably know. If not, here's what's going on:
&i
takes the address of the variable i.
(char *)&i
reinterprets it (or casts it) to a pointer to char type. This means it can now be used like an array. Since you know an int is at least 32-bit on your machine, can access its bytes using c[0], c[1], c[2], c[3].
Depending on the endianness of the system, the bytes of the number might be laid out: most significant byte first (big endian), or least significant byte first (little endian). x86 processors are little endian. This basically means the number 512 is laid out as in the example above, i.e.:
00000000 00000000 00000010 00000000
c[3] c[2] c[1] c[0]
I've grouped the bits into separate 8-bit chunks (bytes) corresponding to the way they are laid out in memory. Note, you also read them right-to-left here, so we can keep with conventions for the binary number system.
Consequences
Now setting c[0] = 1 has this effect:
00000000 00000000 00000010 00000001
c[3] c[2] c[1] c[0]
which is 2^9 + 2^0 == 513 in decimal.
Setting c[1] = 1 has this effect:
00000000 00000000 00000001 00000000
c[3] c[2] c[1] c[0]
which is 2^8 == 256 in decimal, because you've overwritten the second byte 00000010 with 00000001
Do note on a big endian system, the bytes would be stored in reverse order to a little endian system. This would mean you'd get totally different results to ones you got if you ran it on one of those machines.
Remember char is 8 bit, 512 is bit representation is
512 = 10 0000 0000
when you do char *c = (char *)&i; you make:
c[1] = 10
c[0] = 0000 0000
when you do c[0] = 1
you make it 10 0000 0001 which is 513.
when you do c[1] = 1, you make it 01 0000 0000 which is 256.
Before you wonder why what you're seeing is "odd", consider the platform you're running your code on, and the endianness therein.
Then consider the following
int main(int argc, char *argv[])
{
int i=512;
printf("%d : ", i);
unsigned char *p = (unsigned char*)&i;
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
char *c = (char *)&i;
c[0] =1;
printf("%d : ", i);
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
i = 512;
c[1] =1;
printf("%d : ", i);
for (size_t j=0;j<sizeof(i);j++)
printf("%02X", p[j]);
printf("\n");
return 0;
}
On my platform (Macbook Air, OS X 10.8, Intel x64 Arch)
512 : 00020000
513 : 01020000
256 : 00010000
Couple what you see above with what you have hopefully read about endianness, and you can clearly see my platform is little endian. So whats yours?
Since you are aliasing an int through a char pointer, and a char is 8 bits wide (a byte), the assignment:
c[1] = 1;
will set the second byte of i to 000000001. Bytes 1, 3 and 4 (if sizeof(int) == 4) will stay unmodified. Previously, that second byte was 000000010 (since I assume you're on an x86-based computer, which is a little-endian architecture.) So basically, you shifted the only bit that was set one position to the right. That's a division by 2.
On a little-endian machine and a compiler with 32-bit int, you originally had these four bytes in i:
c[0] c[1] c[2] c[3]
00000000 00000010 00000000 00000000
After the assignment, i was set to:
c[0] c[1] c[2] c[3]
00000000 00000001 00000000 00000000
and therefore it went from 512 to 256.
Now you should understand why c[0] = 1 results in 513 :-) Think about which byte is set to 1 and that the assignment doesn't change the other bytes at all.
It's because your machine is little endian, meaning the least-significant byte is stored first in memory.
You said int i=512;. 512 is 0x00000200 in hex (assuming a 32-bit OS for simplicity). Let's look at how i would be stored in memory as hexadecimal bytes:
00 02 00 00 // 4 bytes, least-significant byte first
Now we interpret that same memory location as a character array by doing char *c = (char *)&i; - same memory, different interpretation:
00 02 00 00
c[0][1][2][3]
Now we change c[0] with c[0] =1; and the memory looks like
01 02 00 00
Which means if we look at it as a little endian int again (by doing printf("%d",i);), it's hex 0x00000201, which is 513 decimal.
Now if we go back and change c[1] with c[1] =1;, your memory now becomes:
00 01 00 00
Now we go back and interpret it as a little endian int, it's hex 0x00000100, which is 256 decimal.
It's depends on the machine whether that is little endian or big endian that how data is stored in bits.for more read this about endianness
C language doesn't guarantee about this .
512 in binary :
=============================================
0000 0000 | 0000 0000 | 0000 0010 | 0000 0000 ==>512
=============================================
12 34 56 78
(0x12345678 suppose address of this int)
char *c =(char *)&i now c[0] either point to 0x78 or 0x12
Modifying the value using c[0] may result to 513 if it points to 0x78
=============================================
0000 0000 | 0000 0000 | 0000 0010 | 0000 0001 ==> 513
=============================================
or, can be
=============================================
0000 0001 | 0000 0000 | 0000 0010 | 0000 0000 ==>2^24+512
=============================================
Similarly for 256 also : because your c1 will have the address of 2nd byte from right.
in figure below,
=============================================
0000 0000 | 0000 0000 | 0000 0001 | 0000 0000 ==>256
=============================================
So its implemention of representation of numbers in our system

Output Explanation of this program in C?

I have this program in C:
int main(int argc, char *argv[])
{
int i=300;
char *ptr = &i;
*++ptr=2;
printf("%d",i);
return 0;
}
The output is 556 on little endian.
I tried to understand the output. Here is my explanation.
Question is Will the answer remains the same in the big endian machine?
i = 300;
=> i = 100101100 //in binary in word format => B B Hb 0001 00101100 where B = Byte and Hb = Half Byte
(A)=> in memory (assuming it is Little endian))
0x12345678 - 1100 - 0010 ( Is this correct for little endian)
0x12345679 - 0001 - 0000
0x1234567a - 0000 - 0000
0x1234567b - 0000 - 0000
0x1234567c - Location of next intezer(location of ptr++ or ptr + 1 where ptr is an intezer pointer as ptr is of type int => on doing ++ptr it will increment by 4 byte(size of int))
when
(B)we do char *ptr = &i;
ptr will become of type char => on doing ++ptr it will increment by 1 byte(size of char)
so on doing ++ptr it will jump to location -> 0x12345679 (which has 0001 - 0000)
now we are doing
++ptr = 2
=> 0x12345679 will be overwritten by 2 => 0x12345679 will have 00*10** - 0000 instead of 000*1* - 0000
so the new memory content will look like this :
(C)
0x12345678 - 1100 - 0010
0x12345679 - 0010 - 0000
0x1234567a - 0000 - 0000
0x1234567b - 0000 - 0000
which is equivalent to => B B Hb 0010 00101100 where B = Byte and Hb = Half Byte
Is my reasoning correct?Is there any other short method for this?
Rgds,
Softy
In a little-endian 32-bit system, the int 300 (0x012c) is typically(*) stored as 4 sequential bytes, lowest first: 2C 01 00 00. When you increment the char pointer that was formerly the int pointer &i, you're pointing at the second byte of that sequence, and setting it to 2 makes the sequence 2C 02 00 00 -- which, when turned back into an int, is 0x22c or 556.
(As for your understanding of the bit sequence...it seems a bit off. Endianness affects byte order in memory, as the byte is the smallest addressable unit. The bits within the byte don't get reversed; the low-order byte will be 2C (00101100) whether the system is little-endian or big-endian. (Even if the system did reverse the bits of a byte, it'd reverse them again to present them to you as a number, so you wouldn't notice a difference.) The big difference is where that byte appears in the sequence. The only places where bit order matters, is in hardware and drivers and such where you can receive less than a byte at a time.)
In a big-endian system, the int is typically(*) represented by the byte sequence 00 00 01 2C (differing from the little-endian representation solely in the byte order -- highest byte comes first). You're still modifying the second byte of the sequence, though...making 00 02 01 2C, which as an int is 0x02012c or 131372.
(*) Lots of things come into play here, including two's complement (which almost all systems use these days...but C doesn't require it), the value of sizeof(int), alignment/padding, and whether the system is truly big- or little-endian or a half-assed implementation of it. This is a big part of why mucking around with the bytes of a bigger type so often leads to undefined or implementation-specific behavior.
This is implementation defined. The internal representation of an int is not known according to the standard, so what you're doing is not portable. See section 6.2.6.2 in the C standard.
However, as most implementations use two's complement representation of signed ints, the endianness will affect the result as described in cHaos answer.
This is your int:
int i = 300;
And this is what the memory contains at &i: 2c 01 00 00
With the next instruction you assign the address of i to ptr, and then you move to the next byte with ++ptr and change its value to 2:
char *ptr = &i;
*++ptr = 2;
So now the memory contains: 2c 02 00 00 (i.e. 556).
The difference is that in a big-endian system in the address of i you would have seen 00 00 01 2C, and after the change: 00 02 01 2C.
Even if the internal rappresentation of an int is implementation-defined:
For signed integer types, the bits of the object representation shall
be divided into three groups: value bits, padding bits, and the sign
bit. There need not be any padding bits; signed char shall not have
any padding bits. There shall be exactly one sign bit. Each bit that
is a value bit shall have the same value as the same bit in the object
representation of the corresponding unsigned type (if there are M
value bits in the signed type and N in the unsigned type, then M ≤ N).
If the sign bit is zero, it shall not affect the resulting value. If
the sign bit is one, the value shall be modified in one of the
following ways: — the corresponding value with sign bit 0 is negated
(sign and magnitude); — the sign bit has the value −(2M) (two’s
complement); — the sign bit has the value −(2M − 1) (ones’
complement). Which of these applies is implementation-defined, as
is whether the value with sign bit 1 and all value bits zero (for the
first two), or with sign bit and all value bits 1 (for ones’
complement), is a trap representation or a normal value. In the case
of sign and magnitude and ones’ complement, if this representation is
a normal value it is called a negative zero.
I like experiments and that's the reason for having the PowerPC G5.
stacktest.c:
int main(int argc, char *argv[])
{
int i=300;
char *ptr = &i;
*++ptr=2;
/* Added the Hex dump */
printf("%d or %x\n",i, i);
return 0;
}
Build command:
powerpc-apple-darwin9-gcc-4.2.1 -o stacktest stacktest.c
Output:
131372 or 2012c
Resume: the cHao's answer is complete and in case you're in doubt here is the experimental evidence.

Resources