Variable assignment disparity inside a union - c

What I've heard about union is that it will assign the memory space for biggest sized variable within it. Here I'm trying to assign 'same' value in two different ways , but its ending up problematic.
First,
union h {
int a;
char b;
};
int main()
{
union h h1;
h1.b = 'X';
printf("%d %c\n",h1.a, h1.b );
return 0;
}
The output would be a large random number followed by 'X'
-1674402216 X
When I tried assigning h1.a also into a number,
union h {
int a;
char b;
};
int main()
{
union h h1;
h1.a = 1;
h1.b = 'X';
printf("%d %c\n",h1.a, h1.b );
return 0;
}
This gives the output
88 X
Can someone help me to figure out what exactly is happening here ?
Thankyou :)

Union members occupy same space in memory.
So your union looks something like this:
N-1 ...
--------
N ||X||a||
N+1 | |a||
N+2 | |a||
N+3 | |a||
... | |
--------
...
(Assuming system with 32 bit integer.)
By assiging X you have also modified one byte of your un-initialized a. Your value (-1674402216) can be interpreted as 9C32A658 in base 16. You least significant byte is 58 HEX, which is ASCII code of X adn your other three bytes kept their initial uninitialized value.
In your second case you first initialized int to 1 (which set all but least significant byte to 0), then you have overwritten least significant byte by X have gotten 88 (ASCII code of X) when interpreted as int, and original 'X', when
looking at char member.
Not to forget to mention: Layout like this is implementation defined. Standard does say, as mentioned in comments to you question, that you should not actually access member not written last while at the same time, it is a common practice to use them exactly to do this (see this threads: Why do we need C Unions?, What is the strict aliasing rule?).

Related

Strange output when using union type members

My code:
#include<stdio.h>
union U{
int x;
char y;
};
int main()
{
union U u1;
u1.x = 258;
u1.y = '0';
printf("%d%d",u1.x,u1.y);
return 0;
}
Strangely, the output is 30448.
Can someone please explain how this happens?
You maybe missunderstanding the purpose of a union. It is meant to store only one variable at a time, but this variable can have multiple types. The last variable stored will overwrite the previous.
In your case u1.y (which is '0', it's relevant to remind that the 1 byte ASCII decimal representation for '0' is 48), is the last value stored, this corresponds to last 2 digits of your output as you print '0' by its ASCII decimal representation.
As for the first part of the output, note that you overwrite the int variable 258, which is presumably 4 bytes (but for the sake of explanation I will assume it's 2 bytes) with the 1 byte wide char variable 48.
The binary value for 258 (assuming 2 bytes wide int) is:
|0|0|0|0|0|0|0|1|0|0|0|0|0|0|1|0|
| 2nd byte | 1st byte |
The binary value for 48 (1 byte wide char variable) is:
| | | | | | | | |0|0|1|1|0|0|0|0|
| 1st byte |
When you overwrite the two byte union variable with a one byte variable only the 8 least significant bits(least significant byte) will be overwritten, so you'll end up with:
|0|0|0|0|0|0|0|1|x|x|x|x|x|x|x|x|
| | | | | | | | |0|0|1|1|0|0|0|0|
|0|0|0|0|0|0|0|1|0|0|1|1|0|0|0|0|
And this is the binary representation of 304.
So yor code first prints the 2 bytes wide (for the sake of the examle) int 304 and next the 1 byte wide int 48 (the ASCII int representation of '0'), hence the output 30448.
Note that this behavior is not undefined.
ISO/IEC 9899:2017 N2176
§ 6.5.2.3
97) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called “type punning”). This might be a trap representation.
§ 6.2.6.2
6 - When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.
7 - When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
For confirmation you can use:
printf("%p %p\n", (void*)&u1.x, (void*)&u1.y);
This will print the memory address of both u1.x and u1.y and you will not be shocked to find that they are the same.
You cannot read a member of a union other than the one last written into
#include<stdio.h>
union U{
int x;
char y;
};
int main()
{
union U u1;
u1.x = 258; // write into member x: OK
u1.y = '0'; // write into member y: OK
printf("%d%d",u1.x,u1.y); // read both member x and y: WRONG
// can only read member y
return 0;
}

C - access memory directly using address?

I've been lightly studying C for a few weeks now with some book.
int main(void)
{
float num = 3.15;
int *ptr = (int *)&num; //so I can use line 8 and 10
for (int i = 0; i < 32; i++)
{
if (!(i % 8) && (i / 8))
printf(" ");
printf("%d", *ptr >> (31 - i) & 1);
}
return 0;
}
output : 01000000 01001001 10011001 10011010
As you see 3.15 in single precision float is 01000000 01001001 10011001 10011010.
So let's say ptr points to address 0x1efb40.
Here are the questions:
As I understood in the book, first 8 bits of num data is stored in 0x1efb40, 2nd 8 bits in 0x1efb41, next 8 bits in 0x1efb42 and last 8 bits in 0x1efb43. Am I right?
If I'm right, is there any way I can directly access the 2nd 8 bits with hex address value 0x1efb41? Thereby can I change the data to something like 11111111?
The ordering of bytes within a datatype is known as endianness and is system specific. What you describe with the least significant byte (LSB) first is called little endian and is what you would find on x86 based processors.
As for accessing particular bytes of a representation, you can use a pointer to an unsigned char to point to the variable in question to view the specific bytes. For example:
float num = 3.15;
unsigned char *p = (unsigned char *)&num;
int i;
for (i=0; i<sizeof(num); i++) {
printf("byte %d = %02x\n", i, p[i]);
}
Note that this is only allowed to access bytes via a character pointer, not an int *, as the latter violates strict aliasing.
The code you wrote is not actually valid C. C has a rule called "strict aliasing," which states that if a region of memory contains a value of one type (i.e. float), it cannot be accessed as though it was another type (i.e. int). This rule has its origins in some performance optimizations that let the compiler generate faster code. I can't say it's an obvious rule, but it's the rule.
You can work around this by using union. If you make a union like union { float num, int numAsInt }, you can store a float and then read it as an integer. The result is unspecified. Alternatively, you are always permitted to access the bytes of a value as chars (just not anything larger). char is given special treatment (presumably to make it so you can copy a buffer of data as bytes, then cast it to your data's type and access it, which is something that happens a lot in low level code like network stacks).
Welcome to a fun corner of learning C. There's unspecified behavior and undefined behavior. Informally, unspecified behavior says "we won't say what happens, but it will be reasonable." The C spec will not say what order the bytes are in. But it will say that you will get some bytes. Undefined behavior is nastier. Undefined behavior says anything can happen, ranging from compiler errors to exceptions at runtime, to absolutely nothing at all (making you think your code is valid when it is not).
As for the values, dbush points out in his answer that the order of the bytes is defined by the platform you are on. You are seeing a "little endian" representation of a IEE754 floating point number. On other platforms, it may be different.
Union punning is much safer:
#include <stdio.h>
typedef union
{
unsigned char uc[sizeof(double)];
float f;
double d;
}u_t;
void print(u_t u, size_t size, int endianess)
{
size_t start = 0;
int increment = 1;
if(endianess)
{
start = size - 1;
increment = -1;
}
for(size_t index = 0; index < size; index++)
{
printf("%hhx ", u.uc[start]);
start += increment;
}
printf("\n");
}
int main(void)
{
u_t u;
u.f = 3.15f;
print(u, sizeof(float),0);
print(u, sizeof(float),1);
u.d = 3.15;
print(u, sizeof(double),0);
print(u, sizeof(double),1);
return 0;
}
you can test it yourself: https://ideone.com/7ABZaj

unable to understand the output of union program in C

I know the basic properties of union in C but still couldn't understand the output, can somebody explain this?
#include <stdio.h>
int main()
{
union uni_t{
int i;
char ch[2];
};
union uni_t z ={512};
printf("%d%d",z.ch[0],z.ch[1]);
return 0;
}
The output when running this program is
02
union a
{
int i;
char ch[2];
}
This declares a type union a, the contents of which (i.e. the memory area of a variable of this type) could be accessed as either an integer (a.i) or a 2-element char array (a.ch).
union a z ={512};
This defines a variable z of type union a and initializes its first member (which happens to be a.i of type int) to the value of 512. (Cantfindname has the binary representation of that.)
printf( "%d%d", z.ch[0], z.ch[1] );
This takes the first character, then the second character from a.ch, and prints their numerical value. Again, Cantfindname talks about endianess and how it affects the results. Basically, you are taking apart an int byte-by-byte.
And the whole shebang is apparently assuming that sizeof( int ) == 2, which hasn't been true for desktop computers for... quite some time, so you might want to be looking at a more up-to-date tutorial. ;-)
What you get here is the result of endianess (http://en.wikipedia.org/wiki/Endianness).
512 is 0b0000 0010 0000 0000 in binary, which in little endian is stored in the memory as 0000 0000 0000 0010. Then ch[0] reads the last 8 bits (0b0000 0010 = 2 in decimal) and ch[1] reads the first 8 bits (0b0000 0000 = 0 in decimal).
Using int will not lead to this output in 32 bit machines as sizeof(int) = 4. This output will occur only if we use a 16 bit system or we use short int having memory size of 2 bytes.
A Union is a variable that may hold (at different times) objects of different types and sizes, with the compiler keeping track of size and alignment requirements.
union uni_t
{
short int i;
char ch[2];
};
This code snippet declares a union having two members- a integer and a character array.
The union can be used to hold different values at different times by simply allocating the values.
union uni_t z ={512};
This defines a variable z of type union uni_t and initializes the integer member ( i ) to the value of 512.
So the value stored in z becomes : 0b0000 0010 0000 0000
When this value is referenced using character array then ch[1] refers to first byte of data and ch[0] refers to second byte.
ch[1] = 0b00000010 = 2
ch[0] = ob00000000 = 0
So printf("%d%d",z.ch[0],z.ch[1]) results to
02

Treating a character array as an integer - Learn C the Hard Way Extra credit

In Zed Shaw's "Learn C the Hard Way", exercise 9 (http://c.learncodethehardway.org/book/ex9.html) there is an extra credit question that I find interesting. He defines a 4-character array and asks the reader to figure out how to use the array as a 4-byte integer.
At this point I know just enough to be dangerous, and I was thinking the answer is something along these lines:
#include <stdio.h>
int main(int argc, char *argv[])
{
char name[4] = {'A'};
int *name_int;
name_int = &name;
printf("%d", *name_int);
return 0;
}
My thoughts being that if I created an int pointer with a value being the address of the array that the int type would use the byte of data in that address, followed by the next 3 bytes of data available. In my limited understanding, I am under the impression that both an int and an array would use memory in the same way: starting at an arbitrary memory address than using the next address in sequence, and so on.
However, the output of this isn't what I expected: I get the ascii value of 'A'. Which to me seems to indicate that my solution is incorrect, my understanding how memory is handled is incorrect, or both.
How can this little hack be accomplished and where is it I am going wrong? I am hoping to walk away from this with a better understanding of how pointers and references work, and how memory is stored and used.
Thank you!
You are running into little-endian vs big-endian representation of numbers.
Let's take a look at the values of 4-btyes used to represent a 4-byte integer.
+----+----+----+----+
| N1 | N2 | N3 | N4 |
+----+----+----+----+
In a big-endian representation, these 4 bytes represent:
N1*2^24 + N2*2^16 + N3*2^8 + N4
In a little-endian representation, those 4 bytes represent:
N1 + N2*2^8 + N3*2^16 + N4*2^24
In your case.
N1 = 'A' (65 decimal)
N2 = 0
N3 = 0
N4 = 0
Since the value of integer you are getting is 65, you have a little endian representation. If you want to treat those numbers like a big-endian representation, you can use the following:
#include <stdio.h>
int main(int argc, char *argv[])
{
int i;
char nameString[4] = {'A'};
int name = 0;
for ( i = 0; i < 4; ++i )
{
name = (name << 8) + nameString[i];
}
printf("%d\n", name);
printf("%X\n", name);
return 0;
}
The output I get with the above code:
1090519040
41000000
You may also try the function memcpy().
Use a char array as a source and an unassigned int variable as the destination.

How this code works - regarding union initialization in c?

I got the output 0 2 for this program.....
but don't know why?
Please explain i think only int i is initialized with 512.
But how ch[1] got the value 2.
#include <stdio.h>
int main()
{
union a /* declared */
{
int i; char ch[2];
};
union a z = { 512 };
printf("%d%d", z.ch[0], z.ch[1]);
return 0;
}
Union declaration means that all its members are allocated the same memory. So your int i and char ch[2] are referencing the same memory space -- in other words, they are aliased. Whenever you change one, you will change the other as well.
Now, assuming your ints are 32-bit wide and you're on a little-endian system like x86, i = 512 (512 == 0x00000200) actually looks like this in memory:
0x00 0x02 0x00 0x00.
with the first two values corresponding directly to the 2-character array:
ch[0] ch[1]
So you get ch[0] == 0x0 and ch[1] == 0x02.
Try setting your i = 0x1234 and see what effect it will have on your character array.
Based on your question, it's possible that you may want to use a struct instead of union -- then its members would be allocated in memory sequentially (one after the other).
512 is 0x200 in hex, so the first byte of your union is 0 the second is 2. If you dont specify which union member should be initialized, the first one will be taken, the int in your case.
You get 2 for the second byte of your string as the first byte of ch is intialized with 0, the second one with 2.
Simple: 512 = binary 1000000000, so ch[0] will get the 8 zeroes (assuming your system is little endian) and ch[1] will get the 10 part, which, in decimal, is 2.
you intermix 'struct' with 'union'. in union you collect different typed and named data into one field (with lenght = maximum (size of data)), which you can access, and for which you have yourself make sure you get the right data.
your example allocs memory for max(int, char[2])
It is no difference, if you say z.i = 32 or z.ch[0]=' '
You got 0 2 for good reasons but the C standard says that the behavior is not defined. If you write i then the value of ch can be theoretically anything.
However, the gcc assures that the data will be well-aligned.

Resources