How values are changed in C unions? - c

#include <stdio.h>
int main()
{
typedef union{
int a ;
char c;
float f;
} myu;
myu sam;
sam.a = 10;
sam.f=(float)5.99;
sam.c= 'H';
printf("%d\n %c\n %f\n",sam.a,sam.c,sam.f);
return 0;
}
Output
1086303816
H
5.990025
How come the value of integer has changed so drastically while the float is almost the same.

The fields of a union all share the same starting memory address. This means that writing to one member will overwrite the contents of another.
When you write one member and then read a different member, the representation of the written member (i.e. how it is laid out in memory) is reinterpreted as the representation of the read member. Integers and floating point types have very different representations, so it makes sense that reading a float as though it were an int can vary greatly.
Things become even more complicated if the two types are not the same size. If a smaller field is written and a larger field is read, the excess bytes might not have event been initialized.
In your example, you first write the value 10 to the int member. Then you write the value 5.99 to the float member. Assuming int and float are both 4 bytes in length, all of the bytes used by the int member are overwritten by the float member.
When you then change the char member, this only changes the first byte. Assuming a float is represented in little-endian IEEE754, this changes just the low-order byte of the mantissa, so only the digits furthest to the right are affected.

Try this. Instead of using printf (which will mainly output nonesens) show the raw memory after each modification.
The code below assumes that int and float are 32 bit types and that your compiler does not add padding bytes in this union.
#include <string.h>
#include <stdio.h>
#include <assert.h>
void showmemory(void* myu)
{
unsigned char memory[4];
memcpy(memory, myu, 4);
for (int i = 0; i < 4; i++)
{
printf("%02x ", memory[i]);
}
printf("\n");
}
int main()
{
typedef union {
int a;
char c;
float f;
} myu;
assert(sizeof(myu) == 4); // assume size of the union is 4 bytes
myu sam;
sam.a = 10;
showmemory(&sam);
sam.f = (float)5.99;
showmemory(&sam);
sam.c = 'H';
showmemory(&sam);
}
Possible output on a little endian system:
0a 00 00 00 // 0a is 10 in hexadecimal
14 ae bf 40 // 5.99 in float
48 ae bf 40 // 48 is 'H'

How come the value of integer has changed so drastically while the float is almost the same.
That is just a coincidence. Your union will be stored in 4 bytes. When you assign the field "a" to 10, the binary representation of the union is 0x0000000A. And then, when you assign the field
f to 5.99, it becomes 0x40bfae14. Finally, you set the c to 'H' (0x48 in Hex), it will overwrite the first byte, which corresponds to the mantissa of the float value. Thus, the float part changes slightly. For more information about floating-point encoding, you can check this handy website out.

In "traditional" C, any object that are not bitfields and do not have a register storage class will represent an association between a sequence of consecutive bytes somewhere in memory and a means of reading or writing values of the object's type. Storing a value into an object of type T will convert the value into a a pattern of sizeof (T) * CHAR_BIT bits, and store that pattern into the associated memory. Reading an object of type T will read the sizeof (T) * CHAR_BIT bits from the object's associated storage and convert that bit pattern into a value of type T, without regard for how the underlying storage came to hold that bit pattern.
A union object serves to reserve space for the largest member, and then creates an association between each member and a region of storage that begins at the start of the union object. Any write to a member of a union will affect the appropriate part of the underlying storage, and any read of a union member will interpret whatever happens to be in the underlying storage as a value of its type. This will be true whether the member is accessed directly, or via pointer or array syntax.
The "traditional C" model is really quite simple. The C Standard, however, is much more complicated because the authors wanted to allow implementations to deviate from that behavior when doing so wouldn't interfere with whatever their customers need to do. This in turn has been interpreted by some compiler writers as an invitation to deviate from the traditional behavior without any regard for whether the traditional behavior might be useful in more circumstances than the bare minimums mandated by the Standard.

Related

From which endpoint and how does C read the variables?

I was messing around with pointers in C and was trying to read values from the same address using different types of pointers. First I created a double variable and assigned the number 26 to it.
double g = 26;
like so. And then I assigned g's address to a void pointer void *vptr = &g;. after that, I tried to read the value at the address of g as a float by type-casting.
float *atr = (float*) (vptr);. When I tried to print the value of *atr it gave me 0.000000. Then i used pointer to a character since characters are 1 byte and tried to see values of those 8 bytes one by one.
char *t;
t = (char*) vptr;
for (int i = 0; i < 8; i++){
printf("%x t[%d]: %d\n",t+i , i, t[i]);
}
it gave me this output
ffffcbe9 t[1]: 0
ffffcbea t[2]: 0
ffffcbeb t[3]: 0
ffffcbec t[4]: 0
ffffcbed t[5]: 0
ffffcbee t[6]: 58
ffffcbef t[7]: 64
Then I checked binary representation of g which is 01000000 00111010 00000000 00000000 00000000 00000000 00000000 00000000 using this website.
When I convert every byte to decimal individually, first byte becomes 64 and the second is 58.
So it was basically reversed. Then I tried to read as a float again but this time i shifted the address.
atr = (float*) (vptr+4);. I didn't know how many bytes it would shift but coincidentally i discovered that it shifts by one just like char pointers.
This time i printed as printf("%f\n",*atr); and now it gave me 2.906250.
When I checked it's binary representation it was 01000000 00111010 00000000 00000000 which is the first half of the variable g. So I am kind of confused how C is reading values from addresses since it looks like c reads the values from right-end and when i add positive numbers to addresses it shifts towards left-end. I am sorry for any spelling or grammatical mistakes.
The order in which the bytes of an scalar object are stored in C are implementation-defined, per C 2018 6.2.6.1 2. (Array elements are of course stored in ascending order by index, and members of structures are in order of declaration, possibly with padding between them.)
The behavior of using *atr after float *atr = (float*) (vptr); is not defined by the C standard, due to the aliasing rules in C 2018 6.5 7. It is defined to examine the bytes through a char lvalue, as you did with t[i], although which bytes are which is implementation-defined per above.
A proper way to reinterpret some bytes of a double as a float is to copy them in byte-by-byte, which you can do with manual code using a char * or simply float f; memcpy(&f, &g, sizeof f);. (memcpy is specified to work as if by copying bytes, per C 2018 7.24.2.1 2.) This will of course only reinterpret the low-addressed bytes of the double as a float, which has two problems:
The low-address bytes may not be the ones that contain the most significant bytes of the double.
float and double commonly use different formats, and the difference is not simply that float has fewer bits in the significand (the fraction portion). The exponent field is also a different width and has a different encoding bias. So reinterpreting the double this way generally will not give you a float that has about the same value as a double.
I didn't know how many bytes it would shift but coincidentally i discovered that it shifts by one just like char pointers.
Supporting arithmetic on void * is a GCC extension that is, as far as I know, needless. When offsets are added to or subtracted from void *, GCC does arithmetic as if it were a char *. This appears to be needless because one can do the desired arithmetic simply by using a char * instead of a void *, so the extension does not provide any new function.

unexpected byte order after casting pointer-to-char into pointer-to-int

unsigned char tab[4] = 14;
If I print as individual bytes...
printf("tab[1] : %u\n", tab[0]); // output: 0
printf("tab[2] : %u\n", tab[1]); // output: 0
printf("tab[3] : %u\n", tab[2]); // output: 0
printf("tab[4] : %u\n", tab[3]); // output: 14
If I print as an integer...
unsigned int *fourbyte;
fourbyte = *((unsigned int *)tab);
printf("fourbyte : %u\n", fourbyte); // output: 234881024
My output in binary is : 00001110 00000000 00000000 00000000, which is the data I wanted but in this order tab[3] tab[2] tab[1] tab[0].
Any explanation of that, why the unsigned int pointer points to the last byte instead of the first ?
The correct answer here is that you should not have expected any relationship, order or otherwise. Except for unions, the C standard does not define a linear address space in which objects of different types can overlap. It is the case on many architecture/compiler-tool-chain combinations that these coincidences can occur from time to time, but you should never rely on them. The fact that by casting a pointer to a suitable scalar type yields a number comparable to others of the same type, in no-way implies that number is any particular memory address.
So:
int* p;
int z = 3;
int* pz = &z;
size_t cookie = (size_t)pz;
p = (int*)cookie;
printf("%d", *p); // Prints 3.
Works because the standard says it must work when cookie is derived from the same type of pointer that it is being converted to. Converting to any other type is undefined behavior. Pointers do not represent memory, they reference 'storage' in the abstract. They are merely references to objects or NULL, and the standard defines how pointers to the same object must behave and how they can be converted to scalar values and back again.
Given:
char array[5] = "five";
The standard says that &(array[0]) < &(array[1]) and that (&(array[0])) + 1) == &(array[1]), but it is mute on how elements in array are ordered in memory. The compiler writers are free to use whatever machine codes and memory layouts that they deem are appropriate for the target architecture.
In the case of unions, which provides for some overlap of objects in storage, the standard only says that each of its fields must be suitably aligned for their types, but just about everything else about them is implementation defined. The key clause is 6.2.6.1 p7:
When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
The gist of all of this is that the C standard defines an abstract machine. The compiler generates an architecture specific simulation of that machine based on your code. You cannot understand the C abstract machine through simple empirical means because implementation details bleed into your data set. You must limit your observations to those that are relevant to the abstraction. Therefore, avoid undefined behavior and be very aware of implementation defined behaviors.
Your example code is running on a computer that is Little-Endian. This term means that the "first byte" of an integer contains the least significant bits. By contrast, a Big-Endian computer stores the most significant bits in the first byte.
Edited to add: the way that you've demonstrated this is decidedly unsafe, as it relies upon undefined behavior to get "direct access" to the memory. There is a safer demonstration here

What does casting char* do to a reference of an int? (Using C)

In my course for intro to operating systems, our task is to determine if a system is big or little endian. There's plenty of results I've found on how to do it, and I've done my best to reconstruct my own version of a code. I suspect it's not the best way of doing it, but it seems to work:
#include <stdio.h>
int main() {
int a = 0x1234;
unsigned char *start = (unsigned char*) &a;
int len = sizeof( int );
if( start[0] > start[ len - 1 ] ) {
//biggest in front (Little Endian)
printf("1");
} else if( start[0] < start[ len - 1 ] ) {
//smallest in front (Big Endian)
printf("0");
} else {
//unable to determine with set value
printf( "Please try a different integer (non-zero). " );
}
}
I've seen this line of code (or some version of) in almost all answers I've seen:
unsigned char *start = (unsigned char*) &a;
What is happening here? I understand casting in general, but what happens if you cast an int to a char pointer? I know:
unsigned int *p = &a;
assigns the memory address of a to p, and that can you affect the value of a through dereferencing p. But I'm totally lost with what's happening with the char and more importantly, not sure why my code works.
Thanks for helping me with my first SO post. :)
When you cast between pointers of different types, the result is generally implementation-defined (it depends on the system and the compiler). There are no guarantees that you can access the pointer or that it correctly aligned etc.
But for the special case when you cast to a pointer to character, the standard actually guarantees that you get a pointer to the lowest addressed byte of the object (C11 6.3.2.3 §7).
So the compiler will implement the code you have posted in such a way that you get a pointer to the least significant byte of the int. As we can tell from your code, that byte may contain different values depending on endianess.
If you have a 16-bit CPU, the char pointer will point at memory containing 0x12 in case of big endian, or 0x34 in case of little endian.
For a 32-bit CPU, the int would contain 0x00001234, so you would get 0x00 in case of big endian and 0x34 in case of little endian.
If you de reference an integer pointer you will get 4 bytes of data(depends on compiler,assuming gcc). But if you want only one byte then cast that pointer to a character pointer and de reference it. You will get one byte of data. Casting means you are saying to compiler that read so many bytes instead of original data type byte size.
Values stored in memory are a set of '1's and '0's which by themselves do not mean anything. Datatypes are used for recognizing and interpreting what the values mean. So lets say, at a particular memory location, the data stored is the following set of bits ad infinitum: 01001010 ..... By itself this data is meaningless.
A pointer (other than a void pointer) contains 2 pieces of information. It contains the starting position of a set of bytes, and the way in which the set of bits are to be interpreted. For details, you can see: http://en.wikipedia.org/wiki/C_data_types and references therein.
So if you have
a char *c,
an short int *i,
and a float *f
which look at the bits mentioned above, c, i, and f are the same, but *c takes the first 8 bits and interprets it in a certain way. So you can do things like printf('The character is %c', *c). On the other hand, *i takes the first 16 bits and interprets it in a certain way. In this case, it will be meaningful to say, printf('The character is %d', *i). Again, for *f, printf('The character is %f', *f) is meaningful.
The real differences come when you do math with these. For example,
c++ advances the pointer by 1 byte,
i++ advanced it by 4 bytes,
and f++ advances it by 8 bytes.
More importantly, for
(*c)++, (*i)++, and (*f)++ the algorithm used for doing the addition is totally different.
In your question, when you do a casting from one pointer to another, you already know that the algorithm you are going to use for manipulating the bits present at that location will be easier if you interpret those bits as an unsigned char rather than an unsigned int. The same operatord +, -, etc will act differently depending upon what datatype the operators are looking at. If you have worked in Physics problems wherein doing a coordinate transformation has made the solution very simple, then this is the closest analog to that operation. You are transforming one problem into another that is easier to solve.

A small program for understanding unions in C [duplicate]

Suppose I define a union like this:
#include <stdio.h>
int main() {
union u {
int i;
float f;
};
union u tst;
tst.f = 23.45;
printf("%d\n", tst.i);
return 0;
}
Can somebody tell me what the memory where tst is stored will look like?
I am trying to understand the output 1102813594 that this program produces.
It depends on the implementation (compiler, OS, etc.) but you can use the debugger to actually see the memory contents if you want.
For example, in my MSVC 2008:
0x00415748 9a 99 bb 41
is the memory contents. Read from LSB on the left side (Intel, little-endian machine), this is 0x41bb999a or indeed 1102813594.
Generally, however, the integer and float are stored in the same bytes. Depending on how you access the union, you get the integer or floating point interpretation of those bytes. The size of the memory space, again, depends on the implementation, although it's usually the largest of its constituents aligned to some fixed boundary.
Why is the value such as it is in your (or mine) case? You should read about floating-point number representation for that (look up ieee 754)
The result is depends on the compiler implementation, But for most x86 compilers, float and int will be the same size. Wikipedia has a pretty good diagram of the layout of a 32 bit float http://en.wikipedia.org/wiki/Single_precision_floating-point_format, that can help to explain 1102813594.
If you print out the int as a hex value, it will be easier to figure out.
printf("%x\n", tst.i);
With a union, both variables are stored starting at the same memory location. A float is stored in an IEEE format (can't remember the standard number, you can look that up[edit: as pointed out by others, IEEE 754]). But, it will be a two's complement normalized (mantissa is always between 0 and 10, exponent can be anything) floating point number.
you are taking the first 4 bytes of that number (again, you can look up what bits go where in the 16 or 32 bits that a float takes up, can't remember). So it basically means nothing and it isn't useful as an int. That is, unless you know why you would want to do something like that, but usually, a float and int combo isn't very useful.
And, no, I don't think it is implementation defined. I believe that the standard dictates what format a float is in.
In union, members will be share the same memory. so that we can get the float value as integer value.
Floating number format will be different from integer storage. so that we can understand the difference using the union.
For Ex:
If I store the 12 integer value in ( 32 bits ). we can get this 12 value as floating point format.
It will stored as signed(1 bit), exponent(8 bits) and significant precision(23 bits).
I wrote a little program that shows what happens when you preserve the bit pattern of a 32-bit float into a 32-bit integer. It gives you the exact same output you are experiencing:
#include <iostream>
int main()
{
float f = 23.45;
int x = *reinterpret_cast<int*>(&f);
std::cout << x; // 1102813594
}

How to convert struct to char array in C

I'm trying to convert a struct to a char array to send over the network. However, I get some weird output from the char array when I do.
#include <stdio.h>
struct x
{
int x;
} __attribute__((packed));
int main()
{
struct x a;
a.x=127;
char *b = (char *)&a;
int i;
for (i=0; i<4; i++)
printf("%02x ", b[i]);
printf("\n");
for (i=0; i<4; i++)
printf("%d ", b[i]);
printf("\n");
return 0;
}
Here is the output for various values of a.x (on an X86 using gcc):
127:
7f 00 00 00
127 0 0 0
128:
ffffff80 00 00 00
-128 0 0 0
255:
ffffffff 00 00 00
-1 0 0 0
256:
00 01 00 00
0 1 0 0
I understand the values for 127 and 256, but why do the numbers change when going to 128? Why wouldn't it just be:
80 00 00 00
128 0 0 0
Am I forgetting to do something in the conversion process or am I forgetting something about integer representation?
*Note: This is just a small test program. In a real program I have more in the struct, better variable names, and I convert to little-endian.
*Edit: formatting
What you see is the sign preserving conversion from char to int. The behavior results from the fact that on your system, char is signed (Note: char is not signed on all systems). That will lead to negative values if a bit-pattern yields to a negative value for a char. Promoting such a char to an int will preserve the sign and the int will be negative too. Note that even if you don't put a (int) explicitly, the compiler will automatically promote the character to an int when passing to printf. The solution is to convert your value to unsigned char first:
for (i=0; i<4; i++)
printf("%02x ", (unsigned char)b[i]);
Alternatively, you can use unsigned char* from the start on:
unsigned char *b = (unsigned char *)&a;
And then you don't need any cast at the time you print it with printf.
The x format specifier by itself says that the argument is an int, and since the number is negative, printf requires eight characters to show all four non-zero bytes of the int-sized value. The 0 modifier tells to pad the output with zeros, and the 2 modifier says that the minimum output should be two characters long. As far as I can tell, printf doesn't provide a way to specify a maximum width, except for strings.
Now then, you're only passing a char, so bare x tells the function to use the full int that got passed instead — due to default argument promotion for "..." parameters. Try the hh modifier to tell the function to treat the argument as just a char instead:
printf("%02hhx", b[i]);
char is a signed type; so with two's complement, 0x80 is -128 for an 8-bit integer (i.e. a byte)
Treating your struct as if it were a char array is undefined behavior. To send it over the network, use proper serialization instead. It's a pain in C++ and even more so in C, but it's the only way your app will work independently of the machines reading and writing.
http://en.wikipedia.org/wiki/Serialization#C
Converting your structure to characters or bytes the way you're doing it, is going to lead to issues when you do try to make it network neutral. Why not address that problem now? There are a variety of different techniques you can use, all of which are likely to be more "portable" than what you're trying to do. For instance:
Sending numeric data across the network in a machine-neutral fashion has long been dealt with, in the POSIX/Unix world, via the functions htonl, htons, ntohl and ntohs. See, for example, the byteorder(3) manual page on a FreeBSD or Linux system.
Converting data to and from a completely neutral representation like JSON is also perfectly acceptable. The amount of time your programs spend converting the data between JSON and native forms is likely to pale in comparison to the network transmission latencies.
char is a signed type so what you are seeing is the two-compliment representation, casting to (unsigned char*) will fix that (Rowland just beat me).
On a side note you may want to change
for (i=0; i<4; i++) {
//...
}
to
for (i=0; i<sizeof(x); i++) {
//...
}
The signedness of char array is not the root of the problem! (It is -a- problem, but not the only problem.)
Alignment! That's the key word here. That's why you should NEVER try to treat structs like raw memory. Compliers (and various optimization flags), operating systems, and phases of the moon all do strange and exciting things to the actual location in memory of "adjacent" fields in a structure. For example, if you have a struct with a char followed by an int, the whole struct will be EIGHT bytes in memory -- the char, 3 blank, useless bytes, and then 4 bytes for the int. The machine likes to do things like this so structs can fit cleanly on pages of memory, and such like.
Take an introductory course to machine architecture at your local college. Meanwhile, serialize properly. Never treat structs like char arrays.
When you go to send it, just use:
(char*)&CustomPacket
to convert. Works for me.
You may want to convert to a unsigned char array.
Unless you have very convincing measurements showing that every octet is precious, don't do this. Use a readable ASCII protocol like SMTP, NNTP, or one of the many other fine Internet protocols codified by the IETF.
If you really must have a binary format, it's still not safe just to shove out the bytes in a struct, because the byte order, basic sizes, or alignment constraints may differ from host to host. You must design your wire protcol to use well-defined sizes and to use a well defined byte order. For your implementation, either use macros like ntohl(3) or use shifting and masking to put bytes into your stream. Whatever you do, make sure your code produces the same results on both big-endian and little-endian hosts.

Resources