Consider this program:
#include <stdio.h>
union myUnion
{
int x;
long double y;
};
int main()
{
union myUnion a;
a.x = 5;
a.y = 3.2;
printf("%d\n%.2Lf", a.x, a.y);
return 0;
}
Output:
-858993459
3.20
This is fine, as the int member gets interpreted using some of the bits of the long double member. However, the reverse doesn't really apply:
#include <stdio.h>
union myUnion
{
int x;
long double y;
};
int main()
{
union myUnion a;
a.y = 3.2;
a.x = 5;
printf("%d\n%.2Lf", a.x, a.y);
return 0;
}
Output:
5
3.20
The question is why the long double doesn't get reinterpreted as some garbage value (since 4 of its bytes should represent the integer)? It is not a coincidence, the program outputs 3.20 for all values of a.x, not just 5.
However, the reverse doesn't really apply
On a little endian system (least significant byte of a multi-byte value is at the lowest address), the int will correspond to the least significant bits of the mantissa of the long double. You have to print that long double with a great deal of precision to see the effect of that int on those insignificant digits.
On a big endian system, like a Power PC box, things would be different: the int part would line up with the most significant part of the long double, overlapping with the sign bit, exponent and most significant mantissa bits. Thus changes in x would have drastic effects on the observed floating-point value, even if only a few significant digits are printed. However, for small values of x, the value appears to be zero.
On a PPC64 system, the following version of the program:
int main(void)
{
union myUnion a;
a.y = 3.2;
int i;
for (i = 0; i < 1000; i++) {
a.x = i;
printf("%d -- %.2Lf\n", a.x, a.y);
}
return 0;
}
prints nothing but
1 -- 0.0
2 -- 0.0
[...]
999 - 0.0
This is because we're creating an exponent field with all zeros, giving rise to values close to zero. However, the initial value 3.2 is completely clobbered; it doesn't just have its least significant bits ruffled.
The size of long double is very large. To see the effect of modifying the x field on implementations where x lines up with the LSBs of the mantissa of y and other bits of union are not effected when modifying via x, you need to print the value with much higher precision.
This is only affecting the last half of the mantissa. It won't make any noticeable difference with the amount of digits you're printing. However, the difference can be seen when you print 64 digits.
This program will show the difference:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
union myUnion
{
int x;
long double y;
};
int main()
{
union myUnion a;
a.y = 3.2;
a.x = 5;
printf("%d\n%.64Lf\n", a.x, a.y);
a.y = 3.2;
printf("%.64Lf\n", a.y);
return 0;
}
My output:
5
3.1999999992549419413918193599855044340074528008699417114257812500
3.2000000000000001776356839400250464677810668945312500000000000000
Based on my knowledge of the 80-bit long double format, this overwrites half of the mantissa, which doesn't skew the result much, so this prints somewhat accurate results.
If you had done this in my program:
a.x = 0;
the result would've been:
0
3.1999999992549419403076171875000000000000000000000000000000000000
3.2000000000000001776356839400250464677810668945312500000000000000
which is only slightly different.
Answers posted by Mohit Jain, Kaz and JL2210 provide good insight to explain your observations and investigate further, but be aware that the C Standard does not guarantee this behavior:
6.2.6 Representations of types
6.2.6.1 General
6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.
7 When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.
As a consequence, the behavior described in the answers is not guaranteed as all the bytes of the long double y member could be modified by setting the int x member, including the bytes that are not part of the int. These bytes can take any value and the contents of y could even be a trap value, causing undefined behavior.
As commented by Kaz, gcc is more precise than the C Standard: the documentation notes it as a common practice: The practice of reading from a different union member than the one most recently written to (called type-punning) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. This practice is actually condoned in the C Standard since C11, as documented in this answer: https://stackoverflow.com/a/11996970/4593267 . Yet in my reading of this footnote there is still no guarantee about the bytes of y not part of x.
Related
I'm trying to interface a board with a raspberry.
I have to read/write value to the board via modbus, but I can't write floating point value like the board.
I'm using C, and Eclipse debug perspective to see the variable's value directly.
The board send me 0x46C35000 which should value 25'000 Dec but eclipse shows me 1.18720512e+009...
When I try on this website http://www.binaryconvert.com/convert_float.html?hexadecimal=46C35000 I obtain 25,000.
What's the problem?
For testing purposes I'm using this:
int main(){
while(1){ // To view easily the value in the debug perspective
float test = 0x46C35000;
printf("%f\n",test);
}
return 0;
}
Thanks!
When you do this:
float test = 0x46C35000;
You're setting the value to 0x46C35000 (decimal 1187205120), not the representation.
You can do what you want as follows:
union {
uint32_t i;
float f;
} u = { 0x46C35000 };
printf("f=%f\n", u.f);
This safely allows an unsigned 32-bit value to be interpreted as a float.
You’re confusing logical value and internal representation. Your assignments sets the value, which is thereafter 0x46C35000, i.e. 1187205120.
To set the internal representation of the floating point number you need to make a few assumptions about how floating point numbers are represented in memory. The assumptions on the website you’re using (IEEE 754, 32 bit) are fair on a general purpose computer though.
To change the internal representation, use memcpy to copy the raw bytes into the float:
// Ensure our assumptions are correct:
#if !defined(__STDC_IEC_559__) && !defined(__GCC_IEC_559)
# error Floating points might not be in IEEE 754/IEC 559 format!
#endif
_Static_assert(sizeof(float) == sizeof(uint32_t), "Floats are not 32 bit numbers");
float f;
uint32_t rep = 0x46C35000;
memcpy(&f, &rep, sizeof f);
printf("%f\n", f);
Output: 25000.000000.
(This requires the header stdint.h for uint32_t, and string.h for memcpy.)
The constant 0x46C35000 being assigned to a float will implicitly convert the int value 1187205120 into a float, rather than directly overlay the bits into the IEEE-754 floating point format.
I normally use a union for this sort of thing:
#include <stdio.h>
typedef union
{
float f;
uint32_t i;
} FU;
int main()
{
FU foo;
foo.f = 25000.0;
printf("%.8X\n", foo.i);
foo.i = 0x46C35000;
printf("%f\n", foo.f);
return 0;
}
Output:
46C35000
25000.000000
You can understand how data are represented in memory when you access them through their address:
#include <stdio.h>
int main()
{
float f25000; // totally unused, has exactly same size as `int'
int i = 0x46C35000; // put binary value of 0x46C35000 into `int' (4 bytes representation of integer)
float *faddr; // pointer (address) to float
faddr = (float*)&i; // put address of `i' into `faddr' so `faddr' points to `i' in memory
printf("f=%f\n", *faddr); // print value pointed bu `faddr'
return 0;
}
and the result:
$ gcc -of25000 f25000.c; ./f25000
f=25000.000000
What it does is:
put 0x46C35000 into int i
copy address of i into faddr, which is also address that points data in memory, in this case of float type
print value pointed by faddr; treat it as float type
you get your 25000.0.
I cannot figure out how to convert the value of a referenced float pointer when it is referenced from an integer casted into a float pointer. I'm sorry if I'm wording this incorrectly. Here is an example of what I mean:
#include <stdio.h>
main() {
int i;
float *f;
i = 1092616192;
f = (float *)&i;
printf("i is %d and f is %f\n", i, *f);
}
the output for f is 10. How did I get that result?
Normally, the value of 1092616192 in hexadecimal is 0x41200000.
In floating-point, that will give you:
sign = positive (0b)
exponent = 130, 2^3 (10000010b)
significand = 2097152, 1.25 (01000000000000000000000b)
2^3*1.25
= 8 *1.25
= 10
To explain the exponent part uses an offset encoding, so you have to subtract 127 from it to get the real value. 130 - 127 = 3. And since this is a binary encoding, we use 2 as the base. 2 ^ 3 = 8.
To explain the significand part, you start with an invisible 'whole' value of 1. the uppermost (leftmost) bit is half of that, 0.5. The next bit is half of 0.5, 0.25. Because only the 0.25 bit and the default '1' bit is set, the significand represents 1 + 0.25 = 1.25.
What you are trying to do is called type-punning. It should be done via a union, or using memcpy() and is only meaningful on an architecture where sizeof(int) == sizeof(float) without padding bits. The result is highly dependent on the architecture: byte ordering and floating point representation will affect the reinterpreted value. The presence of padding bits would invoke undefined behavior as the representation of float 15.0 could be a trap value for type int.
Here is how you get the number corresponding to 15.0:
#include <stdio.h>
int main(void) {
union {
float f;
int i;
unsigned int u;
} u;
u.f = 15;
printf("re-interpreting the bits of float %.1f as int gives %d (%#x in hex)\n",
u.f, u.i, u.u);
return 0;
}
output on an Intel PC:
re-interpreting the bits of float 15.0 as int gives 1097859072 (0x41700000 in hex)
You are trying to predict the consequence of an undefined activity - it depends on a lot of random things, and on the hardware and OS you are using.
Basically, what you are doing is throwing a glass against the wall and getting a certain shard. Now you are asking how to get a differently formed shard. well, you need to throw the glass differently against the wall...
Here's the code:
#include <stdio.h>
union
{
unsigned u;
double d;
} a,b;
int main(void)
{
printf("Enter a, b:");
scanf("%lf %lf",&a.d,&b.d);
if(a.d>b.d)
{
a.u^=b.u^=a.u^=b.u;
}
printf("a=%g, b=%g\n",a.d,b.d);
return 0;
}
The a.u^=b.u^=a.u^=b.u; statement should have swapped a and b if a>b, but it seems that whatever I enter, the output will always be exactly my input.
a.u^=b.u^=a.u^=b.u; causes undefined behaviour by writing to a.u twice without a sequence point. See here for discussion of this code.
You could write:
unsigned tmp;
tmp = a.u;
a.u = b.u;
b.u = tmp;
which will swap a.u and b.u. However this may not achieve the goal of swapping the two doubles, if double is a larger type than unsigned on your system (a common scenario).
It's likely that double is 64 bits, while unsigned is only 32 bits. When you swap the unsigned members of the unions, you're only getting half of the doubles.
If you change d to float, or change u to unsigned long long, it will probably work, since they're likely to be the same size.
You're also causing UB by writing to the variables twice without a sequence point. The proper way to write the XOR swap is with multiple statements.
b.u ^= a.u;
a.u ^= b.u;
b.u ^= a.u;
For more about why not to use XOR for swapping, see Why don't people use xor swaps?
In usual environment, memory size of datatype 'unsigned' and 'double' are different.
That is why variables are not look like changed.
And you cannot using XOR swap on floating point variable.
because they are represented totally different in memory.
I saw the following piece of code in an opensource AAC decoder,
static void flt_round(float32_t *pf)
{
int32_t flg;
uint32_t tmp, tmp1, tmp2;
tmp = *(uint32_t*)pf;
flg = tmp & (uint32_t)0x00008000;
tmp &= (uint32_t)0xffff0000;
tmp1 = tmp;
/* round 1/2 lsb toward infinity */
if (flg)
{
tmp &= (uint32_t)0xff800000; /* extract exponent and sign */
tmp |= (uint32_t)0x00010000; /* insert 1 lsb */
tmp2 = tmp; /* add 1 lsb and elided one */
tmp &= (uint32_t)0xff800000; /* extract exponent and sign */
*pf = *(float32_t*)&tmp1 + *(float32_t*)&tmp2 - *(float32_t*)&tmp;
} else {
*pf = *(float32_t*)&tmp;
}
}
In that the line,
*pf = *(float32_t*)&tmp;
is same as,
*pf = (float32_t)tmp;
Isn't it?
Or is there a difference? Maybe in performance?
Thank you.
No, they're completely different. Say the value of tmp is 1. Their code will give *pf the value of whatever floating point number has the same binary representation as the integer 1. Your code would give it the floating point value 1.0!
This code is editing the value of a float knowing it is formatted using the standard IEEE 754 floating representation.
*(float32_t*)&tmp;
means reinterpret the address of temp as being a pointer on a 32 bit float, extract the value pointed.
(float32_t)tmp;
means cast the integer to float 32. Which means 32.1111f may well produce 32.
Very different.
The first causes the bit pattern of tmp to be reinterpreted as a float.
The second causes the numerical value of tmp to be converted to float (within the accuracy that it can be represented including rounding).
Try this:
int main(void) {
int32_t n=1078530011;
float32_t f;
f=*(float32_t*)(&n);
printf("reinterpet the bit pattern of %d as float - f==%f\n",n,f);
f=(float32_t)n;
printf("cast the numerical value of %d as float - f==%f\n",n,f);
return 0;
}
Example output:
reinterpet the bit pattern of 1078530011 as float - f==3.141593
cast the numerical value of 1078530011 as float - f==1078530048.000000
It's like thinking that
const char* str="3568";
int a=*(int*)str;
int b=atoi(str);
Will assign a and b the same values.
First to answer the question, my_float = (float)my_int safely converts the integer to a float according to the rules of the standard (6.3.1.4).
When a value of integer type is converted to a real floating type, if
the value being converted can be represented exactly in the new type,
it is unchanged. If the value being converted is in the range of
values that can be represented but cannot be represented exactly, the
result is either the nearest higher or nearest lower representable
value, chosen in an implementation-defined manner. If the value being
converted is outside the range of values that can be represented, the
behavior is undefined.
my_float = *(float*)&my_int on the other hand, is a dirty trick, telling the program that the binary contents of the integer should be treated as if they were a float variable, with no concerns at all.
However, the person who wrote the dirty trick was probably not aware of it leading to undefined behavior for another reason: it violates the strict aliasing rule.
To fix this bug, you either have to tell your compiler to behave in a non-standard, non-portable manner (for example gcc -fno-strict-aliasing), which I don't recommend.
Or preferably, you rewrite the code so that it doesn't rely on undefined behavior. Best way is to use unions, for which strict aliasing doesn't apply, in the following manner:
typedef union
{
uint32_t as_int;
float32_t as_float;
} converter_t;
uint32_t value1, value2, value3; // do something with these variables
*pf = (converter_t){value1}.as_float +
(converter_t){value2}.as_float -
(converter_t){value3}.as_float;
Also it is good practice to add the following sanity check:
static_assert(sizeof(converter_t) == sizeof(uint32_t),
"Unexpected padding or wrong type sizes!");
In C programming, I find a weird problem, which counters my intuition. When I declare a integer as the INT_MAX (2147483647, defined in the limits.h) and implicitly convert it to a float value, it works fine, i.e., the float value is same with the maximum integer. And then, I convert the float back to an integer, something interesting happens. The new integer becomes the minimum integer (-2147483648).
The source codes look as below:
int a = INT_MAX;
float b = a; // b is correct
int a_new = b; // a_new becomes INT_MIN
I am not sure what happens when the float number b is converted to the integer a_new. So, is there any reasonable solution to find the maximum value which can be switched forth and back between integer and float type?
PS: The value of INT_MAX - 100 works fine, but this is just an arbitrary workaround.
This answer assumes that float is an IEEE-754 single precision float encoded as 32-bits, and that an int is 32-bits. See this Wikipedia article for more information about IEEE-754.
Floating point numbers only have 24-bits of precision, compared with 32-bits for an int. Therefore int values from 0 to 16777215 have an exact representation as floating point numbers, but numbers greater than 16777215 do not necessarily have exact representations as floats. The following code demonstrates this fact (on systems that use IEEE-754).
for ( int a = 16777210; a < 16777224; a++ )
{
float b = a;
int c = b;
printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}
The expected output is
a=16777210 c=16777210 b=0x4b7ffffa
a=16777211 c=16777211 b=0x4b7ffffb
a=16777212 c=16777212 b=0x4b7ffffc
a=16777213 c=16777213 b=0x4b7ffffd
a=16777214 c=16777214 b=0x4b7ffffe
a=16777215 c=16777215 b=0x4b7fffff
a=16777216 c=16777216 b=0x4b800000
a=16777217 c=16777216 b=0x4b800000
a=16777218 c=16777218 b=0x4b800001
a=16777219 c=16777220 b=0x4b800002
a=16777220 c=16777220 b=0x4b800002
a=16777221 c=16777220 b=0x4b800002
a=16777222 c=16777222 b=0x4b800003
a=16777223 c=16777224 b=0x4b800004
Of interest here is that the float value 0x4b800002 is used to represent the three int values 16777219, 16777220, and 16777221, and thus converting 16777219 to a float and back to an int does not preserve the exact value of the int.
The two floating point values that are closest to INT_MAX are 2147483520 and 2147483648, which can be demonstrated with this code
for ( int a = 2147483520; a < 2147483647; a++ )
{
float b = a;
int c = b;
printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}
The interesting parts of the output are
a=2147483520 c=2147483520 b=0x4effffff
a=2147483521 c=2147483520 b=0x4effffff
...
a=2147483582 c=2147483520 b=0x4effffff
a=2147483583 c=2147483520 b=0x4effffff
a=2147483584 c=-2147483648 b=0x4f000000
a=2147483585 c=-2147483648 b=0x4f000000
...
a=2147483645 c=-2147483648 b=0x4f000000
a=2147483646 c=-2147483648 b=0x4f000000
Note that all 32-bit int values from 2147483584 to 2147483647 will be rounded up to a float value of 2147483648. The largest int value that will round down is 2147483583, which the same as (INT_MAX - 64) on a 32-bit system.
One might conclude therefore that numbers below (INT_MAX - 64) will safely convert from int to float and back to int. But that is only true on systems where the size of an int is 32-bits, and a float is encoded per IEEE-754.