What does this C idiom mean? [duplicate] - c

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
John Carmack’s Unusual Fast Inverse Square Root (Quake III)
I came across this piece of code a blog recently - it is from the Quake3 Engine. It is meant to calculate the inverse square root fast using the Newton-Rhapson method.
float InvSqrt (float x){
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
What is the reason for doing int i = *(int*)&x;? Doing int i = (int) x; instead gives a completely different result.

int i = *(int*)&x; doesn't convert x to an int -- what it does is get the actual bits of the float x, which is usually represented as a whole other 4-byte value than you'd expect.
For reference, doing this is a really bad idea unless you know exactly how float values are represented in memory.

int i = *(int*)&x; says "take the four bytes which make up the float value x, and treat them as if they were an int." float values and int value are stored using completely different methods (e.g. int 4 and float 4.0 have completely different bit patterns)

The number that ends up in i is the binary value of the IEEE floating point representation of the number in x. The link explains what that looks like. This is not a common C idiom, it's a clever trick from before the SSE instructions got added to commercially available x86 processors.

Related

Generic Function To extract Decimal part of a float without Floor() [duplicate]

This question already has answers here:
How to extract the decimal part from a floating point number in C?
(16 answers)
Closed 5 years ago.
I want to split the float number to two separate part as real and non real part.
For example: if x = 45.678, then my function have to give real= 45 and non_real=678. I have tried the following logic.
split ( float x, unsigned int *real, unsigned int *non_real)
{
*real = x;
*non_real = ((int)(x*N_DECIMAL_POINTS_PRECISION)%N_DECIMAL_POINTS_PRECISION);
printf ("Real = %d , Non_Real = %d\n", *real, *non_real);
}
where N_DECIMAL_POINTS_PRECISION = 10000. It would give decimal part till 4 digits, not after.
It works only for specific set of decimal point precision. The code is not generic, it has to work for all floating numbers also like 9.565784 and 45.6875322 and so on. So if anyone could help me on this, it would be really helpful.
Thanks in advance.
Use floor() to find the integer part, and then subtract the integer part from the original value to find the fractional part.
Note: The problem you're most likely having is that some numbers are too large for the integer part to fit in the range of an int.
--Added--
If and only if you are able to assume that an unsigned int is larger than the floating point representation's significand (e.g. 32-bit unsigned int and IEEE standard single-precision floating point with only 23 fractional bits, where "32 < 23" is true); then a number that is too large for an unsigned int can't have any fractional bits. This leads to a solution like:
if(x > UINT_MAX) {
integer_part = x;
fractional_part = 0;
} else {
integer_part = (int)x;
fractional_part = x - integer_part;
}

Float conversion in C on ARM [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
I try to convert two byte to float and I have problem with precision.
In my case I read temp and store into two bytes. For example 14.69*C - 14(dec) to one byte and 69(dec) to second byte. Then I would like to convert this bytes to float and compare with another float, for example:
byte byte1 = 0xE;
byte byte2 = 0x45;
float temp1 = (float) byte1*1.0 + (float) byte2*0.01; // byte2*0.1 if byte2<10
float temp2 = 14.69;
...
if (temp1==temp2){
...
}
I expected temp1 value 14.69 but value is 14.68999958 - Why, and what is the solution?
Every time a floating point operation is done, some precision is lost. You can try to reduce the error by replacing floating point arithmetic with int as much as possible. For example:
((float)((unsigned int)byte1 * 100 + (unsigned int)byte2))/100.0
also, comparing floats for strict equality can fail due to machine precision issues, you should use if (fabsf(f1 - f2) < EPSILON)
I think you should use the bytes as they are before converting them in float, float are not really precise when it comes to equality.

Unusual conversion between float and long [duplicate]

This question already has answers here:
John Carmack's Unusual Fast Inverse Square Root (Quake III)
(6 answers)
Closed 8 years ago.
I found a very complex function this is an implementation of Fast inverse square root. I honestly do not understand how this function works but the following conversion between a long and a float has caught my eye:
i = *(long *) &y;
And I leave the full code
inline float Q_rsqrt(float number)
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = *(long *) &y;
i = 0x5f3759df - (i >> 1);
y = * (float *) &i;
y = y * (threehalfs - (x2 * y * y));
return y;
}
The cast simply reinterprets the bits of y as a long so that it can perform integer arithmetic on them.
See Wikipedia for an explanation of the algorithm: Fast inverse square root.
The code makes use of the knowledge that, on the target platform, sizeof(long) == sizeof(float).
#R.. also helpfully adds the following in a comment:
It's also invalid C -- it's an aliasing violation. A correct version of this program needs use either memcpy or possibly (this is less clear that it's correct, but real compilers document support for it) union-based type punning. The version in OP's code will definitely be "miscompiled" (i.e. in a way different than the author's intent) by real compilers though.
This means that the code is not only architecture-specific, it is also compiler-specific.

matlab and c differ with cos function

I have a program implemented in matlab and the same program in c, and the results differ.
I am bit puzzled that the cos function does not return the exact same result.
I use the same computer, Intel Core 2 Duo, and 8 bytes double data type in both cases.
Why does the result differ?
Here is the test:
c:
double a = 2.89308776595231886830;
double b = cos(a);
printf("a = %.50f\n", a);
printf("b = %.50f\n", b);
printf("sizeof(a): %ld\n", sizeof(a));
printf("sizeof(b): %ld\n", sizeof(b));
a = 2.89308776595231886830106304842047393321990966796875
b = -0.96928123535654842068964853751822374761104583740234
sizeof(a): 8
sizeof(b): 8
matlab:
a = 2.89308776595231886830
b = cos(a);
fprintf('a = %.50f\n', a);
fprintf('b = %.50f\n', b);
whos('a')
whos('b')
a = 2.89308776595231886830106304842047393321990966796875
b = -0.96928123535654830966734607500256970524787902832031
Name Size Bytes Class Attributes
a 1x1 8 double
Name Size Bytes Class Attributes
b 1x1 8 double
So, b differ a bit (very slightly, but enough to make my debuging task difficult)
b = -0.96928123535654842068964853751822374761104583740234 c
b = -0.96928123535654830966734607500256970524787902832031 matlab
I use the same computer, Intel Core 2 Duo, and 8 bytes double data type.
Why does the result differ?
does matlab do not use the cos function hardware built-in in Intel?
Is there a simple way to use the same cos function in matlab and c (with exact results), even if a bit slower, so that I can safely compare the results of my matlab and c program?
Update:
thanks a lot for your answers!
So, as you have pointed out, the cos function for matlab and c differ.
That's amazing! I thought they were using the cos function built-in in the Intel microprocessor.
The cos version of matlab is equal (at least for this test) to the one of matlab.
you can try from matlab also: b=java.lang.Math.cos(a)
Then, I did a small MEX function to use the cos c version from within matlab, and it works fine; This allows me to debug the my program (the same one implemented in matlab and c) and see at what point they differ, which was the purpose of this post.
The only thing is that calling the MEX c cos version from matlab is way too slow.
I am now trying to call the Java cos function from c (as it is the same from matlab), see if that goes faster.
Floating point numbers are stored in binary, not decimal. A double precision float has 52 bits of precision, which translates to roughly 15 significant decimal places. In other words, the first 15 nonzero decimal digits of a double printed in decimal are enough to uniquely determine which double was printed.
As a diadic rational, a double has an exact representation in decimal, which takes many more decimal places than 15 to represent (in your case, 52 or 53 places, I believe). However, the standards for printf and similar functions do not require the digits past the 15th to be correct; they could be complete nonsense. I suspect one of the two environments is printing the exact value, and the other is printing a poor approximation, and that in reality both correspond to the exact same binary double value.
Using the script at http://www.mathworks.com/matlabcentral/fileexchange/1777-from-double-to-string
the difference between the two numbers is only in the last bit:
octave:1> bc = -0.96928123535654842068964853751822374761104583740234;
octave:2> bm = -0.96928123535654830966734607500256970524787902832031;
octave:3> num2bin(bc)
ans = -.11111000001000101101000010100110011110111001110001011*2^+0
octave:4> num2bin(bm)
ans = -.11111000001000101101000010100110011110111001110001010*2^+0
One of them must be closer to the "correct" answer, assuming the value given for a is exact.
>> be = vpa('cos(2.89308776595231886830)',50)
be =
-.96928123535654836529707365425580405084360377470583
>> bc = -0.96928123535654842068964853751822374761104583740234;
>> bm = -0.96928123535654830966734607500256970524787902832031;
>> abs(bc-be)
ans =
.5539257488326242e-16
>> abs(bm-be)
ans =
.5562972757925323e-16
So, the C library result is more accurate.
For the purposes of your question, however, you should not expect to get the same answer in matlab and whichever C library you linked with.
The result is the same up to 15 decimal places, I suspect that is sufficient for almost all applications and if you require more you should probably be implementing your own version of cosine anyway such that you are in control of the specifics and your code is portable across different C compilers.
They will differ because they undoubtedly use different methods to calculate the approximation to the result or iterate a different number of times. As cosine is defined as an infinite series of terms an approximation must be used for its software implementation. The CORDIC algorithm is one common implementation.
Unfortunately, I don't know the specifics of the implementation in either case, indeed the C one will depend on which C standard library implementation you are using.
As others have explained, when you enter that number directly in your source code, not all the fraction digits will be used, as you only get 15/16 decimal places for precision. In fact, they get converted to the nearest double value in binary (anything beyond the fixed limit of digits is dropped).
To make things worse, and according to #R, IEEE 754 tolerates error in the last bit when using the cosine function. I actually ran into this when using different compilers.
To illustrate, I tested with the following MEX file, once compiled with the default LCC compiler, and then using VS2010 (I am on WinXP 32-bit).
In one function we directly call the C functions (mexPrintf is simply a macro #define as printf). In the other, we call mexEvalString to evaulate stuff in the MATLAB engine (equivalent to using the command prompt in MATLAB).
prec.c
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include "mex.h"
void c_test()
{
double a = 2.89308776595231886830L;
double b = cos(a);
mexPrintf("[C] a = %.25Lf (%16Lx)\n", a, a);
mexPrintf("[C] b = %.25Lf (%16Lx)\n", b, b);
}
void matlab_test()
{
mexEvalString("a = 2.89308776595231886830;");
mexEvalString("b = cos(a);");
mexEvalString("fprintf('[M] a = %.25f (%bx)\\n', a, a)");
mexEvalString("fprintf('[M] b = %.25f (%bx)\\n', b, b)");
}
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
matlab_test();
c_test();
}
copmiled with LCC
>> prec
[M] a = 2.8930877659523189000000000 (4007250b32d9c886)
[M] b = -0.9692812353565483100000000 (bfef045a14cf738a)
[C] a = 2.8930877659523189000000000 ( 32d9c886)
[C] b = -0.9692812353565484200000000 ( 14cf738b) <---
compiled with VS2010
>> prec
[M] a = 2.8930877659523189000000000 (4007250b32d9c886)
[M] b = -0.9692812353565483100000000 (bfef045a14cf738a)
[C] a = 2.8930877659523189000000000 ( 32d9c886)
[C] b = -0.9692812353565483100000000 ( 14cf738a) <---
I compile the above using: mex -v -largeArrayDims prec.c, and switch between the backend compilers using: mex -setup
Note that I also tried to print the hexadecimal representation of the numbers. I only managed to show the lower half of binary double numbers in C (perhaps you can get the other half using some sort of bit manipulations, but I'm not sure how!)
Finally, if you need more precision in you calculations, consider using a library for variable precision arithmetic. In MATLAB, if you have access to the Symbolic Math Toolbox, try:
>> a = sym('2.89308776595231886830');
>> b = cos(a);
>> vpa(b,25)
ans =
-0.9692812353565483652970737
So you can see that the actual value is somewhere between the two different approximations I got above, and in fact they are all equal up to the 15th decimal place:
-0.96928123535654831.. # 0xbfef045a14cf738a
-0.96928123535654836.. # <--- actual value (cannot be represented in 64-bit)
-0.96928123535654842.. # 0xbfef045a14cf738b
^
15th digit --/
UPDATE:
If you want to correctly display the hexadecimal representation of floating point numbers in C, use this helper function instead (similar to NUM2HEX function in MATLAB):
/* you need to adjust for double/float datatypes, big/little endianness */
void num2hex(double x)
{
unsigned char *p = (unsigned char *) &x;
int i;
for(i=sizeof(double)-1; i>=0; i--) {
printf("%02x", p[i]);
}
}

How do I compute maximum/minimum of 8 different float values

I need to find maximum and minimum of 8 float values I get. I did as follows. But float comparisons are going awry as warned by any good C book!
How do I compute the max and min in a accurate way.
main()
{
float mx,mx1,mx2,mx3,mx4,mn,mn1,mn2,mn3,mn4,tm1,tm2;
mx1 = mymax(2.1,2.01); //this returns 2.09999 instead of 2.1 because a is passed as 2.09999.
mx2 = mymax(-3.5,7.000001);
mx3 = mymax(7,5);
mx4 = mymax(7.0000011,0); //this returns incorrectly- 7.000001
tm1 = mymax(mx1,mx2);
tm2 = mymax(mx3,mx4);
mx = mymax(tm1,tm2);
mn1 = mymin(2.1,2.01);
mn2 = mymin(-3.5,7.000001);
mn3 = mymin(7,5);
mn4 = mymin(7.0000011,0);
tm1 = mymin(mx1,mx2);
tm2 = mymin(mx3,mx4);
mn = mymin(tm1,tm2);
printf("Max is %f, Min is %f \n",mx,mn);
getch();
}
float mymax(float a,float b)
{
if(a >= b)
{
return a;
}
else
{
return b;
}
}
float mymin(float a,float b)
{
if(a <= b)
{
return a;
}
else
{
return b;
}
}
How can I do exact comparisons of these floats? This is all C code.
thank you.
-AD.
You are doing exact comparison of these floats. The problem (with your example code at least) is that float simply does not have enough digits of precision to represent the values of your literals sufficiently. 7.000001 and 7.0000011 simply are so close together that the mantissa of a 32 bit float cannot represent them differently.
But the example seems artificial. What is the real problem you're trying to solve? What values will you actually be working with? Or is this just an academic exercise?
The best solution depends on the answer to that. If your actual values just require somewhat more more precision than float can provide, use double. If you need exact representation of decimal digits, use a decimal type library. If you want to improve your understanding of how floating point values work, read The Floating-Point Guide.
You can do exact comparison of floats. Either directly as floats, or by casting them to int with the same bit representation.
float a = 1.0f;
float b = 2.0f;
int &ia = *(int *)(&a);
int &ib = *(int *)(&b);
/* you can compare a and b, or ia and ib, the results will be the same,
whatever the values of the floats are.
Floats are ordered the correct way when its bits are considered as int
and thus can be compared (provided that float and int both are 32 bits).
*/
But you will never be able to represent exactly 2.1 as a float.
Your problem is not a problem of comparison, it is a problem of representation of a value.
I'd claim that these comparisons are actually exact, since no value is altered.
The problem is that many float literals can't be represented exactly by IEEE-754 floating point numbers. So for example 2.1.
If you need an exact representation of base 10 pointed numbers you could - for example - write your own fixed point BCD arithmetic.
Concerning finding min and max at the same time:
A way that needs less comparisons is for each index pair (2*i, 2*i+1) first finding the minimum (n/2 comparisons)
Then find the minimum of the minima ((n-1)/2 comparisons) and the maximum of the maxima ((n-1)/2 comparisons).
So we get (3*n-2)/2 comparisons instead of (2*n-2)/2 when finding the minimum and maximum separated.
The < and > comparison always works correct with floats or doubles. Only the == comparison has problems, therefore you are advised to use epsilon.
So your method of calculating min, max has no issue. Note that if you use float, you should use the notation 2.1f instead of 2.1. Just a note.

Resources