I'm learning c, and am confused as my code seems to evaluate ( 1e16 - 1 >= 1e16 ) as true when it should be false. My code is below, it returns
9999999999999999 INVALIDBIG\n
when I would expect it not to return anything. I thought any problems with large numbers could be avoided by using long long.
int main(void)
{
long long z;
z = 9999999999999999;
if ( z >= 1e16 || z < 0 )
{
printf("%lli INVALIDBIG\n",z);
}
}
1e16 is a double type literal value, and floats/doubles can be imprecise for decimal arithmetic/comparison (just one of many common examples: decimal 0.2). Its going to cast the long-long z upwards to double for the comparison, and I'm guessing the standard double representation can't store the precision needed (maybe someone else can demonstrate the binary mantissa/sign representations)
Try changing the 1e16 to (long double)1e16, it doesn't then print out your message. (update: or, as the other question-commenter added, change 1e16 to an integer literal)
The doubles and floats can hold limited number of digits. In your case the double numbers with values 9999999999999999 and 1e16 have identical 8 bytes of hex representation. You can check them byte by byte:
long long z = 9999999999999999;
double dz1 = z;
double dz2 = 1e16;
/* prints 0 */
printf("memcmp: %d\n", memcmp(&dz1, &dz2, sizeof(double)));
So, they are equal.
Smaller integers can be stored in double with perfect precision. For example, see Double-precision floating-point format or biggest integer that can be stored in a double
The maximum integer that can be converted to double exactly is 253 (9007199254740992).
Related
I stumbled on one issue while I was implementing in C the given algorithm:
int getNumberOfAllFactors(int number) {
int counter = 0;
double sqrt_num = sqrt(number);
for (int i = 1; i <= sqrt_num; i++) {
if ( number % i == 0) {
counter = counter + 2;
}
}
if (number == sqrt_num * sqrt_num)
counter--;
return counter;
}
– the reason for second condition – is to make a correction for perfect squares (i.e. 36 = 6 * 6), however it does not avoid situations (false positives) like this one:
sqrt(91) = 18.027756377319946
18.027756377319946 * 18.027756377319946 = 91.0
So my questions are: how to avoid it and what is the best way in C language to figure out whether a double number has any digits after decimal point? Should I cast square root values from double to integers?
In your case, you could test it like this:
if (sqrt_num == (int)sqrt_num)
You should probably use the modf() family of functions:
#include <math.h>
double modf(double value, double *iptr);
The modf functions break the argument value into integral and fractional parts, each of
which has the same type and sign as the argument. They store the integral part (in
floating-point format) in the object pointed to by iptr.
This is more reliable than trying to use direct conversions to int because an int is typically a 32-bit number and a double can usually store far larger integer values (up to 53 bits worth) so you can run into errors unnecessarily. If you decide you must use a conversion to int and are working with double values, at least use long long for the conversion rather than int.
(The other members of the family are modff() which handles float and modfl() which handles long double.)
Can two floating point values (IEEE 754 binary64) be compared as integers? Eg.
long long a = * (long long *) ptr_to_double1,
b = * (long long *) ptr_to_double2;
if (a < b) {...}
assuming the size of long long and double is the same.
YES - Comparing the bit-patterns for two floats as if they were integers (aka "type-punning") produces meaningful results under some restricted scenarios...
Identical to floating-point comparison when:
Both numbers are positive, positive-zero, or positive-infinity.
One positive and one negative number, and you are using a signed integer comparison.
Inverse of floating-point comparison when:
Both numbers are negative, negative-zero, or negative-infinity.
One positive and one negative number, and you are using a unsigned integer comparison.
Not comparable to floating-point comparison when:
Either number is one of the NaN values - Floating point comparisons with a NaN always returns false, and this simply can't be modeled in integer operations where exactly one of the following is always true: (A < B), (A == B), (B < A).
Negative floating-point numbers are a bit funky b/c they are handled very differently than in the 2's complement arithmetic used for integers. Doing an integer +1 on the representation for a negative float will make it a bigger negative number.
With a little bit manipulation, you can make both positive and negative floats comparable with integer operations (this can come in handy for some optimizations):
int32 float_to_comparable_integer(float f) {
uint32 bits = std::bit_cast<uint32>(f);
const uint32 sign_bit = bits & 0x80000000ul;
// Modern compilers turn this IF-statement into a conditional move (CMOV) on x86,
// which is much faster than a branch that the cpu might mis-predict.
if (sign_bit) {
bits = 0x7FFFFFF - bits;
}
return static_cast<int32>(bits);
}
Again, this does not work for NaN values, which always return false from comparisons, and have multiple valid bit representations:
Signaling NaNs (w/ sign bit): Anything between 0xFF800001, and 0xFFBFFFFF.
Signaling NaNs (w/o sign bit): Anything between 0x7F800001, and 0x7FBFFFFF.
Quiet NaNs (w/ sign bit): Anything between 0xFFC00000, and 0xFFFFFFFF.
Quiet NaNs (w/o sign bit): Anything between 0x7FC00000, and 0x7FFFFFFF.
IEEE-754 bit format: http://www.puntoflotante.net/FLOATING-POINT-FORMAT-IEEE-754.htm
More on Type-Punning: https://randomascii.wordpress.com/2012/01/23/stupid-float-tricks-2/
No. Two floating point values (IEEE 754 binary64) cannot compare simply as integers with if (a < b).
IEEE 754 binary64
The order of the values of double is not the same order as integers (unless you are are on a rare sign-magnitude machine). Think positive vs. negative numbers.
double has values like 0.0 and -0.0 which have the same value but different bit patterns.
double has "Not-a-number"s that do not compare like their binary equivalent integer representation.
If both the double values were x > 0 and not "Not-a-number", endian, aliasing, and alignment, etc. were not an issue, OP's idea would work.
Alternatively, a more complex if() ... condition would work - see below
[non-IEEE 754 binary64]
Some double use an encoding where there are multiple representations of the same value. This would differ from an "integer" compare.
Tested code: needs 2's complement, same endian for double and the integers, does not account for NaN.
int compare(double a, double b) {
union {
double d;
int64_t i64;
uint64_t u64;
} ua, ub;
ua.d = a;
ub.d = b;
// Cope with -0.0 right away
if (ua.u64 == 0x8000000000000000) ua.u64 = 0;
if (ub.u64 == 0x8000000000000000) ub.u64 = 0;
// Signs differ?
if ((ua.i64 < 0) != (ub.i64 < 0)) {
return ua.i64 >= 0 ? 1 : -1;
}
// If numbers are negative
if (ua.i64 < 0) {
ua.u64 = -ua.u64;
ub.u64 = -ub.u64;
}
return (ua.u64 > ub.u64) - (ua.u64 < ub.u64);
}
Thanks to #David C. Rankin for a correction.
Test code
void testcmp(double a, double b) {
int t1 = (a > b) - (a < b);
int t2 = compare(a, b);
if (t1 != t2) {
printf("%le %le %d %d\n", a, b, t1, t2);
}
}
#include <float.h>
void testcmps() {
// Various interesting `double`
static const double a[] = {
-1.0 / 0.0, -DBL_MAX, -1.0, -DBL_MIN, -0.0,
+0.0, DBL_MIN, 1.0, DBL_MAX, +1.0 / 0.0 };
int n = sizeof a / sizeof a[0];
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
testcmp(a[i], a[j]);
}
}
puts("!");
}
If you strictly cast the bit value of a floating point number to its correspondingly-sized signed integer (as you've done), then signed integer comparison of the results will be identical to the comparison of the original floating-point values, excluding NaN values. Put another way, this comparison is legitimate for all representable finite and infinite numeric values.
In other words, for double-precision (64-bits), this comparison will be valid if the following tests pass:
long long exponentMask = 0x7ff0000000000000;
long long mantissaMask = 0x000fffffffffffff;
bool isNumber = ((x & exponentMask) != exponentMask) // Not exp 0x7ff
|| ((x & mantissaMask) == 0); // Infinities
for each operand x.
Of course, if you can pre-qualify your floating-point values, then a quick isNaN() test would be much more clear. You'd have to profile to understand performance implications.
There are two parts to your question:
Can two floating point numbers be compared? The answer to this is yes. it is perfectly valid to compare size of floating point numbers. Generally you want to avoid equals comparisons due to truncation issues see here, but
if (a < b)
will work just fine.
Can two floating point numbers be compared as integers? This answer is also yes, but this will require casting. This question should help with that answer: convert from long long to int and the other way back in c++
Im working with big numbers , 241 233 ,662581978748022 i wanna find if
662581978748022/241/233 is round or float number ... all of them are long long int , if i try to do
double var=662581978748022/241/233 = it still outputs round number e.g xxx.0000 even if it isnt round number, and bcs of it when i compare
double var=662581978748022/241/23 == long long int var2=662581978748022/241/23 its still true when it shouldnt how can i find if its round or float number other way?
When you do x = integer-number (operator) integer-number the right part is computed as an integer. Whatever the x type.
Example:
double x;
x = 3/2;
Now x is "1.000…". Because 3 is integer, 2 is integer, so the operation is performed as integer. It is then converted to double for the =.
If you want you operation to be performed as float/double you must cast at least one of the right member:
double x;
x = (double)3/2;
Now x is "1.5".
So your double var=662581978748022/241/233 (it is the same if numbers are integer variables) is computed as an integer value.
As said by #Kevin use modulo (%) to get remaining stuff or use floating values.
In C programming, I find a weird problem, which counters my intuition. When I declare a integer as the INT_MAX (2147483647, defined in the limits.h) and implicitly convert it to a float value, it works fine, i.e., the float value is same with the maximum integer. And then, I convert the float back to an integer, something interesting happens. The new integer becomes the minimum integer (-2147483648).
The source codes look as below:
int a = INT_MAX;
float b = a; // b is correct
int a_new = b; // a_new becomes INT_MIN
I am not sure what happens when the float number b is converted to the integer a_new. So, is there any reasonable solution to find the maximum value which can be switched forth and back between integer and float type?
PS: The value of INT_MAX - 100 works fine, but this is just an arbitrary workaround.
This answer assumes that float is an IEEE-754 single precision float encoded as 32-bits, and that an int is 32-bits. See this Wikipedia article for more information about IEEE-754.
Floating point numbers only have 24-bits of precision, compared with 32-bits for an int. Therefore int values from 0 to 16777215 have an exact representation as floating point numbers, but numbers greater than 16777215 do not necessarily have exact representations as floats. The following code demonstrates this fact (on systems that use IEEE-754).
for ( int a = 16777210; a < 16777224; a++ )
{
float b = a;
int c = b;
printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}
The expected output is
a=16777210 c=16777210 b=0x4b7ffffa
a=16777211 c=16777211 b=0x4b7ffffb
a=16777212 c=16777212 b=0x4b7ffffc
a=16777213 c=16777213 b=0x4b7ffffd
a=16777214 c=16777214 b=0x4b7ffffe
a=16777215 c=16777215 b=0x4b7fffff
a=16777216 c=16777216 b=0x4b800000
a=16777217 c=16777216 b=0x4b800000
a=16777218 c=16777218 b=0x4b800001
a=16777219 c=16777220 b=0x4b800002
a=16777220 c=16777220 b=0x4b800002
a=16777221 c=16777220 b=0x4b800002
a=16777222 c=16777222 b=0x4b800003
a=16777223 c=16777224 b=0x4b800004
Of interest here is that the float value 0x4b800002 is used to represent the three int values 16777219, 16777220, and 16777221, and thus converting 16777219 to a float and back to an int does not preserve the exact value of the int.
The two floating point values that are closest to INT_MAX are 2147483520 and 2147483648, which can be demonstrated with this code
for ( int a = 2147483520; a < 2147483647; a++ )
{
float b = a;
int c = b;
printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}
The interesting parts of the output are
a=2147483520 c=2147483520 b=0x4effffff
a=2147483521 c=2147483520 b=0x4effffff
...
a=2147483582 c=2147483520 b=0x4effffff
a=2147483583 c=2147483520 b=0x4effffff
a=2147483584 c=-2147483648 b=0x4f000000
a=2147483585 c=-2147483648 b=0x4f000000
...
a=2147483645 c=-2147483648 b=0x4f000000
a=2147483646 c=-2147483648 b=0x4f000000
Note that all 32-bit int values from 2147483584 to 2147483647 will be rounded up to a float value of 2147483648. The largest int value that will round down is 2147483583, which the same as (INT_MAX - 64) on a 32-bit system.
One might conclude therefore that numbers below (INT_MAX - 64) will safely convert from int to float and back to int. But that is only true on systems where the size of an int is 32-bits, and a float is encoded per IEEE-754.
Something odd is occuring in my C code below.
I want to compare numbers and I round to 4 decimal places.
I have debugged and can see the data being passed in.
The value of tmp_ptr->current_longitude is 6722.31500000, and the value of tmp_ptr->current_latitude is 930.0876500000.
After using the sprintf statements:
charTmpPtrXPos = "6722.3150" and charTmpPtrYPos = "930.0876".
I expect the exact same results for speed_info->myXPos and speed_info->myYPos but strangely even though speed_info->myXPos = 6722.31500000 and the value of speed_info->myYPos > = 30.0876500000, the sprintf statements
charSpeedPtrYPos= "930.0877"
So basically the sprintf statement behaves differently for the second value and appears to round it up. Having debugged this I know the input to the sprintf statement is exactly the same.
Can anyone think of a reason for this?
sizeOfSpeedList = op_prg_list_size (global_speed_trajectory);
tmp_ptr= (WsqT_Location_Message*)op_prg_mem_alloc(sizeof(WsqT_Location_Message));
tmp_ptr = mbls_convert_lat_long_to_xy (own_node_objid);
sprintf(charTmpPtrXPos, "%0.4lf", tmp_ptr->current_longitude);
sprintf(charTmpPtrYPos, "%0.4lf", tmp_ptr->current_latitude);
speed_info = (SpeedInformation *) op_prg_mem_alloc (sizeof (SpeedInformation));
for (count=0; count<sizeOfSpeedList; count++)
{
speed_info = (SpeedInformation*) op_prg_list_access (global_speed_trajectory, count);
sprintf(charSpeedPtrXPos, "%0.4lf", speed_info->myXPos);
sprintf(charSpeedPtrYPos, "%0.4lf", speed_info->myYPos);
//if((tmp_ptr->current_longitude == speed_info->myXPos) && (tmp_ptr->current_latitude == speed_info->myYPos))
if ((strcmp(charTmpPtrXPos, charSpeedPtrXPos) == 0) && (strcmp(charTmpPtrYPos, charSpeedPtrYPos) == 0))
{
my_speed = speed_info->speed;
break;
}
}
printf() typically rounds to the nearest decimal representation, with ties sent to the “even” one (that is, the representation whose last digit is 0, 2, 4, 6, or 8).
However, you must understand that most numbers that are finitely representable in decimal are not finitely representable in binary floating-point. The real number 930.08765, for instance, is not representable as binary floating-point. What you really have as a double value (and what is converted to decimal) is another number, likely slightly above 930.08765, in all likelihood 930.0876500000000532963895238935947418212890625. It is normal for this number to be rounded to the decimal representation 930.0877 since it is closer to this representation than to 930.0876.
Note that if you are using Visual Studio, your *printf() functions may be limited to showing 17 significant digits, preventing you from observing the exact value of the double nearest 930.08765.
It is the difference between a float and a double.
OP is likely using tmp_ptr->current_latitude as a float and speed_info->myYPos as a double. Suggest OP use double though-out unless space/speed oblige the use of float.
int main() {
float f1 = 930.08765;
double d1 = 930.08765;
printf("float %0.4f\ndouble %0.4f\n", f1, d1);
return 0;
}
float 930.0876
double 930.0877
As the typical float uses a IEEE 4-byte binary floating point representation, f1 takes on the exact value of
930.087646484375
930.0876 (This is the 4 digit printed result)
This is the closest float value to 930.08765.
Like-wise, for a double, d1 takes on the exact value of
930.0876500000000532963895238935947418212890625
930.0877 (This is the 4 digit printed result)
In general, one could use more decimal places to reduce the likely-hood of this happening with other numbers, but not eliminate it.
Candidate quick fix
sprintf(charSpeedPtrYPos, "%0.4lf", (float) speed_info->myYPos);
This would first convert the value from speed_info->myYPos to a float. As non-prototyped parameters of type float are converted to double before being passed to sprintf(), the value would get converted back to double. The net result is a loss of precision in the number, but the same string conversion results.
printf("(float) double %0.4f\n", (float) d1);
// (float) double 930.0876
BTW: The l in "%0.4lf" serves no code generation purpose. It is allowed though.
See: http://www.cplusplus.com/reference/cstdio/printf/
A dot followed by a number specifies the precision, which is 4 in your case. Try to use a higher presicion if you need it. I have tried your number with a precision of 5 and it isn't rounded up anymore. So it should be..
sprintf(charSpeedPtrYPos, "%0.5lf", speed_info->myYPos);
..or any higher number which fits your needs.