Since there are finitely many floating point numbers and one can compare each possible pair of such numbers (I assume), there must always exist a number 'b' which is
smaller than some given number 'a' (not +/- infinity) and
there exists no number 'c' smaller than 'a' and greater than 'b';
i.e. the 'next' smaller floating-point-represented number. I wonder if:
there is a function smaller(float a) returning such number b (or greater(float a) for that matter) in the C programming language
if not, then if there is a way to obtain these 'next' numbers for certain types of numbers 'a', for example if 'a' is an integer/zero.
Trying
float smaller(float a) return a - 0.00...001f;
seems to me like a hack that probably doesn't work for all possible inputs, but I might be wrong, so that's why I'm turning to you guys. Any help is appretiated.
Indeed there is. You're after the "nextafter" family of functions.
These can be used to move from one floating point number to the next, much in the same way as you can use ++ and -- for integral types.
See https://en.cppreference.com/w/c/numeric/math/nextafter
(This is C documentation).
The C99/POSIX functions nextafter/nexttoward can do this. You provide a start value x and a destination value y, and they return the next value from the start in the direction of the destination.
Also, if your language does not have the nextafter family of functions, but does let you treat values stored in memory as integers (by pointer casting or other, dirtier tricks), then, for any floating-point type (double, float, half, ...) that conforms to IEEE 754, if you want to find the next larger number than value, you can do
FLOATING value = ...;
if (value >= 0) {
integer_increment(value);
} else {
integer_decrement(value);
}
and vice versa for the next smaller number, where integer_increment increments the value of value as if value was of an integral type.
Related
I've just implemented a line of code, where two numbers need to be divided and the result needs to be rounded up to the next integer number. I started very naïvely:
i_quotient = ceil(a/b);
As the numbers a and b are both integer numbers, this did not work: the division gets executed as an integer division, which is rounding down by default, so I need to force the division to be a floating point operation:
i_quotient = ceil((double) a / b);
Now this seems to work, but it leaves a warning saying that I am trying to assign a double to an integer, and indeed, following the header file "math.h" the return type of the ceil() function is "double", and now I'm lost: what's the sense of a rounding function to return a double? Can anybody enlighten me about this?
A double has a range that can be greater than any integer type.
Returning double is the only way to ensure that the result type has a range that can handle all possible input.
ceil() takes a double as an argument. So, if it were to return an integer, what integer type would you choose that can still represent its ceiled value?
Whatever may be the type, it should be able to represent all possible double values.
The integer type that can hold the highest possible value is uintmax_t.
But that doesn't guarantee it can hold all double values even in some implementations it can.
So, it makes sense to return a double value for ceil(). If an integer value is needed, then the caller can always cast it to the desired integer type.
OP starts with two integers a,b and questions why a function double ceil(double) that takes a double, does not return some integer type.
Most floating-point math functions take floating point arguments and return the same type.
A big reason double ceil(double) does not return an integer type is because that limited functionality is rarely needed. Integer types have (or almost always have) a more limited range that double. ceil(DBL_MAX) is not expected to fit in an integer type.
There is little need to use double math to solve an integer problem.
If code needs to divide integers and round up the quotient, use the following. Ref:#mch
i_quotient = (a + b - 1) / b;
The above will handle most of OP's cases when a >= 0 and b > 0. Other considerations are needed when a or b are negative or if a + b - 1 may overflow.
Because why should it? Converting betwen int and double takes time. This overhead can become significant. If you want to convert a double to int do so explicitly:
i_quotient = (int)ceil((double) a / b);
Check this answer if you want to know more about this latency. You have to consider that C is quit old and achievable performance was one of the top priorities. But even C# and other modern languages usually return a floating value for ceil just for consistency.
Leaving technical discussions apart, couldn't be simply for consistency?
If the function takes a double it should return a result of the same type, if there's no particular reasons to return a different type.
It's up to the user to transform it to an integer if he needs to.
After all you may be working only with doubles in your application.
Although ceil means to round up to the next whole number , it doesn't mean strictly that it is an integer, it's obvious that an integer is a whole number but that doesn't have to prejudice our mind.
I'm new to C, and I'm having such a hard time understanding this material. I really need help! Please someone help.
In arithmetic, the sum of any two positive integers is great than either:
(n+m) > n for n, m > 0
(n+m) > m for n, m > 0
C has an addition operator +. Does this arithmetic rule hold in C?
I know this is False. But can please someone explain to me why so, I can understand it? Please provide counter-example?
Thank you in advance.
(I won't solve this for you, but will provide some pointers.)
It is false for both integer and floating-point arithmetic, for different reasons.
Integers are susceptible to overflow.
Adding a very small floating-point number m to a very large number n returns n. Have a read of What Every Computer Scientist Should Know About Floating-Point Arithmetic.
It doesn't hold, since C's integers are not "abstract" infinititely-sized integers that the real integers (in mathematics) are.
In C, integers are discrete and digital, and implemented using a fixed number of bits. This leads to limited range, and problems when you go (try to) out of range. Typically integers will wrap, which is very "un-natural".
I brief search did not show up nice answers describing these, so I rather attempt to answer this nicely here, for beginners.
The answer is false, of course, but why so?
Integers
In C, or any programming language providing some kind of integer type, this type does not mean it in the mathematical sense. In mathematical sense non-negative integers range from 0 to infinity. A computer, however has only limited storage, so integers necessarily are constrained to something less than infinity.
This alone proves that a + b > a and a + b > b can not be true all the time, since it can be set up so both a and b is less than the largest number the computer can represent in it's storage, but a + b is larger than that.
What exactly happens here, depends. Some mentioned wraparound, but that's not necessarily the case. The C language the first place defines integer overflow to be an undefined behaviour, that whatever, including fire and smoke, may happen if the code happens to step on it (of course in the reality that won't happen, but interpreting the standard strictly it could, as well as the breach of the space-time continuum).
I won't describe how wraparound works here since it is beyond the scope of the problem itself.
Floating point
The case here is again just the same like for integers: the key to understand why mathematics don't fully apply here is that the computer has a limited storage.
Floating numbers in the computers memory are represented much like scientific notation: a mantissa, and an exponent. Both of these have a fixed limited range depending on the type of the floating point variable.
In base 10, you may conceive this like you have the exponent ranging from 10 ^ -10 to 10 ^ 10, and the mantissa having like 4 fraction digits after the decimal point, always normalized.
With this in mind, check these example additions:
1.2345 * (10 ^ 0) + 1.0237 * (10 ^ 5)
5.2345 * (10 ^ 10) + 6.7891 * (10 ^ 10)
The first is an example where the result will equal one of the input numbers while both were larger than zero. The second is an example where the result is out of range.
The floating point representation computers use however is capable to represent infinity, and two at that: positive infinity and negative infinity. So while the first example passes as a proof, the second does not, since that addition's result is positive infinity.
However with this in mind, you could produce an another proofing example:
3.1416 * (10 ^ 0) + (+ infinity)
Of course the result is positive infinity, no matter what you add it to. And of course positive infinity is not larger than positive infinity, so proved again.
I'm trying to get the user to input a number between 1.00000 to 0.00001 while edges not included into a float variable. I can assume that the user isn't typing more than 5 numbers after the dot.
now, here is what I have written:
printf("Enter required Leibniz gap.(Between 0.00001 to 1.00000)\n");
scanf("%f", &gap);
while ((gap < 0.00002) || (gap > 0.99999))
{
printf("Enter required Leibniz gap.(Between 0.00001 to 1.00000)\n");
scanf("%f", &gap);
}
now, when I'm typing the smallest number possible: 0.00002 in getting stuck in the while loop.
when I run the debugger I saw that 0.00002 is stored with this value in the float variable: 1.99999995e-005
anybody can clarify for me what am I doing wrong? why isn't 0.00002 meeting the conditions? what is this "1.99999995e-005" thing.
The problem here is that you are using a float variable (gap), but you are comparing it with a double constant (0.00002). The constant is double because floating-point constants in C are double unless otherwise specified.
An underlying issue is that the number 0.00002 is not representable in either float or double. (It's not representable at all in binary floating point because it's binary expansion is infinitely long, like the decimal expansion of ⅓.) So when you write 0.00002 in a program, the C compiler substitutes it with a double value which is very close to 0.00002. Similarly, when scanf reads the number 0.00002 into a float variable, it substitutes a float value which is very close to 0.00002. Since double numbers have more bits than floats, the double value is closer to 0.00002 than the float value.
When you compare two floating point values with different precision, the compiler converts the value with less precision into exactly the same value with more precision. (The set of values representable as double is a superset of the set of values representable as float, so it is always possible to find a double whose value is the same as the value of a float.) And that's what happens when gap < 0.00002 is executed: gap is converted to the double of the same value, and that is compared with the double (close to) 0.00002. Since both of these values are actually slightly less than 0.00002, and the double is closer, the float is less than the double.
You can solve this problem in a couple of ways. First, you can avoid the conversion, either by making gap a double and changing the scanf format to %lf, or by comparing gap to a float:
while (gap < 0.00002F || gap > 0.99999F) {
But that's not really correct, for a couple of reasons. First, there is actually no guarantee that the floating point conversion done by the C compiler is the same as the conversion done by the standard library (scanf), and the standard allows the compiler to use "either the nearest representable value, or the larger or smaller representable value immediately adjacent to the nearest representable value, chosen in an implementation-defined manner." (It doesn't specify in detail which value scanf produces either, but recommends that it be the nearest representable value.) As it happens, gcc and glibc (the C compiler and standard library used on Linux) both produce the nearest representable value, but other implementations don't.
Anyway, according to your error message, you want the value to be between 0.00001 and 1.00000. So your test should be precisely that:
while (gap <= 0.00001F || gap >= 1.0000F) { ...
(assuming you keep gap as a float.)
Any of the above solutions will work. Personally, I'd make gap a double in order to make the comparison more intuitive, and also change the comparison to compare against 0.00001 and 1.0000.
By the way, the E-05 suffix means "times ten to the power of -5" (the E stands for Exponent). You'll see that a lot; it's a standard way of writing floating point constants.
floats are not capable of storing exact values for every possible number (infinite numbers between 0-1 therefore impossible). Assigning 0.00002 to a float will have a different but really close number due to the implementation which is what you are experiencing. Precision decreases as the number grows.
So you can't directly compare two close floats and have healthy results.
More information on floating points can be found on this Wikipedia page.
What you could do is emulate fixed point math. Have an int n = 100000; to represent 1.00000 internally (1000 -> 0.001 and such) and do calculations accordingly or use a fixed point math library.
Fraction part of single precision floating numbers can represent numbers from -2 to 2-2^-23 and have a fraction part with smallest quantization step of 2^-23. So if some value cannot be represented with a such step then it represented with a nearest value according to IEEE 754 rounding rules:
0.00002*32768 = 0.655360043 // floating point exponent is chosen.
0.655360043/(2^-23) = 5497558.5 // is not an integer multiplier
// of quantization step, so the
5497558*(2^-23) = 0.655359983 // nearest value is chosen
5497559*(2^-23) = 0.655360103 // from these two variants
First one variant equals to 1.999969797×10⁻⁵ in decimal format and the second one equals to 1.999999948×10⁻⁵ (just to compare - if we choose 5497560 we get 2.000000677×10⁻⁵). So the second variant can be choosen as a result and its value is not equal to 0.00002.
The total precision of floating point number depends on exponent value as well (takes values from -128 to 127): it can be computed by multiplication of fraction part quantization step and exponent value. In case of 0.00002 total precision is (2^-23)×(2^-15) = 3.6×(10^-12). It means if we add to 0.00002 a value which is smaller than a half of this value than 0.00002 remains the same. In general it means that numbers of floating point number which is meaningful are from 1×exponent to 2×(10^-23)×exponent.
That is why a very popular approach is to compare two floating numbers using some epsilon value which is greater than quantization step.
Like some of the comments said, due to how floating point numbers are represented, you will see errors like this.
A solution to this is convert it to
gap + 1e-8 < 0.0002
This gives you a small window of tolerance enough to let most cases you want to pass and most you dont want to fail
This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 9 years ago.
float a;
a=8.3;
if(a==8.3)
printf("1");
else
printf("2");
giving a as 8.3 and 8.4 respectively and comparing with 8.3 and 8.4 correspondingly , output becomes 2 but when comparing with 8.5 output is 1. I found that it is related to concept of recurring binary which takes 8 bytes. I want to know how to find which number is recurring binary. kindly give some input.
Recurring numbers are not representable, hence floating point comparison will not work.
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Also as in the 2nd comment - floating point literals 8.3 has type double and a has type float.
Comparing with epsilon – absolute error
Since floating point calculations involve a bit of uncertainty we can try to allow for this by seeing if two numbers are ‘close’ to each other. If you decide – based on error analysis, testing, or a wild guess – that the result should always be within 0.00001 of the expected result then you can change your comparison to this:
if (fabs(result - expectedResult) < 0.00001)
For example, 3/7 is a repeating binary fraction, its computed value in double precision is different from its stored value in single precision. Thus the comparison 3/7 with its stored computed value fails.
For more please read - What Every Computer Scientist Should Know About Floating-Point Arithmetic
You should not compare floating point numbers for equality using ==. Because of how floating point numbers are actually stored in memory it will give inaccurate results.
Use something like this to determine if your number a is close enough to the desired value:
if(fabs(a-8.3) < 0.0000005))
There are two problems here.
First is that floating point literals like 8.3 have type double, while a has type float. Doubles and floats store values to different precisions, and for values that don't have an exact floating point representation (such as 8.3), the stored values are slightly different. Thus, the comparison fails.
You could fix this by writing the comparison as a==8.3f; the f suffix forces the literal to be a float instead of a double.
However, it's bad juju to compare floating point values directly; again, most values cannot be represented exactly, but only to an approximation. If a were the result of an expression involving multiple floating-point calcuations, it may not be equivalent to 8.3f. Ideally, you should look at the difference between the two values, and if it's less than some threshold, then they are effectively equivalent:
if ( fabs( a - 8.3f) < EPSILON )
{
// a is "equal enough" to 8.3
}
The exact value of EPSILON depends on a number of factors, not least of which is the magnitude of the values being compared. You only have so many digits of precision, so if the values you're trying to compare are greater than 999999.0, then you can't test for differences within 0.000001 of each other.
I need a C rounding function which rounds numbers like MATLAB's round function. Is there one? If you don't know how MATLAB's round function works see this link:
MATLAB round function
I was thinking I might just write my own simple round function to match MATLAB's functionality.
Thanks,
DemiSheep
This sounds similar to the round() function from math.h
These functions shall round their
argument to the nearest integer value
in floating-point format, rounding
halfway cases away from zero,
regardless of the current rounding
direction.
There's also lrint() which gives you an int return value, though lrint() and friends obey the current rounding direction - you'll have to set that using fesetround() , the various rounding directions are found here.
Check out the standard header <fenv.c>, specifically the fesetround() function and the four macros FE_DOWNWARD, FE_TOWARDZERO, FE_TONEAREST and FE_UPWARD. This controls how floating point values are rounded to integers. Make sure your implementation (i.e., C compiler / C library) actually support this (by checking the return value of fesetround() and the documentation of your implementation).
Functions honoring these settings include (from <math.h>):
llrint()
llrintf()
llrintl()
lrint()
lrintf()
lrintl()
rint()
rintf()
rintl()
llround()
llroundf()
llroundl()
lround()
lroundf()
lroundl()
nearbyint()
nearbyintf()
nearbyintl()
depending on your needs (parameter type and return type, with or without inexact floating point exception).
NOTE: round(), roundf() and roundl() do look like they belong in the list above, but these three do not honor the rounding mode set by fesetround()!!
Refer to your most favourite standard library documentation for the exact details.
No, C (before C99) doesn't have a round function. The typical approach is something like this:
double sign(double x) {
if (x < 0.0)
return -1.0;
return 1.0;
}
double round(double x) {
return (long long)x + 0.5 * sign(x);
}
This rounds to an integer, assuming the original number is in the range that can be represented by a long long. If you want to round to a specific number of places after the decimal point, that can be a bit harder. If the numbers aren't too large or too small, you can multiply by 10N, round to an integer, and divide by 10N again (keeping in mind that this may introduce some rounding errors of its own).
If there isn't a round() function in the standard library, you could, if dealing with floating-point numbers, arbitrarily evaluate each value, analyze the number in the place after the place you want to round to, check to see if it's greater, equal-to, or less-than 5; Then, if the value is less than 5, you can floor() the number you're ultimately looking at. If the value of the digit after the place you're rounding to is 5 or greater, you can proceed to having the function floor() the number being evaluated, then add 1.
I apologize for any inefficiency tied to this.
If I'm not mistaken you are looking for something like floor and ceil and you shall find them in <math.h>
The documentation specifies
Y = round(X) rounds the elements of X to the nearest integers.
Not the plural: as per regular MATLAB operations, it operates on all elements of a matrix. The C equivalents posted above only deal with a single value at once. If you can use C++, check out Valarray. If not, then good ol' for loop is your friend.