How to multiply floating point in ANSI C? - c

The following code:
float numberForFactorial;
float floatingPart = 1.400000;
int integralPart = 1;
numberForFactorial = ((floatingPart) - (float)integralPart) * 10;
printf("%d", (int)numberForFactorial);
Returns 3 instead of 4. Can you explain me why?

The float closest to 1.400000 is slightly less than 1.4. You can verify that by doing
printf("%0.8hf\n", floatingPart);
The result from ideone is 1.39999998. This means that 10 times the first digit after the decimal point is 3.
To avoid this issue, use rounding instead of truncation. One easy way to round is by adding half before truncation:
printf("%d", (int)(numberForFactorial + 0.5f));
will print 4 as you were expecting. You can also use round, rint, lround, or modf to get the same result: Rounding is a complex topic, so choose the method whose constraints match your situation best.

This is due to binary representation of floating-point values. More specifically, the 0.4 or 2/5 cannot be expressed with mantissa as sum of any combination like 1/2 + 1/4 + 1/8 + ...
The literal 1.400000 is stored as something closer to 1.399999976158142 in its binary representation. The cast to int truncates non-integer part, giving three as the final result.
To be pedantic, the C standard does not require binary-based representation of floating-point data type, however IEEE 754 is de facto the standad one in today's computing.


Using floorf to reduce the number of decimals

I would like to use the first five digits of a number for computation.
For example,
A floating point number: 4.23654897E-05
I wish to use 4.2365E-05.I tried the following
#include <math.h>
#include <stdio.h>
float num = 4.23654897E-05;
int main(){
float rounded_down = floorf(num * 10000) / 10000;
return 0;
The output is 0.000000.The desired output is 4.2365E-05.
In short,say 52 bits are allocated for storing the mantissa.Is there a way to reduce the number of bits being allocated?
Any suggestions on how this can be done?
A number x that is positive and within the normal range can be rounded down approximately to five significant digits with:
double l = pow(10, floor(log10(x)) - 4);
double y = l * floor(x / l);
This is useful only for tinkering with floating-point arithmetic as a learning tool. The exact mathematical result is generally not exactly representable, because binary floating-point cannot represent most decimal values exactly. Additionally, rounding errors can occur in the pow, /, and * operations that may cause the result to differ slightly from the true mathematical result of rounding x to five significant digits. Also, poor implementations of log10 or pow can cause the result to differ from the true mathematical result.
I'd go:
printf("%.6f", num);
Or you can try using snprintf() from stdlib.h:
float num = 4.23654897E-05; char output[50];
snprintf(output, 50, "%f", num);
printf("%s", output);
The result is expected. The multiplication by 10000 yield 0.423.. the nearest integer to it is 0. So the result is 0. Rounding can be done using format specifier %f to print the result upto certain decimal places after decimal point.
If you check the return value of floorf you will see it returns If no errors occur, the largest integer value not greater than arg, that is ⌊arg⌋, is returned. where arg is the passed argument.
Without using floatf you can use %e or (%E)format specifier to print it accordingly.
which outputs:
After David's comment:
Your way of doing things is right but the number you multiplied is wrong. The thing is 4.2365E-05 is 0.00004235.... Now if you multiply it with 10000 then it will 0.42365... Now you said I want the expression to represent in that form. floorf returns float in this case. Store it in a variable and you will be good to go. The rounded value will be in that variable. But you will see that the rounded down value will be 0. That is what you got.
float rounded_down = floorf(num * 10000) / 10000;
This will hold the correct value rounded down to 4 digits after . (not in exponent notation with E or e). Don't confuse the value with the format specifier used to represent it.
What you need to do in order to get the result you want is move the decimal places to the right. To do that multiply with larger number. (1e7 or 1e8 or as you want it to).
I would like to use the first five digits of a number for computation.
In general, floating point numbers are encoded using binary and OP wants to use 5 significant decimal digits. This is problematic as numbers like 4.23654897E-05 and 4.2365E-05 are not exactly representable as a float/double. The best we can do is get close.
The floor*() approach has problems with 1) negative numbers (should have used trunc()) and 2) values near x.99995 that during rounding may change the number of digits. I strongly recommend against it here as such solutions employing it fail many corner cases.
The *10000 * power10, round, /(10000 * power10) approach suffers from 1) power10 calculation (1e5 in this case) 2) rounding errors in the multiple, 3) overflow potential. The needed power10 may not be exact. * errors show up with cases when the product is close to xxxxx.5. Often this intermediate calculation is done using wider double math and so the corner cases are rare. Bad rounding using (some_int_type) which has limited range and is a truncation instead of the better round() or rint().
An approach that gets close to OP's goal: print to 5 significant digits using %e and convert back. Not highly efficient, yet handles all cases well.
int main(void) {
float num = 4.23654897E-05f;
// sign d . dddd e sign expo + \0
#define N (1 + 1 + 1 + 4 + 1 + 1 + 4 + 1)
char buf[N*2]; // Use a generous buffer - I like 2x what I think is needed.
// OP wants 5 significant digits so print 4 digits after the decimal point.
sprintf(buf, "%.4e", num);
float rounded = (float) atof(buf);
printf("%.5e %s\n", rounded, buf);
4.23650e-05 4.2365e-05
Why 5 in %.5e: Typical float will print up to 6 significant decimal digits as expected (research FLT_DIG), so 5 digits after the decimal point are printed. The exact value of rounded in this case was about 4.236500171...e-05 as 4.2365e-05 is not exactly representable as a float.

How to round 8.475 to 8.48 in C (rounding function that takes into account representation issues)? Reducing probability of issue

I am trying to round 8.475 to 8.48 (to two decimal places in C). The problem is that 8.475 internally is represented as 8.47499999999999964473:
double input_test =8.475;
printf("input tests: %.20f, %.20f \n", input_test, *&input_test);
input tests: 8.47499999999999964473, 8.47499999999999964473
So, if I had an ideal round function then it would round 8.475=8.4749999... to 8.47. So, internal round function is no appropriate for me. I see that rounding problem arises in cases of "underflow" and therefore I am trying to use the following algorithm:
double MyRound2( double * value) {
double ad;
long long mzr;
double resval;
if ( *value < 0.000000001 )
ad = -0.501;
ad = 0.501;
mzr = long long (*value);
resval = *value - mzr;
resval= (long long( resval*100+ad))/100;
return resval;
This solves the "underflow" issue and it works well for "overflow" issues as well. The problem is that there are valid values x.xxx99 for which this function incorrectly gives bigger value (because of 0.001 in 0.501). How to solve this issue, how to devise algorithm that can detect floating point representation issue and that can round taking account this issue? Maybe C already has such clever rounding function? Maybe I can select different value for constant ad - such that probability of such rounding errors goes to zero (I mostly work with money values with up to 4 decimal ciphers).
I have read all the popoular articles about floating point representation and I know that there are tricky and unsolvable issues, but my client do not accept such explanation because client can clearly demonstrate that Excel handles (reproduces, rounds and so on) floating point numbers without representation issues.
(The C and C++ standards are intentionally flexible when it comes to the specification of the double type; quite often it is IEEE754 64 bit type. So your observed result is platform-dependent).
You are observing of the pitfalls of using floating point types.
Sadly there isn't an "out-of-the-box" fix for this. (Adding a small constant pre-rounding just pushes the problem to other numbers).
Moral of the story: don't use floating point types for money.
Use a special currency type instead or work in "pence"; using an integral type instead.
By the way, Excel does use an IEEE754 double precision floating point for its number type, but it also has some clever tricks up its sleeve. Essentially it tracks the joke digits carefully and also is clever with its formatting. This is how it can evaluate 1/3 + 1/3 + 1/3 exactly. But even it will get money calculations wrong sometimes.
For financial calculations, it is better to work in base-10 to avoid represenatation issues when going to/from binary. In many countries, financial software is even legally required to do so. Here is one library for IEEE 754R Decimal Floating-Point Arithmetic, have not tried it myself:
Also note that working in decimal floating-point instead of fixed-point representation allows clever algoritms like the Kahan summation algorithm, to avoid accumulation of rounding errors. A noteworthy difference to normal floating point is that numbers with few significant digits are not normalized, so you can have e.g both 1*10^2 and .1*10^3.
An implementation note is that one representation in the std uses a binary significand, to allow sw implementations using a standard binary ALU.
How about this one: Define some threshold. This threshold is the distance to the next multiple of 0.005 at which you assume that this distance could be an error of imprecision. Execute appropriate methods if it's within that distance and smaller. Round as usual and at the end, if you detected that it was, add 0.01.
That said, this is only a work around and somewhat of a code smell. If you don't need too much speed, go for some other type than float. Like your own type that works like
class myDecimal{ int digits; int exponent_of_ten; } with value = digits * E exponent_of_ten
I am not trying to argument that using floating point numbers to represent money is advisable - it is not! but sometimes you have no choice... We do kind of work with money (life incurance calculations) and are forced to use floating point numbers for everything including values representing money.
Now there are quite some different rounding behaviours out there: round up, round down, round half up, round half down, round half even, maybe more. It looks like you were after round half up method.
Our round-half-up function - here translated from Java - looks like this:
#include <iostream>
#include <cmath>
#include <cfloat>
using namespace std;
int main()
double value = 8.47499999999999964473;
double result = value * pow(10, 2);
result = nextafter(result + (result > 0.0 ? 1e-8 : -1e-8), DBL_MAX);
double integral = floor(result);
double fraction = result - integral;
if (fraction >= 0.5) {
result = ceil(result);
} else {
result = integral;
result /= pow(10, 2);
cout << result << endl;
return 0;
where nextafter is a function returning the next floating point value after the given value - this code is proved to work using C++11 (AFAIK the nextafter is also available in boost), the result written into the standard output is 8.48.

Multiplying two floats doesn't give exact result

I am trying to multiply two floats as follows:
float number1 = 321.12;
float number2 = 345.34;
float rexsult = number1 * number2;
The result I want to see is 110895.582, but when I run the code it just gives me 110896. Most of the time I'm having this issue. Any calculator gives me the exact result with all decimals. How can I achive that result?
edit : It's C code. I'm using XCode iOS simulator.
There's a lot of rounding going on.
float a = 321.12; // this number will be rounded
float b = 345.34; // this number will also be rounded
float r = a * b; // and this number will be rounded too
printf("%.15f\n", r);
I get 110895.578125000000000 after the three separate roundings.
If you want more than 6 decimal digits' worth of precision, you will have to use double and not float. (Note that I said "decimal digits' worth", because you don't get decimal digits, you get binary.) As it stands, 1/2 ULP of error (a worst-case bound for a perfectly rounded result) is about 0.004.
If you want exactly rounded decimal numbers, you will have to use a specialized decimal library for such a task. A double has more than enough precision for scientists, but if you work with money everything has to be 100% exact. No floating point numbers for money.
Unlike integers, floating point numbers take some real work before you can get accustomed to their pitfalls. See "What Every Computer Scientist Should Know About Floating-Point Arithmetic", which is the classic introduction to the topic.
Edit: Actually, I'm not sure that the code rounds three times. It might round five times, since the constants for a and b might be rounded first to double-precision and then to single-precision when they are stored. But I don't know the rules of this part of C very well.
You will never get the exact result that way.
First of all, number1 ≠ 321.12 because that value cannot be represented exactly in a base-2 system. You'll need an infinite number of bits for it.
The same holds for number2 ≠ 345.34.
So, you begin with inexact values to begin with.
Then the product will get rounded because multiplication gives you double the number of significant digits but the product has to be stored in float again if you multiply floats.
You probably want to use a 10-based system for your numbers. Or, in case your numbers only have 2 decimal digits of the fractional, you can use integers (32-bit integers are sufficient in this case, but you may end up needing 64-bit):
32112 * 34534 = 1108955808.
That represents 321.12 * 345.34 = 110895.5808.
Since you are using C you could easily set the precision by using "%.xf" where x is the wanted precision.
For example:
float n1 = 321.12;
float n2 = 345.34;
float result = n1 * n2;
printf("%.20f", result);
However, note that float only gives six digits of precision. For better precision use double.
floating point variables are only approximate representation, not precise one. Not every number can "fit" into float variable. For example, there is no way to put 1/10 (0.1) into binary variable, just like it's not possible to put 1/3 into decimal one (you can only approximate it with endless 0.33333)
when outputting such variables, it's usual to apply many rounding options. Unless you set them all, you can never be sure which of them are applied. This is especially true for << operators, as the stream can be told how to round BEFORE <<.
Printf also does some rounding. Consider
float t = 0.1f;
printf("result: %f\n", t);
result: 0.100000
Well, it looks fine. Why? Because printf defaulted to some precision and rounded up the output. Let's dial in 50 places after decimal point:
float t = 0.1f;
printf("result: %.50f\n", t);
result: 0.10000000149011611938476562500000000000000000000000
That's different, isn't it? After 625 the float ran out of capacity to hold more data, that's why we see zeroes.
A double can hold more digits, but 0.1 in binary is not finite. Double has to give up, eventually:
double t = 0.1;
printf("result: %.70f\n", t);
result: 0.1000000000000000055511151231257827021181583404541015625000000000000000
In your example, 321.12 alone is enough to cause trouble:
float t = 321.12f;
printf("and the result is: %.50f\n", t);
result: 321.11999511718750000000000000000000000000000000000000
This is why one has to round up floating point values before presenting them to humans.
Calculator programs don't use floats or doubles at all. They implement decimal number format. eg:
struct decimal
int mantissa; //meaningfull digits
int exponent; //number of decimal zeroes
Ofc that requires reinventing all operations: addition, substraction, multiplication and division. Or just look for a decimal library.

Can anyone explain me this feature simply?

I have the following code,
float a = 0.7;
if(0.7 > a)
printf("Hello\n"); //Line1
float a = 0.98;
if(0.98 > a)
printf("Hello\n"); //Line2
here line1 outputs Hi but Line2 outputs Hello. I assume there would be a certain criteria about double constant and float, i.e any one of them would become larger on evaluation. But this two codes clarify me that situation can be come when double constant get larger and some other times float get larger. Is there any rounding off issue behind this? If it is, please explain me. I am badly in need of this clear..
thanks advance
What you have is called representation error.
To see what is going on you might find it easier to first consider the decimal representations of 1/3, 1/2 and 2/3 stored with different precision (3 decimal places or 6 decimal places):
a = 0.333
b = 0.333333
a < b
a = 0.500
b = 0.500000
a == b
a = 0.667
b = 0.666667
a > b
Increasing the precision can make the number slightly larger, slightly smaller, or have the same value.
The same logic applies to binary floating point numbers.
float a = 0.7;
Now a is the closest single-precision floating point value to 0.7. For the comparison 0.7 > a that is promoted to double, since the type of the constant 0.7 is double, and its value is the closest double-precision floating point value to 0.7. These two values are different, since 0.7 isn't exactly representable, so one value is larger than the other.
The same applies to 0.98. Sometimes, the closest single-precision value is larger than the decimal fraction and the closest double-precision number smaller, sometimes the other way round.
This is part of What Every Computer Scientist Should Know About Floating-Point Arithmetic.
This is simply one of the issues with floating point precision.
While there are an infinite number of floating point numbers, there are not an infinite number of floating point representations due to the bit-constraints. So there will be rounding errors when using floats in this manner.
There is no criteria for where it decides to round up or down, that would probably be language -implementation or compiler dependent.
See here:, and for more details.

Why is not a==0 in the following code?

#include <stdio.h>
int main( )
float a=1.0;
long i;
for(i=0; i<100; i++)
a = a - 0.01;
Result is: 6.59e-07
It's a binary floating point number, not a decimal one - therefore you need to expect rounding errors. See the Basic section in this article:
What Every Programmer Should Know About Floating-Point Arithmetic
For example, the value 0.01 does not have a precise represenation in binary floating point type. To get a "correct" result in your sample you would have to either round or use a a decimal floating point type (see Wikipedia):
Binary fixed-point types are most commonly used, because the rescaling operations can be implemented as fast bit shifts. Binary fixed-point numbers can represent fractional powers of two exactly, but, like binary floating-point numbers, cannot exactly represent fractional powers of ten. If exact fractional powers of ten are desired, then a decimal format should be used. For example, one-tenth (0.1) and one-hundredth (0.01) can be represented only approximately by binary fixed-point or binary floating-point representations, while they can be represented exactly in decimal fixed-point or decimal floating-point representations. These representations may be encoded in many ways, including BCD.
There are two questions here. If you're asking, why is my printf statement displaying the result as 6.59e-07 instead of 0.000000659, it's because you've used the format specifier for Scientific Notation: %e. You want %f for the floating point a.
If you're asking why the result is not exactly zero rather than 0.000000659, the answer is (as others have pointed out) that with floating point arithmetic using binary numbers you need to expect rounding.
You have to specify %f for printing the float number then it will print 0 for variable a.
That's floating point numbers rounding errors on the scene. Each time you subtract a fraction you get approximately the result you'd normally expect from a number on paper and so the final result is very close to zero, but not necessarily precise zero.
The precision with floating numbers isn't accurate, that's why you find this result.
