While writing this answer, I used the mpf_pow function to calculate 12.3 ^ 123, and the result is different from the one given by WolframAlpha (which by the way also uses GMP).
I casted the code to pure C to simplify:
#include <stdio.h>
#include <gmp.h>
int main (void) {
mpf_t a, c;
unsigned long int b = 123UL;
mpf_inits(a, c, NULL);
mpf_set_d(a, 12.3);
mpf_pow_ui(c, a, b);
gmp_printf("c = %.50Ff\n", c);
return 0;
Which results in
While WolframAlpha returns
1.14374367934617190099880295228066276746218078451850229775887975052369504785666896446606568365201542169649974727730628842345343196581134895919942820874449837212099476648958359023796078549041949007807220625356526926729664064846685758382803707100766740220839267 × 10^134
which starts to disagree with mpf_pow at the 15th digit.
Am I doing something wrong in the code, is this a limitation of GMP, or is WolframAlpha giving an incorrect result?

Am I doing something wrong in the code, is this a limitation of GMP, or is WolframAlpha giving an incorrect result?
You are doing something different from what Wolfram is doing (obviously). Your code is not wrong, per se, but it is not doing what you probably think it is doing. Compare the output of this variation:
#include <stdio.h>
#include <gmp.h>
int main (void) {
mpf_t a, c;
unsigned long int b = 123UL;
mpf_inits(a, c, NULL);
mpf_set_d(a, 12.3);
mpf_pow_ui(c, a, b);
gmp_printf("c = %.50Ff\n", c);
mpf_t a1, c1;
mpf_inits(a1, c1, NULL);
mpf_set_str(a1, "12.3", 10);
mpf_pow_ui(c1, a1, b);
gmp_printf("c' = %.50Ff\n", c1);
return 0;
c = 114374367934618002778643226182707594198913258409535335775583252201365538178632825702225459029661601216944929436371688246107986574246790.32099077871758646985223686110515186972735931183764
c' = 114374367934617190099880295228066276746218078451850229775887975052369504785666896446606568365201542169649974727730628842345343196581134.89591994282087444983721209947664895835902379607855
The difference between the two output values arises because my C implementation and yours represent values of type double in binary floating point, and 12.3 is not exactly representable in binary floating point (see Is floating point math broken?). C provides the closest approximation available, which, assuming 64-bit IEEE 754 representation, matches to about 15 decimal digits of precision. When you initialize a GMP variable with such a value, you get an exact GMP representation of the actual double value, which is only an approximation to 12.3 decimal.
But GMP can represent 12.3 (decimal) to whatever precision you choose.* You chose a very high precision, so when you use a decimal string to initialize your MP-float variable you get a much closer approximation than when you used a double. Naturally, performing the same operation on those different values produces different results. The GMP result in the latter case appears to agree with the Wolfram result to the full precision in which it is expressed.
Note also that in a general sense, one can also use decimal floating-point, in software or (if you are so equipped) in hardware. The value 12.3 (decimal) can be represented exactly in such a format, but that's not what GMP uses.
* Or indeed, GMP can represent 12.3 exactly as a MP rational, though that's not what the code above does.

This gives a result similar to WolframAlpha's:
from decimal import Decimal
from decimal import getcontext
getcontext().prec = 200
print(Decimal('12.3') ** 123)
So you must be doing something wrong in your GMP configuration.


Using round() function in c

I'm a bit confused about the round() function in C.
First of all, man says:
#include <math.h>
double round(double x);
These functions return the rounded integer value.
If x is integral, +0, -0, NaN, or infinite, x itself is returned.
The return value is a double / float or an int?
In second place, I've created a function that first rounds, then casts to int. Latter on my code I use it as a mean to compare doubles
int tointn(double in,int n)
int i = 0;
i = (int)round(in*pow(10,n));
return i;
This function apparently isn't stable throughout my tests. Is there redundancy here? Well... I'm not looking only for an answer, but a better understanding on the subject.
The wording in the man-page is meant to be read literally, that is in its mathematical sense. The wording "x is integral" means that x is an element of Z, not that x has the data type int.
Casting a double to int can be dangerous because the maximum arbitrary integral value a double can hold is 2^52 (assuming an IEEE 754 conforming binary64 ), the maximum value an int can hold might be smaller (it is mostly 32 bit on 32-bit architectures and also 32-bit on some 64-bit architectures).
If you need only powers of ten you can test it with this little program yourself:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(){
int i;
for(i = 0;i < 26;i++){
printf("%d:\t%.2f\t%d\n",i, pow(10,i), (int)pow(10,i));
Instead of casting you should use the functions that return a proper integral data type like e.g.: lround(3).
here is an excerpt from the man page.
#include <math.h>
double round(double x);
float roundf(float x);
long double roundl(long double x);
notice: the returned value is NEVER a integer. However, the fractional part of the returned value is set to 0.
notice: depending on exactly which function is called will determine the type of the returned value.
Here is an excerpt from the man page about which way the rounding will be done:
These functions round x to the nearest integer, but round halfway cases
away from zero (regardless of the current rounding direction, see
fenv(3)), instead of to the nearest even integer like rint(3).
For example, round(0.5) is 1.0, and round(-0.5) is -1.0.
If you want a long integer to be returned then please use lround:
long int tolongint(double in)
return lround(in));
For details please see lround which is available as of the C++ 11 standard.

pow numeric error in c

I'm wondering where does the numeric error happen, in what layer.
Let me explain using an example:
int p = pow(5, 3);
printf("%d", p);
I've tested this code on various HW and compilers (VS and GCC) and some of them print out 124, and some 125.
On the same HW (OS) i get different results in different compilers (VS and GCC).
On the different HW(OS) I get different results in the same compiler (cc (GCC) 4.8.1).
AFAIK, pow computes to 124.99999999 and that gets truncated to int, but where does this error happen?
Or, in other words, where does the correction happen (124.99->125)
Is it a compiler-HW interaction?
//****** edited:
Here's an additional snippet to play with (keep an eye on p=5, p=18, ...):
#include <stdio.h>
#include <math.h>
int main(void) {
int p;
for (p = 1; p < 20; p++) {
printf("\n%d %d %f %f", (int) pow(p, 3), (int) exp(3 * log(p)), pow(p, 3), exp(3 * log(p)));
return 0;
(First note that for an IEEE754 double precision floating point type, all integers up to the 53rd power of 2 can be represented exactly. Blaming floating point precision for integral pow inaccuracies is normally incorrect).
pow(x, y) is normally implemented in C as exp(y * log(x)). Hence it can "go off" for even quite small integral cases.
For small integral cases, I normally write the computation long-hand, and for other integral arguments I use a 3rd party library. Although a do-it-yourself solution using a for loop is tempting, there are effective optimisations that can be done for integral powers that such a solution might not exploit.
As for the observed different results, it could be down to some of the platforms using an 80 bit floating point intermediary. Perhaps some of the computations then are above 125 and others are below that.

Can I calculate error introduced by doubles?

Suppose I have an irrational number like \sqrt{3}. As it is irrational, it has no decimal representation. So when you try to express it with a IEEE 754 double, you will introduce an error.
A decimal representation with a lot of digits is:
Now, when I calculate \sqrt{3}, I get 1.732051:
#include <stdio.h> // printf
#include <math.h> // needed for sqrt
int main() {
double myVar = sqrt (3);
printf("as double:\t%f\n", myVar);
According to Wolfram|Alpha, I have an error of 1.11100... × 10^-7.
Is there any way I can calculate the error myself?
(I don't mind switching to C++, Python or Java. I could probably also use Mathematica, if there is no simple alternative)
Just to clarify: I don't want a solution that works only for sqrt{3}. I would like to get a function that gives me the error for any number. If that is not possible, I would at least like to know how Wolfram|Alpha gets more values.
My try
While writing this question, I found this:
#include <stdio.h> // printf
#include <math.h> // needed for sqrt
#include <float.h> // needed for higher precision
int main() {
long double r = sqrtl(3.0L);
printf("Precision: %d digits; %.*Lg\n",LDBL_DIG,LDBL_DIG,r);
With this one, I can get the error down to 2.0 * 10^-18 according to Wolfram|Alpha. So I thought this might be close enough to get a good estimation of the error. I wrote this:
#include <stdio.h> // printf
#include <math.h> // needed for sqrt
#include <float.h>
int main() {
double myVar = sqrt (3);
long double r = sqrtl(3.0L);
long double error = abs(r-myVar) / r;
printf("Double:\t\t%f\n", myVar);
printf("Precision:\t%d digits; %.*Lg\n",LDBL_DIG,LDBL_DIG,r);
printf("Error:\t\t%.*Lg\n", LDBL_DIG, error);
But it outputs:
Double: 1.732051
Precision: 18 digits; 1.73205080756887729
Error: 0
How can I fix that to get the error?
What every Programmer should know about Floating Point Arithmetic by Goldberg is the definite guide you are looking for.
printf rounds doubles to 6 places when you use %f without a precision.
double x = 1.3;
long double y = 1.3L;
long double err = y - (double) x;
printf("Error %.20Lf\n", err);
My output: -0.00000000000000004445
If the result is 0, your long double and double are the same.
One way to obtain an interval that is guaranteed to contain the real value of the computation is to use interval arithmetic. Then, comparing the double result to the interval tells you how far the double computation is, at worst, from the real computation.
Frama-C's value analysis can do this for you with option -all-rounding-modes.
double Frama_C_sqrt(double x);
double sqrt(double x)
return Frama_C_sqrt(x);
double y;
int main(){
y = sqrt(3.0);
Analyzing the program with:
frama-c -val t.c -float-normal -all-rounding-modes
[value] Values at end of function main:
y ∈ [1.7320508075688772 .. 1.7320508075688774]
This means that the real value of sqrt(3), and thus the value that would be in variable y if the program computed with real numbers, is within the double bounds [1.7320508075688772 .. 1.7320508075688774].
Frama-C's value analysis does not support the long double type, but if I understand correctly, you were only using long double as reference to estimate the error made with double. The drawback of that method is that long double is itself imprecise. With interval arithmetic as implemented in Frama-C's value analysis, the real value of the computation is guaranteed to be within the displayed bounds.
You have a mistake in printing Double: 1.732051 here printf("Double:\t\t%f\n", myVar);
The actual value of double myVar is
1.732050807568877281 //18 digits
so 1.732050807568877281-1.732050807568877281 is zero
According to the C standard printf("%f", d) will default to 6 digits after the decimal point. This is not the full precision of your double.
It might be that double and long double happen to be the same on your architecture. I have different sizes for them on my architecture and get a non-zero error in your example code.
You want fabsl instead of abs when calculating the error, at least when using C. (In C, abs is integer.) With this substitution, I get:
Double: 1.732051
Precision: 18 digits; 1.73205080756887729
Error: 5.79643049346087304e-17
(Calculated on Mac OS X 10.8.3 with Apple clang 4.0.)
Using long double to estimate the errors in double is a reasonable approach for a few simple calculations, except:
If you are calculating the more accurate long double results, why bother with double?
Error behavior in sequences of calculations is hard to describe and can grow to the point where long double is not providing an accurate estimate of the exact result.
There exist perverse situations where long double gets less accurate results than double. (Mostly encountered when somebody constructs an example to teach students a lesson, but they exist nonetheless.)
In general, there is no simple and efficient way to calculate the error in a floating-point result in a sequence of calculations. If there were, it would be effectively a means of calculating a more accurate result, and we would use that instead of the floating-point calculations alone.
In special cases, such as when developing math library routines, the errors resulting from a particular sequence of code are studied carefully (and the code is redesigned as necessary to have acceptable error behavior). More often, error is estimated either by performing various “experiments” to see how much results fluctuate with varying inputs or by studying general mathematical behavior of systems.
You also asked “I would like to get a function that gives me the error for any number.” Well, that is easy, given any number x and the calculated result x', the error is exactly x' – x. The actual problem is you probably do not have a description of x that can be used to evaluate that expression easily. In your example, x is sqrt(3). Obviously, then, the error is sqrt(3) – x, and x is exactly 1.732050807568877193176604123436845839023590087890625. Now all you need to do is evaluate sqrt(3). In other words, numerically evaluating the error is about as hard as numerically evaluating the original number.
Is there some class of numbers you want to perform this analysis for?
Also, do you actually want to calculate the error or just a good bound on the error? The latter is somewhat easier, although it remains hard for sequences of calculations. For all elementary operations, IEEE 754 requires the produced result to be the result that is nearest the mathematically exact result (in the appropriate direction for the rounding mode being used). In round-to-nearest mode, this implies that each result is at most 1/2 ULP (unit of least precision) away from the exact result. For operations such as those found in the standard math library (sine, logarithm, et cetera), most libraries will produce results within a few ULP of the exact result.

Storing numbers with higher precision in C

I am writing a program in which I need to store numbers with a very high precision(around 10^-10) and then further use them a parameter( create_bloomfilter ([yet to decide the type] falsePositivity, long expected_num_of_elem) ).
The highest precision I am able to get is with double (something around 10^-6) which is not sufficient.
How can we store numbers with more higher precision in c?
You have been misinformed about double.
The smallest positive number you can store in a double is about 2⨯10-308, not counting denormalized numbers, which can be smaller. Denormals go down to 5⨯10-324. They have the equivalent of about 15-17 digits of precision, which is sufficient to measure the diameter of the Earth to within the size of a red blood cell, the smallest cell in the human body.
If you really need more precision, you need MPFR. (If your algorithms are numerically unstable, MPFR might not help.)
Edit: I figured out what you are doing wrong.
In C, 10^-7 is an integer expression. It should be equal to -13 on most systems. The ^ operator is the bitwise XOR operator, not the exponentiation operator. There is no exponentiation operator in C, because C operators generally correspond to more primitive operations, at least in terms of hardware implementation.
You want 1e-7, or pow(10, -7).
#include <stdio.h>
#include <math.h>
int main(int argc, char *argv[])
printf("2e-308 = %g\n", 2e-308);
printf("2 * pow(10, -308) = %g\n", 2 * pow(10, -308));
printf("10^-7 = %d\n", 10^-7);
return 0;
2e-308 = 2e-308
2 * pow(10, -308) = 2e-308
10^-7 = -13
Note that there are a lot of gotchas with floating point numbers.
Try GNU MPFR library and GNU GMP library
The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.
GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.
Is long double sufficient? Some implementations use 128bit long double, which should easily handle your requirements.
If you're looking for something extremely strong, check out MPFR

Why does GCC give an unexpected result when adding float values?

I'm using GCC to compile a program which adds floats, longs, ints and chars. When it runs, the result is bad. The following program unexpectedly prints the value of 34032.101562.
Recompiling with a Microsoft compiler gives the right result.
#include <stdio.h>
int main (void) {
const char val_c = 10;
const int val_i = 20;
const long val_l = 34000;
const float val_f = 2.1;
float result;
result = val_c + val_i + val_l + val_f;
printf("%f\n", result);
return 0;
What do you think the "right result" is? I'm guessing that you believe it is 34032.1. It isn't.
2.1 is not representable as a float, so val_f instead is initialized with the closest representable float value. In binary, 2.1 is:
a float has 24 binary digits, so the value of val_f in binary is:
The expression resultat = val_c + val_i + val_l + val_f computes 34030 + val_f, which is evaluated in single-precision and causes another rounding to occur.
+ 10.0001100110011001100110
rounds to 24 digits:
In decimal, this result is exactly 34032.1015625. Because the %f format prints 6 digits after the decimal point (unless specified otherwise), this is rounded again, and printf prints 34032.101562.
Now, why do you not get this result when you compile with MSVC? The C and C++ standard allow floating-point calculations to be carried out in a wider type if the compiler chooses to do so. MSVC does this with your calculation, which means that the result of 34030 + val_f is not rounded before being passed to printf. In that case, the exact floating-point value being printed is 34032.099999999991268850862979888916015625, which is rounded to 34032.1 by printf.
Why don't all compilers do what MSVC does? A few reasons. First, it's slower on some processors. Second, and more importantly, although it can give more accurate answers, the programmer cannot depend on that -- seemingly unrelated code changes can cause the answer to change in the presence of this behavior. Because of this, carrying extra precision often causes more problems than it solves.
Google David Goldberg's paper "What Every Computer Scientist Should Know About
Floating-Point Arithmetic".
The float format has only about 6-7 digits of precision. Use %7.1f or some other reasonable format and you will like your results better.
I don't see any problem here. 2.1 has no exact representation in IEEE floating-point format, and as such, it is converting the entire answer to a floating-point number with around 6-7 (correct) sig-figs. If you need more precision, use a double.
