Storing numbers with higher precision in C - c

I am writing a program in which I need to store numbers with a very high precision(around 10^-10) and then further use them a parameter( create_bloomfilter ([yet to decide the type] falsePositivity, long expected_num_of_elem) ).
The highest precision I am able to get is with double (something around 10^-6) which is not sufficient.
How can we store numbers with more higher precision in c?

You have been misinformed about double.
The smallest positive number you can store in a double is about 2⨯10-308, not counting denormalized numbers, which can be smaller. Denormals go down to 5⨯10-324. They have the equivalent of about 15-17 digits of precision, which is sufficient to measure the diameter of the Earth to within the size of a red blood cell, the smallest cell in the human body.
If you really need more precision, you need MPFR. (If your algorithms are numerically unstable, MPFR might not help.)
Edit: I figured out what you are doing wrong.
In C, 10^-7 is an integer expression. It should be equal to -13 on most systems. The ^ operator is the bitwise XOR operator, not the exponentiation operator. There is no exponentiation operator in C, because C operators generally correspond to more primitive operations, at least in terms of hardware implementation.
You want 1e-7, or pow(10, -7).
#include <stdio.h>
#include <math.h>
int main(int argc, char *argv[])
{
printf("2e-308 = %g\n", 2e-308);
printf("2 * pow(10, -308) = %g\n", 2 * pow(10, -308));
printf("10^-7 = %d\n", 10^-7);
return 0;
}
Output:
2e-308 = 2e-308
2 * pow(10, -308) = 2e-308
10^-7 = -13
Note that there are a lot of gotchas with floating point numbers.

Try GNU MPFR library and GNU GMP library
The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.
GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.

Is long double sufficient? Some implementations use 128bit long double, which should easily handle your requirements.
http://en.wikipedia.org/wiki/Quadruple_precision
If you're looking for something extremely strong, check out MPFR

Related

How to round 8.475 to 8.48 in C (rounding function that takes into account representation issues)? Reducing probability of issue

I am trying to round 8.475 to 8.48 (to two decimal places in C). The problem is that 8.475 internally is represented as 8.47499999999999964473:
double input_test =8.475;
printf("input tests: %.20f, %.20f \n", input_test, *&input_test);
gives:
input tests: 8.47499999999999964473, 8.47499999999999964473
So, if I had an ideal round function then it would round 8.475=8.4749999... to 8.47. So, internal round function is no appropriate for me. I see that rounding problem arises in cases of "underflow" and therefore I am trying to use the following algorithm:
double MyRound2( double * value) {
double ad;
long long mzr;
double resval;
if ( *value < 0.000000001 )
ad = -0.501;
else
ad = 0.501;
mzr = long long (*value);
resval = *value - mzr;
resval= (long long( resval*100+ad))/100;
return resval;
}
This solves the "underflow" issue and it works well for "overflow" issues as well. The problem is that there are valid values x.xxx99 for which this function incorrectly gives bigger value (because of 0.001 in 0.501). How to solve this issue, how to devise algorithm that can detect floating point representation issue and that can round taking account this issue? Maybe C already has such clever rounding function? Maybe I can select different value for constant ad - such that probability of such rounding errors goes to zero (I mostly work with money values with up to 4 decimal ciphers).
I have read all the popoular articles about floating point representation and I know that there are tricky and unsolvable issues, but my client do not accept such explanation because client can clearly demonstrate that Excel handles (reproduces, rounds and so on) floating point numbers without representation issues.
(The C and C++ standards are intentionally flexible when it comes to the specification of the double type; quite often it is IEEE754 64 bit type. So your observed result is platform-dependent).
You are observing of the pitfalls of using floating point types.
Sadly there isn't an "out-of-the-box" fix for this. (Adding a small constant pre-rounding just pushes the problem to other numbers).
Moral of the story: don't use floating point types for money.
Use a special currency type instead or work in "pence"; using an integral type instead.
By the way, Excel does use an IEEE754 double precision floating point for its number type, but it also has some clever tricks up its sleeve. Essentially it tracks the joke digits carefully and also is clever with its formatting. This is how it can evaluate 1/3 + 1/3 + 1/3 exactly. But even it will get money calculations wrong sometimes.
For financial calculations, it is better to work in base-10 to avoid represenatation issues when going to/from binary. In many countries, financial software is even legally required to do so. Here is one library for IEEE 754R Decimal Floating-Point Arithmetic, have not tried it myself:
http://www.netlib.org/misc/intel/
Also note that working in decimal floating-point instead of fixed-point representation allows clever algoritms like the Kahan summation algorithm, to avoid accumulation of rounding errors. A noteworthy difference to normal floating point is that numbers with few significant digits are not normalized, so you can have e.g both 1*10^2 and .1*10^3.
An implementation note is that one representation in the std uses a binary significand, to allow sw implementations using a standard binary ALU.
How about this one: Define some threshold. This threshold is the distance to the next multiple of 0.005 at which you assume that this distance could be an error of imprecision. Execute appropriate methods if it's within that distance and smaller. Round as usual and at the end, if you detected that it was, add 0.01.
That said, this is only a work around and somewhat of a code smell. If you don't need too much speed, go for some other type than float. Like your own type that works like
class myDecimal{ int digits; int exponent_of_ten; } with value = digits * E exponent_of_ten
I am not trying to argument that using floating point numbers to represent money is advisable - it is not! but sometimes you have no choice... We do kind of work with money (life incurance calculations) and are forced to use floating point numbers for everything including values representing money.
Now there are quite some different rounding behaviours out there: round up, round down, round half up, round half down, round half even, maybe more. It looks like you were after round half up method.
Our round-half-up function - here translated from Java - looks like this:
#include <iostream>
#include <cmath>
#include <cfloat>
using namespace std;
int main()
{
double value = 8.47499999999999964473;
double result = value * pow(10, 2);
result = nextafter(result + (result > 0.0 ? 1e-8 : -1e-8), DBL_MAX);
double integral = floor(result);
double fraction = result - integral;
if (fraction >= 0.5) {
result = ceil(result);
} else {
result = integral;
}
result /= pow(10, 2);
cout << result << endl;
return 0;
}
where nextafter is a function returning the next floating point value after the given value - this code is proved to work using C++11 (AFAIK the nextafter is also available in boost), the result written into the standard output is 8.48.

Max value of datatypes in C

I am trying to understand the maximum value that I can store in C. I tried doing printf("%f", pow(2, x)). The answer holds good until x = 1023. It says Inf when x = 1024.
I am sorry that it is a basic question but I am trying to understand how C assigns datatypes' sizes based on my machine.
I have a Mac (64-bit processor). A clear understanding that I have is that my processor being a 64-bit one, it will be able to do calculations up to the value (264). Clearly pow(2, 1023) is greater than that. But my program is working fine till x = 1023. How is this possible? Is GNU compiler has something to do with this?
If this is a duplicate of other question kindly give the link.
In C the pow() functions returns a double, and the double type is typically a 64-bit IEEE format representation of a floating point number.
The basic idea of floating point is to express a number in the same general way as e.g. 1.234×1056. Here you have a mantissa 1.234 and an exponent 56. C++, and probably also C, allows decimal representation for floating point numbers (but not for integer types), but in practice the internal representation will be binary, with a power of 2 rather than a power of 10.
The limit you ran up against was the supported range for the exponent in your compiler's representation of double numbers; probably 64-bit IEEE 754.
The limits of the various built-in integral numerical types are available as symbolic constants from <limits.h>. The limits of the built-in floating point types are available as symbolic constants from <float.h>. See the table over at cppreference.com for more details.
In C++ these limits are also available via the numeric_limits class template from <limits>.
"64-bit processor" typically means that it can deal with integers that contain at most 64 bits at a time (i.e. in a single instruction), not that it can only process numbers with 64 binary digits or less. Using arbitrary precision arithmetic you can do calculations on numbers that are arbitrarily large, provided that you have enough memory (and time), just like how us humans can do operations on big values with only 10 fingers. Read more here: What is the biggest number you can generate using a 64-bit processor?
However pow(2, 1023) is a little bit different. It's not an integer but a floating-point number (of type double in C) represented by a sign, a mantissa and an exponent like this (-1)sign × 1 × 21023. Not all the digits are stored so it's only accurate to the first few digits. However most systems use binary floating-point types so they can store the precise value of a power of 2 up to a large exponent depending on the exponent range. Most modern systems' floating-point types conform to IEEE-754 standard with double maps to binary64/double precision, therefore the maximum value will be
21023 × (1 + (1 − 2−52)) ≈ 1.7976931348623157 × 10308
The maximum value for a double is DBL_MAX. This is defined by <float.h> in C, or <cfloat> in C++. The numeric value may vary across systems, but you can always refer to it by the macro DBL_MAX.
You can print this:
printf("%f\n", DBL_MAX);
The integer data types all have similar macros defined in <limits.h>: e.g. ULLONG_MAX is the biggest value for unsigned long long. If printing with printf make sure to use the correct format specifier.

Very long definition of PI

I'm debugging some old C code and it has a definition #define PI 3.14... where ... is about 50 other digits.
Why is this? I said I could reduce the number to about 16 decimal places but my boss snarled at me saying that the other numbers are there for platform independence and forward compatibility. But will is slow the program down?
No, this will not slow down the program, unless you are running on an incredibly underpowered 1MHz DSP chip that has to do floating point arithmetic in software as opposed to passing it off to a dedicated FPU. This would mean that any mathematical operations that use floating point data are much slower than just using integer arithmetic.
In general, greater precision is only going to introduce a slowdown if the most time-consuming part of your program is doing a lot of calculations in rapid succession, and floating point calculations are especially slow. On a modern CPU, this is generally not the case, with the possible exception of certain chips that cause an 80-cycle stall on things like floating point underflow. That kind of issue likely exceeds the domain of this question.
First, it's better to use a common standard definition of PI, like in the C standard header, <math.h>, where it is defined as #define M_PI 3.14159265358979323846. If you insist, you can go ahead and define it manually.
Also, the best precision currently available in C is the equivalent of about 19 digits.
According to Wikipedia, 80-bit "Intel" IEEE 754 extended-precision
long double, which is 80 bits padded to 16 bytes in memory, has 64
bits mantissa, with no implicit bit, which gets you 19.26 decimal
digits. This has been the almost universal standard for long double
for ages, but recently things have started to change.
The newer 128-bit quad-precision format has 112 mantissa bits plus an
implicit bit, which gets you 34 decimal digits. GCC implements this as
the __float128 type and there is (if memory serves) a compiler option
to set long double to it.
Personally, if I were required to use our own definition of pi, I'd write something like this:
#ifndef M_PI
#define PI 3.14159265358979323846264338327950288419716939937510
#else
#define PI M_PI
#endif
If the latest C standard supports an even wider floating point primitive data type, it's pretty much a guarantee that constants in the math library would be updated to support this.
References
More Precise Floating point Data Types than double?, Accessed 2014-03-13, <https://stackoverflow.com/questions/15659668/more-precise-floating-point-data-types-than-double>
Math constant PI value in C, Accessed 2014-03-13, <https://stackoverflow.com/questions/9912151/math-constant-pi-value-in-c>
The number of digits in a macro definition almost certainly will have no effect at all on run-time performance.
Macro expansion is textual. That means that if you have:
#define PI 3.14159... /* 50 digits */
then any time you refer to PI in code to which that definition is visible, it will be as if you had written out 3.14159....
C has just three floating-point types: float, double, and long double. There sizes and precisions are implementation-defined, but they're typically 32 bits, 64 bits, and something wider than 64 bits (the size of long double typically varies more from system to system than the other two do.)
If you use PI in an expression, it will be evaluated as a value of some specific type. And in fact, if there's no L suffix on the literal, it will be of type double.
So if you write:
double x = PI / 2.0;
it's as if you had written:
double x = 3.14159... / 2.0;
The compiler will probably evaluate the division at compile time generating a value of type double. Any extra precision in the literal will be discarded.
To see this, you can try writing a small program that uses the PI macro and examining an assembly listing.
For example:
#include <stdio.h>
#define PI 3.141592653589793238462643383279502884198716939937510582097164
int main(void) {
double x = PI;
printf("x = %g\n", x);
}
On my x86_64 system, the generated machine code has no reference to the full precision value. The instruction corresponding to the initialization is:
movabsq $4614256656552045848, %rax
where 4614256656552045848 is a 64-bit integer corresponding to the binary IEEE double-precision representation of a number as close as possible to 3.141592653589793238462643383279502884198716939937510582097164.
The actual stored floating-point value on my system happens to be exactly:
3.1415926535897931159979634685441851615905761718750000000000000000
of which only about 16 decimal digits are significant.

How to process calculations with a high digit numbers?

When I want to work with big and small digits how must I sum / compare values in C?
#include <stdio.h>
#include <math.h>
int main(void) {
if (1.0 + (1/pow (10,50)) == 1.0)
printf("true");
else
printf("false");
return 0;
}
how to make it to return false?
You can't make it return false with standard C types. You'll need to use a high-precision floating point library.
In C99, the most precision you can have is long double, which is either a 64-bit or 128-bit IEEE floating-point number on most modern C compilers/architectures. If you want more precision, consider using some libraries which are, for example, used by GCC:
GMP (http://gmplib.org/) - arbitrary precision arithmetic for both integers and floats;
MPFR (http://www.mpfr.org/) - multiple precision floating-point library (claimed to round correctly)
MPC (http://www.multiprecision.org/index.php?prog=mpc) arbitrary-precision complex number library.

Scramble a floating point number?

I need a repeatable pseudo-random function from floats in [0,1] to floats in [0,1]. I.e. given a 32-bit IEEE float, return a "different" one (as random as possible, given the 24 bits of mantissa). It has to be repeatable, so keeping tons of internal state is out. And unfortunately it has to work with only 32-bit int and single-float math (no doubles and not even 32x32=64bit multiply, though I could emulate that if needed -- basically it needs to work on older CUDA hardware). The better the randomness the better, of course, within these rather severe limitations. Anyone have any ideas?
(I've been through Park-Miller, which requires 64-bit int math, and the CUDA version of Park-Miller which requires doubles, Mersenne Twisters which have lots of internal state, and a few other things which didn't work.)
Best I understand the requirements, a hash accomplishes the desired functionality. Re-interprete the float input as an integer, apply the hash function to produce an integer approximately uniformly distributed in [0,2^32), then multiply this integer by 2^-32 to convert the resulting integer back to a float roughly uniformly distributed in [0,1]. One suitable hash function which does not require multiplication is Bob Jenkin's mix(), which can be found here: http://www.burtleburtle.net/bob/hash/doobs.html.
To re-interpret the bits of a float as an integer and vice versa, there are two choices in CUDA. Use intrinsics, or use C++-style reinterpretation casts:
float f;
int i;
i = __float_as_int(f);
f = __int_as_float(i);
i = reinterpret_cast<int&>(f);
f = reinterpret_cast<float&>(i);
So as a self-contained function, the entire process might look something like this:
/* transform float in [0,1] into a different float in [0,1] */
float scramble_float (float f)
{
unsigned int magic1 = 0x96f563ae; /* number of your choice */
unsigned int magic2 = 0xb93c7563; /* number of your choice */
unsigned int j;
j = reinterpret_cast<unsigned int &>(f);
mix (magic1, magic2, j);
return 2.3283064365386963e-10f * j;
}
The NVIDIA CUDA Toolkit includes a library called CURAND that I believe fits your requirements: it produces repeatable results (assuming you start with the same seed), works on the GPU, supports 32-bit floats and ints, and should work on older GPUs. It also supports multiple pseudo- and quasi-random generation algorithms and distributions.
[Note: a problem with using the C library rand() function (other than that it does not run in CUDA on the device) is that on Windows, rand() only returns a 16-bit value, and thus any float created by division by RAND_MAX has only 16 random bits of precision. What's more, on linux/mac it returns a 32-bit value so code that uses it is not numerically portable.]
Why not use the standard C library rand() function and divide the result by RAND_MAX?
#include <stdlib.h>
float randf (void)
{
return rand() / (float) RAND_MAX;
}

Resources