How to process calculations with a high digit numbers? - c

When I want to work with big and small digits how must I sum / compare values in C?
#include <stdio.h>
#include <math.h>
int main(void) {
if (1.0 + (1/pow (10,50)) == 1.0)
printf("true");
else
printf("false");
return 0;
}
how to make it to return false?

You can't make it return false with standard C types. You'll need to use a high-precision floating point library.

In C99, the most precision you can have is long double, which is either a 64-bit or 128-bit IEEE floating-point number on most modern C compilers/architectures. If you want more precision, consider using some libraries which are, for example, used by GCC:
GMP (http://gmplib.org/) - arbitrary precision arithmetic for both integers and floats;
MPFR (http://www.mpfr.org/) - multiple precision floating-point library (claimed to round correctly)
MPC (http://www.multiprecision.org/index.php?prog=mpc) arbitrary-precision complex number library.

Related

GNU MP low precision while using mpf_pow function

While writing this answer, I used the mpf_pow function to calculate 12.3 ^ 123, and the result is different from the one given by WolframAlpha (which by the way also uses GMP).
I casted the code to pure C to simplify:
#include <stdio.h>
#include <gmp.h>
int main (void) {
mpf_t a, c;
unsigned long int b = 123UL;
mpf_set_default_prec(100000);
mpf_inits(a, c, NULL);
mpf_set_d(a, 12.3);
mpf_pow_ui(c, a, b);
gmp_printf("c = %.50Ff\n", c);
return 0;
}
Which results in
114374367934618002778643226182707594198913258409535335775583252201365538178632825702225459029661601216944929436371688246107986574246790.32099077871758646985223686110515186972735931183764
While WolframAlpha returns
1.14374367934617190099880295228066276746218078451850229775887975052369504785666896446606568365201542169649974727730628842345343196581134895919942820874449837212099476648958359023796078549041949007807220625356526926729664064846685758382803707100766740220839267 × 10^134
which starts to disagree with mpf_pow at the 15th digit.
Am I doing something wrong in the code, is this a limitation of GMP, or is WolframAlpha giving an incorrect result?
Am I doing something wrong in the code, is this a limitation of GMP, or is WolframAlpha giving an incorrect result?
You are doing something different from what Wolfram is doing (obviously). Your code is not wrong, per se, but it is not doing what you probably think it is doing. Compare the output of this variation:
#include <stdio.h>
#include <gmp.h>
int main (void) {
mpf_t a, c;
unsigned long int b = 123UL;
mpf_set_default_prec(100000);
mpf_inits(a, c, NULL);
mpf_set_d(a, 12.3);
mpf_pow_ui(c, a, b);
gmp_printf("c = %.50Ff\n", c);
putchar('\n');
mpf_t a1, c1;
mpf_inits(a1, c1, NULL);
mpf_set_str(a1, "12.3", 10);
mpf_pow_ui(c1, a1, b);
gmp_printf("c' = %.50Ff\n", c1);
return 0;
}
...
c = 114374367934618002778643226182707594198913258409535335775583252201365538178632825702225459029661601216944929436371688246107986574246790.32099077871758646985223686110515186972735931183764
c' = 114374367934617190099880295228066276746218078451850229775887975052369504785666896446606568365201542169649974727730628842345343196581134.89591994282087444983721209947664895835902379607855
The difference between the two output values arises because my C implementation and yours represent values of type double in binary floating point, and 12.3 is not exactly representable in binary floating point (see Is floating point math broken?). C provides the closest approximation available, which, assuming 64-bit IEEE 754 representation, matches to about 15 decimal digits of precision. When you initialize a GMP variable with such a value, you get an exact GMP representation of the actual double value, which is only an approximation to 12.3 decimal.
But GMP can represent 12.3 (decimal) to whatever precision you choose.* You chose a very high precision, so when you use a decimal string to initialize your MP-float variable you get a much closer approximation than when you used a double. Naturally, performing the same operation on those different values produces different results. The GMP result in the latter case appears to agree with the Wolfram result to the full precision in which it is expressed.
Note also that in a general sense, one can also use decimal floating-point, in software or (if you are so equipped) in hardware. The value 12.3 (decimal) can be represented exactly in such a format, but that's not what GMP uses.
* Or indeed, GMP can represent 12.3 exactly as a MP rational, though that's not what the code above does.
This gives a result similar to WolframAlpha's:
from decimal import Decimal
from decimal import getcontext
getcontext().prec = 200
print(Decimal('12.3') ** 123)
So you must be doing something wrong in your GMP configuration.

How to guarantee exact size of double in C?

So, I am aware that types from the stdint.h header provide standardized width integer types, however I am wondering what type or method does one uses to guarantee the size of a double or other floating point type across platforms? Specifically, this would deal with packing data in a void*
#include <stdio.h>
#include <stdlib.h>
void write_double(void* buf, double num)
{
*(double*)buf = num;
}
double read_double(void* buf)
{
return *(double*)buf;
}
int main(void) {
void* buffer = malloc(sizeof(double));
write_double(buffer, 55);
printf("The double is %f\n", read_double(buffer));
return 0;
}
Say like in the above program, if I wrote that void* to a file or if it was used on another system, would there be some standard way to guarantee size of a floating point type or double?
How to guarantee exact size of double in C?
Use _Static_assert()
#include <limits.h>
int main(void) {
_Static_assert(sizeof (double)*CHAR_BIT == 64, "Unexpected double size");
return 0;
}
_Static_assert available since C11. Otherwise code could use a run-time assert.
#include <assert.h>
#include <limits.h>
int main(void) {
assert(sizeof (double)*CHAR_BIT == 64);
return 0;
}
Although this will insure the size of a double is 64, it does not insure IEEE 754 double-precision binary floating-point format adherence.
Code could use __STDC_IEC_559__
An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex` C11 Annex F IEC 60559 floating-point arithmetic
Yet that may be too strict. Many implementations adhere to most of that standard, yet still do no set the macro.
would there be some standard way to guarantee size of a floating point type or double?
The best guaranteed is to write the FP value as its hex representation or as an exponential with sufficient decimal digits. See Printf width specifier to maintain precision of floating-point value
The problem with floating point type is that the C standard doesn't specify how they should be represented. The use of IEEE 754 is not required.
If you're communicating between a system that uses IEEE 754 and one that doesn't, you won't be able to write on one and read on the other even if the sizes are the same.
You need to serialize the data in a known format. You can either use sprintf to convert it to a text format, or you can do some math to determine the base and mantissa and store those.
Floating point values are defined in the The IEEE Standard for Floating-Point Arithmetic (IEEE 754) and have standard sizes:
float, in full "single precision floating point number": 32 bits
double, in full "double precision floating point number": 64 bits
The following also exist:
Half-precision floating-point format
Quadruple precision floating-point format
Extended precision floating-point format
This format is reused in the C11 standard, Annex F "IEC 60559 floating-point arithmetic" of ISO/IEC 9899:2011(en).
Why use CHAR_BIT and assert at runtime? We can do this at compile time.
void write_double(void* buf, double num)
{
char checkdoublesize[(sizeof(double) == 8)?1:-1];
*(double*)buf = num;
}
Your code is still undefined as it doesn't gurantee IEEE or endianness but it will catch a bad double size. If your platform's new enough for htonq this will allow endianness to work
void write_double(void* buf, double num)
{
char checkdoublesize[(sizeof(double) == 8)?1:-1];
*(int64_t*)buf = htonq(*(volatile int64_t*)&num);
}
double read_double(void* buf)
{
int64_t n = ntohq(*(int64_t*)buf);
return *(volatile double*)&n;
}
Where volatile is merely the shortest way to tell the compiler the pointer cast really is defined. Usually it does the right thing anyway but after N levels of inlining maybe it won't anymore.

pow numeric error in c

I'm wondering where does the numeric error happen, in what layer.
Let me explain using an example:
int p = pow(5, 3);
printf("%d", p);
I've tested this code on various HW and compilers (VS and GCC) and some of them print out 124, and some 125.
On the same HW (OS) i get different results in different compilers (VS and GCC).
On the different HW(OS) I get different results in the same compiler (cc (GCC) 4.8.1).
AFAIK, pow computes to 124.99999999 and that gets truncated to int, but where does this error happen?
Or, in other words, where does the correction happen (124.99->125)
Is it a compiler-HW interaction?
//****** edited:
Here's an additional snippet to play with (keep an eye on p=5, p=18, ...):
#include <stdio.h>
#include <math.h>
int main(void) {
int p;
for (p = 1; p < 20; p++) {
printf("\n%d %d %f %f", (int) pow(p, 3), (int) exp(3 * log(p)), pow(p, 3), exp(3 * log(p)));
}
return 0;
}
(First note that for an IEEE754 double precision floating point type, all integers up to the 53rd power of 2 can be represented exactly. Blaming floating point precision for integral pow inaccuracies is normally incorrect).
pow(x, y) is normally implemented in C as exp(y * log(x)). Hence it can "go off" for even quite small integral cases.
For small integral cases, I normally write the computation long-hand, and for other integral arguments I use a 3rd party library. Although a do-it-yourself solution using a for loop is tempting, there are effective optimisations that can be done for integral powers that such a solution might not exploit.
As for the observed different results, it could be down to some of the platforms using an 80 bit floating point intermediary. Perhaps some of the computations then are above 125 and others are below that.

Storing numbers with higher precision in C

I am writing a program in which I need to store numbers with a very high precision(around 10^-10) and then further use them a parameter( create_bloomfilter ([yet to decide the type] falsePositivity, long expected_num_of_elem) ).
The highest precision I am able to get is with double (something around 10^-6) which is not sufficient.
How can we store numbers with more higher precision in c?
You have been misinformed about double.
The smallest positive number you can store in a double is about 2⨯10-308, not counting denormalized numbers, which can be smaller. Denormals go down to 5⨯10-324. They have the equivalent of about 15-17 digits of precision, which is sufficient to measure the diameter of the Earth to within the size of a red blood cell, the smallest cell in the human body.
If you really need more precision, you need MPFR. (If your algorithms are numerically unstable, MPFR might not help.)
Edit: I figured out what you are doing wrong.
In C, 10^-7 is an integer expression. It should be equal to -13 on most systems. The ^ operator is the bitwise XOR operator, not the exponentiation operator. There is no exponentiation operator in C, because C operators generally correspond to more primitive operations, at least in terms of hardware implementation.
You want 1e-7, or pow(10, -7).
#include <stdio.h>
#include <math.h>
int main(int argc, char *argv[])
{
printf("2e-308 = %g\n", 2e-308);
printf("2 * pow(10, -308) = %g\n", 2 * pow(10, -308));
printf("10^-7 = %d\n", 10^-7);
return 0;
}
Output:
2e-308 = 2e-308
2 * pow(10, -308) = 2e-308
10^-7 = -13
Note that there are a lot of gotchas with floating point numbers.
Try GNU MPFR library and GNU GMP library
The MPFR library is a C library for multiple-precision floating-point computations with correct rounding.
GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.
Is long double sufficient? Some implementations use 128bit long double, which should easily handle your requirements.
http://en.wikipedia.org/wiki/Quadruple_precision
If you're looking for something extremely strong, check out MPFR

Scramble a floating point number?

I need a repeatable pseudo-random function from floats in [0,1] to floats in [0,1]. I.e. given a 32-bit IEEE float, return a "different" one (as random as possible, given the 24 bits of mantissa). It has to be repeatable, so keeping tons of internal state is out. And unfortunately it has to work with only 32-bit int and single-float math (no doubles and not even 32x32=64bit multiply, though I could emulate that if needed -- basically it needs to work on older CUDA hardware). The better the randomness the better, of course, within these rather severe limitations. Anyone have any ideas?
(I've been through Park-Miller, which requires 64-bit int math, and the CUDA version of Park-Miller which requires doubles, Mersenne Twisters which have lots of internal state, and a few other things which didn't work.)
Best I understand the requirements, a hash accomplishes the desired functionality. Re-interprete the float input as an integer, apply the hash function to produce an integer approximately uniformly distributed in [0,2^32), then multiply this integer by 2^-32 to convert the resulting integer back to a float roughly uniformly distributed in [0,1]. One suitable hash function which does not require multiplication is Bob Jenkin's mix(), which can be found here: http://www.burtleburtle.net/bob/hash/doobs.html.
To re-interpret the bits of a float as an integer and vice versa, there are two choices in CUDA. Use intrinsics, or use C++-style reinterpretation casts:
float f;
int i;
i = __float_as_int(f);
f = __int_as_float(i);
i = reinterpret_cast<int&>(f);
f = reinterpret_cast<float&>(i);
So as a self-contained function, the entire process might look something like this:
/* transform float in [0,1] into a different float in [0,1] */
float scramble_float (float f)
{
unsigned int magic1 = 0x96f563ae; /* number of your choice */
unsigned int magic2 = 0xb93c7563; /* number of your choice */
unsigned int j;
j = reinterpret_cast<unsigned int &>(f);
mix (magic1, magic2, j);
return 2.3283064365386963e-10f * j;
}
The NVIDIA CUDA Toolkit includes a library called CURAND that I believe fits your requirements: it produces repeatable results (assuming you start with the same seed), works on the GPU, supports 32-bit floats and ints, and should work on older GPUs. It also supports multiple pseudo- and quasi-random generation algorithms and distributions.
[Note: a problem with using the C library rand() function (other than that it does not run in CUDA on the device) is that on Windows, rand() only returns a 16-bit value, and thus any float created by division by RAND_MAX has only 16 random bits of precision. What's more, on linux/mac it returns a 32-bit value so code that uses it is not numerically portable.]
Why not use the standard C library rand() function and divide the result by RAND_MAX?
#include <stdlib.h>
float randf (void)
{
return rand() / (float) RAND_MAX;
}

Resources