I'm performing some calculations on arbitrary precision integers using GNU Multiple Precision (GMP) library. Then I need the decimal digits of the result. But not all of them: just, let's say, a hundred of most significant digits (that is, the digits the number starts with) or a selected range of digits from the middle of the number (e.g. digits 100..200 from a 1000-digit number).
Is there any way to do it in GMP?
I couldn't find any functions in the documentation to extract a range of decimal digits as a string. The conversion functions which convert mpz_t to character strings always convert the entire number. One can only specify the radix, but not the starting/ending digit.
Is there any better way to do it other than converting the entire number into a humongous string only to take a small piece of it and throw out the rest?
Edit: What I need is not to control the precision of my numbers or limit it to a particular fixed amount of digits, but selecting a subset of digits from the digit string of the number of arbitrary precision.
Here's an example of what I need:
71316831 = 19821203202357042996...2076482743
The actual number has 1112852 digits, which I contracted into the ....
Now, I need only an arbitrarily chosen substring of this humongous string of digits. For example, the ten most significant digits (1982120320 in this case). Or the digits from 1112841th to 1112849th (21203202 in this case). Or just a single digit at the 1112841th position (2 in this case).
If I were to first convert my GMP number to a string of decimal digits with mpz_get_str, I would have to allocate a tremendous amount of memory for these digits only to use a tiny fraction of them and throw out the rest. (Not to mention that the original mpz_t number in binary representation already eats up quite a lot.)
If you know the number of decimal digits of x = 7^1316831 in advance, e.g., 1112852. Then you get your lower, say, 10 digits with:
x % (10^10), and the upper 20 digits with:
x / (10^(1112852 - 20)).
Note, I get 19821203202357042995 for the latter; 5 at final, not 6.
I don't think you can do that in GMP. However you can use Boost Multiprecision Library
Depending upon the number type, precision may be arbitrarily large (limited only by available memory), fixed at compile time (for example 50 or 100 decimal digits), or a variable controlled at run-time by member functions. The types are expression-template-enabled for better performance than naive user-defined types.
Emphasis mine
Another alternative is ttmath with the type ttmath::Big<e,m> that you can control the needed precision. Any fixed-precision types will work, provided that you only need the most significant digits, as they all drop the low significant digits like how float and double work. Those digits don't affect the high digits of the result, hence can be omitted safely. For instance if you need the high 20 digits then use a type that can store 20 digits and a little more, in order to provide enough data for correct rounding later
For demonstration let's take a simple example of 77 = 823543 and you only need the top 2 digits. Using a 4-digit type for calculation you'll get this
75 = 16807 => round to 1681×10¹ and store
75×7 = 1681×101×7 = 11767*10¹ ≈ 1177×102
75×7×7 = 1177×102×7 = 8232×102
As you can see the top digits are the same even without needing to get the full exact result. Calculating the full precision using GMP not only wastes a lot of time but also memory. Think about the amount of memory you need to store the result of another operation on 2 bigints to get the digits you want. By fixing the precision instead of leaving it at infinite you'll decrease the CPU and memory usage significantly.
If you need the 100th to 200th high order digits then use a type that has enough room for 201 digits and more, and extract those 101 digits after calculation. But this will be more wasteful so you may need to change to an arbitrary-precision (or fixed-precision) type that uses a base that's a power of 10 for its limbs (I'm using GMP notation here). For example if the type uses base 109 then each limb represents 9 digits in the decimal output and you can get arbitrary digit in decimal directly without any conversion from binary to decimal. That means zero waste for the string. I'm not sure which library uses base 10n but you can look at Mini-Pi's implementation which uses base 109, or write it yourself. This way it also work for efficiently getting the high digits
See
How are extremely large floating-point numbers represented in memory?
What is the simplest way of implementing bigint in C?
Related
TLDR, at the bottom :)
Brief:
I am in a process of creating an basic arithmetic library(addition, subtraction, ...) for handling huge numbers. One of the problem i am facing is printing these huge binary numbers into decimal.
I have huge binary number stored in an array of uint64_t. e.g.
uint64_t a[64] = {0};
Now, the goal is to print the 64*64bits binary number in the console/file as its decimal value.
Initial Work:
To elaborate the problem I want to describe how I printed hex value.
int i;
int s = 1;
a[1] = (uint64_t)0xFF;
for(i = s; i>= 0; i--)
{
printf("0x%08llX, ", a[i]);
}
Output:
0x000000FF, 0x00000000,
Similarly for printing OCT value I can just take LSB 3 bits from a[64], print decimal equivalent of those bits, 3 bits right shift all the bits of a[64] and keep repeating until all the values of a[64] has been printed. (print in revers order to keep first Oct digit on the right)
I can print Hex and Oct value of a binary of unlimited size just by repeating this unit algorithm, but I could not find/develop one for Decimal which I can repeat over and over again to print a[64](or something bigger).
What I have thought of:
My initial idea was to keep subtracting
max_64 =(uint64)10000000000000000000; //(i.e.10^19)
the biggest multiple of 10 inside uint64_t, from a until the value inside a is smaller than max_64 (which is basically equivalent of rem_64 = a%max_64 ) and print the rem_64 value using
printf("%019llu",rem_64);
which is the 1st 19 decimal digits of the number a.
Then do an arithmetic operation similar to (not the code):
a = a/max_64; /* Integer division(no fractional part) to remove right most 19 dec digits from 'a' */
and keep repeating and printing 19 decimal digits. (print in such a way that first found 19 digits are on the right, then next 19 digits on its left and so on...).
The problem is this process is to long and I don't want to use all these to just print the dec value. And was looking for a process which avoids using these huge time consuming arithmetic operations.
What I believe is that there must be a way to print huge size just by repeating an algorithm (similar to how Hex and Oct can be printed) and I hope someone could point me to the right direction.
What my library can do(so far):
Add (Using Full-Adder)
Sub (Using Full-subtractor)
Compare (by comparing array size and comparing array elements)
Div (Integer division, no fractional part)
Modulus (%)
Multiplication (basically adding from several times :( )
I will write code for other operations if needed, but I would like to implement the printing function independent of the library if possible.
Consider the problem like this:
You have been given a binary number X of n bits (1<=n<=64*64) you have to print out X in decimal. You can use existing library if absolutely needed but better if unused.
TLDR:
Any code, reference or unit algorithm which I can repeat for printing decimal value of a binary of too big and/or unknown size would be much helpful. Emphasis on algorithm i.e. I don't need a code if some one could describe a process I will be able to implement it. Thanks in advance.
When faced with such doubts, and given that there are many bigint libraries out there, it is interesting to look into their code. I had a look at Java's BigInteger, which has a toString method, and they do two things:
for small numbers, they bite the bullet and do something similar to what you proposed - straightforward link-by-link base conversion, outputting decimal numbers in each step.
for large numbers, they use the recursive Schönhage algorithm, which they quote in the comments as being referred to in, among other places,
Knuth, Donald, The Art of Computer Programming, Vol. 2, Answers to
Exercises (4.4) Question 14.
Is there a way to print doubles in c using fprint so that the precision of the print is the least possible (So for example that an integer is always printed without decimals?)
I know that you can do something like printf("%.0f",number); But I am wondering if there is a way to use the minimum precision that makes the print exact (whenever the number can be expressed finitely in base 10 of course).
All finite double, encoded in base 10 or base 2 (the usual), or base 16 can be exactly finitely printed in base 10. DBL_MIN may take 100+ of digits to do so, but it is not infinite. printf() need not perform to that level. So it ends up being custom code and of course that can "printing doubles without zeros"
Recommend sprintf(buffer, "%.*e", DBL_DECIMAL_DIG - 1, some_double) and post-process the buffer to remove least significant 0 as needed for a "close enough" answer to code's goal.
Ref
What is the simplest solution to print a double (printf) in C so that:
exactly N characters are used (will be around 6) for all double numbers (nan and infinities are handled separately), positive and negative alike (+ or - always as first char);
decimal representation ('.' always present) is used as long as the numeric chars are not all 0 (i.e. too small number) or the decimal point is the last of the N char (i.e too big number). Otherwise switch to scientific representation, always occupying exactly N chars.
All the solutions I can think of seem quite involved, any idea to obtain this result easily (efficiency is not a concern here) ?
Thanks!
I could not find a way to do this via a single printf call, here is my solution.
At least 9 chars must be used as (with +- in front) that's the minimum amount of chars for scientific notation (for example: +1.0E-002). In the following I consider the case of 9 chars. The following two formats are used based on the conditions reported below:
Scientific format '%+.1e':
chars 4 to 9 as per decimal format are 0 and the number is not identical to 0 (i.e. too small for decimal)
the '.' char is not present between char 3 and char 8 as per decimal format (i.e. too large for decimal)
Decimal format '%+.6f':
Infinite or nan
All other cases
It's easy to adapt to a representation longer than 9 chars by changing the constants above.
I'm heaving an (big) array of floats, each float takes 4 bytes.
Is there a way, given the fact that my floats are ranged between 0 and 255, to store each float in less than 4 bytes?
I can do any amount of computation on the whole array.
I'm using C.
How much precision do you need?
You can store each float in 2 bytes by representing it as an unsigned short (ranges from 0 to 65,535) and dividing all values by 2^8 when you need the actual value. This is essentially the same as using a fixed point format instead of floating point.
Your precision is limited to 1.0 / (2^8) = 0.00390625 when you do this, however.
The absolute range of your data doesn't really matter that much, it's the amount of precision you need. If you can get away with e.g. 6 digits of precision, then you only need as much storage as would be required to store the integers from 1-1000000, and that's 20 bits. So, supposing this, what you can do is:
1) Shift your data so that the smallest element has value 0. I.e. subtract a single value from every element. Record this shift.
2) Scale (multiply) your data by a number just large enough so that after truncation to an integer, you will not lose any precision you need.
3) Now this might be tricky unless you can pack your data into convenient 8- or 16-bit units--pack the data into successive unsigned integers. Each one of your data values needs 20 bits in this example, so value 1 takes up the first 20 bits of integer 1, value 2 takes up the remaining 12 bits of integer 1 and the first 8 bits of integer 2, and so on. In this hypothetical case you end up saving ~40%.
4) Now, 'decrypting'. Unpack the values (you have saved the # of bits in each one), un-scale, and un-shift.
So, this will do it, and might be faster and more compact than standard compression algorithms, as they aren't allowed to make assumptions about how much precision you need, but you are.
For example you could store integers (floats with .0) on one byte, but the other float need more bytes.
You could also use fixed-point if you don't worry about precision...
I would like to convert the exponential number 5.52794e+15.
It won't fit in an int, but a long long will do:
long long x = 5.52794e+15;
Alternative: you want to print it as an integer.
double d = 5.52794e+15;
printf("%15.0lf\n",d );
Gives:
5527940000000000
Besides using long longs, you could also:
Use a packed decimal library.
You could use logarithmic (or geometric, etc) scaling in a standard int or long.
Use a structure of ints, representing the significand and exponent, and do floating point math manually (or with a good library).
I've used all three methods, they all have their ups and downs. Packed Decimal is slowest and most accurate. Logarithmic scaling is by far the fastest and easiest to implement, and least accurate. Reproducing floating point via integers is in-between in performance, essentially the same in accuracy as "real" floating point, and hardest to implement.
All 3 are slower than using floating point hardware- assuming your hardware has floating point!
long long x = 5.52794e+15;
where the exponent "15" has to be an integer, and the coefficient 5.52794 is any real number or could even be integer.
The maximum value of the mantissa is just over 32,000,000 Any value above this will be truncated.
This becomes significant where you have a running total and are adding in small values and keeping an accurate total. - for example power usage where the value that you are adding in is not a whole number.