How can I scan a number having upto 10^18 digits in C [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
No data type can store such large number. Using array
int a[pow(10,pow(10,18))] again won't do the job because pow() returns double and double can't store 10^(10^18).
Anyone having any idea?
I'm trying to solve the following problem:
Consider an integer with N
digits (in decimal notation, without leading zeroes) D1,D2,D3,…,DN. Here, D1 is the most significant digit and DN the least significant. The weight of this integer is defined as:
∑ i=2 -> N (Di−Di−1).
You are given integers N and W. Find the number of positive integers with N digits (without leading zeroes) and weight equal to W. Compute this number modulo 109+7.
Input:
The first line of the input contains a single integer T denoting the number of test cases. The description of T test cases follows.
The first and only line of each test case contains two space-separated integers N and W denoting the number of digits and the required weight.
Output:
For each test case, print a single line containing one integer — the number of N-digit positive integers with weight W, modulo 109+7.
Constraints:
1≤T≤105
2≤N≤1018
|W|≤300

You don't need to store a number with 10^18 digits. Look at the definition of the weight:
∑ i=2 -> N (Di−Di−1)
Each element in the sum is the difference of two consecutive digits.
Let's take for example a 4 digit number whose digits are D1, D2, D3, D4. Then the sum is:
(D2 - D1) + (D3 - D2) + (D4 - D3)
Reording the operands:
D4 - D3 + D3 - D2 + D2 - D1
You'll see that all but the first and last digits cancel out! So the whole sum is D4 - D1. In fact, for any number of digits N, the sum is:
DN - D1
So only the first and the last digits are relevant. You should be able to figure out the rest from there.

There are libraries for handling these sorts of problems, like The GNU
Multiple Precision
Arithmetic Library:
What is GMP?
GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.
The main target applications for GMP are cryptography applications and research, Internet security applications, algebra systems, computational algebra research, etc.
But 10^18 would take a huge (and effectively impossible) amount of memory (if my math is correct: 2.881 EiB).

The referenced problem takes input from 2 ≤ N ≤ 10^18. That isn't 10^18 digits (or 10^10^18 which is absurdly enormous) but 18 digits or 2 to 1,000,000,000,000,000,000. This will fit inside a 64 bit integer, signed or unsigned.
Use int64_t from stdint.h.
10^18 is pushing the limits of 64 bit integers, probably why they chose it. Anything larger should use an arbitrary precision math library such as GMP.
...but there are limits. Simply storing such a number would take about 1 million gigabytes. So while the problem is about solving for numbers with 10^18 digits, I strongly suspect you're not supposed to solve it by actually storing those numbers. There's some mathematical technique you're supposed to apply.

Related

Multiplication of 2 numbers with a maximum of 2000 digits [duplicate]

This question already has answers here:
What is the simplest way of implementing bigint in C?
(5 answers)
How can I compute a very big digit number like (1000 digits ) in c , and print it out using array
(4 answers)
Store very big numbers in an integer in C
(2 answers)
Closed 3 months ago.
Implement a program to multiply two numbers, with the mention that the first can have a maximum of 2048 digits, and the second number is less than 100. HINT: multiplication can be done using repeated additions.
Up to a certain point, the program works using long double, but when working with larger numbers, only INF is displayed. Any ideas?
Implement a program to multiply two numbers, with the mention that the first can have a maximum of 2048 digits, and the second number is less than 100.
OK. The nature of multiplication is that if a number with N bits is multiplied by a number with M bits, then the result will have up to N+M bits. In other words, you need to handle a result that has 2148 bits.
A long double could be anything (it's implementation dependent). Most likely (Windows or not 80x86) is that it's a synonym for double, but sometimes it might be larger (e.g. the 80-bit format described on this Wikipedia page ). The best you can realistically hope for is a dodgy estimate with lots of precision loss and not a correct result.
The worst case (the most likely case) is that the exponent isn't big enough either. E.g. for double the (unbiased) exponent has to be in the range −1022 to +1023 so attempting to shove a 2048 bit number in there will cause an overflow (an infinity).
What you're actually being asked to do is implement a program that uses "big integers". The idea would be to store the numbers as arrays of integers, like uint32_t result[2148/32];, so that you actually do have enough bits to get a correct result without precision loss or overflow problems.
With this in mind, you want a multiplication algorithm that can work with big integers. Note: I'd recommend something from that Wikipedia page's "Algorithms for multiplying by hand" section - there's faster/more advanced algorithms that are way too complicated for (what I assume is) a university assignment.
Also, the "HINT: multiplication can be done using repeated additions" is a red herring to distract you. It'd take literally days for a computer do the equivalent of a while(source2 != 0) { result += source1; source2--; } with large numbers.
Here's a few hints.
Multiplying a 2048 digit string by a 100 digit string might yield a string with as many as 2148 digits. That's two high for any primitive C type. So you'll have to do all the math the hard way against "strings". So stay in the string space since your input will most likely be read in as much.
Let's say you are trying to multiple "123456" x "789".
That's equivalent to (123456 * (700 + 80 + 9)
Which is equivalent to to 123456 * 700 + 123456 * 80 + 123456 * 9
Which is equivalent to doing these steps:
result1 = Multiply 123456 by 7 and add two zeros at the end
result2 = Multiply 123456 by 8 and add one zero at the end
result3 = Multiply 123456 by 9
final result = result1+result2+result3
So all you need is a handful of primitives that can take a digit string of arbitrary length and do some math operations on it.
You just need these three functions:
// Returns a new string that is identical to s but with a specific number of
// zeros added to the end.
// e.g. MultiplyByPowerOfTen("123", 3) returns "123000"
char* MultiplyByPowerOfTen(char* s, size_t zerosToAdd)
{
};
// Performs multiplication on the big integer represented by s
// by the specified digit
// e.g. Multiple("12345", 2) returns "24690"
char* Multiply(char* s, int digit) // where digit is between 0 and 9
{
};
// Performs addition on the big integers represented by s1 and s2
// e.g. Add("12345", "678") returns "13023"
char* Add(char* s1, char* s2)
{
};
Final hint. Any character at position i in your string can be converted to its integer equivalent like this:
int digit = s[i] - '0';
And any digit can be converted back to a printable char:
char c = '0' + digit

C thinking : float vs. integers and float representation

When using integers in C (and in many other languages), one must pay attention when dividing about precision. It is always better to multiply and add things (thus creating a larger intermediary result, so long as it doesn't overflow) before dividing.
But what about floats? Does that still hold? Or are they represented in such a way that it is better to divide number of similar orders of magnitude rather than large ones by small ones?
The representation of floats/doubles and similar floating-point working, is geared towards retaining numbers of significant digits (aka "precision"), rather than a fixed number of decimal places, such as happens in fixed-point, or integer working.
It is best to avoid combining quantities, that may give rise to implicit under or overflow in terms of the exponent, ie at the limits of the floating-point number range.
Hence, addition/subtraction of quantities of widely differing magnitudes (either explicitly, or due to having opposite signs)) should be avoided and re-arranged, where possible, to avoid this well-known route to lost precision.
Example: it's better to refactor/re-order
small + big + small + big + small * big
as
(small+small+small) + big + big
since the smalls individually might make no difference to a big, and hence their contribution might disappear.
If there is any "noise" or imprecision in the lower bits of any quantity, it's also wise to be aware how loss of significant bits propagates through a computation.
With integers:
As long as there is no overflow, +,-,* is always exact.
With division, the result is truncated and often not equal to the mathematical answer.
ia,ib,ic, multiplying before dividing ia*ib/ic vs ia*(ib/ic) is better as the quotient is based on more bits of the product ia*ib than ib.
With floating point:
Issues are subtle. Again, as long as no over/underflow, the order or *,/ sequence make less impact than with integers. FP */- is akin to adding/subtracting logs. Typical results are within 0.5 ULP of the mathematically correct answer.
With FP and +,- the result of fa,fb,fc can have significant differences than the mathematical correct one when 1) values are far apart in magnitude or 2) subtracting values that are nearly equal and the error in a prior calculation now become significant.
Consider the quadratic equation:
double d = sqrt(b*b - 4*a/c); // assume b*b - 4*a/c >= 0
double root1 = (-b + d)/(2*a);
double root2 = (-b - d)/(2*a);
Versus
double d = sqrt(b*b - 4*a/c); // assume b*b - 4*a/c >= 0
double root1 = (b < 0) ? (-b + d)/(2*a) : (-b - d)/(2*a)
double root2 = c/(a*root1); // assume a*root1 != 0
The 2nd has much better root2 precision result when one root is near 0 and |b| is nearly d. This is because the b,d subtraction cancels many bits of significance allowing the error in the calculation of d to become significant.
(for integer) It is always better to multiply and add things (thus creating a larger intermediary result, so long as it doesn't overflow) before dividing.
Does that still hold (for floats)?
In general the answer is No
It is easy to construct an example where adding all input before division will give you a huge rounding error.
Assume you want to add 10000000000 values and divide them by 1000. Further assume that each value is 1. So the expected result is 10000000.
Method 1
However, if you add all the values before division, you'll get the result 16777.216 (for a 32 bit float). As you can see it is pretty much off.
Method 2
So is it better to divide each value by 1000 before adding it to the result? If you do that, you'll get the result 32768.0 (for a 32 bit float). As you can see it is pretty much off as well.
Method 3
However, if you go on adding values until the temporary result is greater than 1000000 and then divide the temporary result by 1000 and add that intermediate result to the final result and repeats that until you have added a total 10000000000 values, you will get the correct result.
So there is no simple "always add before division" or "always divide before adding" when dealing with floating point. As a general rule it is typically a good idea to keep operands in similar magnitude. That is what the third example does.

GMP most significant digits

I'm performing some calculations on arbitrary precision integers using GNU Multiple Precision (GMP) library. Then I need the decimal digits of the result. But not all of them: just, let's say, a hundred of most significant digits (that is, the digits the number starts with) or a selected range of digits from the middle of the number (e.g. digits 100..200 from a 1000-digit number).
Is there any way to do it in GMP?
I couldn't find any functions in the documentation to extract a range of decimal digits as a string. The conversion functions which convert mpz_t to character strings always convert the entire number. One can only specify the radix, but not the starting/ending digit.
Is there any better way to do it other than converting the entire number into a humongous string only to take a small piece of it and throw out the rest?
Edit: What I need is not to control the precision of my numbers or limit it to a particular fixed amount of digits, but selecting a subset of digits from the digit string of the number of arbitrary precision.
Here's an example of what I need:
71316831 = 19821203202357042996...2076482743
The actual number has 1112852 digits, which I contracted into the ....
Now, I need only an arbitrarily chosen substring of this humongous string of digits. For example, the ten most significant digits (1982120320 in this case). Or the digits from 1112841th to 1112849th (21203202 in this case). Or just a single digit at the 1112841th position (2 in this case).
If I were to first convert my GMP number to a string of decimal digits with mpz_get_str, I would have to allocate a tremendous amount of memory for these digits only to use a tiny fraction of them and throw out the rest. (Not to mention that the original mpz_t number in binary representation already eats up quite a lot.)
If you know the number of decimal digits of x = 7^1316831 in advance, e.g., 1112852. Then you get your lower, say, 10 digits with:
x % (10^10), and the upper 20 digits with:
x / (10^(1112852 - 20)).
Note, I get 19821203202357042995 for the latter; 5 at final, not 6.
I don't think you can do that in GMP. However you can use Boost Multiprecision Library
Depending upon the number type, precision may be arbitrarily large (limited only by available memory), fixed at compile time (for example 50 or 100 decimal digits), or a variable controlled at run-time by member functions. The types are expression-template-enabled for better performance than naive user-defined types.
Emphasis mine
Another alternative is ttmath with the type ttmath::Big<e,m> that you can control the needed precision. Any fixed-precision types will work, provided that you only need the most significant digits, as they all drop the low significant digits like how float and double work. Those digits don't affect the high digits of the result, hence can be omitted safely. For instance if you need the high 20 digits then use a type that can store 20 digits and a little more, in order to provide enough data for correct rounding later
For demonstration let's take a simple example of 77 = 823543 and you only need the top 2 digits. Using a 4-digit type for calculation you'll get this
75 = 16807 => round to 1681×10¹ and store
75×7 = 1681×101×7 = 11767*10¹ ≈ 1177×102
75×7×7 = 1177×102×7 = 8232×102
As you can see the top digits are the same even without needing to get the full exact result. Calculating the full precision using GMP not only wastes a lot of time but also memory. Think about the amount of memory you need to store the result of another operation on 2 bigints to get the digits you want. By fixing the precision instead of leaving it at infinite you'll decrease the CPU and memory usage significantly.
If you need the 100th to 200th high order digits then use a type that has enough room for 201 digits and more, and extract those 101 digits after calculation. But this will be more wasteful so you may need to change to an arbitrary-precision (or fixed-precision) type that uses a base that's a power of 10 for its limbs (I'm using GMP notation here). For example if the type uses base 109 then each limb represents 9 digits in the decimal output and you can get arbitrary digit in decimal directly without any conversion from binary to decimal. That means zero waste for the string. I'm not sure which library uses base 10n but you can look at Mini-Pi's implementation which uses base 109, or write it yourself. This way it also work for efficiently getting the high digits
See
How are extremely large floating-point numbers represented in memory?
What is the simplest way of implementing bigint in C?

Printing multiple integers as one arbitrarily long decimal string

Say I have 16 64-bit unsigned integers. I have been careful to feed the carry as appropriate between them when performing operations. Could I feed them into a method to convert all of them into a single string of decimal digits, as though it was one 1024-bit binary number? In other words, is it possible to make a method that will work for an arbitrary number of integers that represent one larger integer?
I imagine that it would be more difficult for signed integers, as there is the most significant bit to deal with. I suppose it would be that the most significant integer would be the signed integer, and the rest would be unsigned, to represent the remaining 'parts' of the number.
(This is semi-related to another question.)
You could use the double dabble algorithm, which circumvents the need for multi-precision multiplication and division. In fact, the Wikipedia page contains a C implementation for this algorithm.
This is a bit unclear.
Of course, a function such as
void print_1024bit(uint64_t digits[]);
could be written to do this. But if you mean if any of the standard library's printf()-family of functions can do this, then I think the answer is no.
As you probably saw in the other question, the core of converting a binary number into a different base b is made of two operations:
Modulo b, to figure out the current least significant digit
Division by b, to remove that digit once it's been generated
When applied until the number is 0, this generates all the digits in reverse order.
So, you need to implement "modulo 10" and "divide by 10" for your 1024-bit number.
For instance, consider the number decimal 4711, which we want to convert to octal just for this example:
4711 % 8 is 7, so the right-most digit is 7
4711 / 8 is 588
588 % 8 is 4, the next digit is 4
588 / 8 is 73
73 % 8 is 1
73 / 8 is 9
9 % 8 is 1
8 / 8 is 1
1 % 8 is 1
1 / 8 is 0, we're done.
So, reading the bold digits from the bottom and up towards the right-most digits, we conclude that 471110 = 111478. You can use a calculator to verify this, or just trust me. :)
It's possible, of course, but not terribly straight-forward.
Rather than reinventing the wheel, how about reusing a library?
The GNU Multi Precision Arithmetic Library is one such possibility. I've not needed such things myself, but it seems to fit your bill.

Modulo operation in output

In most coding competitions where the output of a program is presumed to be very large,it is generally instructed to divide the output by 10000007(or in that case a prime number).What is the significance of the prime number being taken because in many cases I find that the same number is given as 100004(i.e. not a prime number)..?
A prime number is used for two reasons. One reason is that the integers modulo a prime form a mathematical field. Arithmetic in an field works in many ways like arithmetic on integers. This makes an field useful in certain contest problems where otherwise the sequences used might collapse in some cases. Certain arithmetic might produce zero, other trivial results, or simpler results than desired, because the numbers involve happen upon a factor of the modulus, causing some reduction or elimination to occur.
Another reason is to compel the programmer to deal with arithmetic on integers of a certain size. If a composite number were used, then other techniques could be used that did not resort to arithmetic with large integers.
For example, supposed we want to know what 132 is modulo 35, but we have only a very small processor that cannot handle three-digit numbers, so it cannot compute 132 = 169.
Well, 35 = 5•7, and 13 is congruent to 3 modulo 5 and 6 modulo 7. Instead of computing the square of 13, we can compute the squares of these residues, which tells us that 132 is congruent to 32 = 9 = 4 modulo 5 and is congruent to 62 = 36 = 1 modulo 7. Combining these residues requires some additional knowledge (the Extended Euclidean Algorithm). For these particular numbers, we can multiply the residue of 5 by 21 and the residue of 7 by 15 to get 4·21+1·15 = 99. Reducing that modulo 35 yields 29, which is the answer (the residue of 132 modulo 35).
If the modulus is prime, this circumvention of the arithmetic is not available. A prime modulus essentially requires the use of arithmetic on numbers up to the square of the modulus (or time-consuming workarounds), but a composite modulus would allow the use of arithmetic on smaller numbers, up to just twice the modulus.

Resources