Algorithm for doing arithmetic operation on very large numbers - c

I need a algorithm to perform arithmetic operations on large numbers(that are way above range of float, double int or any other data type for that matter). I am required to write the code in C. I tried looking up here: Knuth, Donald, The Art of Computer Programming, ISBN 0-201-89684-2, Volume 2: Seminumerical Algorithms, Section 4.3.1: The Classical Algorithms but couldn't stand it. I just need the algorithm not the code.

For addition, as far as I know, you won't get much better than the simple linear O(n) algorithm, i.e., just add digit by digit. You likely, have to read the entire input anyway, so it's at least linear. You might be able to do various tricks to get the constant down.
Your main issue is going to be multiplication, due to the quadratic nature of the basic long multiplication algorithm. You might want to consider one of the several much faster methods given here. The Karatsuba method is a little tricky to implement nicely but is probably the easiest non-trivial algorithm that will give you a gain. Otherwise, you'll have to look a bit more into the Fast Fourier Transform methods, such as Schönhage-Strassen's algorithm or Fürer's algorithm.
See also Big O notation.

I think Karatsuba algorithm is best to perform arithmetic operations on large numbers.For sufficiently large n, another generalization, the Schönhage–Strassen algorithm, is even faster.
You can look for the algorithm in
Karatsuba
or
Karatsuba_Multiplication

There is no algorithm to perform arithmetic operations on very large numbers. The arithmetic operations remains the same. What you need is in class like http://www.codeproject.com/KB/cs/BigInt.aspx

The book Prime Numbers and Computer Methods for Factorization by Riesel has an appendix with easy-to-read code for multiple-precision arithmetic.

For just the algorithms, read Knuth vol 2 or Crandall and Pomerance. For the coding, I would suggest getting the obvious algorithms working first before moving on to Karatsuba or Fourier transforms.

Related

Numerical stability of Simplex Algorithm

Edit: Simplex the mathematical optimization algorithm, not to be confused with simplex noise or triangulation.
I'm implementing my own linear programming solver and I would like to do so using 32bit floats. I know Simplex is very sensitive to the precision of the numbers because it performs lots of calculations and if too little precision is used, rounding errors may occur. But still, I would like to implement it using 32bit floats so I can make the instructions 4-wide, that is, so I can use SIMD to perform 4 calculations at a time. I'm aware that I could use doubles and make instructions 2-wide, but 4 is greater than 2 :)
I have had problems with my floating point implementation where the solution was suboptimal or the problem was said to be unfeasible. This happens especially with mixed integer linear programs, which I solve with the branch and bound method.
So my question is: how can I prevent as much as possible having rounding errors resulting in unfeasible, unbounded or suboptimal solutions?
I know one thing I can do is to scale the input values so that they are close to one (http://lpsolve.sourceforge.net/5.5/scaling.htm). Is there something else I can do?
Yes, I tried to implement an algorithm for the Extended Knapsack problem using the Branch and bound method and Greedy Algorithm as a heuristic, is the exact analogue of the simplex running with a pivoting strategy that chooses the largest objective increase.
I had problems with the numerical stabilities of the algorithm too.
Personally, I don't think there is an easy way to eliminate the issues if we keep using the floating-point, but there is a way to detect the instability during the branching process.
I think, via experiment instead of rigorous maths on Stability Analysis, the majority of errors propagate through an integer solution that is extremely close to the constraints of the system.
Given any integer solution, we figure out the slack for that solution, and if the elements of the vector are extremely small, or on the magnitude of 1e-14 to 1e-15, then stop the branching and report instability.

Determine if a given integer number is element of the Fibonacci sequence in C without using float

I had recently an interview, where I failed and was finally told having not enough experience to work for them.
The position was embedded C software developer. Target platform was some kind of very simple 32-bit architecture, those processor does not support floating-point numbers and their operations. Therefore double and float numbers cannot be used.
The task was to develop a C routine for this architecture. This takes one integer and returns whether or not that is a Fibonacci number. However, from the memory only an additional 1K temporary space is allowed to use during the execution. That means: even if I simulate very great integers, I can't just build up the sequence and interate through.
As far as I know, a positive integer is a exactly then a Fibonacci number if one of
(5n ^ 2) + 4
or
(5n ^ 2) − 4
is a perfect square. Therefore I responded the question: it is simple, since the routine must determine whether or not that is the case.
They responded then: on the current target architecture no floating-point-like operations are supported, therefore no square root numbers can be retrieved by using the stdlib's sqrt function. It was also mentioned that basic operations like division and modulus may also not work because of the architecture's limitations.
Then I said, okay, we may build an array with the square numbers till 256. Then we could iterate through and compare them to the numbers given by the formulas (see above). They said: this is a bad approach, even if it would work. Therefore they did not accept that answer.
Finally I gave up. Since I had no other ideas. I asked, what would be the solution: they said, it won't be told; but advised me to try to look for it myself. My first approach (the 2 formula) should be the key, but the square root may be done alternatively.
I googled at home a lot, but never found any "alternative" square root counter algorithms. Everywhere was permitted to use floating numbers.
For operations like division and modulus, the so-called "integer-division" may be used. But what is to be used for square root?
Even if I failed the interview test, this is a very interesting topic for me, to work on architectures where no floating-point operations are allowed.
Therefore my questions:
How can floating numbers simulated (if only integers are allowed to use)?
What would be a possible soultion in C for that mentioned problem? Code examples are welcome.
The point of this type of interview is to see how you approach new problems. If you happen to already know the answer, that is undoubtedly to your credit but it doesn't really answer the question. What's interesting to the interviewer is watching you grapple with the issues.
For this reason, it is common that an interviewer will add additional constraints, trying to take you out of your comfort zone and seeing how you cope.
I think it's great that you knew that fact about recognising Fibonacci numbers. I wouldn't have known it without consulting Wikipedia. It's an interesting fact but does it actually help solve the problem?
Apparently, it would be necessary to compute 5n²±4, compute the square roots, and then verify that one of them is an integer. With access to a floating point implementation with sufficient precision, this would not be too complicated. But how much precision is that? If n can be an arbitrary 32-bit signed number, then n² is obviously not going to fit into 32 bits. In fact, 5n²+4 could be as big as 65 bits, not including a sign bit. That's far beyond the precision of a double (normally 52 bits) and even of a long double, if available. So computing the precise square root will be problematic.
Of course, we don't actually need a precise computation. We can start with an approximation, square it, and see if it is either four more or four less than 5n². And it's easy to see how to compute a good guess: it will very close to n×√5. By using a good precomputed approximation of √5, we can easily do this computation without the need for floating point, without division, and without a sqrt function. (If the approximation isn't accurate, we might need to adjust the result up or down, but that's easy to do using the identity (n+1)² = n²+2n+1; once we have n², we can compute (n+1)² with only addition.
We still need to solve the problem of precision, so we'll need some way of dealing with 66-bit integers. But we only need to implement addition and multiplication of positive integers, is considerably simpler than a full-fledged bignum package. Indeed, if we can prove that our square root estimation is close enough, we could safely do the verification modulo 2³¹.
So the analytic solution can be made to work, but before diving into it, we should ask whether it's the best solution. One very common caregory of suboptimal programming is clinging desperately to the first idea you come up with even when as its complications become increasingly evident. That will be one of the things the interviewer wants to know about you: how flexible are you when presented with new information or new requirements.
So what other ways are there to know if n is a Fibonacci number. One interesting fact is that if n is Fib(k), then k is the floor of logφ(k×√5 + 0.5). Since logφ is easily computed from log2, which in turn can be approximated by a simple bitwise operation, we could try finding an approximation of k and verifying it using the classic O(log k) recursion for computing Fib(k). None of the above involved numbers bigger than the capacity of a 32-bit signed type.
Even more simply, we could just run through the Fibonacci series in a loop, checking to see if we hit the target number. Only 47 loops are necessary. Alternatively, these 47 numbers could be precalculated and searched with binary search, using far less than the 1k bytes you are allowed.
It is unlikely an interviewer for a programming position would be testing for knowledge of a specific property of the Fibonacci sequence. Thus, unless they present the property to be tested, they are examining the candidate’s approaches to problems of this nature and their general knowledge of algorithms. Notably, the notion to iterate through a table of squares is a poor response on several fronts:
At a minimum, binary search should be the first thought for table look-up. Some calculated look-up approaches could also be proposed for discussion, such as using find-first-set-bit instruction to index into a table.
Hashing might be another idea worth considering, especially since an efficient customized hash might be constructed.
Once we have decided to use a table, it is likely a direct table of Fibonacci numbers would be more useful than a table of squares.

Fast factorization of polynomial with integers coefficients

I want to fast decompose polynomial over ring of integers (original polynomial has integer coefficients and all of factors have integer coefficients).
For example I want to decompose 4*x^6 + 20*x^5 + 29*x^4 - 14*x^3 - 71*x^2 - 48*x as (2*x^4 + 7*x^3 + 4*x^2 - 13*x - 16)*(2*x + 3)*x.
Which algorithm should I pick to avoid complexity of code and inefficiency of approach (speaking about total amount of arithmetic operations and memory consumption)?
I'm going to use the C programming language.
For example, maybe there are some good algorithms for polynomial factorization over ring of integers modulo prime number?
Since Sage is free and open source, you should be able to find the algorithm that Sage uses and then call it or at worst re-implement it in C. However, if you really must write a procedure from scratch, this is what I would do: First find the gcd of all the coefficients and divide that out, which makes your polynomial "content free". Then take the derivative and find the polynomial gcd of the original polynomial and its derivative. Take that factor out of the original polynomial by polynomial division, which breaks your problem into two parts: factoring a content-free, square free polynomial (p/gcd(p,p')), and factoring another polynomial (gcd(p,p')) which may not be square free. For the latter, start over at the beginning, until you have reduced the problem to factoring one or more content-free, square-free polynomials.
The next step would be to implement a factoring algorithm mod p. Berlekamp's algorithm is probably easiest, although Cantor-Zassenhaus is state of the art.
Finally, apply Zassenhaus algorithm to factor over the integers. If you find it is too slow, it can be improved using the "Lenstra-Lenstra-Lovasz lattice basis reduction algorithm". http://en.wikipedia.org/wiki/Factorization_of_polynomials#Factoring_univariate_polynomials_over_the_integers
As you can see, this is all rather complicated and depends on a great deal of theory from abstract algebra. You're much better off using the same library that Sage uses, or re-implementing the Sage implementation, or even just calling a running version of the Sage kernel from within your program.
According to this answer on mathoverflow, Sage uses FLINT to do factorisation.
FLINT (Fast Library for Number Theory) is a C library in support of
computations in number theory. It's also a research project into
algorithms in number theory.
So it is possible to look and even use realisation of decomposition algorithms in that library which is well-tested and stable.

which encoding to use for genetic algorithm?

I want to code a genetic algorithm in C for optimizing a function of 10 variables (x1 to x10). However I am not able to figure out which encoding I should use. I have mostly seen binary encoding being used in example but the variables in my case can take real values. Also, is value encoding a good option for these types of problems?
For real valued problems I would suggest to try CMA-ES or another ES variant. CMA-ES certainly is the current state of the art for real-valued problems. It is designed to find good solutions in multidimensional problems quickly. There are implementations available on Hansen's page. There's also a C# implementation in the work for HeuristicLab. Evolution strategies are algorithms that were specifically designed for real-valued optimization problems. They are very similar to genetic algorithms (both were invented around the same time, but in different places). The main distinction is that for ES the main driver is mutation and it features a clever adaption of the mutation strength. Without this adaption the (local) optimum cannot be located in time. CMA-ES is easy to configure, all it needs is the initial standard deviation and optionally the population size (otherwise there's a formula that estimates this given the problem size).
Genetic algorithms can of course also be applied, but you have to use some specific operators which are able to mutate variables only with very small degree. For example there's the Breeder Genetic Algorithm from Mühlenbein. In general however genetic algorithms are more suited for problems that need a right combination of things. E.g. which items to include in a knapsack problem or which functions and terminals to combine to a formula (genetic programming). Less for problems, where you need to find the right value for something. Although of course there are variants of the genetic algorithm to solve these, look for Real coded Genetic Algorithm (RCGA or RGA).
Another algorithm suited for real-valued problems is Particle Swarm Optimization, but in my opinion it is harder to configure. I'd start with SPSO-2011 the 2011 standard PSO.
If your problem contains integer variables choices become more difficult. Evolution strategies do not perform so well when variables are discrete, because the adaptation schemes for integer variables are different. A genetic algorithm becomes an interesting first-choice algorithm again.
A genetic algorithm is best used when two answers that are pretty close to optimal will make something else pretty close to optimal when combined. The problem with a pure binary encoding is that if you don't check your crossover you end up getting two answers which may not have all that much to do with the original answers.
That said, this is only really an issue if your number of variables is very small and the amount of data in your variables is large. As far as picking an encoding, it's more of an art than a science and it depends on your problem. I would suggest going with an encoding that fits the amount of precision you want. With 10 variables you won't got that far wrong however you encode it, an 8-bit ASCII encoder would probably work fine.
Hope that helps.

What's differential evolution and how does it compare to a genetic algorithm?

From what I've read so far they seem very similar.
Differential evolution uses floating point numbers instead, and the solutions are called vectors? I'm not quite sure what that means.
If someone could provide an overview with a little bit about the advantages and disadvantages of both.
Well, both genetic algorithms and differential evolution are examples of evolutionary computation.
Genetic algorithms keep pretty closely to the metaphor of genetic reproduction. Even the language is mostly the same-- both talk of chromosomes, both talk of genes, the genes are distinct alphabets, both talk of crossover, and the crossover is fairly close to a low-level understanding of genetic reproduction, etc.
Differential evolution is in the same style, but the correspondences are not as exact. The first big change is that DE is using actual real numbers (in the strict mathematical sense-- they're implemented as floats, or doubles, or whatever, but in theory they're ranging over the field of reals.) As a result, the ideas of mutation and crossover are substantially different. The mutation operator is modified so far that it's hard for me to even see why it's called mutation, as such, except that it serves the same purpose of breaking things out of local minima.
On the plus side, there are a handful of results showing DEs are often more effective and/or more efficient than genetic algorithms. And when working in numerical optimization, it's nice to be able to represent things as actual real numbers instead of having to work your way around to a chromosomal kind of representation, first. (Note: I've read about them, but I've not messed extensively with them so I can't really comment from first hand knowledge.)
On the negative side, I don't think there's been any proof of convergence for DEs, yet.
Differential evolution is actually a specific subset of the broader space of genetic algorithms, with the following restrictions:
The genotype is some form of real-valued vector
The mutation / crossover operations make use of the difference between two or more vectors in the population to create a new vector (typically by adding some random proportion of the difference to one of the existing vectors, plus a small amount of random noise)
DE performs well for certain situations because the vectors can be considered to form a "cloud" that explores the high value areas of the solution solution space quite effectively. It's pretty closely related to particle swarm optimization in some senses.
It still has the usual GA problem of getting stuck in local minima however.

Resources