Numerical stability of Simplex Algorithm - c

Edit: Simplex the mathematical optimization algorithm, not to be confused with simplex noise or triangulation.
I'm implementing my own linear programming solver and I would like to do so using 32bit floats. I know Simplex is very sensitive to the precision of the numbers because it performs lots of calculations and if too little precision is used, rounding errors may occur. But still, I would like to implement it using 32bit floats so I can make the instructions 4-wide, that is, so I can use SIMD to perform 4 calculations at a time. I'm aware that I could use doubles and make instructions 2-wide, but 4 is greater than 2 :)
I have had problems with my floating point implementation where the solution was suboptimal or the problem was said to be unfeasible. This happens especially with mixed integer linear programs, which I solve with the branch and bound method.
So my question is: how can I prevent as much as possible having rounding errors resulting in unfeasible, unbounded or suboptimal solutions?
I know one thing I can do is to scale the input values so that they are close to one (http://lpsolve.sourceforge.net/5.5/scaling.htm). Is there something else I can do?

Yes, I tried to implement an algorithm for the Extended Knapsack problem using the Branch and bound method and Greedy Algorithm as a heuristic, is the exact analogue of the simplex running with a pivoting strategy that chooses the largest objective increase.
I had problems with the numerical stabilities of the algorithm too.
Personally, I don't think there is an easy way to eliminate the issues if we keep using the floating-point, but there is a way to detect the instability during the branching process.
I think, via experiment instead of rigorous maths on Stability Analysis, the majority of errors propagate through an integer solution that is extremely close to the constraints of the system.
Given any integer solution, we figure out the slack for that solution, and if the elements of the vector are extremely small, or on the magnitude of 1e-14 to 1e-15, then stop the branching and report instability.

Related

Determine if a given integer number is element of the Fibonacci sequence in C without using float

I had recently an interview, where I failed and was finally told having not enough experience to work for them.
The position was embedded C software developer. Target platform was some kind of very simple 32-bit architecture, those processor does not support floating-point numbers and their operations. Therefore double and float numbers cannot be used.
The task was to develop a C routine for this architecture. This takes one integer and returns whether or not that is a Fibonacci number. However, from the memory only an additional 1K temporary space is allowed to use during the execution. That means: even if I simulate very great integers, I can't just build up the sequence and interate through.
As far as I know, a positive integer is a exactly then a Fibonacci number if one of
(5n ^ 2) + 4
or
(5n ^ 2) − 4
is a perfect square. Therefore I responded the question: it is simple, since the routine must determine whether or not that is the case.
They responded then: on the current target architecture no floating-point-like operations are supported, therefore no square root numbers can be retrieved by using the stdlib's sqrt function. It was also mentioned that basic operations like division and modulus may also not work because of the architecture's limitations.
Then I said, okay, we may build an array with the square numbers till 256. Then we could iterate through and compare them to the numbers given by the formulas (see above). They said: this is a bad approach, even if it would work. Therefore they did not accept that answer.
Finally I gave up. Since I had no other ideas. I asked, what would be the solution: they said, it won't be told; but advised me to try to look for it myself. My first approach (the 2 formula) should be the key, but the square root may be done alternatively.
I googled at home a lot, but never found any "alternative" square root counter algorithms. Everywhere was permitted to use floating numbers.
For operations like division and modulus, the so-called "integer-division" may be used. But what is to be used for square root?
Even if I failed the interview test, this is a very interesting topic for me, to work on architectures where no floating-point operations are allowed.
Therefore my questions:
How can floating numbers simulated (if only integers are allowed to use)?
What would be a possible soultion in C for that mentioned problem? Code examples are welcome.
The point of this type of interview is to see how you approach new problems. If you happen to already know the answer, that is undoubtedly to your credit but it doesn't really answer the question. What's interesting to the interviewer is watching you grapple with the issues.
For this reason, it is common that an interviewer will add additional constraints, trying to take you out of your comfort zone and seeing how you cope.
I think it's great that you knew that fact about recognising Fibonacci numbers. I wouldn't have known it without consulting Wikipedia. It's an interesting fact but does it actually help solve the problem?
Apparently, it would be necessary to compute 5n²±4, compute the square roots, and then verify that one of them is an integer. With access to a floating point implementation with sufficient precision, this would not be too complicated. But how much precision is that? If n can be an arbitrary 32-bit signed number, then n² is obviously not going to fit into 32 bits. In fact, 5n²+4 could be as big as 65 bits, not including a sign bit. That's far beyond the precision of a double (normally 52 bits) and even of a long double, if available. So computing the precise square root will be problematic.
Of course, we don't actually need a precise computation. We can start with an approximation, square it, and see if it is either four more or four less than 5n². And it's easy to see how to compute a good guess: it will very close to n×√5. By using a good precomputed approximation of √5, we can easily do this computation without the need for floating point, without division, and without a sqrt function. (If the approximation isn't accurate, we might need to adjust the result up or down, but that's easy to do using the identity (n+1)² = n²+2n+1; once we have n², we can compute (n+1)² with only addition.
We still need to solve the problem of precision, so we'll need some way of dealing with 66-bit integers. But we only need to implement addition and multiplication of positive integers, is considerably simpler than a full-fledged bignum package. Indeed, if we can prove that our square root estimation is close enough, we could safely do the verification modulo 2³¹.
So the analytic solution can be made to work, but before diving into it, we should ask whether it's the best solution. One very common caregory of suboptimal programming is clinging desperately to the first idea you come up with even when as its complications become increasingly evident. That will be one of the things the interviewer wants to know about you: how flexible are you when presented with new information or new requirements.
So what other ways are there to know if n is a Fibonacci number. One interesting fact is that if n is Fib(k), then k is the floor of logφ(k×√5 + 0.5). Since logφ is easily computed from log2, which in turn can be approximated by a simple bitwise operation, we could try finding an approximation of k and verifying it using the classic O(log k) recursion for computing Fib(k). None of the above involved numbers bigger than the capacity of a 32-bit signed type.
Even more simply, we could just run through the Fibonacci series in a loop, checking to see if we hit the target number. Only 47 loops are necessary. Alternatively, these 47 numbers could be precalculated and searched with binary search, using far less than the 1k bytes you are allowed.
It is unlikely an interviewer for a programming position would be testing for knowledge of a specific property of the Fibonacci sequence. Thus, unless they present the property to be tested, they are examining the candidate’s approaches to problems of this nature and their general knowledge of algorithms. Notably, the notion to iterate through a table of squares is a poor response on several fronts:
At a minimum, binary search should be the first thought for table look-up. Some calculated look-up approaches could also be proposed for discussion, such as using find-first-set-bit instruction to index into a table.
Hashing might be another idea worth considering, especially since an efficient customized hash might be constructed.
Once we have decided to use a table, it is likely a direct table of Fibonacci numbers would be more useful than a table of squares.

which encoding to use for genetic algorithm?

I want to code a genetic algorithm in C for optimizing a function of 10 variables (x1 to x10). However I am not able to figure out which encoding I should use. I have mostly seen binary encoding being used in example but the variables in my case can take real values. Also, is value encoding a good option for these types of problems?
For real valued problems I would suggest to try CMA-ES or another ES variant. CMA-ES certainly is the current state of the art for real-valued problems. It is designed to find good solutions in multidimensional problems quickly. There are implementations available on Hansen's page. There's also a C# implementation in the work for HeuristicLab. Evolution strategies are algorithms that were specifically designed for real-valued optimization problems. They are very similar to genetic algorithms (both were invented around the same time, but in different places). The main distinction is that for ES the main driver is mutation and it features a clever adaption of the mutation strength. Without this adaption the (local) optimum cannot be located in time. CMA-ES is easy to configure, all it needs is the initial standard deviation and optionally the population size (otherwise there's a formula that estimates this given the problem size).
Genetic algorithms can of course also be applied, but you have to use some specific operators which are able to mutate variables only with very small degree. For example there's the Breeder Genetic Algorithm from Mühlenbein. In general however genetic algorithms are more suited for problems that need a right combination of things. E.g. which items to include in a knapsack problem or which functions and terminals to combine to a formula (genetic programming). Less for problems, where you need to find the right value for something. Although of course there are variants of the genetic algorithm to solve these, look for Real coded Genetic Algorithm (RCGA or RGA).
Another algorithm suited for real-valued problems is Particle Swarm Optimization, but in my opinion it is harder to configure. I'd start with SPSO-2011 the 2011 standard PSO.
If your problem contains integer variables choices become more difficult. Evolution strategies do not perform so well when variables are discrete, because the adaptation schemes for integer variables are different. A genetic algorithm becomes an interesting first-choice algorithm again.
A genetic algorithm is best used when two answers that are pretty close to optimal will make something else pretty close to optimal when combined. The problem with a pure binary encoding is that if you don't check your crossover you end up getting two answers which may not have all that much to do with the original answers.
That said, this is only really an issue if your number of variables is very small and the amount of data in your variables is large. As far as picking an encoding, it's more of an art than a science and it depends on your problem. I would suggest going with an encoding that fits the amount of precision you want. With 10 variables you won't got that far wrong however you encode it, an 8-bit ASCII encoder would probably work fine.
Hope that helps.

Algorithm for doing arithmetic operation on very large numbers

I need a algorithm to perform arithmetic operations on large numbers(that are way above range of float, double int or any other data type for that matter). I am required to write the code in C. I tried looking up here: Knuth, Donald, The Art of Computer Programming, ISBN 0-201-89684-2, Volume 2: Seminumerical Algorithms, Section 4.3.1: The Classical Algorithms but couldn't stand it. I just need the algorithm not the code.
For addition, as far as I know, you won't get much better than the simple linear O(n) algorithm, i.e., just add digit by digit. You likely, have to read the entire input anyway, so it's at least linear. You might be able to do various tricks to get the constant down.
Your main issue is going to be multiplication, due to the quadratic nature of the basic long multiplication algorithm. You might want to consider one of the several much faster methods given here. The Karatsuba method is a little tricky to implement nicely but is probably the easiest non-trivial algorithm that will give you a gain. Otherwise, you'll have to look a bit more into the Fast Fourier Transform methods, such as Schönhage-Strassen's algorithm or Fürer's algorithm.
See also Big O notation.
I think Karatsuba algorithm is best to perform arithmetic operations on large numbers.For sufficiently large n, another generalization, the Schönhage–Strassen algorithm, is even faster.
You can look for the algorithm in
Karatsuba
or
Karatsuba_Multiplication
There is no algorithm to perform arithmetic operations on very large numbers. The arithmetic operations remains the same. What you need is in class like http://www.codeproject.com/KB/cs/BigInt.aspx
The book Prime Numbers and Computer Methods for Factorization by Riesel has an appendix with easy-to-read code for multiple-precision arithmetic.
For just the algorithms, read Knuth vol 2 or Crandall and Pomerance. For the coding, I would suggest getting the obvious algorithms working first before moving on to Karatsuba or Fourier transforms.

Efficiency of arcsin computation from sine lookup table

I have implemented a lookup table to compute sine/cosine values in my system. I now need inverse trigonometric functions (arcsin/arccos).
My application is running on an embedded device on which I can't add a second lookup table for arcsin as I am limited in program memory. So the solution I had in mind was to browse over the sine lookup table to retrieve the corresponding index.
I am wondering if this solution will be more efficient than using the standard implementation coming from the math standard library.
Has someone already experimented on this?
The current implementation of the LUT is an array of the sine values from 0 to PI/2. The value stored in the table are multiplied by 4096 to stay with integer values with enough precision for my application. The lookup table as a resolution of 1/4096 which give us an array of 6434 values.
Then I have two funcitons sine & cosine that takes an angle in radian multiplied by 4096 as argument. Those functions convert the given angle to the corresponding angle in the first quadrant and read the corresponding value in the table.
My application runs on dsPIC33F at 40 MIPS an I use the C30 compiling suite.
It's pretty hard to say anything with certainty since you have not told us about the hardware, the compiler or your code. However, a priori, I'd expect the standard library from your compiler to be more efficient than your code.
It is perhaps unfortunate that you have to use the C30 compiler which does not support C++, otherwise I'd point you to Optimizing Math-Intensive Applications with Fixed-Point Arithmetic and its associated library.
However the general principles of the CORDIC algorithm apply, and the memory footprint will be far smaller than your current implementation. The article explains the generation of arctan() and the arccos() and arcsin() can be calculated from that as described here.
Of course that suggests also that you will need square-root and division also. These may be expensive though PIC24/dsPIC have hardware integer division. The article on math acceleration deals with square-root also. It is likely that your look-up table approach will be faster for the direct look-up, but perhaps not for the reverse search, but the approaches explained in this article are more general and more precise (the library uses 64bit integers as 36.28 bit fixed point, you might get away with less precision and range in your application), and certainly faster than a standard library implementation using software-floating-point.
You can use a "halfway" approach, combining a coarse-grained lookup table to save memory, and a numeric approximation for the intermediate values (e.g. Maclaurin Series, which will be more accurate than linear interpolation.)
Some examples here.
This question also has some related links.
A binary search of 6434 will take ~12 lookups to find the value, followed by an interpolation if more accuracy is needed. Due to the nature if the sin curve, you will get much more accuracy at one end than the other. If you can spare the memory, making your own inverse table evenly spaced on the inputs is likely a better bet for speed and accuracy.
In terms of comparison to the built-in version, you'll have to test that. When you do, pay attention to how much the size of your image increases. The stdin implementations can be pretty hefty in some systems.

Finding prime factors to large numbers using specially-crafted CPUs

My understanding is that many public key cryptographic algorithms these days depend on large prime numbers to make up the keys, and it is the difficulty in factoring the product of two primes that makes the encryption hard to break. It is also my understanding that one of the reasons that factoring such large numbers is so difficult, is that the sheer size of the numbers used means that no CPU can efficiently operate on the numbers, since our minuscule 32 and 64 bit CPUs are no match for 1024, 2048 or even 4096 bit numbers. Specialized Big Integer math libraries must be used in order to process those numbers, and those libraries are inherently slow since a CPU can only hold (and process) small chunks (like 32 or 64 bits) at one time.
So...
Why can't you build a highly specialized custom chip with 2048 bit registers, and giant arithmetic circuits, much in the same way that we scaled from 8 to 16 to 32 to 64-bit CPUs, just build one a LOT larger? This chip wouldn't need most of the circuitry on conventional CPUs, after all it wouldn't need to handle things like virtual memory, multithreading or I/O. It wouldn't even need to be a general-purpose processor supporting stored instructions. Just the bare minimum to perform the necessary arithmetical calculations on ginormous numbers.
I don't know a whole lot about IC design, but I do remember learning about how logic gates work, how to build a half adder, full adder, then link together a bunch of adders to do multi-bit arithmetic. Just scale up. A lot.
Now, I'm fairly certain that there is a very good reason (or 17) that the above won't work (since otherwise one of the many people smarter than I am would have already done it) but I am interested in knowing why it won't work.
(Note: This question may need some re-working, as I'm not even sure yet if the question makes sense)
What #cube said, and the fact that a giant arithmetic logic unit would take more time for the logic signals to stabilize, and include other complications in digital design. Digital logic design includes something that you take for granted in software, namely that signals through combinational logic take a small but nonzero time to propagate and settle. A 32x32 multiplier needs to be designed carefully. A 1024x1024 multiplier would not only take a huge amount of physical resources in a chip, but it also would be slower than a 32x32 multiplier (though perhaps faster than a 32x32 multiplier computing all the partial products needed to perform a 1024x1024 multiply). Plus it's not only the multiplier that's the bottleneck: you've got memory pathways. You'd have to spend a bunch of time gathering the 1024 bits from a memory circuit that's only 32 bits wide, and storing the resulting 2048 bits back into the memory circuit.
Almost certainly it's better to get a bunch of "conventional" 32-bit or 64-bit systems working in parallel: you get the speedup w/o the hardware design complexity.
edit: if anyone has ACM access (I don't), perhaps take a look at this paper to see what it says.
Its because this speedup would be only in O(n), but the complexity of factoring the number is something like O(2^n) (with respect to the number of bits). So if you made this überprocessor and factorized the numbers 1000 times faster, I would only have to make the numbers 10 bits larger and we would be back on the start again.
As indicated above, the primary problem is simply how many possibilities you have to go through to factor a number. That being said, specialized computers do exist to do this sort of thing.
The real progress for this sort of cryptography is improvements in number factoring algorithms. Currently, the fastest known general algorithm is the general number field sieve.
Historically, we seem to be able to factor numbers twice as large each decade. Part of that is faster hardware, and part of it is simply a better understanding of mathematics and how to perform factoring.
I can't comment on the feasibility of an approach exactly like the one you described, but people do similar things very frequently using FPGAs:
Crack DES keys
Crack GSM conversations
Open source graphics card
Shamir & Tromer suggest a similar approach, using a kind of grid computing:
This article discusses a new design for a custom hardware
implementation of the sieving step, which
reduces [the cost of sieving, relative to TWINKLE,] to about $10M. The new device,
called TWIRL, can be seen as an extension of the
TWINKLE device. However, unlike TWINKLE it
does not have optoelectronic components, and can
thus be manufactured using standard VLSI technology
on silicon wafers. The underlying idea is to use
a single copy of the input to solve many subproblems
in parallel. Since input storage dominates cost, if the
parallelization overhead is kept low then the resulting
speedup is obtained essentially for free. Indeed, the
main challenge lies in achieving this parallelism efficiently while allowing compact storage of the input.
Addressing this involves myriad considerations, ranging
from number theory to VLSI technology.
Why don't you try building an uber-quantum computer and run Shor's algorithm on it?
"... If a quantum computer with a sufficient number of qubits were to be constructed, Shor's algorithm could be used to break public-key cryptography schemes such as the widely used RSA scheme. RSA is based on the assumption that factoring large numbers is computationally infeasible. So far as is known, this assumption is valid for classical (non-quantum) computers; no classical algorithm is known that can factor in polynomial time. However, Shor's algorithm shows that factoring is efficient on a quantum computer, so a sufficiently large quantum computer can break RSA. ..." -Wikipedia

Resources