Efficiency of arcsin computation from sine lookup table - c

I have implemented a lookup table to compute sine/cosine values in my system. I now need inverse trigonometric functions (arcsin/arccos).
My application is running on an embedded device on which I can't add a second lookup table for arcsin as I am limited in program memory. So the solution I had in mind was to browse over the sine lookup table to retrieve the corresponding index.
I am wondering if this solution will be more efficient than using the standard implementation coming from the math standard library.
Has someone already experimented on this?
The current implementation of the LUT is an array of the sine values from 0 to PI/2. The value stored in the table are multiplied by 4096 to stay with integer values with enough precision for my application. The lookup table as a resolution of 1/4096 which give us an array of 6434 values.
Then I have two funcitons sine & cosine that takes an angle in radian multiplied by 4096 as argument. Those functions convert the given angle to the corresponding angle in the first quadrant and read the corresponding value in the table.
My application runs on dsPIC33F at 40 MIPS an I use the C30 compiling suite.

It's pretty hard to say anything with certainty since you have not told us about the hardware, the compiler or your code. However, a priori, I'd expect the standard library from your compiler to be more efficient than your code.

It is perhaps unfortunate that you have to use the C30 compiler which does not support C++, otherwise I'd point you to Optimizing Math-Intensive Applications with Fixed-Point Arithmetic and its associated library.
However the general principles of the CORDIC algorithm apply, and the memory footprint will be far smaller than your current implementation. The article explains the generation of arctan() and the arccos() and arcsin() can be calculated from that as described here.
Of course that suggests also that you will need square-root and division also. These may be expensive though PIC24/dsPIC have hardware integer division. The article on math acceleration deals with square-root also. It is likely that your look-up table approach will be faster for the direct look-up, but perhaps not for the reverse search, but the approaches explained in this article are more general and more precise (the library uses 64bit integers as 36.28 bit fixed point, you might get away with less precision and range in your application), and certainly faster than a standard library implementation using software-floating-point.

You can use a "halfway" approach, combining a coarse-grained lookup table to save memory, and a numeric approximation for the intermediate values (e.g. Maclaurin Series, which will be more accurate than linear interpolation.)
Some examples here.
This question also has some related links.

A binary search of 6434 will take ~12 lookups to find the value, followed by an interpolation if more accuracy is needed. Due to the nature if the sin curve, you will get much more accuracy at one end than the other. If you can spare the memory, making your own inverse table evenly spaced on the inputs is likely a better bet for speed and accuracy.
In terms of comparison to the built-in version, you'll have to test that. When you do, pay attention to how much the size of your image increases. The stdin implementations can be pretty hefty in some systems.


How to compare two Math library implementations?

As you know, C standard library defines several standard functions calls that should be implemented by any compliant implementation e.g., Newlib, MUSL, GLIBC ...
If I am targetting Linux for example, I have to choose between glibc and MUSL, and the criteria for me is accuracy of the math library libm. How can I compare two possible implementations of, say sin() or cos() for example?
A naive approach would be to test the output quality of result of both implementations on a set of randomly generated inputs with a reference one (from Matlab for example), but is there any other more reliable/formal/structured/guided way to compare/model the two? I tried to see if there is any research in this direction but I found any, so any pointers are appreciated.
Some thoughts:
You can use the GNU Multiple Precision Arithmetic Library (GnuMP to generate good reference results.
It is possible to test most, if not all of the single-argument single-precision (IEEE-754 binary32) routines exhaustively. (For some of the macOS trigonometric routines, such as sinf, we tested one implementation exhaustively, verifying that it returned faithfully rounded results, meaning the result was the mathematical value [if representable] or one of the two adjacent values [if not]. Then, when changing implementations, we compared one to the other. If a new-implementation result was identical to the old-implementation result, it passed. Otherwise, GnuMP was used to test it. Since new implementations largely coincided with old implementations, this resulted in few invocations of GnuMP, so we were able to exhaustively test a new routine implementation in about three minutes, if I recall correctly.)
It is not feasible to test the multiple-argument or double-precision routines exhaustively.
When comparing implementations, you have to choose a metric, or several metrics. A library that has a good worst-case error is good for proofs; its bound can be asserted to hold true for any argument, and that can be used to derive further bounds in subsequent computations. But a library that has a good average error may tend to produce better results for, say, physics simulations that use large arrays of data. For some applications, only the errors in a “normal” domain may be relevant (angles around −2π to +2π), so errors in reducing large arguments (up to around 10308) may be irrelevant because those arguments are never used.
There are some common points where various routines should be tested. For example, for the trigonometric routines, test at various fractions of π. Aside from being mathematically interesting, these tend to be where implementations switch between approximations, internally. Also test at large numbers that are representable but happen to be very near multiples of simple fractions of π. These are the worst cases for argument reduction and can yield huge relative errors if not done correctly. They require number theory to find. Testing in any sort of scattershot approach, or even orderly approaches that fail to consider this reduction problem, will fail to find these troublesome arguments, so it would be easy to report as accurate a routine that had huge errors.
On the other hand, there are important points to test that cannot be known without internal knowledge of the implementation. For example, when designing a sine routine, I would use the Remez algorithm to find a minimax polynomial, aiming for it to be good from, say, –π/2 to +π/2 (quite large for this sort of thing, but just for example). Then I would look at the arithmetic and rounding errors that could occur during argument reduction. Sometimes they would produce a result a little outside that interval. So I would go back to the minimax polynomial generation and push for a slightly larger interval. And I would also look for improvements in the argument reduction. In the end, I would end up with a reduction guaranteed to produce results within a certain interval and a polynomial known to be good to a certain accuracy within that interval. To test my routine, you then need to know the endpoints of that interval, and you have to be able to find some arguments for which the argument reduction yields points near those endpoints, which means you have to have some idea of how my argument reduction is implemented—how many bits does it use, and such. Like the troublesome arguments mentioned above, these points cannot be found with a scattershot approach. But unlike those above, they cannot be found from pure mathematics; you need information about the implementation. This makes it practically impossible to know you have compared the worst potential arguments for implementations.

Numerical stability of Simplex Algorithm

Edit: Simplex the mathematical optimization algorithm, not to be confused with simplex noise or triangulation.
I'm implementing my own linear programming solver and I would like to do so using 32bit floats. I know Simplex is very sensitive to the precision of the numbers because it performs lots of calculations and if too little precision is used, rounding errors may occur. But still, I would like to implement it using 32bit floats so I can make the instructions 4-wide, that is, so I can use SIMD to perform 4 calculations at a time. I'm aware that I could use doubles and make instructions 2-wide, but 4 is greater than 2 :)
I have had problems with my floating point implementation where the solution was suboptimal or the problem was said to be unfeasible. This happens especially with mixed integer linear programs, which I solve with the branch and bound method.
So my question is: how can I prevent as much as possible having rounding errors resulting in unfeasible, unbounded or suboptimal solutions?
I know one thing I can do is to scale the input values so that they are close to one (http://lpsolve.sourceforge.net/5.5/scaling.htm). Is there something else I can do?
Yes, I tried to implement an algorithm for the Extended Knapsack problem using the Branch and bound method and Greedy Algorithm as a heuristic, is the exact analogue of the simplex running with a pivoting strategy that chooses the largest objective increase.
I had problems with the numerical stabilities of the algorithm too.
Personally, I don't think there is an easy way to eliminate the issues if we keep using the floating-point, but there is a way to detect the instability during the branching process.
I think, via experiment instead of rigorous maths on Stability Analysis, the majority of errors propagate through an integer solution that is extremely close to the constraints of the system.
Given any integer solution, we figure out the slack for that solution, and if the elements of the vector are extremely small, or on the magnitude of 1e-14 to 1e-15, then stop the branching and report instability.

Determine if a given integer number is element of the Fibonacci sequence in C without using float

I had recently an interview, where I failed and was finally told having not enough experience to work for them.
The position was embedded C software developer. Target platform was some kind of very simple 32-bit architecture, those processor does not support floating-point numbers and their operations. Therefore double and float numbers cannot be used.
The task was to develop a C routine for this architecture. This takes one integer and returns whether or not that is a Fibonacci number. However, from the memory only an additional 1K temporary space is allowed to use during the execution. That means: even if I simulate very great integers, I can't just build up the sequence and interate through.
As far as I know, a positive integer is a exactly then a Fibonacci number if one of
(5n ^ 2) + 4
(5n ^ 2) − 4
is a perfect square. Therefore I responded the question: it is simple, since the routine must determine whether or not that is the case.
They responded then: on the current target architecture no floating-point-like operations are supported, therefore no square root numbers can be retrieved by using the stdlib's sqrt function. It was also mentioned that basic operations like division and modulus may also not work because of the architecture's limitations.
Then I said, okay, we may build an array with the square numbers till 256. Then we could iterate through and compare them to the numbers given by the formulas (see above). They said: this is a bad approach, even if it would work. Therefore they did not accept that answer.
Finally I gave up. Since I had no other ideas. I asked, what would be the solution: they said, it won't be told; but advised me to try to look for it myself. My first approach (the 2 formula) should be the key, but the square root may be done alternatively.
I googled at home a lot, but never found any "alternative" square root counter algorithms. Everywhere was permitted to use floating numbers.
For operations like division and modulus, the so-called "integer-division" may be used. But what is to be used for square root?
Even if I failed the interview test, this is a very interesting topic for me, to work on architectures where no floating-point operations are allowed.
Therefore my questions:
How can floating numbers simulated (if only integers are allowed to use)?
What would be a possible soultion in C for that mentioned problem? Code examples are welcome.
The point of this type of interview is to see how you approach new problems. If you happen to already know the answer, that is undoubtedly to your credit but it doesn't really answer the question. What's interesting to the interviewer is watching you grapple with the issues.
For this reason, it is common that an interviewer will add additional constraints, trying to take you out of your comfort zone and seeing how you cope.
I think it's great that you knew that fact about recognising Fibonacci numbers. I wouldn't have known it without consulting Wikipedia. It's an interesting fact but does it actually help solve the problem?
Apparently, it would be necessary to compute 5n²±4, compute the square roots, and then verify that one of them is an integer. With access to a floating point implementation with sufficient precision, this would not be too complicated. But how much precision is that? If n can be an arbitrary 32-bit signed number, then n² is obviously not going to fit into 32 bits. In fact, 5n²+4 could be as big as 65 bits, not including a sign bit. That's far beyond the precision of a double (normally 52 bits) and even of a long double, if available. So computing the precise square root will be problematic.
Of course, we don't actually need a precise computation. We can start with an approximation, square it, and see if it is either four more or four less than 5n². And it's easy to see how to compute a good guess: it will very close to n×√5. By using a good precomputed approximation of √5, we can easily do this computation without the need for floating point, without division, and without a sqrt function. (If the approximation isn't accurate, we might need to adjust the result up or down, but that's easy to do using the identity (n+1)² = n²+2n+1; once we have n², we can compute (n+1)² with only addition.
We still need to solve the problem of precision, so we'll need some way of dealing with 66-bit integers. But we only need to implement addition and multiplication of positive integers, is considerably simpler than a full-fledged bignum package. Indeed, if we can prove that our square root estimation is close enough, we could safely do the verification modulo 2³¹.
So the analytic solution can be made to work, but before diving into it, we should ask whether it's the best solution. One very common caregory of suboptimal programming is clinging desperately to the first idea you come up with even when as its complications become increasingly evident. That will be one of the things the interviewer wants to know about you: how flexible are you when presented with new information or new requirements.
So what other ways are there to know if n is a Fibonacci number. One interesting fact is that if n is Fib(k), then k is the floor of logφ(k×√5 + 0.5). Since logφ is easily computed from log2, which in turn can be approximated by a simple bitwise operation, we could try finding an approximation of k and verifying it using the classic O(log k) recursion for computing Fib(k). None of the above involved numbers bigger than the capacity of a 32-bit signed type.
Even more simply, we could just run through the Fibonacci series in a loop, checking to see if we hit the target number. Only 47 loops are necessary. Alternatively, these 47 numbers could be precalculated and searched with binary search, using far less than the 1k bytes you are allowed.
It is unlikely an interviewer for a programming position would be testing for knowledge of a specific property of the Fibonacci sequence. Thus, unless they present the property to be tested, they are examining the candidate’s approaches to problems of this nature and their general knowledge of algorithms. Notably, the notion to iterate through a table of squares is a poor response on several fronts:
At a minimum, binary search should be the first thought for table look-up. Some calculated look-up approaches could also be proposed for discussion, such as using find-first-set-bit instruction to index into a table.
Hashing might be another idea worth considering, especially since an efficient customized hash might be constructed.
Once we have decided to use a table, it is likely a direct table of Fibonacci numbers would be more useful than a table of squares.

How to do floating point calculations with integers

I have a coprocessor attached to the main processor. Some floating point calculations needs to be done in the coprocessor, but it does not support hardware floating point instructions, and emulation is too slow.
Now one way is to have the main processor to scale the floating point values so that they can be represented as integers, send them to the co processor, who performs some calculations, and scale back those values on return. However, that wouldn't work most of the time, as the numbers would eventually become too big or small to be out of range of those integers. So my question is, what is the fastest way of doing this properly.
You are saying emulation is too slow. I guess you mean emulation of floating point. The only remaining alternative if scaled integers are not sufficient, is fixed point math but it's not exactly fast either, even though it's much faster than emulated float.
Also, you are never going to escape the fact that with both scaled integers, and fixed point math, you are going to get less dynamic range than with floating point.
However, if your range is known in advance, the fixed point math implementation can be tuned for the range you need.
Here is an article on fixed point. The gist of the trick is deciding how to split the variable, how many bits for the low and high part of the number.
A full implementation of fixed point for C can be found here. (BSD license.) There are others.
In addition to #Amigable Clark Kant's suggestion, Anthony Williams' fixed point math library provides a C++ fixed class that can be use almost interchangeably with float or double and on ARM gives a 5x performance improvement over software floating point. It includes a complete fixed point version of the standard math library including trig and log functions etc. using the CORDIC algorithm.

Sine function in Matlab/C/Java or any other program in digital systems

How does Matlab/C generates Sine wave, I mean do they store the values for every angle ? but if they do then there are infinite values that are needed to be stored.
There are many, many routines for calculating sines of angles using only the basic arithmetic operators available on any modern digital computer. One such, but only one example, is the CORDIC algorithm. I don't know what algorithm(s) Matlab uses for trigonometric functions.
Computers don't simply look up the value of a trigonometric function in a stored table of values, though some of the algorithms in current use do look up critical values in stored tables. What those tables contain is algorithm-specific and would probably not accord with a naive expectation of a trig table.
Note that this question has been asked many times on SO, this answer seems to be the pick of them.
No, they generally don't. And even if they did, no, there are not "infinite values". Digital finite (real) computers don't deal well with the infinite, but that applies just as well to the angle (the input to the sine function). You can't ask for "every" angle, since the angle itself must be expressed in a finite set of bits.
I think using Taylor series is common, or other solutions based on interpolation.
Also, modern CPU:s have instructions for computing sine, but that of course just pushes the question of how it's implemented down a level or three.
See also this very similar question for more details.
Before the 90s computers would numerically calculate sine and other trigonometric functions using the basic functions addition, subtraction, multiplication and division. The algorithm for calculating them was usually included in basic compiler libraries. At some point, it become possible to add an optional hardware processor called the floating point processor (FPU). My understanding of the FPU is that it did have hard values of the trig functions stored on it. The speed to calculate trig functions would increase dramatically with the inclusion of the FPU. Since the 90s however, the FPU has been bundled with the CPU.
I can't seem to find any explicit description on precisely how sine and other trig functions are implemented by the FPU and generally that is an implementation detail left to the electrical engineer designing the chip. However it really is only necessary to know the values from 0 to pi/2. Everything else can be easily calculated from those values.
EDIT: Ok here is the implementation used on the FPU: CORDIC, which I found at this answer.
