I am developing an algorithm for an audio application for mobile platforms. It appears to me that currently the float point calculation support on many mobile processors is not ubiquitous and developing in fixed point would be a safer bet.
I have written FFT routines in float point form for some time now to a degree of success, however writing one in fixed point turned out to be rather difficult. Namely, I would be happy to improve the precision, as well as to find a way to handle potential overflows. The problem is, unlike float point FFTs, descriptions of fixed point FFT algorithms are hard to come by on the Internet.
Has anyone had some experience developing such algorithms?
Your first choice should probably be to use a native-optimized FFT. There are processing requirement for fixed point FFTs that are difficult to express efficiently in portable C (or any language probably): saturation arithmetic is probably the biggest obstacle. Assembly libraries will tend to take advantage of processor-specific instructions for these .
If you still want a portable ANSI C fixed point FFT, I only know of one choice: kissfft. (Disclaimer : I wrote it)
I have read great things about http://anthonix.com/ffts/index.html - this works well on mobile platforms - The site contains benchmarks
I have been working on an automated tool that converts floating-point C code to fixed-point, with a variety of options for tradeoffs between accuracy and execution time. I have had good results with a number of algorithms, including a 2D 8x8 discrete cosine transform. My target platform is typically an ARM Cortex-M processor but similar results should be achievable on other platforms. Would you be interested in letting me take a crack at your FFT?
Related
Edit: Simplex the mathematical optimization algorithm, not to be confused with simplex noise or triangulation.
I'm implementing my own linear programming solver and I would like to do so using 32bit floats. I know Simplex is very sensitive to the precision of the numbers because it performs lots of calculations and if too little precision is used, rounding errors may occur. But still, I would like to implement it using 32bit floats so I can make the instructions 4-wide, that is, so I can use SIMD to perform 4 calculations at a time. I'm aware that I could use doubles and make instructions 2-wide, but 4 is greater than 2 :)
I have had problems with my floating point implementation where the solution was suboptimal or the problem was said to be unfeasible. This happens especially with mixed integer linear programs, which I solve with the branch and bound method.
So my question is: how can I prevent as much as possible having rounding errors resulting in unfeasible, unbounded or suboptimal solutions?
I know one thing I can do is to scale the input values so that they are close to one (http://lpsolve.sourceforge.net/5.5/scaling.htm). Is there something else I can do?
Yes, I tried to implement an algorithm for the Extended Knapsack problem using the Branch and bound method and Greedy Algorithm as a heuristic, is the exact analogue of the simplex running with a pivoting strategy that chooses the largest objective increase.
I had problems with the numerical stabilities of the algorithm too.
Personally, I don't think there is an easy way to eliminate the issues if we keep using the floating-point, but there is a way to detect the instability during the branching process.
I think, via experiment instead of rigorous maths on Stability Analysis, the majority of errors propagate through an integer solution that is extremely close to the constraints of the system.
Given any integer solution, we figure out the slack for that solution, and if the elements of the vector are extremely small, or on the magnitude of 1e-14 to 1e-15, then stop the branching and report instability.
I want to code a genetic algorithm in C for optimizing a function of 10 variables (x1 to x10). However I am not able to figure out which encoding I should use. I have mostly seen binary encoding being used in example but the variables in my case can take real values. Also, is value encoding a good option for these types of problems?
For real valued problems I would suggest to try CMA-ES or another ES variant. CMA-ES certainly is the current state of the art for real-valued problems. It is designed to find good solutions in multidimensional problems quickly. There are implementations available on Hansen's page. There's also a C# implementation in the work for HeuristicLab. Evolution strategies are algorithms that were specifically designed for real-valued optimization problems. They are very similar to genetic algorithms (both were invented around the same time, but in different places). The main distinction is that for ES the main driver is mutation and it features a clever adaption of the mutation strength. Without this adaption the (local) optimum cannot be located in time. CMA-ES is easy to configure, all it needs is the initial standard deviation and optionally the population size (otherwise there's a formula that estimates this given the problem size).
Genetic algorithms can of course also be applied, but you have to use some specific operators which are able to mutate variables only with very small degree. For example there's the Breeder Genetic Algorithm from Mühlenbein. In general however genetic algorithms are more suited for problems that need a right combination of things. E.g. which items to include in a knapsack problem or which functions and terminals to combine to a formula (genetic programming). Less for problems, where you need to find the right value for something. Although of course there are variants of the genetic algorithm to solve these, look for Real coded Genetic Algorithm (RCGA or RGA).
Another algorithm suited for real-valued problems is Particle Swarm Optimization, but in my opinion it is harder to configure. I'd start with SPSO-2011 the 2011 standard PSO.
If your problem contains integer variables choices become more difficult. Evolution strategies do not perform so well when variables are discrete, because the adaptation schemes for integer variables are different. A genetic algorithm becomes an interesting first-choice algorithm again.
A genetic algorithm is best used when two answers that are pretty close to optimal will make something else pretty close to optimal when combined. The problem with a pure binary encoding is that if you don't check your crossover you end up getting two answers which may not have all that much to do with the original answers.
That said, this is only really an issue if your number of variables is very small and the amount of data in your variables is large. As far as picking an encoding, it's more of an art than a science and it depends on your problem. I would suggest going with an encoding that fits the amount of precision you want. With 10 variables you won't got that far wrong however you encode it, an 8-bit ASCII encoder would probably work fine.
Hope that helps.
We are looking for exemplar problems and codes that will run on any or all of shared memory, distributed memory, and GPGPU architectures. The reference platform we are using is LittleFe (littlefe.net), an open-design, low cost educational cluster currently with six dual core CPUs, each with an nVidia chipset.
These problems and solutions will be good for teaching parallelism to any newbie by providing working examples and opportunities to roll up your sleeves and code. Stackoverflow experts have good insight and are likely to have some favorites.
Calculating area under a curve is interesting, simple and easy to understand, but there are bound to be ones that are just as easily expressed and chock full of opportunities to practice and learn.
Hybrid examples using more than one of the memory architectures are most desirable, and reflective of where parallel programming seems to be trending.
On LittleFe we have predominantly been using three applications. The first is an analysis of optimal targets on a dartboard which is highly parallel with little communication overhead. The second is Conway's game of life which is a typical of problems sharing boundary conditions. It has a moderate communication overhead. The third is an n-body model of galaxy formation which requires heavy communication overhead.
The CUDA programming guide(PDF) contains a detailed analysis of the implementation of matrix multiplication on a GPU. That seems to be the staple "hello world" example for learning GPU programing.
Furthermore, the CUDE SDK contains tens of other well explained examples of GPU programming in CUDA and OpenCL. My favorite is the colliding balls example. (a demo with a few thousands of balls colliding in real time)
Two of my favorites are numerical integeration and finding prime numbers. For the first we code the midpoint rectangle rule on the function f(x) = 4.0 / (1.0 + x*x). Integration of the function between 0 and 1 give an approximation of the constant pi, which makes checking the correctness of the answer easy. The parallelism is across the range of the integration (computing the areas of rectangles).
For the second, we input an integer range and then identify and save the prime numbers in that range. We use a brute force division of values by all possible factors; if any divisors are found that are not 1 or the number, then the value is composite. If a prime is found, count it and store in a shared array. The parallelism is dividing up the range since testing for primality of N is independent of testing M. There is some trickiness needed to share the prime store between threads or to gather distributed parital answers.
These are very basic and simple problems to solve, which allows students to focus on the parallel implementation and not so much on the computation involved.
One of the more complex but easy example problems is the BLAS routine sgemm or dgemm (C = alpha * A x B + beta * C) where A, B, C are matrices of valid sizes and alpha and beta are scalars. The types may be single precision floating point (sgemm) or double precision floating point (dgemm).
The implementation of this simple routine on different platforms and architectures teaches some insights about the functionality and working principles. For more details on BLAS and the ?gemm routine have a look to http://www.netlib.org/blas.
You need only to pay attention that for a double precision implementation on the GPU the GPU needs to have double precision capabilities.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm doing some Project Euler problems and most of the time, the computations involve large numbers beyond int, float, double etc.
Firstly, I know that I should be looking for more efficient ways of calculation so as to avoid the large number problem. I've heard of the Bignum libraries.
But, for academics interests, I'd like to know how to code my own solution to this problem.
Can any expert please help me out? (My language is C)
You need to store the big numbers in a base that your computer can easily handle with its native types, and then store the digits in a variable length array. I'd suggest that for simplicity you start by storing the numbers in base 10 just to get the hang of how to do this. It will make debugging a lot easier.
Once you have a class that can store the numbers in this form, it's just a matter of implementing the operations add, subtract, multiply, etc. on this class. Each operation will have to iterate over digits of its operands and combine them, being careful to carry correctly so that your digits are never larger than the base. Addition and subtraction are simple. Multiplication requires a bit more work as the naive algorithm requires nested loops. Then once you have that working, you can try implementing exponentiation in an efficient manner (e.g. repeated squaring).
If you are planning to write a serious bignum implementation, base 10 won't cut it. It's wasteful of memory and it will be slow. You should choose a base that is natural for the computer, such as 256 or the word size (2**32). However this will make simple operations more difficult as you will get overflows if you naively add two digits, so you will need to handle that very carefully.
C is not a good choice for Project Euler. The benefits of C are raw speed, machine portability (to an extent, with standard C), language interoperability (if some language communicates with another, C is a popular first choice), sticking close to a specific library or platform's API (because C is common, e.g. OS API), and a stable language & stdlib. None of these benefits apply to solving Project Euler problems. Not even raw speed, because most of the problems aren't about raw computation, but understanding the algorithm required, and you can sit there all day and wait before submission.
If you are attempting Project Euler problems to broaden your experience with C, that's perfectly fine, just realize this experience doesn't necessarily apply to long-lived and real-world C projects you may work on.
For this kind of short, one-off problem those languages commonly described as "scripting languages" will work better, faster (in dev time), and easier. Try Python, it stays close to C in many ways, including a C API, and out of the various popular "scripting languages" is possibly the one for which you will find the most use in conjunction with C projects.
This may become an unpopular answer, but it isn't a rant—plus I really like C and use C/C++ often—and there is an explicit answer here to your problem: "don't use C", with your final large number solution depending on which alternative you choose. Again picking on Python, integers do not have an upper bound (note below), and I use this to naturally code answers to Project Euler problems, where in other languages I have to use a painful-by-comparison alternative number library.
(Python integers: There are two integer types in 2.x, 'int' and 'long' (which have been completely unified in 3.x). The conversion between them is practically seamless, and 'long' allows arbitrarily large values, instead of just being a bigger 'int' type as C's long is.)
A popular bignum library for C/C++ is the GNU MP Bignum Library. I've used it for several Project Euler problems, but fact remains that C isn't a very suitable language for Euler-problems. If performance was more important C would have more to give, but now you're much better off using a language which built in bignum support, such as Ruby (there are lots of others).
A simple way is to think of the number as its string representation in base b. Suppose b=10, simple arithmetic operation like addition on two such strings can be done using the same method we use when adding numbers by pen and paper. The same goes for other simple operations. For better results, you can take a larger base.
A simple bignum implementation like that should be enough for most Project Euler problems (probably all, but I haven't solved much at Euler so can't be sure), but there are ways of using much faster algorithms for operations such as multiplication and division/mod.
Although I recommend writing your own bignum for practice, if you are really stuck you can take ideas from the code of already implemented bigint libraries. For a serious implementation something like gmp is the obvious choice. But you cana also find small bigints coded by other people when solving similar practice problem online (e.g. Abednego's bigint.cpp).
Here's a nice and simple bignum module for C. You can learn from it for ideas. The C code isn't the highest quality, but the algorithm is well implemented and quite common.
For more advanced stuff, look up GMP.
If you want a nice C++ version (I know, you said C, but this is really interesting code), take a look at the internals of CGAL: http://www.cgal.org/
Kernel-based classifier usually requires O(n^3) training time because of the inner-product computation between two instances. To speed up the training, inner-product values can be pre-computed and stored in a two-dimensional array. However when the no. of instances is very large, say over 100,000, there will not be sufficient memory to do so.
So any better idea for this?
For modern implementations of support vector machines, the scaling of the training algorithm is dependent on lots of factors, such as the nature of the training data and kernel that you are using. The scaling factor of O(n^3) is an analytical result and isn't particularly useful in predicting how SVM training will scale in real-world situations. For example, empirical estimates of the training algorithm used by SVMLight put the scaling against training set size to be approximately O(n^2).
I would suggest you ask this question in the kernel machines forum. I think you're more likely to get a better answer than on Stack Overflow, which is more of a general-purpose programming site.
The Relevance Vector Machine has a sequential training mode in which you do not need to keep the entire kernel matrix in memory. You can basically calculate a column at a time, determine if it appears relevant, and throw it away otherwise. I have not had much luck with it myself, though, and the RVM has some other issues. There is most likely a better solution in the realm of Gaussian Processes. I haven't really sat down much with those, but I have seen mention of an online algorithm for it.
I am not a numerical analyst, but isn't the QR decomposition which you need to do ordinary least-squares linear regression also O(n^3)?
Anyways, you'll probably want to search the literature (since this is fairly new stuff) for online learning or active learning versions of the algorithm you're using. The general idea is to either discard data far from your decision boundary or to not include them in the first place. The danger is that you might get locked into a bad local maximum and then your online/active algorithm will ignore data that would help you get out.