Trigonometric functions on embedded system - c

sin and cos functions are slow and need a lot of resources to run on embedded systems. How does one calculate sin and cos functions in a more resource-saving and faster way?

To calculate a Taylor or Fourier series is always going to be time-consuming.
In an embedded system, you should think about lookup tables.
There might also be interesting information on the 'Net about how Hewlett-Packard optimised such calculations in their early scientific calculators.
I recall seeing such stuff at the time

A lookup table with interpolation would without doubt be the most efficient solution. If you want to use less memory however, CORDIC is a pretty efficient algorithm for calculating values of trig functions, and is commonly implemented in handheld calculators.
As a side point, it doesn't make any sense to represent these functions using fourier series, since you're just creating a circular problem of how you then evaluate the sin/cos terms of series. A Taylor series is a well-known approximation method, but the error turns out to be unacceptably large in many cases.
You may also want to check out this question and its answers, regarding fast trigonometric functions for Java (thus the code could be ported easily). It mentions both the CORDIC and Chebyshev approximations, among others. One of them will undoubtedly suit your needs.

Depends on what you need it for. If you are not very fussed about your angle accuracy (e.g. if to the nearest degree is OK) then just use a lookup table of values. If you don't have an FPU, work in fixed-point.
One simple way to calculate sin/cos functions is with Taylor series (as shown under Trigonometric Functions here). The fewer terms you use, the less accurate the values but the faster the calculations.
Fourier series calculations require some sin/cos values to be known. If you store things in the frequency domain most of the time, though, you can potentially save on calculations - depending on what it is you are doing.

This Dr. Dobb's article: Optimizing Math-Intensive Applications with Fixed-Point Arithmetic has a good explanation of CORDIC algorithms and provides complete source code for the library discussed in the article.

See the Stack Overflow question How do Trigonometric functions work? The accepted answer there explains some details of how to do range reduction, then use CORDIC, then do some further optimizations.

Lookup-tables
Taylor series, like you say
Note that with lookup-tables, you can often optimize things by limiting the domain, e.g. represent the angle as an unsigned char, giving you only 256 steps around the circle but also a very compact table. Similar things can be done to the value, like using fixed-point.

There seems to be an nice pseudocode example here and explicit code here.
However, as #unwind suggested, you might want to try to precalculate these tables on a decent computer and load the tables to the embedded device.
If your answer doesn't have to be very exact, the lookup table would be rather small, and you'll be able to store it in your device's memory. If you need higher accuracy, you'll need to calculate it within the device. It's a tradeoff between memory, time and required precision; the answer relies on the specific nature of your project.

In some cases one can manage with just IIR filter, tuned to resonance on needed frequency.
Look here: http://www.ee.ic.ac.uk/pcheung/teaching/ee3_Study_Project/Sinewave%20Generation(708).pdf

This may be of some help / inspiration:
Magical square root in Quake III

I'm a bit late to the party, but anyway I want to share a ready-made efficient solution that uses lookup table (table generator included) : DFTrig.
DFTrig consists of two parts:
Lookup table generator tablegen (written in Java, but that doesn't matter much) that receives several options and produces C code (const struct with lookup table)
Small C module that works with lookup table generated by tablegen.
Of course, lookup table contains only minimal information: sine values for just a single quadrant, i.e. [0, 90] degrees. That is fairly enough to calculate sine / cosine for any angle.
The behavior is quite customizable. You may specify:
Factor by which each item in the lookup table is multiplied (on per-table basis);
Step in degrees between each item in the table (on per-table basis);
Type of items in the table (common for the whole C project).
So, depending on your needs, you may:
Generate single table for the whole application with max factor, so that any subsystem of your C project may use that single table, providing desired factor, and it will be recalculated if requested factor is other than that of the table;
Generate multiple tables, each with ad hoc factor, and each subsystem of your C project uses its dedicated table. Then, values can be returned from table as is, without recalculation; that works faster.
I use it in my embedded projects, it works nicely.

You can take a look at this arbitrary fixed point library for 8-bit AVR microcontrollers:
https://community.atmel.com/projects/afp-arbitrary-fixed-point-lib
EDIT: link updated

Related

Comparing the two feature sets

I am working on a classification of two feature sets derived from a dataset. We first obtain two feature matrices derived from two feature extraction methods. Now, I need to compare them. However, the recognition accuracy for two feature sets, reaches almost the same recognition accuracy (using 10-fold cross validation by SVM). My question is:
Is there a way to design a meaningful experiment to show the difference between the two methods? What are your suggestions?
Note: I already saw the similar questions in stackoverflow, however, I am looking for another approach.
You can
Perform a dimensionality reduction on the features in their respective spaces. This will allow you to see differences in the distribution of data points. Kudos if you apply a kernel, being the one used by the SVM (linear kernel otherwise).
Do distribution testing on the features to see if they differ much.
Augment the predictions into an output space, and see distance between the vectors.

SVM and Neural Network

What is difference between SVM and Neural Network?
Is it true that linear svm is same NN, and for non-linear separable problems, NN uses adding hidden layers and SVM uses changing space dimensions?
There are two parts to this question. The first part is "what is the form of function learned by these methods?" For NN and SVM this is typically the same. For example, a single hidden layer neural network uses exactly the same form of model as an SVM. That is:
Given an input vector x, the output is:
output(x) = sum_over_all_i weight_i * nonlinear_function_i(x)
Generally the nonlinear functions will also have some parameters. So these methods need to learn how many nonlinear functions should be used, what their parameters are, and what the value of all the weight_i weights should be.
Therefore, the difference between a SVM and a NN is in how they decide what these parameters should be set to. Usually when someone says they are using a neural network they mean they are trying to find the parameters which minimize the mean squared prediction error with respect to a set of training examples. They will also almost always be using the stochastic gradient descent optimization algorithm to do this. SVM's on the other hand try to minimize both training error and some measure of "hypothesis complexity". So they will find a set of parameters that fits the data but also is "simple" in some sense. You can think of it like Occam's razor for machine learning. The most common optimization algorithm used with SVMs is sequential minimal optimization.
Another big difference between the two methods is that stochastic gradient descent isn't guaranteed to find the optimal set of parameters when used the way NN implementations employ it. However, any decent SVM implementation is going to find the optimal set of parameters. People like to say that neural networks get stuck in a local minima while SVMs don't.
NNs are heuristic, while SVMs are theoretically founded. A SVM is guaranteed to converge towards the best solution in the PAC (probably approximately correct) sense. For example, for two linearly separable classes SVM will draw the separating hyperplane directly halfway between the nearest points of the two classes (these become support vectors). A neural network would draw any line which separates the samples, which is correct for the training set, but might not have the best generalization properties.
So no, even for linearly separable problems NNs and SVMs are not same.
In case of linearly non-separable classes, both SVMs and NNs apply non-linear projection into higher-dimensional space. In the case of NNs this is achieved by introducing additional neurons in the hidden layer(s). For SVMs, a kernel function is used to the same effect. A neat property of the kernel function is that the computational complexity doesn't rise with the number of dimensions, while for NNs it obviously rises with the number of neurons.
Running a simple out-of-the-box comparison between support vector machines and neural networks (WITHOUT any parameter-selection) on several popular regression and classification datasets demonstrates the practical differences: an SVM becomes a very slow predictor if many support vectors are being created while a neural network's prediction speed is much higher and model-size much smaller. On the other hand, the training time is much shorter for SVMs. Concerning the accuracy/loss - despite the aforementioned theoretical drawbacks of neural networks - both methods are on par - especially for regression problems, neural networks often outperform support vector machines. Depending on your specific problem, this might help to choose the right model.
Both Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) are supervised machine learning classifiers. An ANN is a parametric classifier that uses hyper-parameters tuning during the training phase. An SVM is a non-parametric classifier that finds a linear vector (if a linear kernel is used) to separate classes. Actually, in terms of the model performance, SVMs are sometimes equivalent to a shallow neural network architecture. Generally, an ANN will outperform an SVM when there is a large number of training instances, however, neither outperforms the other over the full range of problems.
We can summarize the advantages of the ANN over the SVM as follows:
ANNs can handle multi-class problems by producing probabilities for each class. In contrast, SVMs handle these problems using independent one-versus-all classifiers where each produces a single binary output. For example, a single ANN can be trained to solve the hand-written digits problem while 10 SVMs (one for each digit) are required.
Another advantage of ANNs, from the perspective of model size, is that the model is fixed in terms of its inputs nodes, hidden layers, and output nodes; in an SVM, however, the number of support vector lines could reach the number of instances in the worst case.
The SVM does not perform well when the number of features is greater than the number of samples. More work in feature engineering is required for an SVM than that needed for a multi-layer Neural Network.
On the other hand, SVMs are better than ANNs in certain respects:
In comparison to SVMs, ANNs are more prone to becoming trapped in local minima, meaning that they sometime miss the global picture.
While most machine learning algorithms can overfit if they don’t have enough training samples, ANNs can also overfit if training goes on for too long - a problem that SVMs do not have.
SVM models are easier to understand. There are different kernels that provide a different level of flexibilities beyond the classical linear kernel, such as the Radial Basis Function kernel (RBF). Unlike the linear kernel, the RBF can handle the case when the relation between class labels and attributes is nonlinear.
SVMs and NNs have the same building block as perceptrons, but SVMs also uses a kernel trick to raise dimension from say 2 to 3d by translation such as Y = (x1,2,..^2, y1,2...^2) which can separate linearly inseparable plains using a straight line. Want a demo like it and ask me :)
Actually, they are exactly equivalent to each other. The only difference is in their standard implementations with selections of activation function and regularization etc, which obviously differ from each other. Also, I have yet not seen a dual formulation for neural networks, but SVMs are moving toward the primal anyway.
Practically, most of your assumption are often quite true. I'll elaborate: for linear separable classes Linear SVM works quite good and and it's much faster to train. For non linear classes there is the kernel trick, which is sending your data to a higher dimension space. This trick however has two disadvantages compared to NN. First - your have to search for the right parameters , because the classifier will only work if in the higher dimension the two sets will be linearly separable. Now - testing parameters is often done by grid search which is CPU-time consuming. The other problem is that this whole technique is not as general as NN (for example, for NLP if often results in poor classifier).

How do I pick a good representation for a board game tactic for a genetic algorithm?

For my bachelor's thesis I want to write a genetic algorithm that learns to play the game of Stratego (if you don't know this game, it's probably safe to assume I said chess). I haven't ever before done actual AI projects, so it's an eye-opener to see how little I actually know of implementing things.
The thing I'm stuck with is coming up with a good representation for an actual strategy. I'm probably making some thinking error, but some problems I encounter:
I don't assume you would have a representation containing a lot of
transitions between board positions, since that would just be
bruteforcing it, right?
What could branches of a decision tree look
like? Any representation I come up with don't have interchangeable
branches... If I were to use a bit string, which is apparently also
common, what would the bits represent?
Do I assign scores to the distance between certain pieces? How would I represent that?
I think I ought to know these things after three+ years of study, so I feel pretty stupid - this must look likeI have no clue at all. Still, any help or tips on what to Google would be appreciated!
I think, you could define a decision model and then try to optimize the parameters of that model. You can create multi-stage decision models also. I once did something similar for solving a dynamic dial-a-ride problem (paper here) by modeling it as a two stage linear decision problem. To give you an example, you could:
For each of your figures decide which one is to move next. Each figure is characterized by certain features derived from its position on the board, e.g. ability to make a score, danger, protecting x other figures, and so on. Each of these features can be combined (e.g. in a linear model, through a neural network, through a symbolic expression tree, a decision tree, ...) and give you a rank on which figure to act next with.
Acting with the figure you selected. Again there are a certain number of actions that can be taken, each has certain features. Again you can combine and rank them and one action will have the highest priority. This is the one you choose to perform.
The features you extract can be very simple or insanely complex, it's up to what you think will work best vs what takes how long to compute.
To evaluate and improve the quality of your decision model you can then simulate these decisions in several games against opponents and train the parameters of the model that combines these features to rank the moves (e.g. using a GA). This way you tune the model to win as many games as possible against the specified opponents. You can test the generality of that model by playing against opponents it has not seen before.
As Mathew Hall just said, you can use GP for this (if your model is a complex rule), but this is just one kind of model. In my case a linear combination of the weights did very well.
Btw, if you're interested we've also got a software on heuristic optimization which provides you with GA, GP and that stuff. It's called HeuristicLab. It's GPL and open source, but comes with a GUI (Windows). We've some Howto on how to evaluate the fitness function in an external program (data exchange using protocol buffers), so you can work on your simulation and your decision model and let the algorithms present in HeuristicLab optimize your parameters.
Vincent,
First, don't feel stupid. You've been (I infer) studying basic computer science for three years; now you're applying those basic techniques to something pretty specialized-- a particular application (Stratego) in a narrow field (artificial intelligence.)
Second, make sure your advisor fully understands the rules of Stratego. Stratego is played on a larger board, with more pieces (and more types of pieces) than chess. This gives it a vastly larger space of legal positions, and a vastly larger space of legal moves. It is also a game of hidden information, increasing the difficulty yet again. Your advisor may want to limit the scope of the project, e.g., concentrate on a variant with full observation. I don't know why you think this is simpler, except that the moves of the pieces are a little simpler.
Third, I think the right thing to do at first is to take a look at how games in general are handled in the field of AI. Russell and Norvig, chapters 3 (for general background) and 5 (for two player games) are pretty accessible and well-written. You'll see two basic ideas: One, that you're basically performing a huge search in a tree looking for a win, and two, that for any non-trivial game, the trees are too large, so you search to a certain depth and then cop out with a "board evaluation function" and look for one of those. I think your third bullet point is in this vein.
The board evaluation function is the magic, and probably a good candidate for using either a genetic algorithm, or a genetic program, either of which might be used in conjunction with a neural network. The basic idea is that you are trying to design (or evolve, actually) a function that takes as input a board position, and outputs a single number. Large numbers correspond to strong positions, and small numbers to weak positions. There is a famous paper by Chellapilla and Fogel showing how to do this for a game of Checkers:
http://library.natural-selection.com/Library/1999/Evolving_NN_Checkers.pdf
I think that's a great paper, tying three great strands of AI together: Adversarial search, genetic algorithms, and neural networks. It should give you some inspiration about how to represent your board, how to think about board evaluations, etc.
Be warned, though, that what you're trying to do is substantially more complex than Chellapilla and Fogel's work. That's okay-- it's 13 years later, after all, and you'll be at this for a while. You're still going to have a problem representing the board, because the AI player has imperfect knowledge of its opponent's state; initially, nothing is known but positions, but eventually as pieces are eliminated in conflict, one can start using First Order Logic or related techniques to start narrowing down individual pieces, and possibly even probabilistic methods to infer information about the whole set. (Some of these may be beyond the scope of an undergrad project.)
The fact you are having problems coming up with a representation for an actual strategy is not that surprising. In fact I would argue that it is the most challenging part of what you are attempting. Unfortunately, I haven't heard of Stratego so being a bit lazy I am going to assume you said chess.
The trouble is that a chess strategy is rather a complex thing. You suggest in your answer containing lots of transitions between board positions in the GA, but a chess board has more possible positions than the number of atoms in the universe this is clearly not going to work very well. What you will likely need to do is encode in the GA a series of weights/parameters that are attached to something that takes in the board position and fires out a move, I believe this is what you are hinting at in your second suggestion.
Probably the simplest suggestion would be to use some sort of generic function approximation like a neural network; Perceptrons or Radial Basis Functions are two possibilities. You can encode weights for the various nodes into the GA, although there are other fairly sound ways to train a neural network, see Backpropagation. You could perhaps encode the network structure instead/as well, this also has the advantage that I am pretty sure a fair amount of research has been done into developing neural networks with a genetic algorithm so you wouldn't be starting completely from scratch.
You still need to come up with how you are going to present the board to the neural network and interpret the result from it. Especially, with chess you would have to take note that a lot of moves will be illegal. It would be very beneficial if you could encode the board and interpret the result such that only legal moves are presented. I would suggest implementing the mechanics of the system and then playing around with different board representations to see what gives good results. A few ideas top of the head ideas to get you started could be, although I am not really convinced any of them are especially great ways to do this:
A bit string with all 64 squares one after another with a number presenting what is present in each square. Most obvious, but probably a rather bad representation as a lot of work will be required to filter out illegal moves.
A bit string with all 64 squares one after another with a number presenting what can move to each square. This has the advantage of embodying the covering concept of chess where you what to gain as much coverage of the board with your pieces as possible, but still has problems with illegal moves and dealing with friendly/enemy pieces.
A bit string with all 32 pieces one after another with a number presenting the location of that piece in each square.
In general though I would suggest that chess is rather a complex game to start with, I think it will be rather hard to get something playing to standard which is noticeably better than random. I don't know if Stratego is any simpler, but I would strongly suggest you opt for a fairly simple game. This will let you focus on getting the mechanics of the implementation correct and the representation of the game state.
Anyway hope that is of some help to you.
EDIT: As a quick addition it is worth looking into how standard chess AI's work, I believe most use some sort of Minimax system.
When you say "tactic", do you mean you want the GA to give you a general algorithm to play the game (i.e. evolve an AI) or do you want the game to use a GA to search the space of possible moves to generate a move at each turn?
If you want to do the former, then look into using Genetic programming (GP). You could try to use it to produce the best AI you can for a fixed tree size. JGAP already comes with support for GP as well. See the JGAP Robocode example for an instance of this. This approach does mean you need a domain specific language for a Stratego AI, so you'll need to think carefully how you expose the board and pieces to it.
Using GP means your fitness function can just be how well the AI does at a fixed number of pre-programmed games, but that requires a good AI player to start with (or a very patient human).
#DonAndre's answer is absolutely correct for movement. In general, problems involving state-based decisions are hard to model with GAs, requiring some form of GP (either explicit or, as #DonAndre suggested, trees that are essentially declarative programs).
A general Stratego player seems to me quite challenging, but if you have a reasonable Stratego playing program, "Setting up your Stratego board" would be an excellent GA problem. The initial positions of your pieces would be the phenotype and the outcome of the external Stratego-playing code would be the fitness. It is intuitively likely that random setups would be disadvantaged versus setups that have a few "good ideas" and that small "good ideas" could be combined into fitter-and-fitter setups.
...
On the general problem of what a decision tree, even trying to come up with a simple example, I kept finding it hard to come up with a small enough example, but maybe in the case where you are evaluation whether to attack a same-ranked piece (which, IIRC destroys both you and the other piece?):
double locationNeed = aVeryComplexDecisionTree();
if(thatRank == thisRank){
double sacrificeWillingness = SACRIFICE_GENETIC_BASE; //Assume range 0.0 - 1.0
double sacrificeNeed = anotherComplexTree(); //0.0 - 1.0
double sacrificeInContext = sacrificeNeed * SACRIFICE_NEED_GENETIC_DISCOUNT; //0.0 - 1.0
if(sacrificeInContext > sacrificeNeed){
...OK, this piece is "willing" to sacrifice itself
One way or the other, the basic idea is that you'd still have a lot of coding of Stratego-play, you'd just be seeking places where you could insert parameters that would change the outcome. Here I had the idea of a "base" disposition to sacrifice itself (presumably higher in common pieces) and a "discount" genetically-determined parameter that would weight whether the piece would "accept or reject" the need for a sacrifice.

AI Techniques for Face Detection

Can anyone all the different techniques used in face detection? Techniques like neural networks, support vector machines, eigenfaces, etc.
What others are there?
the technique I'm going to talk about is more a machine learning oriented approach; in my opinion is quite fascinating, though not very recent: it was described in the article "Robust Real-Time Face Detection" by Viola and Jones. I used the OpenCV implementation for an university project.
It is based on haar-like features, which consists in additions and subtractions of pixel intensities within rectangular regions of the image. This can be done very fast using a procedure called integral image, for which also GPGPU implementations exist (sometimes are called "prefix scan"). After computing integral image in linear time, any haar-like feature can be evaluated in constant time. A feature is basically a function that takes a 24x24 sub-window of the image S and computes a value feature(S); a triplet (feature, threshold, polarity) is called a weak classifier, because
polarity * feature(S) < polarity * threshold
holds true on certain images and false on others; a weak classifier is expected to perform just a little better than random guess (for instance, it should have an accuracy of at least 51-52%).
Polarity is either -1 or +1.
Feature space is big (~160'000 features), but finite.
Despite threshold could in principle be any number, from simple considerations on the training set it turns out that if there are N examples, only N + 1 threshold for each polarity and for each feature have to be examined in order to find the one that holds the best accuracy. The best weak classifier can thus be found by exhaustively searching the triplets space.
Basically, a strong classifier can be assembled by iteratively choosing the best possible weak classifier, using an algorithm called "adaptive boosting", or AdaBoost; at each iteration, examples which were misclassified in the previous iteration are weighed more. The strong classifier is characterized by its own global threshold, computed by AdaBoost.
Several strong classifiers are combined as stages in an attentional cascade; the idea behind the attentional cascade is that 24x24 sub-windows that are obviously not faces are discarded in the first stages; a strong classifier usually contains only a few weak classifiers (like 30 or 40), hence is very fast to compute. Each stage should have a very high recall, while false positive rate is not very important. if there are 10 stages each with 0.99 recall and 0.3 false positive rate, the final cascade will have 0.9 recall and extremely low false positive rate. For this reason, strong classifier are usually tuned in order to increase recall and false positive rate. Tuning basically involves reducing the global threshold computed by AdaBoost.
A sub-window that makes it way to the end of the cascade is considered a face.
Several sub-window in the initial image, eventually overlapping, eventually after rescaling the image, must be tested.
An emerging but rather effective approach to the broad class of vision problems, including face detection, is the use of Hierarchical Temporal Memory (HTM), a concept/technology developed by Numenta.
Very loosely speaking, this is a neuralnetwork-like approach. This type of network has a tree shape where the number of nodes decreases significantly at each level. HTM models some of the structural and algorithmic properties of the neocortex. In [possible] departure with the neocortex the classification algorithm implemented at the level of each node uses a Bayesian algorithm. HTM model is based on the memory-prediction theory of brain function and relies heavily on the the temporal nature of inputs; this may explain its ability to deal with vision problem, as these are typically temporal (or can be made so) and also require tolerance for noise and "fuzziness".
While Numemta has produced vision kits and demo applications for some time, Vitamin D recently produced -I think- the first commercial application of HTM technology at least in the domain of vision applications.
If you need it not just as theoretical stuff but you really want to do face detection then I recommend you to find already implemented solutions.
There are plenty tested libraries for different languages and they are widely used for this purpose. Look at this SO thread for more information: Face recognition library.

Measuring the performance of classification algorithm

I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account.
That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless.
An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training.
White papers, book pointers, and PDFs much appreciated!
You could use 10-fold Cross-validation for this. I believe it's pretty standard approach for classification algorithm performance evaluation.
The basic idea is to divide your learning samples into 10 subsets. Then use one subset for test data and others for train data. Repeat this for each subset and calculate average performance at the end.
As Mr. Brownstone said 10-fold Cross-Validation is probably the best way to go. I recently had to evaluate the performance of a number of different classifiers for this I used Weka. Which has an API and a load of tools that allow you to easily test the performance of lots of different classifiers.

Resources