How can I use TF distributed to train many estimators in a distributed environment? - distributed

I'm working on an optimization problem using genetic algorithms.
Let's say I have a population of 4 members.Every member generates 10 children.
The member or worker is an estimator object. I'd like to assign every 10 estimators to do some training steps on a GPU in other machine.
Considering the population_size is 4 and I have 4 GPUs, I'd like to train every 10 estimators asynchronously with a GPU from another machine.
Any ideas of how we can achieve that ?

Related

How to split the data into training and test sets?

One approach to split the data into two disjoint sets, one for training and one for tests is taking the first 80% as the training set and the rest as the test set. Is there another approach to split the data into training and test sets?
** For example, I have a data contains 20 attributes and 5000 objects. Therefore, I will take 12 attributes and 1000 objects as my training data and 3 attributes from the 12 attributes as test set. Is this method correct?
No, that's invalid. You would always use all features in all data sets. You split by "objects" (examples).
It's not clear why you are taking just 1000 objects and trying to extract a training set from that. What happened to the other 4000 you threw away?
Train on 4000 objects / 20 features. Cross-validate on 500 objects / 20 features. Evaluate performance on the remaining 500 objects/ 20 features.
If your training produces a classifier based on 12 features, it could be (very) hard to evaluate its performances on a test set based only on a subset of these features (your classifier is expecting 12 inputs and you'll give only 3).
Feature/attribute selection/extraction is important if your data contains many redundant or irrelevant features. So you could identify and use only the most informative features (maybe 12 features) but your training/validation/test sets should be based on the same number of features (e.g. since you're mentioning weka Why do I get the error message 'training and test set are not compatible'?).
Remaining on a training/validation/test split (holdout method), a problem you can face is that the samples might not be representative.
For example, some classes might be represented with very few instance or even with no instances at all.
A possible improvement is stratification: sampling for training and testing within classes. This ensures that each class is represented with approximately equal proportions in both subsets.
However, by partitioning the available data into fixed training/test set, you drastically reduce the number of samples which can be used for learning the model. An alternative is cross validation.

Viola - jones adaboost method

I have read the paper about Viola-Jones method for object detection and am confused by a few things.
1 - For Adaboost does each round mean that we calculate all the 160k features across all images then find the one with the least error (which as i understand is a 'weak classifier? please correct me if I am wrong). If yes then won't this take extremely long to train on a large set of images which can take months ? And also how many rounds will you run it for if this is correct.
2- If the above point is wrong then does it mean that for each feature, we evaluate all the non-face and face images with one feature and compare it to a certain acceptable threshold of error, and if it lies below this acceptable threshold then we take this feature as a weak classifier and then update the weights before using the next feature from the 160k features.
I have tried understanding the MATLAB code in this link http://www.ece301.com/ml-doc/54-face-detect-matlab-1.html but not sure if the way he implemented adaboost was correct.
It would also be a great help if there was a link that would explain adaboost used by Viola-Jones in a simple and clear way.
AdaBoost tries out multiple weak classifier over several rounds , Selecting the best weak classifier in each round and combining the best classifier to create a strong classifier.
Example for adaboost :
Data point Classifier 1 Classifier 2 Classifier 3
x1 Fail pass fail
x2 Pass fail pass
x3 fail pass pass
x4 pass fail pass
AdaBoost can use classifier that are consistently wrong by reversing their decision .
1) Yes. If your parameters are, say, 5000 positive windows and 5000 negative ones (extracted from the set of negative background images you provided initially), then at each stage every feature is evaluated in turn for all 10000 windows and the best one is added to the feature set.
And yes in the original paper of Viola-Jones each feature is a weak classifier.
The process of checking all features will take a long time, but not weeks. As a matter of fact the bottle neck is the gathering of negative widows over the last stages than could require days or also weeks The number of rounds depends on the reaching of the given conditions. Each stage requires a number of round (usually more in final stages). So for example: stage 1 requires 3 features (3 rounds), stage 2 requires 5 features (5 rounds), stage 3 requires 6 features (6 rounds), etc.
2) The above point is true so it doesn’t apply.

Getting win percentages for Texas hold'em poker without monte carlo/exhaustive enumeration

Sorry I'm just starting this project and don't have any ideas or code, I'm asking more of a theoretical question than a programming one.
It seems that every google search provides the same responses and it's very hard to find an answer to this question:
Is there a way to calculate win percentages for texas holdem poker (the same way they do on poker after dark or other televised poker events) without using the monte carlo/exhaustive enumeration methods. Assuming all cards are face up and we know every card in the deck.
Every response on other forums just seems to be "use pokerstove" or something similar, I'm looking for the theory to write the code.
Thanks.
Is there a way to calculate win percentages for texas holdem poker
(the same way they do on poker after dark or other televised poker
events) without using the monte carlo/exhaustive enumeration methods.
In specific instances it is possible...
You can use a perfect lookup table preflop for two players heads-up preflop matchups: note that the "typical" 169 vs 169 approximation ain't good enough (say Jh Th vs 9h 8h ain't really "JTs vs 98s": I mean, that would quite a gross approximation).
Besides that if you have a lot of memory and if you can live with gigantic cache misses, you technically could precompute gigantic lookup tables (say on the server side) and do lookups for other cases (e.g. for every possible three players all-in matchups preflop), but you'd really need a lot of memory : )
Note that "full enumeration" at flop and turn ain't an issue: at flop there's only 2 more cards to come, so there are typically only C(45,2) [two players all-in at flop, we know 2*2 holecards + 3 community cards -- hence leaving 990 possibilities] or C(43,2) [three players all-in at at flop, we know 3*2 holecards + 3 community cards].
So an actual evaluator would not use one but several methods. For example:
lookup table for two players all-in preflop (the fastest)
full enum for any number of players all-in at flop or turn because it's tiny (max 990 possibilities) -- very fast
monte-carlo or full enum for three players or more all-in preflop -- incredibly slower
It is interesting to see here that in the most typical cases you'll get the result very, very fast: most actual all-ins involve two players, not three or more.
So you're either looking up in a "1 vs 1 preflop" lookup table or doing full C(45,2) or C(46,1) full enum (which are, in both case, amazingly fast).
It's really only the "three players or more all-in preflop" case which do take time.
The answer is no.
There is no closed form computation that you can do to compute poker equities. Using combinatorics, you can identify and solve many subproblems, which speeds up computation.
For example, if you are considering all five card hands, there are 52 choose 5 = 2,598,960 different hands. But knowing that suits are equivalent and using combinatorial methods (either analytic or computational), you can reduce the space of all hands to 134,459 classes each weighted according to the number of different hands in each equivalent class.
There are also various ways of using exhaustive evaluations tailored to your application. If you need to perform some subset of evaluations repeatedly, you can use caches or precomputed lookup tables targeted to your specific needs.

Backpropagation overall error chart with very small slope... Is this normal?

I'm training a neural network with backpropagation algorithm and this is the chart of Overall Errors:
( I'm calculating Overall error by this formula : http://www.colinfahey.com/neural_network_with_back_propagation_learning/neural_network_with_back_propagation_learning_en.html Part 6.3 : Overall training error)
I have used Power Trendline and after calculations, I saw that if epoches = 13000 => overall error = 0.2
Isn't this too high?
Is this chart normal? Seems that the training process will take too long... Right? What should I do? Isn't there any faster way?
EDIT : My neural network has a hidden layer with 200 neurons. and my input and output layers have 10-12 neurons. My problem is clustering characters. (it clusters Persian characters into some clusters with supervised training)
So you are using a ANN with 200 input nodes with 10-12 hidden nodes in the hidden layer, what activation function are you using if any for your hidden layer and output layer?
Is this a standard back propagation training algorithm and what training function are you using?
Each type of training function will affect the speed of training and in some cases its ability to generalise, you don't want to train against your data such that your neural network is only good for your training data.
So ideally you want decent training data that could be a sub sample of your real data say 15%.
You could training your data using conjugate gradient based algorithm:
http://www.mathworks.co.uk/help/toolbox/nnet/ug/bss331l-1.html#bss331l-2
this will train your network quickly.
10-12 nodes may not be ideal for your data, you can try changing the number in blocks of 5 or add another layer, in general more layers will improve the ability of your network to classify your problem but will increase the computational complexity and hence slow down the training.
Presumably these 10-12 nodes are 'features' you are trying to classify?
If so, you may wish to normalise them, so rescale each to between 0 and 1 or -1 to 1 depending on your activation function (e.g. tan sigmoidal will produce values in range -1 to +1):
http://www.heatonresearch.com/node/706
You may also train a neural network to identify the ideal number of nodes you should have in your hidden layer.

A good "hello world" program for drools planner

I am trying to implement Drools Planner for allocating timetables. At the moment, my proficiency in Java and JavaBean design pattern is low and I need something simple to practice on.
Is there an AI optimization problem that
known to be solved very well with 'X' algorithm
the data model lends itself to be expressed in JavaBean design pattern in a simple manner
uses fewest number of extra features (like planning entity difficulty)
Such a problem would be good to cut my teeth on Drools Planner.
I am trying N-Queens problem right now which seems the simplest of these. So I am looking for something of this league.
Update: See CloudBalancingHelloWorld.java in optaplanner-examples (Drools Planner is renamed to OptaPlanner).
You could also try implementing the ITC2007 curriculum course scheduling yourself and then compare it with the source code of the example in Drools Planner.
If you want to keep it simple but get decent results too, follow this recipe and go for First Fit followed by Tabu Search.
Another good idea, is to join the ITC2011 scheduling competition: it's still open till 1-MAY-2012 and very similar to the curriculum course scheduling example.
I am trying 2X2 Sudoku (generating and solving) as something simple. You can model it on Nqueens code. While 2x2 sudokus are solved easily, 3x3 sudokus may get stuck. So you can implement swap moves.
Another interesting problem would be bucket sums. Given 10 buckets, each able to contain 5 numbers each, and 50 numbers; make a program to allocate the numbers so that the sum of numbers in each bucket are more or less even.
Bucket Bucket0 3 6 19 16 11 =55
Bucket Bucket1 8 2 5 25 15 =55
...
Bucket Bucket7 3 25 4 16 8 =56
Bucket Bucket8 12 20 12 9 2 =55
Bucket Bucket9 4 9 11 12 20 =56
This has practical implications, such as evenly distributing tasks of varying toughness throughout the week.
A collection of some problems: http://eclipseclp.org/examples/index.html

Resources