I've created a hill climbing algorithm which randomly generates a solution then copies that solution and mutates it a little to see if it ends up with a better solution. If it does it keeps the new solution and discards the old one.
If I want to add simulated annealing to this algorithm could I just start off with a higher mutation rate and decrease the mutation rate a little each time a new solution is created?
I assume then the mutation rate would act as the simulated annealing algorithm's temperature, is that correct?
The mutation rate would act as a temperature for the annealing but by just chosing the better solution everytime wont be a perfect Simulated Annealing.
You need to select the better choice depending on how much better it is and the mutation rate. (i.e. the deltaE and Temperature) so that you allow Simulated Annealing to get out of local optima. If you keep taking the best choice you might get stuck in a local optimum.
Related
I need to create a load test for a certain number of requests in a given time. I could successfully setup Precise Throughput Timer and I believe I understand how it works. What I don't understand is how other timers, specifically Gaussian Random Timer would affect it.
I have run my test plan with and without Gaussian Random Timer but I don't see that much of difference in the results. I'm wondering whether adding Gaussian Random Timer would help me to better simulate my users behavior?
I would say that these timers are mutually exclusive
Precise Throughput Timer allows you to reach and maintain the desired throughput (number of requests per given amount of time)
Gaussian Random Timer - allows you to simulate "think time"
If your goal is to mimic real users behavior as close as possible - go for the Gaussian Random Timer because real users don't hammer the application under test non-stop, they need some time to "think" between operations, i.e. locate the button and move the mouse pointer there, read something, type something, etc. So if your test assumes simulating real users using real browsers - go for Gaussian Random Timer and put realistic think times between operations. If you need your test to produce certain amount of hits per second - just increase the number of threads (virtual users) accordingly. Check out What is the Relationship Between Users and Hits Per Second? for comprehensive explanation if needed.
On the other hand Precise Thorughput Timer is handy when there are no "real users", for example you're testing an API or a database or a message queue and need to send a specific number of requests per second.
I'm trying to implement the MCTS algorithm on a game. I can only use around 0.33 seconds per move. In this time I can generate one or two games per child from the start state, which contains around 500 child nodes. My simulations aren't random, but of course I can't make a right choice based on 1 or 2 simulations. Further in the game the tree becomes smaller and I can my choices are based on more simulations.
So my problem is in the first few moves. Is there a way to improve the MCTS algorithm so it can simulate more games or should I use another algorithm?
Is it possible to come up with some heuristic evaluation function for states? I realise that one of the primary benefits of MCTS is that in theory you wouldn't need this, BUT: if you can create a somewhat reasonable evaluation function anyway, this will allow you to stop simulations early, before they reach a terminal game state. Then you can back-up the evaluation of such a non-terminal game state instead of just a win or a loss. If you stop your simulations early like this, you may be able to run more simulations (because every individual simulation takes less time).
Apart from that, you'll want to try to find ways to ''generalize''. If you run one simulation, you should try to see if you can also extract some useful information from that simulation for other nodes in the tree which you didn't go through. Examples of enhancements you may want to consider in this spirit are AMAF, RAVE, Progressive History, N-Gram Selection Technique.
Do you happen to know where the bottleneck is for your performance? You could investigate this using a profiler. If most of your processing time is spent in functions related to the game (move generation, advancing from one state to the next, etc.), you know for sure that you're going to be limited in the number of simulations you can do. You should then try to implement enhancements that make each individual simulation as informative as possible. This can for example mean using really good, computationally expensive evaluation functions. If the game code itself already is very well optimized and fast, moving extra computation time into things like evaluation functions will be more harmful to your simulation count and probably pay off less.
For more on this last idea, it may be interesting to have a look through some stuff I wrote on my MCTS-based agent in General Video Game AI, which is also a real-time environment with a very computationally expensive game, meaning that simulations counts are severely constrained (but the branching factor is much much smaller than it seems to be in your case). Pdf files of my publications on this are also available online.
I have a problem where I am going to have a bunch of nbodies - the movements of each is predescribed by existing data, however when a body is in the range of another one certain properties about it change. For the sake of this question we'll just assume you have a counter per body that counts the time you were around other bodies. So basically you start with t = 0, you spend 5 seconds around body 2, so your t is now 5. I am wondering what's the best way I should go about this, I don't have the data yet, but I was just wondering if it's appropriate for me to explore something like CUDA/OpenCL or should I stick with optimizing this across a multi-core cpu machine. Because the amount of data that this will be simulated across is about 500 bodies, which each have movements described down to the second over a 30 day period, so that's 43200 points of data per body.
Brute force nbody is definitely suited to GPUs, because it is "embarrassingly parallel". Each body-to-body interaction computation is completely independent of any other. Your variation that includes keeping track of time spent in the "presence" of other bodies would be a straightforward addition to the existing body-to-body force computation, since everything is done on a timestep basis anyway.
Here's some sample CUDA code for nbody.
So, for larger networks with large data sets, they take a while to train. It would be awesome if there was a way to share the computing time across multiple machines. However, the issue with that is that when a neural network is training, the weights are constantly being altered every iteration, and each iteration is more or less based on the last -- which makes the idea of distributed computing at the very least a challenge.
I've thought that for each portion of the network, the server could send maybe a 1000 sets of data to train a network on... but... you'd have roughly the same computing time as I wouldn't be able to train on different sets of data simultaneously (which is what I want to do).
But even if I could split up the network's training into blocks of different data sets to train on, how would I know when I'm done with that set of data? especially if the amount of data sent to the client machine isn't enough to achieve the desired error?
I welcome all ideas.
Quoting http://en.wikipedia.org/wiki/Backpropagation#Multithreaded_Backpropagation:
When multicore computers are used multithreaded techniques can greatly decrease the amount of time that backpropagation takes to converge. If batching is being used, it is relatively simple to adapt the backpropagation algorithm to operate in a multithreaded manner.
The training data is broken up into equally large batches for each of the threads. Each thread executes the forward and backward propagations. The weight and threshold deltas are summed for each of the threads. At the end of each iteration all threads must pause briefly for the weight and threshold deltas to be summed and applied to the neural network.
which is essentially what other answers here describe.
Depending on your ANN model you can exploit some parallelism on multiple machines by running the same model with the same training and validation data on multiple machines but set different ANN properies; initial values, ANN parameters, noise etc, for different runs.
I used to do this a lot to make sure I'd explored the problem space effectively and wasn't stuck in local minima etc. This is a very easy way to take advantage of multiple machines without having to recode your algorith. Just another approach you might want to consider.
My assumption is you have more than 1 training set, and you have a gold standard. Also, I assume you have some way of storing the state of the neural network (whether it's a list of probability weights for each node, or something along those lines).
Using as many compute nodes in a cluster as you can, launch the program on a data set on each node. Save the results for each, and test on the gold standard. Which ever neural network state performs best set as the input for the next round of training. Repeat as much as you see fit
If I understand correctly, you're trying to figure out a way to train an ANN on a cluster of machines? As you stated, partitioning the network isn't the right approach, and as far as I know, is seemingly unfeasible for most models. A possible approach might be to partition the training sets and run local copies of your network, and then merge the results. An intuitive way to do this and gain some validation along the way would be with cross-validation. As you stated, knowing when the network has had the right amount of training is a problem, but that variability is a problem inherent to neural nets in general, not in parallelizing the work.
As you also stated, the updates that happen during each iteration of training are dependent on the current state of the weights, but without mixing up training sets/validation, you're likely overfitting. This is why CV is nice, because your training sets will all get a chance to play a role in the training, and the validating, across multiple samples.
If you do batch training, the weight are only altered after you have been through the entire dataset. You can compute the weight update vector for each data point in the set on a separate machine/core and add them up at the end, then proceed with the next epoch.
Here is a link to a question about batch training.
I'm using a feed-foward neural network in python using the pybrain implementation. For the training, i'll be using the back-propagation algorithm. I know that with the neural-networks, we need to have just the right amount of data in order not to under/over-train the network. I could get about 1200 different templates of training data for the datasets.
So here's the question:
How do I calculate the optimal amount of data for my training? Since I've tried with 500 items in the dataset and it took many hours to converge, I would prefer not to have to try too much sizes. The results we're quite good with this last size but I would like to find the optimal amount. The neural network has about 7 inputs, 3 hidden nodes and one output.
How do I calculate the optimal amount
of data for my training?
It's completely solution-dependent. There's also a bit of art with the science. The only way to know if you're into overfitting territory is to be regularly testing your network against a set of validation data (that is data you do not train with). When performance on that set of data begins to drop, you've probably trained too far -- roll back to the last iteration.
The results were quite good with this
last size but I would like to find the
optimal amount.
"Optimal" isn't necessarily possible; it also depends on your definition. What you're generally looking for is a high degree of confidence that a given set of weights will perform "well" on unseen data. That's the idea behind a validation set.
The diversity of the dataset is much more important than the quantity of samples you are feeding to the network.
You should customize your dataset to include and reinforce the data you want the network to learn.
After you have crafted this custom dataset you have to start playing with the amount of samples, as it is completely dependant on your problem.
For example: If you are building a neural network to detect the peaks of a particular signal, it would be completely useless to train your network with a zillion samples of signals that do not have peaks. There lies the importance of customizing your training dataset no matter how many samples you have.
Technically speaking, in the general case, and assuming all examples are correct, then more examples are always better. The question really is, what is the marginal improvement (first derivative of answer quality)?
You can test this by training it with 10 examples, checking quality (say 95%), then 20, and so on, to get a table like:
10 95%
20 96%
30 96.5%
40 96.55%
50 96.56%
you can then clearly see your marginal gains, and make your decision accordingly.