Bayesian Network based on outcomes - artificial-intelligence

I'm pretty new here, but have a question that I would like some help with. I'm studying machine learning and specifically Bayesian Networks. The problem I am trying to solve is:
Consider a cow that might have a disease. You can detect this disease with a milk test that has a false positive rate of .05 and a false negative rate of .01. This test is done for 5 days in a row, with 5 outcomes. Given a set of outcomes, determine the state of the disease. Assume that the prior probability of an infection on day one is 0.001, and that the state of infection at a given day depends only on its state at the previous day, such that the probability that an infection would persist to the next day is .70, while the probability of a new infection is 0.002.
Your then given a set of outcomes and asked to determine some stuff based on those outcomes. I'm not quite sure how to construct the network for this problem and was wondering if anyone had some pointers.
Thanks.

The Bayesian network should have the following structure:
infectn says whether the cow is infected on day n
testn gives the result of the test on day n

As shown in the diagram, this is a so-called "hidden Markov model" (HMM). Searching for that term should turn up a lot of info.

Related

MCTS UCT with a scoring system

I'm trying to solve a variant of 2048 by a Monte-Carlo Tree Search. I found that UCT could a good way to have some trade-off between exploration/exploitation.
My only issue is that all the versions I've seen assume that the score is a win percentage. How can I adapt it to a game where the score is the value of the board at the last state, and thus going from 1-MAX and not a win.
I could normalize the score using the constant c by dividing by MAX but then it would overweight exploration at early stage of the game (since you get bad average score) and overweight exploitation at late stage of the game.
Indeed most of the literature assumes your games are either lost or won and award a score of 0 or 1, which will turn into a win ratio when averaged over the number of games played. Then exploration parameter C is usually set to sqrt(2) which is optimal for the UCB in bandit problems.
To find out what a good C is in general you have to step back a bit and see what the UCT is really doing. If one node in your tree had an exceptionally bad score in the one rollout it had then exploitation says you should never choose it again. But you've only played that node once, so it might have just been bad luck. To acknowledge this you give that node a bonus. How much? Enough to make it a viable choice even if its average score is the lowest possible and some other node has the highest average score possible. Because with enough plays it might turn out that the one rollout your bad node had was indeed a fluke, and the node actually turns out to be pretty reliable with good scores. Of course, if you get more bad scores then it will likely not be bad luck so it won't deserve more rollouts.
So with scores ranging from 0 to 1 a C of sqrt(2) is a good value. If your game has a maximum achievable score then you can normalize your scores by dividing by the max and force your scores into to 0-1 range to suit a C of sqrt(2). Alternatively you don't normalize the scores but multiply C by your maximum score. The effect is the same: the UCT exploration bonus is large enough to give your underdog nodes some rollouts and a chance to prove themselves.
There is an alternative way of setting C dynamically that has given me good results. As you play, you keep track of the highest and lowest scores you've ever seen in each node (and subtree). This is the range of scores possible and this gives you a hint of how big C should be in order to give not well explored underdog nodes a fair chance. Every time i descend into the tree and pick a new root i adjust C to be sqrt(2) * score range for the new root. In addition, as rollouts complete and their scores turn out the be a new highest or lowest score i adjust C in the same way. By continually adjusting C this way as you play but also as you pick a new root you keep C as large as it needs to be to converge but as small as it can be to converge fast. Note that the minimum score is as important as the max: if every rollout will yield at minimum a certain score then C won't need to overcome it. Only the difference between max and min matters.

Neural network inputs

I don't really know how to word this question so bear with me..
Let's say I am developing a neural network for rating each runner in an athletics race. I give the neural network information regarding the runner, eg. win%, days since last run etc. etc.
My question is - in this case where the neural network is rating runners, can I give the network an input like the race weather ? e.g. I give the network 1.00 for hot, 2.00 for cold, 3.00 for OK .. ?
The reason I am asking this question: The greater the output of the neural network, the better the runner. So, this means that the higher the win % input, the bigger the rating. If I give the neural network inputs whereby the greater the value doesn't necessarily mean the better the runner, will the network be able to understand and use/interpret this input?
Please let me know if the question doesn't make sense!
Neural nets can correctly model irrelevant inputs (by assigning them low weights) and inputs that are inversely related to the desired output (by assigning negative weights). Neural nets do better with inputs that are continuously-varying, so your example of 1.00 for hot, 2.00 for cold, 3.00 for OK .. is not ideal: better would be 0.00 for hot, 1.00 for OK, 2.00 for cool.
In situations such as your country code where there is no real continuous relationship, the best encoding (from a convergence standpoint) is to use a set of boolean attributes (isArgentina, isAustralia, ..., isZambia). Even without that, though, a neural net ought to be able to model an input of discrete values (i.e., if countries were relevant and if you encoded them as numbers, eventually a neural net should be able to converge on 87 (Kenya) is correlated with high performance). In such a situation, it might take more hidden nodes or a longer training period.
The whole point of neural nets is to use them in situations where simple statistical analysis is difficult, so I disagree with the other answer that says that you should pre-judge your data.
What a neural network does is map relations between inputs and outputs. This means that you have to have some kind of objective for your neural network. Examples of such objectives could be "to predict the winner", "to predict how fast each runner will be" or to "predict the complete results from a race". Which of those examples that are plausible for you to attempt depends, of course, on what data you have available.
If you have a large dataset (say e.g. a few hundred races for each runner) where the resulting time and all predictory variables (including weather) are recorded and you establish that there is a relationship between weather and an individual runners performance a neural network would very well be able to map such a relationship, even if it is a different relationship for each individual runner.
Example of good weather variables to record could be sun intensity (W/m2), head wind (m/s) and temperature (deg C). Then each runners performance could be modeled using these variables and then the neural network could be used to predict a runners performance (observe that this approach would require one neural network per runner).

Multiple Output Neural Network

I have built my first neural network in python, and i've been playing around with a few datasets; it's going well so far !
I have a quick question regarding modelling events with multiple outcomes: -
Say i wish to train a network to tell me the probability of each runner winning a 100m sprint. I would give the network all of the relevant data regarding each runner, and the number of outputs would be equal to the number of runners in the race.
My question is, using a sigmoid function, how can i ensure the sum of the outputs will be equal to 1.0 ? Will the network naturally learn to do this, or will i have to somehow make this happen explicitly ? If so, how would i go about doing this ?
Many Thanks.
The output from your neural network will approach 1. I don't think it will actually get to 1.
You actually don't need to see which output is equal to 1. Once you've trained your network up to a specific error level, when you present the inputs, just look for the maximum output in your output later. For example, let's say your output layer presents the following output: [0.0001, 0.00023, 0.0041, 0.99999412, 0.0012, 0.0002], then the runner that won the race is runner number 4.
So yes, your network will "learn" to produce 1, but it won't exactly be 1. This is why you train to within a certain error rate. I recently created a neural network to recognize handwritten digits, and this is the method that I used. In my output layer, I have a vector with 10 components. The first component represents 0, and the last component represents 9. So when I present a 4 to the network, I expect the output vector to look like [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]. Of course, it's not what I get exactly, but it's what I train the network to provide. So to find which digit it is, I simply check to see which component has the highest output or score.
Now in your second question, I believe you're asking how the network would learn to provide the correct answer? To do this, you need to provide your network with some training data and train it until the output is under a certain error threshold. So what you need is a set of data that contains the inputs and the correct output. Initially your neural network will be set up with random weights (there are some algorithms that help you select better weights to minimize training time, but that's a little more advanced). Next you need a way to tell the neural network to learn from the data provided. So basically you give the data to the neural network and it provides an output, which is highly likely to be wrong. Then you compare that data with the expected (correct) output and you tell the neural network to update its weights so that it gets closer to the correct answer. You do this over and over again until the error is below a certain threshold.
The easiest way to do this is to implement the stochastic backpropagation algorithm. In this algorithm, you calculate the error between the actual output of the neural network and the expected output. Then you backpropagate the error from the output layer all the way up to the weights to the hidden layer, adjusting the weights as you go. Then you repeat this process until the error that you calculate is below a certain threshold. So during each step, you're getting closer and closer towards your solution.
You can use the algorithm described here. There is a decent amount of math involved, so be prepared for that! If you want to see an example of an implementation of this algorithm, you can take a look at this Java code that I have on github. The code uses momentum and a simple form of simulated annealing as well, but the standard backpropagation algorithm should be easily discernible. The Wikipedia article on backpropagation has a link to an implementation of the backpropagation algorithm in Python.
You're probably not going to understand the algorithm immediately; expect to spend some time understanding it and working through some of the math. I sat down with a pencil and paper as I was coding, and that's how I eventually understood what was going on.
Here are a few resources that should help you understand backpropagation a little better:
The learning process: backpropagation
Error backpropagation
If you want some more resources, you can also take a look at my answer here.
Basically you want a function of multiple real numbers that converts those real numbers into probabilities (each between 0 to 1, sum to 1). You can this easily by post processing the output of your network.
Your network gives you real numbers r1, r2, ..., rn that increases as the probability of each runner wins the race.
Then compute exp(r1), exp(r2), ..., and sum them up for ers = exp(r1) + exp(r2) + ... + exp(rn). Then the probability that the first racer wins is exp(r1) / ers.
This is a one use of the Boltzman distribution. http://en.wikipedia.org/wiki/Boltzmann_distribution
Your network should work around that and learn it naturally eventually.
To make the network learn that a little faster, here's what springs to mind first:
add an additional output called 'sum' (summing all the other output neurons) -- if you want all the output neurons to be in an separate layer, just add a layer of outputs, first numRunners outputs just connect to corresponding neuron in the previous layer, and the last numRunners+1-th neuron you connect to all the neurons from the previous layer, and fix the weights to 1)
the training set would contain 0-1 vectors for each runner (did-did not run), and the "expected" result would be a 0-1 vector 00..00001000..01 first 1 marking the runner that won the race, last 1 marking the "sum" of "probabilities"
for the unknown races, the network would try to predict which runner would win. Since the outputs have contiguous values (more-or-less :D) they can be read as "the certainty of the network that the runner would win the race" -- which is what you're looking for
Even without the additional sum neuron, this is the rough description of the way the training data should be arranged.

TD(λ) in Delphi/Pascal (Temporal Difference Learning)

I have an artificial neural network which plays Tic-Tac-Toe - but it is not complete yet.
What I have yet:
the reward array "R[t]" with integer values for every timestep or move "t" (1=player A wins, 0=draw, -1=player B wins)
The input values are correctly propagated through the network.
the formula for adjusting the weights:
What is missing:
the TD learning: I still need a procedure which "backpropagates" the network's errors using the TD(λ) algorithm.
But I don't really understand this algorithm.
My approach so far ...
The trace decay parameter λ should be "0.1" as distal states should not get that much of the reward.
The learning rate is "0.5" in both layers (input and hidden).
It's a case of delayed reward: The reward remains "0" until the game ends. Then the reward becomes "1" for the first player's win, "-1" for the second player's win or "0" in case of a draw.
My questions:
How and when do you calculate the net's error (TD error)?
How can you implement the "backpropagation" of the error?
How are the weights adjusted using TD(λ)?
Thank you so much in advance :)
If you're serious about making this work, then understanding TD-lambda would be very helpful. Sutton and Barto's book, "Reinforcement Learning" is available for free in HTML format and covers this algorithm in detail. Basically, what TD-lambda does is create a mapping between a game state and the expected reward at the game's end. As games are played, states that are more likely to lead to winning states tend to get higher expected reward values.
For a simple game like tic-tac-toe, you're better off starting with a tabular mapping (just track an expected reward value for every possible game state). Then once you've got that working, you can try using a NN for the mapping instead. But I would suggest trying a separate, simpler NN project first...
I have been confused about this too, but I believe this is the way it works:
Starting from the end node, you check R, (output received) and E, (output expected). If E = R, it's fine, and you have no changes to make.
If E != R, you see how far off it was, based on thresholds and whatnot, and then shift the weights or threshold up or down a bit. Then, based on the new weights, you go back in, and guess whether or not it was too high, or too low, and repeat, with a weaker effect.
I've never really tried this algorithm, but that's basically the version of the idea as I understand it.
As far as I remember you do the training with a known result set - so you calculate the output for a known input and subtract your known output value from that - that is the error.
Then you use the error to correct the net - for a single layer NN adjusted with the delta rule I know that an epsilon of 0.5 is too high - something like 0.1 is better - slower but better. With backpropagation it is a bit more advanced - but as far as I remember the math equation description of a NN is complex and hard to understand - it is not that complicated.
take a look at
http://www.codeproject.com/KB/recipes/BP.aspx
or google for "backpropagation c" - it is probably easier to understand in code.

Determining which inputs to weigh in an evolutionary algorithm

I once wrote a Tetris AI that played Tetris quite well. The algorithm I used (described in this paper) is a two-step process.
In the first step, the programmer decides to track inputs that are "interesting" to the problem. In Tetris we might be interested in tracking how many gaps there are in a row because minimizing gaps could help place future pieces more easily. Another might be the average column height because it may be a bad idea to take risks if you're about to lose.
The second step is determining weights associated with each input. This is the part where I used a genetic algorithm. Any learning algorithm will do here, as long as the weights are adjusted over time based on the results. The idea is to let the computer decide how the input relates to the solution.
Using these inputs and their weights we can determine the value of taking any action. For example, if putting the straight line shape all the way in the right column will eliminate the gaps of 4 different rows, then this action could get a very high score if its weight is high. Likewise, laying it flat on top might actually cause gaps and so that action gets a low score.
I've always wondered if there's a way to apply a learning algorithm to the first step, where we find "interesting" potential inputs. It seems possible to write an algorithm where the computer first learns what inputs might be useful, then applies learning to weigh those inputs. Has anything been done like this before? Is it already being used in any AI applications?
In neural networks, you can select 'interesting' potential inputs by finding the ones that have the strongest correlation, positive or negative, with the classifications you're training for. I imagine you can do similarly in other contexts.
I think I might approach the problem you're describing by feeding more primitive data to a learning algorithm. For instance, a tetris game state may be described by the list of occupied cells. A string of bits describing this information would be a suitable input to that stage of the learning algorithm. actually training on that is still challenging; how do you know whether those are useful results. I suppose you could roll the whole algorithm into a single blob, where the algorithm is fed with the successive states of play and the output would just be the block placements, with higher scoring algorithms selected for future generations.
Another choice might be to use a large corpus of plays from other sources; such as recorded plays from human players or a hand-crafted ai, and select the algorithms who's outputs bear a strong correlation to some interesting fact or another from the future play, such as the score earned over the next 10 moves.
Yes, there is a way.
If you choose M selected features there are 2^M subsets, so there is a lot to look at.
I would to the following:
For each subset S
run your code to optimize the weights W
save S and the corresponding W
Then for each pair S-W, you can run G games for each pair and save the score L for each one. Now you have a table like this:
feature1 feature2 feature3 featureM subset_code game_number scoreL
1 0 1 1 S1 1 10500
1 0 1 1 S1 2 6230
...
0 1 1 0 S2 G + 1 30120
0 1 1 0 S2 G + 2 25900
Now you can run some component selection algorithm (PCA for example) and decide which features are worth to explain scoreL.
A tip: When running the code to optimize W, seed the random number generator, so that each different 'evolving brain' is tested against the same piece sequence.
I hope it helps in something!

Resources