What AI theory uses "win" counts to determine strategy?

What AI theory uses "win" counts to determine strategy? - artificial-intelligence

I am new to AI, and was wondering what (if any) AI theory best describes the following scenario:
I have an agent that can be programmed to execute one of three states in a competitive scenario. I program the agent to execute each state five times. A win count is maintained while testing each state. Results would look something like this:
State Wins
1 3
2 1
3 3
After all testing is complete, the state(s) with the highest (and joint highest) win count are determined from the results set. In the above example, states 1 and 3 are determined as being the most effective states.

Related

How can I predict a user generated distribution by learning from previous distributions

I am trying to program a prediction algorithmus which predicts the distribution of marbles among 4 cups based on the prevision user input. But I have no idea where to start or which techniques can be used to solve this.
Example:
There are 4 cups numbered from 0 to 3 and the user will receive x marbles which he distributes among those cups. Each round the user receives another amount of marbles (or the same amount) and before
the user distributes them, the algorithm tries to predict the distribution based on the users previous inputs. After that the user “corrects” it. The goal is that the user does not have to correct anything hence the
algorithm predicts the correct distribution. However the pattern which the user distributes the marbles can change, and the algorithm has to adapt.
This is the simplest design of the problem which already is not trivial to solve. However it get exponential more complex when the marbles have additional properties which can be used for distribution.
For example, they could have a color and a weight.
So, for example, how does the algorithm learn that the user (most of the time) put the marbles with the same color in one cup, but cup 2 is (most of the time empty) and the rest is equally distributed?
So in my head the algorithm has to do something like this:
search for a pattern after the users distribution is done. Those patterns can be the amount of marbles per cup / weight per cup or anything else.
if a pattern is found a predefined value (weight) is added to the pattern
if a previous pattern was not found a predefined value has to be subtracted from the pattern
when the algorithm has to predict, all patterns with a predefined weight have to be applied.
I am not sure if I am missing something and how I would implement something like this or in which area I have to look for answers.

First of all, bear in mind that human behavior does not always follow a pattern. If the user distributes these marbles randomly, it will be hard to predict the next move!
But if there IS a pattern in the distributions, you might create a prediction using an algorithm such as a neural network or a decision tree.
For example:
// dataset1
// the weights and colors of 10 marbles
let dataset1 = [4,3,1,1,2,3,4,5,3,4,7,6,4,4,2,4,1,6,1,2]
// cups1
// the cups distribution of the above marbles
let labels1 = [2,1,0,2,1,1,3,1,0,1]
Now you can train an algorithm, for example a neural network or a decision tree.
This isn't real code, just an example of how it could work.
let net = new NeuralNet()
net.train(dataset1, labels1)
After training with lots of data (at least hundreds of these datasets), you can give the network a new dataset and it will give you a prediction of the cups distribution
let newMarbleSet = [...]
let prediction = net.predict(newMarbleSet)
It's up to you what you want to do with this prediction.

Predict probabilities of a recommender system?

I have a dataset where I have a sparse utility matrix (user-product) with binary input: 1 if the user 𝑖 bought the product 𝑗, and 0 if it hasn't.
However it has a different meaning on the test set, 0 means that we don't know if the user bought this product, and 1 means that we're sure that the user bought that given product.
I need to get for each user in each product the probability that a user 𝑖 bought a product 𝑗 in the test set. For this I wish to use different matrix factorization techniques like FunkSVD, NMF or SVD++ , but I'm quite confused:
These techniques would only allow me to get the label (1 or 0) on test set, but I need to compute a probability of getting 1 and not the label in it self.
How can I approach this problem ? Or do I treat it as a classification problem and then use all common classification techniques ?

Okay, One approach that might help based on Recurrent-Knowledge-Graph-Embedding.
Try converting user-item interactions into a knowledge graph and mine the top-n paths in the graph between user_i and item_j. Build an RNN model as described in the paper and its code, to get probabilities.

Determining Number of Outputs for trading type ANN

I am currently trying to implement an ANN that does 1 for 1 Trades with 8 different possible Goods.
I am wondering how I determine the number of outputs necessery for the ANN to perform adequatly.
Should the number of outputs be equivalent to the number of possible trades? Meaning if I have 8 different Goods and can trade each one for each of the 8 Goods does the ANN need 8*8 outputs?
To summarize does an ANN need a number of outputs equal to the number of distinct actions it can perform?
edit: To clarify the goods have worth specific to a situation which is the input given to the ANN. the 8*8 is referring to the number of possible combination of trades one of the goods for any other.
Thank you in advance.

Classification engine
Neural networks (feed forward) are classification engines - they are not necessarily ment for "storing knowledge" such as decision trees and logical knowledge bases.
Though it certainly is possible to store predefined decisions inside a neural network - much like a gigantic if-clause.
Number of outputs
If the different outputs assigns different classes, you should use one outputs signal per classification instance.
If you were to decide to let one output signal imply different classes depending on the output value, you are hinting to the network that a output signal of 10 "is a better class" than one with output -10. Therefore I would strongly recommend to use one output signal per class, although this will require more training (at the advantage of possibly fewer plateaus in the search space).
I am not sure what you are refering to with:
Meaning if I have 8 different Goods and can trade each one for each of
the 8 Goods does the ANN need 8^8 outputs
Are you going to input a set if "stock values" and force the net to output which stocks to buy and sell?

Genetic Algorithm Sudoku - optimizing mutation

I am in the process of writing a genetic algorithm to solve Sudoku puzzles and was hoping for some input. The algorithm solves puzzles occasionally (about 1 out of 10 times on the same puzzle with max 1,000,000 iterations) and I am trying to get a little input about mutation rates, repopulation, and splicing. Any input is greatly appreciated as this is brand new to me and I feel like I am not doing things 100% correct.
A quick overview of the algorithm
Fitness Function
Counts the number of unique values of numbers 1 through 9 in each column, row, and 3*3 sub box. Each of these unique values in the subsets are summed and divided by 9 resulting in a floating value between 0 and 1. The sum of these values is divided by 27 providing a total fitness value ranging between 0 and 1. 1 indicates a solved puzzle.
Population Size:
100
Selection:
Roulette Method. Each node is randomly selected where nodes containing higher fitness values have a slightly better chance of selection
Reproduction:
Two randomly selected chromosomes/boards swap a randomly selected subset (row, column, or 3*3 subsets) The selection of subset(which row, column, or box) is random. The resulting boards are introduced into population.
Reproduction Rate: 12% of population per cycle
There are six reproductions per iteration resulting in 12 new chromosomes per cycle of the algorithm.
Mutation: mutation occurs at a rate of 2 percent of population after 10 iterations of no improvement of highest fitness.
Listed below are the three mutation methods which have varying weights of selection probability.
1: Swap randomly selected numbers. The method selects two random numbers and swaps them throughout the board. This method seems to have the greatest impact on growth early in the algorithms growth pattern. 25% chance of selection
2: Introduce random changes: Randomly select two cells and change their values. This method seems to help keep the algorithm from converging. %65 chance of selection
3: count the number of each value in the board. A solved board contains a count of 9 of each number between 1 and 9. This method takes any number that occurs less than 9 times and randomly swaps it with a number that occurs more than 9 times. This seems to have a positive impact on the algorithm but only used sparingly. %10 chance of selection
My main question is at what rate should I apply the mutation method. It seems that as I increase mutation I have faster initial results. However as the result approaches a correct result, I think the higher rate of change is introducing too many bad chromosomes and genes into the population. However, with the lower rate of change the algorithm seems to converge too early.
One last question is whether there is a better approach to mutation.

You can anneal the mutation rate over time to get the sort of convergence behavior you're describing. But I actually think there are probably bigger gains to be had by modifying other parts of your algorithm.
Roulette wheel selection applies a very high degree of selection pressure in general. It tends to cause a pretty rapid loss of diversity fairly early in the process. Binary tournament selection is usually a better place to start experimenting. It's a more gradual form of pressure, and just as importantly, it's much better controlled.
With a less aggressive selection mechanism, you can afford to produce more offspring, since you don't have to worry about producing so many near-copies of the best one or two individuals. Rather than 12% of the population producing offspring (possible less because of repetition of parents in the mating pool), I'd go with 100%. You don't necessarily need to literally make sure every parent participates, but just generate the same number of offspring as you have parents.
Some form of mild elitism will probably then be helpful so that you don't lose good parents. Maybe keep the best 2-5 individuals from the parent population if they're better than the worst 2-5 offspring.
With elitism, you can use a bit higher mutation rate. All three of your operators seem useful. (Note that #3 is actually a form of local search embedded in your genetic algorithm. That's often a huge win in terms of performance. You could in fact extend #3 into a much more sophisticated method that looped until it couldn't figure out how to make any further improvements.)
I don't see an obvious better/worse set of weights for your three mutation operators. I think at that point, you're firmly within the realm of experimental parameter tuning. Another idea is to inject a bit of knowledge into the process and, for example, say that early on in the process, you choose between them randomly. Later, as the algorithm is converging, favor the mutation operators that you think are more likely to help finish "almost-solved" boards.

I once made a fairly competent Sudoku solver, using GA. Blogged about the details (including different representations and mutation) here:
http://fakeguido.blogspot.com/2010/05/solving-sudoku-with-genetic-algorithms.html

Determining which inputs to weigh in an evolutionary algorithm

I once wrote a Tetris AI that played Tetris quite well. The algorithm I used (described in this paper) is a two-step process.
In the first step, the programmer decides to track inputs that are "interesting" to the problem. In Tetris we might be interested in tracking how many gaps there are in a row because minimizing gaps could help place future pieces more easily. Another might be the average column height because it may be a bad idea to take risks if you're about to lose.
The second step is determining weights associated with each input. This is the part where I used a genetic algorithm. Any learning algorithm will do here, as long as the weights are adjusted over time based on the results. The idea is to let the computer decide how the input relates to the solution.
Using these inputs and their weights we can determine the value of taking any action. For example, if putting the straight line shape all the way in the right column will eliminate the gaps of 4 different rows, then this action could get a very high score if its weight is high. Likewise, laying it flat on top might actually cause gaps and so that action gets a low score.
I've always wondered if there's a way to apply a learning algorithm to the first step, where we find "interesting" potential inputs. It seems possible to write an algorithm where the computer first learns what inputs might be useful, then applies learning to weigh those inputs. Has anything been done like this before? Is it already being used in any AI applications?

In neural networks, you can select 'interesting' potential inputs by finding the ones that have the strongest correlation, positive or negative, with the classifications you're training for. I imagine you can do similarly in other contexts.

I think I might approach the problem you're describing by feeding more primitive data to a learning algorithm. For instance, a tetris game state may be described by the list of occupied cells. A string of bits describing this information would be a suitable input to that stage of the learning algorithm. actually training on that is still challenging; how do you know whether those are useful results. I suppose you could roll the whole algorithm into a single blob, where the algorithm is fed with the successive states of play and the output would just be the block placements, with higher scoring algorithms selected for future generations.
Another choice might be to use a large corpus of plays from other sources; such as recorded plays from human players or a hand-crafted ai, and select the algorithms who's outputs bear a strong correlation to some interesting fact or another from the future play, such as the score earned over the next 10 moves.

Yes, there is a way.
If you choose M selected features there are 2^M subsets, so there is a lot to look at.
I would to the following:
For each subset S
run your code to optimize the weights W
save S and the corresponding W
Then for each pair S-W, you can run G games for each pair and save the score L for each one. Now you have a table like this:
feature1 feature2 feature3 featureM subset_code game_number scoreL
1 0 1 1 S1 1 10500
1 0 1 1 S1 2 6230
...
0 1 1 0 S2 G + 1 30120
0 1 1 0 S2 G + 2 25900
Now you can run some component selection algorithm (PCA for example) and decide which features are worth to explain scoreL.
A tip: When running the code to optimize W, seed the random number generator, so that each different 'evolving brain' is tested against the same piece sequence.
I hope it helps in something!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight