I am writing a game for gameboy advance and I am implementing basic AI in the form of a binary search tree. There is 1 human player and 1 computer player. I need to figure out a way of telling how aggressive the human player is being in attacking the computer. The human will push a button to attack (and must be in a certain radius of the computer), so my first thought was to see how big the number of attacks was compared to the number of iterations my main while loop had gone through. This seems like a poor way to do it though, because that number will depend on the frame rate, which can fluctuate. Does anyone have an idea for a better way to do this?
Why not just keep a count of the number of moments the human player can attack the computer player? I mean, you obviously have to check to see if he's in range, so why not check whether he's in range, and if he is, increment the could_have_attacked counter. Then if he actually does attack, increment an attacked counter. Then you can compare how often the human player attacks to how often he has the opportunity.
Alternatively, compare how often he is in range to how often he is not in range, or more generally keep track of the average distance between the players. More aggressive players will probably stay a lot closer to their opponents than passive players.
Your best bet will probably be a combination of different methods. Keep several different measurements, and try several different ways of weighing the methods against each other, and use extensive playtesting to pick which one is better.
You have to define what aggressive is. For example, you could say aggressive is a player that pursues the AI, so you would check whether he is in range more often than he is out of range. Or you could say aggressive is a player that attacks every chance he gets, so you would check the ratio of successful attacks to the amount of time the player has to attack. Or you could say an aggressive player will attack when he is injured. Once you define what "aggressive" means in your game world, you can establish key performance indicators of when the player falls under that definition.
You can use a pseudorandom generator for that. Say you have a rnd() function that returns a value from 0 to 1 - greater value corresponds to higher agressiveness. You introduce a coefficient from 0 to 1. Each time you need to determine whether the character would attack you call rnd() and compare the value with the coefficient. If it's greater than that - attack, otherwise not.
I thought of something along the lines of counting the average time between combats and counting the average rate of fire while in combat. To do so, you would need 5 variables like this:
float avgTimeWithoutCombat;
float currentTimeWithoutCombat;
float avgRateOfFire;
int shotsFired;
float combatTimeElapsed;
As long as the player does not fight, you count the "peaceful" time in currentTimeWithoutCombat. When he fires, add it to avgTimeWithoutCombat and divide it by 2 to get the average.
Everytime the player attacks, increase the count of shotsFired. Also, count the time of the combat. Check in the main loop if a certain time without any attacks elasped. If so, the combat has ended and you can calculate the avgRateOfFire.
Now you have two values to measure the aggressiveness of the player:
avgTimeWithoutCombat: The bigger this number, the less aggressive is the player.
avgRateOfFire: The bigger this number, the more aggressive the player.
Related
I'm trying to solve a variant of 2048 by a Monte-Carlo Tree Search. I found that UCT could a good way to have some trade-off between exploration/exploitation.
My only issue is that all the versions I've seen assume that the score is a win percentage. How can I adapt it to a game where the score is the value of the board at the last state, and thus going from 1-MAX and not a win.
I could normalize the score using the constant c by dividing by MAX but then it would overweight exploration at early stage of the game (since you get bad average score) and overweight exploitation at late stage of the game.
Indeed most of the literature assumes your games are either lost or won and award a score of 0 or 1, which will turn into a win ratio when averaged over the number of games played. Then exploration parameter C is usually set to sqrt(2) which is optimal for the UCB in bandit problems.
To find out what a good C is in general you have to step back a bit and see what the UCT is really doing. If one node in your tree had an exceptionally bad score in the one rollout it had then exploitation says you should never choose it again. But you've only played that node once, so it might have just been bad luck. To acknowledge this you give that node a bonus. How much? Enough to make it a viable choice even if its average score is the lowest possible and some other node has the highest average score possible. Because with enough plays it might turn out that the one rollout your bad node had was indeed a fluke, and the node actually turns out to be pretty reliable with good scores. Of course, if you get more bad scores then it will likely not be bad luck so it won't deserve more rollouts.
So with scores ranging from 0 to 1 a C of sqrt(2) is a good value. If your game has a maximum achievable score then you can normalize your scores by dividing by the max and force your scores into to 0-1 range to suit a C of sqrt(2). Alternatively you don't normalize the scores but multiply C by your maximum score. The effect is the same: the UCT exploration bonus is large enough to give your underdog nodes some rollouts and a chance to prove themselves.
There is an alternative way of setting C dynamically that has given me good results. As you play, you keep track of the highest and lowest scores you've ever seen in each node (and subtree). This is the range of scores possible and this gives you a hint of how big C should be in order to give not well explored underdog nodes a fair chance. Every time i descend into the tree and pick a new root i adjust C to be sqrt(2) * score range for the new root. In addition, as rollouts complete and their scores turn out the be a new highest or lowest score i adjust C in the same way. By continually adjusting C this way as you play but also as you pick a new root you keep C as large as it needs to be to converge but as small as it can be to converge fast. Note that the minimum score is as important as the max: if every rollout will yield at minimum a certain score then C won't need to overcome it. Only the difference between max and min matters.
I've been studying Neural Networks lately. I'll explain my goal: i'm trying to teach monsters to walk, stand, basically perform actions that "reward" them (maximize the fitness function).
The NN receives sensor inputs, and outputs muscle activity. The problem gets down to training the weights and biases of the neurons.
My problem is that i'm not sure if i'm doing things right, and with neural networks i can make a mistake and never know about it. So i'll explain what i'm doing in general, and if you spot a mistake please correct me!
1) I create a neural network with neurons that use hyperbolic tangent transfer function.
2) Create a population of random "Chromosomes", each containing an array of doubles as genes(the weights and biases in the NN), the length of the array being amount of weights and biases in the NN. The genes have a lower and upper limit, usually [-2,2] in which their random value is generated in initialization and mutation.
For each generation:
3) For each chromosome, I update the NN weights and test the monster for about 5000 frames. Every 10 frames, network outputs are generated with sensor input. The outputs are double values normalized to [0,1] and they control "muscles" (springs) in the body by changing their neutral length, according to that value. Fitness value is calculated.
4) Perform Genetic Algorithm operators- first create cross overs with ~0.4 probability, then mutate with ~0.1 probability, depending on chromosome length. Mutation randomizes the gene to a value between some lower and upper limit. Elitism - two best solutions are left unchanged for the next generation.
Repeat until generations>maxGenerations or max fitness is reached.
I'm not sure about a few things in my code: should there be a limit for weights and biases? if yes, it constricts the potential results the NN could achieve. If no, then how do i initialize values, and mutate? I'm afraid that adding a random value as mutation will get stuck in local optima, like hill climbing. No limit will reduce the amount of parameters i need to consider when initializing the whole thing, which is nice!
Is hyperbolic tangent a good choice? why or why not?
Do i have to normalize inputs sensor data? if yes, between what values?
Also i'm not sure if i'm doing a mistake by outputting a double value for flexing instead of binary- higher than 0.5 is flex, less is release, could be an option, when now i'm just using the value as flex amount.
Don't consider bugs in my code as reasons for bad results, because i checked many times and implemented XOR that worked perfectly.
I would greatly appreciate any help, thank you!
I assume you are referring to Feed Forward Neural Networks, ie, forward connected layers of neurons.
It's ok to use hyperbolic tangent or a sigmoid function. Just make sure they are continuous and derivable in their domain. Else the learning algorithm (gradient descent) might not feedback correctly the error back into first layers.
You should normalize each input to either a range such as [-1,+1] or [-std,+std] using zscore. Therefore, the values of your inputs will have a similar weight in the decision function.
You do not specify the targets of your outputs, if they are discrete or floating point.
I wonder, as FFNN are supervised, with what data are you training your algorithm?
Given is an array of 320 elements (int16), which represent an audio signal (16-bit LPCM) of 20 ms duration. I am looking for a most simple and very fast method which should decide whether this array contains active audio (like speech or music), but not noise or silence. I don't need a very high quality of the decision, but it must be very fast.
It occurred to me first to add all squares or absolute values of the elements and compare their sum with a threshold, but such a method is very slow on my system, even if it is O(n).
You're not going to get much faster than a sum-of-squares approach.
One optimization that you may not be doing so far is to use a running total. That is, in each time step, instead of summing the squares of the last n samples, keep a running total and update that with the square of the most recent sample. To avoid your running total from growing and growing over time, add an exponential decay. In pseudocode:
decay_constant=0.999; // Some suitable value smaller than 1
total=0;
for t=1,...
// Exponential decay
total=total*decay_constant;
// Add in latest sample
total+=current_sample;
if total>threshold
// do something
end
end
Of course, you'll have to tune the decay constant and threshold to suit your application. If this isn't fast enough to run in real time, you have a seriously underpowered DSP...
You might try calculating two simple "statistics" - first would be spread (max-min). Silence will have very low spread. Second would be variety - divide the range of possible values into say 16 brackets (= value range) and as you go through the elements, determine in which bracket that element goes. Noise will have similar numbers for all brackets, whereas music or speech should prefer some of them while neglecting others.
This should be possible to do in just one pass through the array and you do not need complicated arithmetics, just some addition and comparison of values.
Also consider some approximation, for example take only each fourth value, thus reducing the number of checked elements to 80. For audio signal, this should be okay.
I did something like this a while back. After some experimentation I arrived at a solution that worked sufficiently well in my case.
I used the rate of change in the cube of the running average over about 120ms. When there is silence (only noise that is) the expression should be hovering around zero. As soon as the rate starts increasing over a couple of runs, you probably have some action going on.
rate = cur_avg^3 - prev_avg^3
I used a cube because the square just wasn't agressive enough. If the cube is to slow for you, try using the square and a bitshift instead. Hope this helps.
It is clearly that the complexity should be at least O(n). Probably some simple algorithms that calculate some value range are good for the moment but I would look for Voice Activity Detection on web and for related code samples.
While thinking about this question and conversing with the participants, the idea came up that shuffling a finite set of clearly biased random numbers makes them random because you don't know the order in which they were chosen. Is this true and if so can someone point to some resources?
EDIT: I think I might have been a little unclear. Suppose a bad random numbers generator. Take n values. These are biased(the rng is bad). Is there a way through shuffling to make the output of the rng over multiple trials statistically match the output of a known good rng?
False.
There is an easy test: Assume the bias in the original set creation algorithm is "creates sets whose arithmetic average is significantly lower than expected average". Obviously, shuffling the result of the algorithm will not change the averages and thus not remove the bias.
Also, regarding your clarification: How would you shuffle the set? Using the same bad output from the bad RNG that created the set in the first place? Or using a better RNG? Which raises the question why you don't use that directly.
It's not true. In the other question the problem is to select 30 random numbers in [1..9] with a sum of 200. After choosing about on average 20 of them randomly, you reach a point where you can't select nines anymore because this would make the total sum go over 200. Of the remaining 10 numbers, most will be ones and twos. So in the end, ones and twos are very overrepresented in the selected numbers. Shuffling doesn't change that. But it's not clear how the random distribution really should look like, so one could say this is as good a solution as any.
In general, if your "random" numbers will be biased to, say, low numbers, they will be biased that way no matter the ordering.
Just shuffling a set of numbers of already random numbers won't do anything to the probability distribution of course. That would mean false. Perhaps I misunderstand your question though?
I would say false, with a caveat:
I think there is random, and then there is 'random-enough'. For most applications that I have needed to work on, 'random-enough' was more than enough, i.e. picking a 'random' ad to display on a page from a list of 300 or so that have paid to be placed on that site.
I am sure a mathematician could prove my very basic 'random' selection criteria is not truly random at all, but in fact is predictable - for my clients, and for the users, nobody cares.
On the other hand if I was writing a video game to be used in Las Vegas where large amounts of money was at hand I'd define random differently (and may have a hard time coming up with truly random).
False
The set is finite, suppose consists of n numbers. What happens if you choose n+1 numbers? Let's also consider a basic random function as implemented in many languages which gives you a random number in [0,1). However, this number is limited to three digits after the decimal giving you a set of 1000 possible numbers (0.000 - 0.999). However in most cases you will not need to use all these 1000 numbers so this amount of randomness is more than enough.
However for some uses, you will need a better random generator than this. So it all comes down to exactly how many random numbers you are going to need, and how random you need them to be.
Addition after reading original question: in the case that you have some sort of limitation (such as in the original question in which each set of selected numbers must sum up to a certain N) you are not really selected random numbers per se, but rather choosing numbers in a random order from a given set (specifically, a permutation of numbers summing up to N).
Addition to edit: Suppose your bad number generator generated the sequence (1,1,1,2,2,2). Does the permutation (1,2,2,1,1,2) satisfy your definition of random?
Completely and utterly untrue: Shuffling doesn't remove a bias, it just conceals it from the casual observer. It's like removing your dog's fondly-laid present from your carpet by just pushing under the sofa - you really haven't solved the problem, you've just made it less conspicuous. Anyone with a nose knows that there is still a problem that needs removing.
The randomness must be applied evenly over the whole range, so here's one way (off the top of my head, lots of assumptions, yadda yadda. The point is the approach, not the code - start with everything even, then introduce your randomness in a consistent fashion until you're done. The only bias now is dependent on the values chosen for 'target' and 'numberofnumbers', which is part of the question.)
target = 200
numberofnumbers = 30
numbers = array();
for (i=0; i<numberofnumbers; i++)
numbers[i] = 9
while (sum(numbers)>target)
numbers[random(numberofnumbers)]--
False. Consider a bad random number generator producing only zeros (I said it was BAD :-) No amount of shuffling the zeros would change any property of that sequence.
I want to program a chess engine which learns to make good moves and win against other players. I've already coded a representation of the chess board and a function which outputs all possible moves. So I only need an evaluation function which says how good a given situation of the board is. Therefore, I would like to use an artificial neural network which should then evaluate a given position. The output should be a numerical value. The higher the value is, the better is the position for the white player.
My approach is to build a network of 385 neurons: There are six unique chess pieces and 64 fields on the board. So for every field we take 6 neurons (1 for every piece). If there is a white piece, the input value is 1. If there is a black piece, the value is -1. And if there is no piece of that sort on that field, the value is 0. In addition to that there should be 1 neuron for the player to move. If it is White's turn, the input value is 1 and if it's Black's turn, the value is -1.
I think that configuration of the neural network is quite good. But the main part is missing: How can I implement this neural network into a coding language (e.g. Delphi)? I think the weights for each neuron should be the same in the beginning. Depending on the result of a match, the weights should then be adjusted. But how? I think I should let 2 computer players (both using my engine) play against each other. If White wins, Black gets the feedback that its weights aren't good.
So it would be great if you could help me implementing the neural network into a coding language (best would be Delphi, otherwise pseudo-code). Thanks in advance!
In case somebody randomly finds this page. Given what we know now, what the OP proposes is almost certainly possible. In fact we managed to do it for a game with much larger state space - Go ( https://deepmind.com/research/case-studies/alphago-the-story-so-far ).
I don't see why you can't have a neural net for a static evaluator if you also do some classic mini-max lookahead with alpha-beta pruning. Lots of Chess engines use minimax with a braindead static evaluator that just adds up the pieces or something; it doesn't matter so much if you have enough levels of minimax. I don't know how much of an improvement the net would make but there's little to lose. Training it would be tricky though. I'd suggest using an engine that looks ahead many moves (and takes loads of CPU etc) to train the evaluator for an engine that looks ahead fewer moves. That way you end up with an engine that doesn't take as much CPU (hopefully).
Edit: I wrote the above in 2010, and now in 2020 Stockfish NNUE has done it. "The network is optimized and trained on the [classical Stockfish] evaluations of millions of positions at moderate search depth" and then used as a static evaluator, and in their initial tests they got an 80-elo improvement when using this static evaluator instead of their previous one (or, equivalently, the same elo with a little less CPU time). So yes it does work, and you don't even have to train the network at high search depth as I originally suggested: moderate search depth is enough, but the key is to use many millions of positions.
Been there, done that. Since there is no continuity in your problem (the value of a position is not closely related to an other position with only 1 change in the value of one input), there is very little chance a NN would work. And it never did in my experiments.
I would rather see a simulated annealing system with an ad-hoc heuristic (of which there are plenty out there) to evaluate the value of the position...
However, if you are set on using a NN, is is relatively easy to represent. A general NN is simply a graph, with each node being a neuron. Each neuron has a current activation value, and a transition formula to compute the next activation value, based on input values, i.e. activation values of all the nodes that have a link to it.
A more classical NN, that is with an input layer, an output layer, identical neurons for each layer, and no time-dependency, can thus be represented by an array of input nodes, an array of output nodes, and a linked graph of nodes connecting those. Each node possesses a current activation value, and a list of nodes it forwards to. Computing the output value is simply setting the activations of the input neurons to the input values, and iterating through each subsequent layer in turn, computing the activation values from the previous layer using the transition formula. When you have reached the last (output) layer, you have your result.
It is possible, but not trivial by any means.
https://erikbern.com/2014/11/29/deep-learning-for-chess/
To train his evaluation function, he utilized a lot of computing power to do so.
To summarize generally, you could go about it as follows. Your evaluation function is a feedforward NN. Let the matrix computations lead to a scalar output valuing how good the move is. The input vector for the network is the board state represented by all the pieces on the board so say white pawn is 1, white knight is 2... and empty space is 0. An example board state input vector is simply a sequence of 0-12's. This evaluation can be trained using grandmaster games (available at a fics database for example) for many games, minimizing loss between what the current parameters say is the highest valuation and what move the grandmasters made (which should have the highest valuation). This of course assumes that the grandmaster moves are correct and optimal.
What you need to train a ANN is either something like backpropagation learning or some form of a genetic algorithm. But chess is such an complex game that it is unlikly that a simple ANN will learn to play it - even more if the learning process is unsupervised.
Further, your question does not say anything about the number of layers. You want to use 385 input neurons to encode the current situation. But how do you want to decide what to do? On neuron per field? Highest excitation wins? But there is often more than one possible move.
Further you will need several hidden layers - the functions that can be represented with an input and an output layer without hidden layer are really limited.
So I do not want to prevent you from trying it, but chances for a successful implemenation and training within say one year or so a practically zero.
I tried to build and train an ANN to play Tic-tac-toe when I was 16 years or so ... and I failed. I would suggest to try such an simple game first.
The main problem I see here is one of training. You say you want your ANN to take the current board position and evaluate how good it is for a player. (I assume you will take every possible move for a player, apply it to the current board state, evaluate via the ANN and then take the one with the highest output - ie: hill climbing)
Your options as I see them are:
Develop some heuristic function to evaluate the board state and train the network off that. But that begs the question of why use an ANN at all, when you could just use your heuristic.
Use some statistical measure such as "How many games were won by white or black from this board configuration?", which would give you a fitness value between white or black. The difficulty with that is the amount of training data required for the size of your problem space.
With the second option you could always feed it board sequences from grandmaster games and hope there is enough coverage for the ANN to develop a solution.
Due to the complexity of the problem I'd want to throw the largest network (ie: lots of internal nodes) at it as I could without slowing down the training too much.
Your input algorithm is sound - all positions, all pieces, and both players are accounted for. You may need an input layer for every past state of the gameboard, so that past events are used as input again.
The output layer should (in some form) give the piece to move, and the location to move to.
Write a genetic algorithm using a connectome which contains all neuron weights and synapse strengths, and begin multiple separated gene pools with a large number of connectomes in each.
Make them play one another, keep the best handful, crossover and mutate the best connectomes to repopulate the pool.
Read blondie24 : http://www.amazon.co.uk/Blondie24-Playing-Kaufmann-Artificial-Intelligence/dp/1558607838.
It deals with checkers instead of chess but the principles are the same.
Came here to say what Silas said. Using a minimax algorithm, you can expect to be able to look ahead N moves. Using Alpha-beta pruning, you can expand that to theoretically 2*N moves, but more realistically 3*N/4 moves. Neural networks are really appropriate here.
Perhaps though a genetic algorithm could be used.