How would Transposition Tables work with Hypermax? - artificial-intelligence

I was wondering if someone out there could help me understand how Transposition Tables could be incorporated into the Hypermax algorithm. Any examples, pseudo-code, tips, or implementation references would be much appreciated!
A little background:
Hypermax is a recursive game tree search algorithm used for n-player
games, typically for 3+ players. It's an extension of minimax and
alpha beta pruning.
Generally at each node in the game tree the
current player (chooser) will look at all of the moves it can make
and choose the one that maximizes it's own utility. Different than
minimax / negamax.
I understand how transposition tables work, but I
don't know how the values stored in them would be used to initiate
cutoffs when a transposition table entry is found. A transposition
flag is required in minimax with transposition & alpha-beta pruning.
I can't seem to wrap my head around how that would be incorporated
here.
Hypermax Algorithm without Transposition Tables in Javascript:
/**
* #param {*} state A game state object.
* #param {number[]} alphaVector The alpha vector.
* #returns {number[]} An array of utility values for each player.
*/
function hypermax(state, alphaVector) {
// If terminal return the utilities for all of the players
if (state.isTerminal()) {
return state.calculateUtilities();
}
// Play out each move
var moves = state.getLegalMoves();
var bestUtilityVector = null;
for (var i = 0; i < moves.length; ++i) {
var move = moves[i];
state.doMove(move); // move to child state - updates game board and advances player 1
var utilityVector = hypermax(state, alphaVector.slice(0)); // copy the alpha values down
state.undoMove(move); // return to this state - remove board updates and rollsback player 1
// Select this as best utility if first found
if (i === 0) {
bestUtilityVector = utilityVector;
}
// Update alpha
if (utilityVector[state.currentPlayer] > alpha[state.currentPlayer]) {
alpha[state.currentPlayer] = utilities[state.currentPlayer];
bestUtilities = utilityVector;
}
// Alpha prune
var sum = 0;
for (var j = 0; j < alphaVector.length; ++j) {
sum += alpha[j];
}
if (sum >= 0) {
break;
}
}
}
References:
An implementation of Hypermax without Transposition Tables: https://meatfighter.com/spotai/#references_2
Minimax (negamax variant) with alpha-beta pruning and transposition tables: https://en.wikipedia.org/wiki/Negamax#Negamax_with_alpha_beta_pruning_and_transposition_tables
Original derivation and Proofs of Hypermax: http://uu.diva-portal.org/smash/get/diva2:761634/FULLTEXT01.pdf

The question is quite broad, so this is a similarly broad answer - if there is something specific, please clarify what you don't understand.
Transposition tables are not guaranteed to be correct in multi-player games, but if you implement them carefully they can be. This is discussed briefly in this thesis:
Multi-Player Games, Algorithms and Approaches
To summarize, there are three things to note about transposition
tables in multi-player game trees. First, they require that we be
consistent with our node-ordering. Second, they can be less
effective than in two-player games, due to the fact that it takes more
moves for a transposition to occur. Finally, speculative pruning can
benefit from transposition tables, as they can offset the cost of
re-searching portions of the game tree.
Beyond ordering issues, you may need to store things like the depth of search underneath a branch, the next player to play, and the bounds used for pruning the subtree. If, for instance, you have different bounds for pruning a tree in your first search, you may not produce correct results in the second search.
HyperMax is only a slight variant of Max^n with speculative pruning, so you might want to look at that context to see if you can implement things in Max^n.

Related

How to select array of integers in UPPAAL?

I am using uppaal for a class and I would like to create array of integers within range, using select statement.
For background, I am modelling a modified game of nim, with 3 players and 3 heaps, where a player can either pick up to 3 matches from a single heap, or pick the same number of matches from ALL the heaps (Assuming there is enough matches left in all of them.)
So far I have apparently working (according to some basic queries for the verifier) nim game with 3 players, taking matches from a single heap, but I need to extend the players to be able to take from all the heaps and I would prefer not to hardcode variables like heap1Taken, heap1TakenAmount, heap2Taken, heap2TakenAmount etc. :-)
I ended up creating an array int[0, MAX] beru[3]; and two functions, set_beru and beru_init.
void set_beru(int[0, MAX]& beru[3], int[0, 2] index, int[1, MAX] value){
for (i : int[0, 2]){
if (i == index){
beru[i] = value;
} else {
beru[i] = 0;
}
}
}
void beru_init(int[0, MAX]& beru[3], int[1, MAX] init_value){
for (i : int[0, 2]){
beru[i] = init_value;
}
}
A player of the game then has two possible transitions from ready_to_play to playing, one of them selecting a heap index and an amount, then calling set_beru, the other one selecting an amount and calling beru_init. Both o them have the guards that make sure the move is legal of course.
When a player is in the playing state, he signals on a channel and the game board updates the heaps using the beru array. This allows the players to play according to the full set of rules.

Solving Lights out for AI Course

So I was given the following task: Given that all lights in a 5x5 version of a game are turned on, write an algorithm using UCS / A* / BFS / Greedy best first search that finds a solution.
What I did first was realize that UCS would be unnecessary as the cost from moving from one state to another is 1(pressing a button that flips itself and neighbouring ones). So what I did is wrote BFS instead. It turned out that it works too long and fills up a queue, even though I was paying attention to removing parent nodes when I was finished with them not to overflow the memory. It would work for around 5-6mins and then crash because of memory.
Next, what I did is write DFS(even though it was not mentioned as one of possibilities) and it did find a solution in 123 secs, at depth 15(I used depth-first limited because I knew that there was a solution at depth 15).
What I am wondering now is am I missing something? Is there some good heuristics to try to solve this problem using A* search? I figured out absolutely nothing when it's about heuristics, because it doesn't seem any trivial to find one in this problem.
Thanks very much. Looking forward to some help from you guys
Here is my source code(I think it's pretty straightforward to follow):
struct state
{
bool board[25];
bool clicked[25];
int cost;
int h;
struct state* from;
};
int visited[1<<25];
int dx[5] = {0, 5, -5};
int MAX_DEPTH = 1<<30;
bool found=false;
struct state* MakeStartState()
{
struct state* noviCvor = new struct state();
for(int i = 0; i < 25; i++) noviCvor->board[i] = false, noviCvor->clicked[i] = false;
noviCvor->cost = 0;
//h=...
noviCvor->from = NULL;
return noviCvor;
};
struct state* MakeNextState(struct state* temp, int press_pos)
{
struct state* noviCvor = new struct state();
for(int i = 0; i < 25; i++) noviCvor->board[i] = temp->board[i], noviCvor->clicked[i] = temp->clicked[i];
noviCvor->clicked[press_pos] = true;
noviCvor->cost = temp->cost + 1;
//h=...
noviCvor->from = temp;
int temp_pos;
for(int k = 0; k < 3; k++)
{
temp_pos = press_pos + dx[k];
if(temp_pos >= 0 && temp_pos < 25)
{
noviCvor->board[temp_pos] = !noviCvor->board[temp_pos];
}
}
if( ((press_pos+1) % 5 != 0) && (press_pos+1) < 25 )
noviCvor->board[press_pos+1] = !noviCvor->board[press_pos+1];
if( (press_pos % 5 != 0) && (press_pos-1) >= 0 )
noviCvor->board[press_pos-1] = !noviCvor->board[press_pos-1];
return noviCvor;
};
bool CheckFinalState(struct state* temp)
{
for(int i = 0; i < 25; i++)
{
if(!temp->board[i]) return false;
}
return true;
}
int bijection_mapping(struct state* temp)
{
int temp_pow = 1;
int mapping = 0;
for(int i = 0; i < 25; i++)
{
if(temp->board[i])
mapping+=temp_pow;
temp_pow*=2;
}
return mapping;
}
void BFS()
{
queue<struct state*> Q;
struct state* start = MakeStartState();
Q.push(start);
struct state* temp;
visited[ bijection_mapping(start) ] = 1;
while(!Q.empty())
{
temp = Q.front();
Q.pop();
visited[ bijection_mapping(temp) ] = 2;
for(int i = 0; i < 25; i++)
{
if(!temp->clicked[i])
{
struct state* next = MakeNextState(temp, i);
int mapa = bijection_mapping(next);
if(visited[ mapa ] == 0)
{
if(CheckFinalState(next))
{
printf("NADJENO RESENJE\n");
exit(0);
}
visited[ mapa ] = 1;
Q.push(next);
}
}
}
delete temp;
}
}
PS. As I am not using map anymore(switched to array) for visited states, my DFS solution improved from 123 secs to 54 secs but BFS still crashes.
First of all, you may already recognize that in Lights Out you never have to flip the same switch more than once, and it doesn't matter in which order you flip the switches. You can thus describe the current state in two distinct ways: either in terms of which lights are on, or in terms of which switches have been flipped. The latter, together with the starting pattern of lights, gives you the former.
To employ a graph-search algorithm to solve the problem, you need a notion of adjacency. That follows more easily from the second characterization: two states are adjacent if there is exactly one switch about which they they differ. That characterization also directly encodes the length of the path to each node (= the number of switches that have been flipped), and it reduces the number of subsequent moves that need to be considered for each state considered, since all possible paths to each node are encoded in the pattern of switches.
You could use that in a breadth-first search relatively easily (and this may be what you in fact tried). BFS is equivalent to Dijkstra's algorithm in that case, even without using an explicit priority queue, because you enqueue new nodes to explore in priority (path-length) order.
You can also convert that to an A* search with addition of a suitable heuristic. For example, since each move turns off at most five lights, one could take as the heuristic the number of lights still on after each move, divided by 5. Though that's a bit crude, I'm inclined to think that it would be of some help. You do need a real priority queue for that alternative, however.
As far as implementation goes, do recognize that you can represent both the pattern of lights currently on and the pattern of switches that have been pressed as bit vectors. Each pattern fits in a 32-bit integer, and a list of visited states requires 225 bits, which is well within the capacity of modern computing systems. Even if you use that many bytes, instead, you ought to be able to handle it. Moreover, you can perform all needed operations using bitwise arithmetic operators, especially XOR. Thus, this problem (at its given size) ought to be computable relatively quickly.
Update:
As I mentioned in comments, I decided to solve the problem for myself, with -- it seemed to me -- very good success. I used a variety of techniques to achieve good performance and minimize memory usage, and in this case, those mostly were complementary. Here are some of my tricks:
I represented each whole-system state with a single uint64_t. The top 32 bits contain a bitmask of which switches have been flipped, and the bottom 32 contain a bitmask of which lights are on as a result. I wrapped these in a struct along with a single pointer to link them together as elements of a queue. A given state can be tested as a solution with one bitwise-and operation and one integer comparison.
I created a pre-initialized array of 25 uint64_t bitmasks representing the effect of each move. One bit set among the top 32 of each represents the switch that is flipped, and between 3 and five bits set among the bottom 32 represent the lights that are toggled as a result. The effect of flipping one switch can then be computed simply as new_state = old_state ^ move[i].
I implemented plain breadth-first search instead of A*, in part because I was trying to put something together quickly, and in particular because that way I could use a regular queue instead of a priority queue.
I structured my BFS in a way that naturally avoided visiting the same state twice, without having to actually track which states had ever been enqueued. This was based on some insight into how to efficiently generate distinct bit patterns without repeating, with those having fewer bits set generated before those having more bits set. The latter criterion was satisfied fairly naturally by the queue-based approach required anyway for BFS.
I used a second (plain) queue to recycle dynamically-allocated queue nodes after they were removed from the main queue, to minimize the number calls to malloc().
Overall code was a bit less than 200 lines, including blank and comment lines, data type declarations, I/O, queue implementation (plain C, no STL) -- everything.
Note, by the way, that the priority queue employed in standard Dijkstra and in A* is primarily about finding the right answer (shortest path), and only secondarily about doing so efficiently. Enqueueing and dequeueing from a standard queue can both be O(1), whereas those operations on a priority queue are o(log m) in the number of elements in the queue. A* and BFS both have worst-case queue size upper bounds of O(n) in the total number of states. Thus, BFS will scale better than A* with problem size; the only question is whether the former reliably gives you the right answer, which in this case, it does.

Explore grid with obstacles

I need an algorithm for a robot to explore an n*n grid with obstacles (a maze if you wish). The goal is to explore all the squares without obstacles in them an avoid the squares with obstacles. The trick is that an obstacle forces the robot to change its path causing it to miss the possible free squares behind the obstacle. I can lazily increment/decrement the robot's x/y coordinates to have the robot move in any of the four directions in case there are no obstacles and the robot can traverse a pre-seen path (if needed) in order to reach other free squares. The algorithm should terminate when ALL the free squares were met at least once.
Any simple lazy/efficient way to do this? a pseudo-code will be greatly appreciated
I believe the problem is reduceable from Traveling Salesman Problem, and thus is NP - Hard, so you are unlikely to find a polynomial solution that solves the problem optimally and efficiently.
However, You might want to adopt some of the heuristics and approximations for TSP, I believe they can be adjusted to this problem as well, since the problem seems very closed to TSP in the first place
EDIT:
If finding the shortest path is not a requirement, and you want any path, a simple DFS with maintaining a visited set can do. In the step in DFS you come back from the recursion - you move to the previous squares, this way the robot is ensured to explore all squares, if there is a path to all of them.
pseudo code for DFS:
search(path,visited,current):
visited.add(current)
for each square s adjacent to current:
if (s is an obstacle): //cannot go to a square which is an obstacle
continue
if (s is not in visited): //no point to go to a node that was already visited
path.add(s) //go to s
search(path,visited,current) //recursively visit all squares accesable form s
//step back from s to current, so you can visit the next neighbor of current.
path.add(current)
invoke with search([source],{},source)
Note that optimization heuristics can be used before the for each step - the heuristic will just be to reorder the iteration order of the nodes.
Just keep a list of unexplored neighbors. A clever heuristic for which field from the list to visit next can be used to make it more efficient if necessary.
Pseudocode (uses a Stack to keep track of the unexplored neighbors resulting in a DFS):
// init
Cell first_cell;
CellStack stack;
stack.push(first_cell);
while(!stack.isEmpty()) {
Cell currentCell = stack.pop();
currentCell.markVisisted();
for(neighbor in currentCell.getNeighbors()) {
stack.push(neighbor);
}
}
I think best way of the problem solve is recursive (go to nearest free cell). With following additional heuristic: go to the nearest free cell with minimal free neighbours (prefer stubs).
Pseudocode:
// init:
for (cell in all_cells) {
cell.weight = calc_neighbors_number(cell);
}
path = [];
void calc_path(cell)
{
cell.visited = true;
path.push_back(cell);
preferred_cell = null;
for (cell in cell.neighbors)
if (cell.visited) {
continue;
}
if (preferred_cell == null || preferred_cell.weight > cell.weight)
preferred_cell = cell;
}
if (preferred_cell != null) {
calc_path(preferred_cell);
}
}
// Start algotithm:
calc_path(start);

Algorithm for 3D dice generation

I am making a simple test application in C that is supposed to generate three dimensional dice. I am going to use OpenGL to do the actual drawing, but I cannot figure out how to actually generate the vertices. Of course, the whole point of this test was to see if my algorithm worked, but I found a major logic error that I cannot fix. Can somebody please point me to an article, website, or something that explains the concept? If not, although I would prefer to do the actual implementation myself, the C code is acceptable.
Basically, this is what I did before I forgot what I was doing for the algorithm:
void calculateVertices(int sides) {
BOOL isDone = FALSE;
int vectorsPerSide = 3;
VDVector3f *vertices = malloc((sizeof(VDVector3f) * (sides + 1)));
VDVector3f *normals = malloc((sizeof(VDVector3f) * (sides + 1)));
while (!isDone) {
// Start by positioning the first vertex.
vertices[0] = VDVector3fMake(0.0, 0.0, 1.0);
for (int index = 1; index <= sides; index ++) {
}
// Not a match, increase the number of vectors
vectorsPerSide ++;
}
}
Basically, it loops until a match is found. This sounds inefficient to me, but I had no other idea as to how to do this. The first vertex will actually be removed from the array at the end; I was going to use it to create the first side, which would have been used to properly position the others.
My main goal here is to be able to pass number (like 30) to it, and have it set the vertices automatically. I will not have protections against making one sided and two sided dice, because I have something special in mind. I will have those vertices entered elsewhere.
Thanks in advance for the help!
By the way, I have an algorithm that can normalize the completed vertex array. You don't have to bother helping with that.
I don't think this is possible to generalize this. How, for example would you make a fair 5 or 9 sided die? I don't think I have ever seen such a thing. A quick search on wikipedia suggests platonic solids may be what you are after. http://en.wikipedia.org/wiki/Platonic_solid

Efficient Implementation of Fitness-Proportionate "Roulette" Selection

I am currently writing a keyboard layout optimization algorithm in C (such as the one designed by Peter Klausler) and I want to implement a fitness-proportionate selection as described here (PDF Link):
With roulette selection you select
members of the population based on a
roullete wheel model. Make a pie
chart, where the area of a member’s
slice to the whole circle is the ratio
of the members fitness to the total
population. As you can see if a point
on the circumfrence of the circle is
picked at random those population
members with higher fitness will have a
higher probability of being picked.
This ensures natural selection takes
place.
The problem is, I don't see how to implement it efficiently. I've thought of two methods: one is unreliable, and the other is slow.
First, the slow one:
For a keyboard pool of length N, create an array of length N where each element of the array actually contains two elements, a minimum and a maximum value. Each keyboard has a corresponding minimum and maximum value, and the range is based on the fitness of the keyboard. For example, if keyboard zero has a fitness of 10, keyboard one has a fitness of 20, and keyboard two has a fitness of 25, it would look like this:
Code:
array[0][0] = 0; // minimum
array[0][1] = 9; // maximum
array[1][0] = 10;
array[1][1] = 30;
array[2][0] = 31;
array[2][1] = 55;
(In this case a lower fitness is better, since it means less effort is required.)
Then generate a random number. For whichever range that number falls into, the corresponding keyboard is "killed" and replaced with the offspring of a different keyboard. Repeat this as many times as desired.
The problem with this is that it is very slow. It takes O(N^2) operations to finish.
Next the fast one:
First figure out what the lowest and highest fitnesses for the keyboards are. Then generate a random number between (lowest fitness) and (highest fitness) and kill all keyboards with a fitness higher than the generated number. This is efficient, but it's not guaranteed to only kill half the keyboards. It also has somewhat different mechanics from a "roulette wheel" selection, so it may not even be applicable.
So the question is, what is an efficient implementation?
There is a somewhat efficient algorithm on page 36 of this book (Link), but the problem is, it's only efficient if you do the roulette selection only one or a few times. Is there any efficient way to do many roulette selections in parallel?
For one thing, it sounds like you are talking about unfitness scores if you want to "kill off" your selection (which is likely to be a keyboard with high score).
I see no need to maintain two arrays. I think the simplest way is to maintain a single array of scores, which you then iterate through to make a choice:
/* These will need to be populated at the outset */
int scores[100];
int totalScore;
for (gen = 0; gen < nGenerations; ++gen) {
/* Perform a selection and update */
int r = rand() % totalScore; /* HACK: using % introduces bias */
int t = 0;
for (i = 0; i < 100; ++i) {
t += scores[i];
if (r < t) {
/* Bingo! */
totalScore -= scores[i];
keyboards[i] = generate_new_keyboard_somehow();
scores[i] = score_keyboard(keyboards[i]);
totalScore += scores[i]; /* Now totalScore is correct again */
}
}
}
Each selection/update takes O(n) time for n keyboards.

Resources