Confusion about Transposition Tables (chess programming) - artificial-intelligence

I'm currently using transposition tables for move ordering. Using iterative deepening search, I store the minimax value of the previous iteration to order moves for the next iteration. That's all fine and good.
Here's my confusion:
If I find a certain position in my transposition table, then I use the previously calculated score for move ordering (from the previous iteration of iterative deepening). However, if this position's score is updated (after minimax is returned), and the position is found AGAIN in another subtree (the same iteration of iterative deepening) - I don't want to just use it for move ordering... I should be able to return the value, because the value has now been calculated for this iteration and is absolute.
Here's my question: Is it standard to have two transposition tables? One for the previous iteration, and one for the current iteration of iterative deepening. So I would first check the table for the current iteration, to see if the minimax value was calculated already, and simply return this value. If it's not in this table, I would use the table for the previous iteration, for move ordering. If it's in neither, then it's a new position I haven't seen before in this search.
Is this line of thinking correct, or is there a more efficient method?

Typically you'd want to not only store the last found minimax value of a state in your table, but also the depth limit you used in the iteration when you stored the state in the table. That way, when you look it up in the table later, you can tell whether it was last updated in a previous iteration or in the current iteration by comparing the depth limit stored in the table entry to your current depth limit. If those are equal, you know the table entry was last updated in the current iteration, and you can directly use the value stored in the table without any extra search

I agree with #Dennis_Soemers. You should save depth and maybe even alpha/beta bounds added to your transposition table. No, you should not need two tables.
Let's examine the Stockfish source code on the table.
https://github.com/official-stockfish/Stockfish/blob/master/src/tt.h
/// TTEntry struct is the 10 bytes transposition table entry, defined as below:
///
/// key 16 bit
/// move 16 bit
/// value 16 bit
/// eval value 16 bit
/// generation 6 bit
/// bound type 2 bit
/// depth 8 bit
The save function for the table is defined as:
void save(Key k, Value v, Bound b, Depth d, Move m, Value ev, uint8_t g)
Now, if you have two identical positions, from depth d-1 and d. You can do something like:
// My hash key is now position + depth
Key my_hash_key = k + d
You can easily examine the previous iteration and the current iteration:
Key previous_iter_key = my_position_key + d-1
probe(previous_iter_key ...)
Key current_iter_key = my_position_key + d
probe(current_iter_key ...)

Related

Support efficient access and insert where indices must shift upon insertion, s.t. A[k] = A'[k+1] for inserting at j and k>j?

I have a special case I'm trying to attack, where I have an initial set of r indices which correspond to positions of set bits in an n length bitvector. I would like to associate in some data structure A a weight w corresponding to the kth set bit, such that A[k] = w. However, I support insert operations on the bitvector such that setting a bit j, I also want to perform A.insert(j, w) while shifting the entries to the right so they still reflect the kth set bit.
The overall problem is harder to explain, but I require the order since k is actually the rank of the set bit, and I need rank/select operations to remain consistent
Since I need insert/access with shifting, maps won't help. The overall algorithm performs O(n) steps which require an insert and access, and for my data size I'd like to avoid an overall O(n2)-time/space that comes with O(n) insert (arrays) or O(n) access (lists).
Is there an obvious approach here? I'd like to keep the weights such that array style (kth element) is possible regardless of which key is used to balance the structure.

Algorithm for finding the lowest price to get through an array

I've been thinking a lot about the following problem:
We are given an array of n numbers. We start at the first index and our task is to get to the last index. Every move we can jump one or two steps forward and the number at the index we jump to represents the cost we need to pay for visiting that index. We need to find the cheapest way for getting to the end of the array.
For example if the array looks like this: [2,1,4,2,5] the cheapest way to get to the end is 10: we visit the indexes 1->2->4->5 and we have to pay 2+1+2+5 = 10 which is the cheapest possible way. Let f(i) be the cheapest price to get to the index i. This we can calculate easily in O(n) time with dynamic programming by realizing that f(i) = arr[i] + min(f(i-1),f(i-2))
But here's the twist:
The array gets updated several times and after every update we need to be able to tell in O(logn) time what is the cheapest way at the moment. Updating the array happens by telling the index which will be changed and the number it will be changed to. For example the update could be arr[2] = 7 changing our example array to [2,7,4,2,5]. Now the cheapest way would be 11.
Now how can we support these updates in O(logn) time? Any ideas?
Here's what I've come up with so far:
First I would create an array f for the dynamic programming like described before. I would store the content of this array in a segment tree s in the following way: s(i) = f(i) - f(i-1). This would allow me to update intervals of f (adding a constant to every value) in O(logn) time and ask for the values at a given index in O(logn) time. This would come handy since after some updates it often happens that all the values in f would need to be increased by a constant after some given index in f. So by asking for the value of s(n) in the segment tree after every update we would get the answer we need.
There are however different things that can happen after an update:
Only one value in f needs to be updated. For exampe if [2,1,4,2,5,4,9,3,7] gets changed to [2,1,9,2,5,4,9,3,7] only f(3) would need to be updated, since no cheapest way went through the 3. index anyway.
All the values in f after a given index need to be updated by a constant. This is what the segment tree is good for.
Every other value in f after a given index needs to be updated by a constant.
Something more random.
Alright, I managed to solve the problem all myself so I decided to share the solution with you. :)
I was on the right track with dynamic programming and segment tree, but i was feeding the segment tree in a wrong way in my previous attempts.
Here's how we can support the updates in O(logn) time:
The idea is to use a binary segment tree where the leaves of the tree represent the current array and every node stores 4 different values.
v1 = The lowest cost to get from the leftmost descendant to the rightmost descendant
v2 = The lowest cost to get from the leftmost descendant to the second rightmost descendant
v3 = The lowest cost to get from the second leftmost descendant to the rightmost descendant
v4 = The lowest cost to get from the second leftmost descendant to the second rightmost descendant
With descendants I mean the descendants of the node that are also leaves.
When updating the array we update the value at the leaf and then all its ancestors up to the root. Since at every node we already know all of these 4 values of its two children, we can calculate easily the new 4 values for the current parent node. Just to give an example: v1_current_node = min(v2_leftchild+v1_rightchild, v1_leftchild+v1_rightchild, v1_leftchild+v3_rightchild). All the other three values can be calculated in a similar way.
Since there are only O(logn) ancestors for every leaf, and all 4 values are calculated in O(1) time it takes only O(logn) time to update the entire tree.
Now that we know the 4 values for every node, we can in a similar way calculate the lowest cost from the first to the n:th node by using the nodes of the highest powers of 2 in our path in the tree that add up to n. For example if n = 11 we want to know the lowest cost from first to eleventh node and this can be done by using the information of the node that covers the leaves 1-8, the node that covers the leaves 9-10 and the leaf node 11. For all of those three nodes we know the 4 values described and we can in a similar way combine that information to figure out the answer. At very most we need to consider O(logn) nodes for doing this so that is not a problem.

How to design inserting to an infinite array

Problem statement
Imagine we have an infinite array, where we store integers. When n elements are in the array, only n first cells are used, the rest is empty.
I'm trying to come up with a data structure / algorithm that is capable of:
checking whether an element is stored
inserting a new element if it is not already stored
deleting an element if it is stored
Each operation has to be in O(sqrt(n)).
Approach 1
I've come across this site, where the following algorithm was presented:
The array is (virtually, imagine this) divided into subarrays. Their lengths are 1, 4, 9, 16, 25, 36, 49, etc. the last subarray is not a perfect square - it may not be filled entirely.
Assumption that, when we consider those subarrays as sets - they're in increasing order. So all elements of a heap that is further to the right are greater than any element from heaps on their left.
Each such subarray represents a binary heap. A max heap.
Lookup: go to first indexes of heaps (so again 1, 4, 9, 16, ...) and go as long as you find the first heap with its max (max is stored on those indexes) is greater than your number. Then check this subarray / heap.
Insert: once you do the lookup, insert the element to the heap where is should be. When the heap is full - take the greatest element and insert it to the next heap. And so on.
Unfortunately, this solution is O(sqrt(n) * log(n)).
How to make it pure O(sqrt(n))?
Idea 2
Since all the operations require the lookup to be performed, I imagine that inserting and deleting would both be O(1). Just a guess. And probably once inserting is done, deleting will be obvious.
Clarification
What does the infinite array mean?
Basically, you can store any number of elements in it. It is infinite. However, there are two restrictions. First - one cell can only store one element. Second - when the array stores currently n elements, only n first cells can be used.
What about the order?
It does not matter.
Have you considered a bi-parental heap (aka: BEAP)?
The heap maintains a height of sqrt(n), which means that insert, find, and remove all run in O(sqrt(n)) in the worst case.
These structures are described in Munro and Suwanda's 1980 paper Implicit data structures for fast search and update.
Create a linked list to a set of k arrays which represent hash tables.
Per the idea of the first site, let the hash tables be sized to contain 1, 4, 9, 16, 25, 36, 49, ... elements.
The data structure therefore contains N=k*(k+1)*(2*k+1)/6=O(k^3) (this is the result of a well-known summation formula for adding squares) elements with k hash tables.
You can then successively search each hash table for elements. The hash check, insert, and delete operations all work in O(1) time (assuming separate chaining so that deletions can be handled gracefully), and, since k<sqrt(N) (less than the cubic root, actually), this fulfills the time requirements of your algorithm.
If a hash table is full, add an additional one to the linked list. If a hash table is empty, remove it from the list (add it back in if necessary later). List insertion/deletion is O(1) for a doubly-linked list, so this does not affect the time complexity.
Note that this improves on other answers which suggest a straight-out hash table because rehashing will not be required as the data structure grows.
I think approach 1 works, I just think some of the math is wrong.
The number of sub arrays is not O(sqrt(n)) it's O(cuberoot(n))
So, you get O(log(n)*n^(1/3)) = O( (log(n) / n^(1/6)) * n^(1/2) ) and since lim(log(n) / n^(1/6)) = 0 we get O( (log(n) / n^(1/6)) * n^(1/2) ) < O(sqrt(n))
My CS is a bit rusty, so you'll have to double check this. Please let me know if I got this wrong.
The short answer is that fulfilling all of your requirements is impossible for the simple fact that an array is a representation of elements ordered by index; and if you want to keep the first n elements referenced by the first n indexes, as you say, any deletion can potentially require re-indexing (that is shifting elements up the array) on the order of O(n) operations.
(That said, ignoring deletion, this was my earlier proposal: Since your array is infinite, perhaps you won't mind if I bend one of the rules a little. Think of your array as similar to memory addresses in a computer, then build a balanced binary tree, consigning a block of array elements for each node (I'm not too experienced with trees but I believe you'll need a block of four elements, two for children, one for the value, and one for the height). The elements reserved for the children will simply contain the starting indexes in the array for the children blocks (nodes). You will use 4n = O(n) instead of strictly n space for the first n elements (bending your rule a little), and have orders of magnitude better complexity since the operations on a BST would be O(log2 n). (Instead of assigning blocks of elements, node construction could also be done by dividing each array element into sections of bits, of which you would likely have enough of in a theoretically infinite scenario.)
Since you are storing integers just make the array 4 billion ints wide. Then when you add an element increment the integer equal to the element by 1. You will be able to add, remove, checking for the element will take O(1) time. It's basically just a hash table without the hash.

Unique set sums from a list

I'm doing some algorithm practice and came across this.
I have a list/array, something like this: [1, 5, 4, 9, 10, 2, ...]
How would I go about returning finding unique sets of the list that have the same sum? For example, (5, 4) would equal (9) and (1, 5) would equal 6 and so on.
I am familiar with finding all the sets of a list, but the added trick to this is that the sets have to be unique as in if one index was used for one set, the same index cannot be used for the other set.
Any thoughts? Thanks.
Edit:
After thinking about this some more, here is what I have. I make a list of all possible sets, not worrying about unique. Then I get the min and max of the original superset array. I loop through the values from min to max, incrementing by 1 and check each set. Create a hashmap. If the sum of the set is equal to the value we are checking, we add that set to a list. Additionally, we set the value associated key in the hashmap to True if the the value of an index of a set has a key. Keep on checking each set with the hashmap conditional. Then we return the list of list that should only have the unique sets.
Make sense?
make a list of all possible sets - this is of course exponential in time and space. Here is a polynomial time solution (If I understood the problem):
iterate over the list, keeping two pointers - For every time the first one is incremented the second one runs from that index+1 to the end of the list.
For every combination of two elements, if their sum is lower than the required sum, their combined value is pushed into the end of the list(keeping the original indexes that comprised it).
once a combination of elements that equals exactly the sum is found, all the elements in the corresponding indexes are rmoved from the list, and that set is inserted into the solution vector of disjointed sets.
At the end of this iterative process you will have a complete set of disjointed sets - not necessarily the one with the most number of elements or the most number of groups, but one that no other set could be created from the remaining elements in the list.

Determine if more than half the keys in an array of size n are the same key in O(n) time? [duplicate]

This question already has answers here:
Linear Time Voting Algorithm. I don't get it
(5 answers)
Closed 10 years ago.
You have an array or list of keys of known size n. It is unknown how many unique keys there are in this list, could be as little as 0 and up to and including n. The keys are in no particular order and they really can't be, as these keys have no concept of greater than or less than, only equality or inequality. Now before you say hash map, here's one more condition that I think throws a wrench in that idea: The value of each key is private. The only information you can get about the key is whether or not it is equal to another key. So basically:
class key{
private:
T data;
...
public:
...
bool operator==(const key &k){return data==k.data}
bool operator!=(const key &k){return data!=k.data}
};
key array[n];
Now, is there an algorithm that can determine if more than half of the keys in the array are the same key in linear time? If not, what about O(n*log(n))? So for example say the array only has 3 unique keys. 60% of the array is populated with keys where key.data==foo, 30% key.data==bar and 10% key.data==derp. The algorithm only needs to determine that more than 50% of the keys are of the same kind (keys with data==foo) and also return one of those keys.
According to my professor it can be done in O(n) time but he says we only have to find one that can do it in O(n*log(n)) time.
If you can extract and hold any key for further comparisons, then the Boyer-Moore Majority Vote Algorithm is your friend.
If you don't want use BM algorithm, you could use the 2 following algorithm based on the same idea.
Algorithm a. At each run maintain a set S of M (a small part of N, for example 10) pairs element-counts, while going through array for each element:
1.1. If element E is in the set, increase count of the corresponding pair (E,count) -> (E, count+1)
1.2. If not, drop out element with minimal count and insert new pair (E,1)
If element has frequency F > 0.5 it will be in the set at the end of this procedure with probability (very roughly, actuallty much higher) 1 - (1-F)^M, at the second run calculate actual frequencies of elements in the set S.
Algorithm b. Take N series of length M of randomly picked elements of the array, select the most frequent element by any method and calculate frequency for it and middle frequency over series, the maximal error of frequency calculation would be
F (realFrequency) / sqrt(N). So if you get F_e* 1 - 1.0 /sqrt(N) ) > 0.5 then you find the most frequent element, if you get F_e(1 + 1.0/sqrt(N)) < 0.5 this element don't exists. F_e is estimated frequency.
One solution that comes in my mind is.. You will pick first element from this array, and traverse through the list and all the matching elements you put in a separate arraylist, Now you pick the second element from the original list and compare it with first if they are equal , leave this element and pick the next one. This could be a possible solution.

Resources