Infinite loop in Modified Value Iteration(MDP GridWorld) - artificial-intelligence

consider a simple GridWorld 3x4 with reward -0.04
[ ][ ][ ][+1]
[ ][W][ ][-1]
[ ][ ][ ][ ]
where W is a wall, +1/-1 are terminal states. An agent can move in any direction, but only 80% of the times he succeeds in going to the planned direction, 10% he goes right (relative to direction), 10% left.
In Policy Iteration algorithm, we first generate a random policy, let's say this policy gets generated:
[L][L][L][+1]
[L][W][L][-1]
[L][L][L][L ]
where L means left.
Now we run Modified value iteration algorithm until the values at neighbouring iterations don't differ much.
We initialize values at 0 (except for terminals states)
[0][0][0][+1]
[0][W][0][-1]
[0][0][0][0 ]
But here's what I don't get:
Since we use the formula 0.8*previousValueFromForwardState + 0.1*previousValueFromLeftState + 0.1*previousValueFromRightState + Reward to fill new states, that pretty much means that whatever is behind the policy direction at state won't change the value in that cell. Since only terminal states +1 and -1 can get the value iteration going and they get always ignored,
wouldn't that just create an infinite loop?
With each iteration, we would always be getting multiples of 0.04, the differences between iterations will always be constant (except for lower right corner, but it won't influence anything...)

Related

Interleaving array {a1,a2,....,an,b1,b2,...,bn} to {a1,b1,a2,b2,a3,b3} in O(n) time and O(1) space

I have to interleave a given array of the form
{a1,a2,....,an,b1,b2,...,bn}
as
{a1,b1,a2,b2,a3,b3}
in O(n) time and O(1) space.
Example:
Input - {1,2,3,4,5,6}
Output- {1,4,2,5,3,6}
This is the arrangement of elements by indices:
Initial Index Final Index
0 0
1 2
2 4
3 1
4 3
5 5
By observation after taking some examples, I found that ai (i<n/2) goes from index (i) to index (2i) & bi (i>=n/2) goes from index (i) to index (((i-n/2)*2)+1). You can verify this yourselves. Correct me if I am wrong.
However, I am not able to correctly apply this logic in code.
My pseudo code:
for (i = 0 ; i < n ; i++)
if(i < n/2)
swap(arr[i],arr[2*i]);
else
swap(arr[i],arr[((i-n/2)*2)+1]);
It's not working.
How can I write an algorithm to solve this problem?
Element bn is in the correct position already, so lets forget about it and only worry about the other N = 2n-1 elements. Notice that N is always odd.
Now the problem can be restated as "move the element at each position i to position 2i % N"
The item at position 0 doesn't move, so lets start at position 1.
If you start at position 1 and move it to position 2%N, you have to remember the item at position 2%N before you replace it. The the one from position 2%N goes to position 4%N, the one from 4%N goes to 8%N, etc., until you get back to position 1, where you can put the remaining item into the slot you left.
You are guaranteed to return to slot 1, because N is odd and multiplying by 2 mod an odd number is invertible. You are not guaranteed to cover all positions before you get back, though. The whole permutation will break into some number of cycles.
If you can start this process at one element from each cycle, then you will do the whole job. The trouble is figuring out which ones are done and which ones aren't, so you don't cover any cycle twice.
I don't think you can do this for arbitrary N in a way that meets your time and space constraints... BUT if N = 2x-1 for some x, then this problem is much easier, because each cycle includes exactly the cyclic shifts of some bit pattern. You can generate single representatives for each cycle (called cycle leaders) in constant time per index. (I'll describe the procedure in an appendix at the end)
Now we have the basis for a recursive algorithm that meets your constraints.
Given [a1...an,b1...bn]:
Find the largest x such that 2x <= 2n
Rotate the middle elements to create [a1...ax,b1...bx,ax+1...an,bx+1...bn]
Interleave the first part of the array in linear time using the above-described procedure, since it will have modulus 2x-1
Recurse to interleave the last part of the array.
Since the last part of the array we recurse on is guaranteed to be at most half the size of the original, we have this recurrence for the time complexity:
T(N) = O(N) + T(N/2)
= O(N)
And note that the recursion is a tail call, so you can do this in constant space.
Appendix: Generating cycle leaders for shifts mod 2x-1
A simple algorithm for doing this is given in a paper called "An algorithm for generating necklaces of beads in 2 colors" by Fredricksen and Kessler. You can get a PDF here: https://core.ac.uk/download/pdf/82148295.pdf
The implementation is easy. Start with x 0s, and repeatedly:
Set the lowest order 0 bit to 1. Let this be bit y
Copy the lower order bits starting from the top
The result is a cycle leader if x-y divides x
Repeat until you have all x 1s
For example, if x=8 and we're at 10011111, the lowest 0 is bit 5. We switch it to 1 and then copy the remainder from the top to give 10110110. 8-5=3, though, and 3 does not divide 8, so this one is not a cycle leader and we continue to the next.
The algorithm I'm going to propose is probably not o(n).
It's not based on swapping elements but on moving elements which probably could be O(1) if you have a list and not an array.
Given 2N elements, at each iteration (i) you take the element in position N/2 + i and move it to position 2*i
a1,a2,a3,...,an,b1,b2,b3,...,bn
| |
a1,b1,a2,a3,...,an,b2,b3,...,bn
| |
a1,b1,a2,b2,a3,...,an,b3,...,bn
| |
a1,b1,a2,b2,a3,b3,...,an,...,bn
and so on.
example with N = 4
1,2,3,4,5,6,7,8
1,5,2,3,4,6,7,8
1,5,2,6,3,4,7,8
1,5,2,6,3,7,4,8
One idea which is a little complex is supposing each location has the following value:
1, 3, 5, ..., 2n-1 | 2, 4, 6, ..., 2n
a1,a2, ..., an | b1, b2, ..., bn
Then using inline merging of two sorted arrays as explained in this article in O(n) time an O(1) space complexity. However, we need to manage this indexing during the process.
There is a practical linear time* in-place algorithm described in this question. Pseudocode and C code are included.
It involves swapping the first 1/2 of the items into the correct place, then unscrambling the permutation of the 1/4 of the items that got moved, then repeating for the remaining 1/2 array.
Unscrambling the permutation uses the fact that left items move into the right side with an alternating "add to end, swap oldest" pattern. We can find the i'th index in this permutation with this this rule:
For even i, the end was at i/2.
For odd i, the oldest was added to the end at step (i-1)/2
*The number of data moves is definitely O(N). The question asks for the time complexity of the unscramble index calculation. I believe it is no worse than O(lg lg N).

Changing the values of array by the distance of the indexes (c)

I'm having hard time with this one:
I need to write a function in C that recieving a binary array and his size, and the function should calculate and replace the current values with the distance (by indexes) of each 1 to the closest 0.
for example: if the function recieve that array {1,1,0,1,1,1,0,1} then the new values of the array should be {2,1,0,1,2,1,0,1}. It is known that the input has atleast 1 zero.
So the first step I tought about was to locate pair of zeros (or just 1 if there is only 1) and set them as 2 indexes (z1, z2). Then I set another index i
that check everytime which zero is the closest to him (absolute value) and then the diffrence between i and z1 or z2 would be the new value.
I have the plan but things are not going exactly as I planned. Basicly I deleted the code (it wasn't good anyway) so I would appreciate any help. thanks!
This problem is based on two things
Keep an array left[i] which has the distance of nearest 0 from index i from left to right.
Keep an array right[i] which has the distance of nearest 0 from index i from right to left.
Both can be calculate in single loop iteration. O(n).
Then for each position get the minimum value of left[i] and right[i]. That will be the answer for 1 staying in position i.
Overall the time complexity is O(n).

Find an element in an array, but the element can jump

There is an array where all but one of the cells are 0, and we want to find the index of that single non-zero cell. The problem is, every time that you check for a cell in this array, that non-zero element will do one of the following:
move forward by 1
move backward by 1
stay where it is.
For example, if that element is currently at position 10, and I check what is in arr[5], then the element may be at position 9, 10 or 11 after I checked arr[5].
We only need to find the position where the element is currently at, not where it started at (which is impossible).
The hard part is, if we write a for loop, there really is no way to know if the element is currently in front of you, or behind you.
Some more context if it helps:
The interviewer did give a hint which is maybe I should move my pointer back after checking x-number of cells. The problem is, when should I move back, and by how many slots?
While "thinking out loud", I started saying a bunch of common approaches hoping that something would hit. When I said recursion, the interviewer did say "recursion is a good start". I don't know recursion really is the right approach, because I don't see how I can do recursion and #1 at the same time.
The interviewer said this problem can't be solved in O(n^2). So we are looking at at least O(n^3), or maybe even exponential.
Tl;dr: Your best bet is to keep checking each even index in the array in turn, wrapping around as many times as necessary until you find your target. On average you will stumble upon your target in the middle of your second pass.
First off, as many have already said, it is indeed impossible to ensure you will find your target element in any given amount of time. If the element knows where your next sample will be, it can always place itself somewhere else just in time. The best you can do is to sample the array in a way that minimizes the expected number of accesses - and because after each sample you learn nothing except if you were successful or not and a success means you stop sampling, an optimal strategy can be described simply as a sequence of indexes that should be checked, dependent only on the size of the array you're looking through. We can test each strategy in turn via automated means to see how well they perform. The results will depend on the specifics of the problem, so let's make some assumptions:
The question doesn't specify the starting position our target. Let us assume that the starting position is chosen uniformly from across the entire array.
The question doesn't specify the probability our target moves. For simplicity let's say it's independent on parameters such as the current position in the array, time passed and the history of samples. Using the probability 1/3 for each option gives us the least information, so let's use that.
Let us test our algorithms on an array of 100 101 elements. Also, let us test each algorithm one million times, just to be reasonably sure about its average case behavior.
The algorithms I've tested are:
Random sampling: after each attempt we forget where we were looking and choose an entirely new index at random. Each sample has an independent 1/n chance of succeeding, so we expect to take n samples on average. This is our control.
Sweep: try each position in sequence until our target is found. If our target wasn't moving, this would take n/2 samples on average. Our target is moving, however, so we may miss it on our first sweep.
Slow sweep: the same, except we test each position several times before moving on. Proposed by Patrick Trentin with a slowdown factor of 30x, tested with a slowdown factor of 2x.
Fast sweep: the opposite of slow sweep. After the first sample we skip (k-1) cells before testing the next one. The first pass starts at ary[0], the next at ary[1] and so on. Tested with each speed up factor (k) from 2 to 5.
Left-right sweep: First we check each index in turn from left to right, then each index from right to left. This algorithm would be guaranteed to find our target if it was always moving (which it isn't).
Smart greedy: Proposed by Aziuth. The idea behind this algorithm is that we track each cell probability of holding our target, then always sampling the cell with the highest probability. On one hand, this algorithm is relatively complex, on the other hand it sounds like it should give us the optimal results.
Results:
The results are shown as [average] ± [standard derivation].
Random sampling: 100.889145 ± 100.318212
At this point I have realised a fencepost error in my code. Good thing we have a control sample. This also establishes that we have in the ballpark of two or three digits of useful precision (sqrt #samples), which is in line with other tests of this type.
Sweep: 100.327030 ± 91.210692
The chance of our target squeezing through the net well counteracts the effect of the target taking n/2 time on average to reach the net. The algorithm doesn't really fare any better than a random sample on average, but it's more consistent in its performance and it isn't hard to implement either.
slow sweep (x0.5): 128.272588 ± 99.003681
While the slow movement of our net means our target will probably get caught in the net during the first sweep and won't need a second sweep, it also means the first sweep takes twice as long. All in all, relying on the target moving onto us seems a little inefficient.
fast sweep x2: 75.981733 ± 72.620600
fast sweep x3: 84.576265 ± 83.117648
fast sweep x4: 88.811068 ± 87.676049
fast sweep x5: 91.264716 ± 90.337139
That's... a little surprising at first. While skipping every other step means we complete each lap in twice as many turns, each lap also has a reduced chance of actually encountering the target. A nicer view is to compare Sweep and FastSweep in broom-space: rotate each sample so that the index being sampled is always at 0 and the target drifts towards the left a bit faster. In Sweep, the target moves at 0, 1 or 2 speed each step. A quick parallel with the Fibonacci base tells us that the target should hit the broom/net around 62% of the time. If it misses, it takes another 100 turns to come back. In FastSweep, the target moves at 1, 2 or 3 speed each step meaning it misses more often, but it also takes half as much time to retry. Since the retry time drops more than the hit rate, it is advantageous to use FastSweep over Sweep.
Left-right sweep: 100.572156 ± 91.503060
Mostly acts like an ordinary sweep, and its score and standard derivation reflect that. Not too surprising a result.
Aziuth's smart greedy: 87.982552 ± 85.649941
At this point I have to admit a fault in my code: this algorithm is heavily dependent on its initial behavior (which is unspecified by Aziuth and was chosen to be randomised in my tests). But performance concerns meant that this algorithm will always choose the same randomized order each time. The results are then characteristic of that randomisation rather than of the algorithm as a whole.
Always picking the most likely spot should find our target as fast as possible, right? Unfortunately, this complex algorithm barely competes with Sweep 3x. Why? I realise this is just speculation, but let us peek at the sequence Smart Greedy actually generates: During the first pass, each cell has equal probability of containing the target, so the algorithm has to choose. If it chooses randomly, it could pick up in the ballpark of 20% of cells before the dips in probability reach all of them. Afterwards the landscape is mostly smooth where the array hasn't been sampled recently, so the algorithm eventually stops sweeping and starts jumping around randomly. The real problem is that the algorithm is too greedy and doesn't really care about herding the target so it could pick at the target more easily.
Nevertheless, this complex algorithm does fare better than both simple Sweep and a random sampler. it still can't, however, compete with the simplicity and surprising efficiency of FastSweep. Repeated tests have shown that the initial randomisation could swing the efficiency anywhere between 80% run time (20% speedup) and 90% run time (10% speedup).
Finally, here's the code that was used to generate the results:
class WalkSim
attr_reader :limit, :current, :time, :p_stay
def initialize limit, p_stay
#p_stay = p_stay
#limit = limit
#current = rand (limit + 1)
#time = 0
end
def poke n
r = n == #current
#current += (rand(2) == 1 ? 1 : -1) if rand > #p_stay
#current = [0, #current, #limit].sort[1]
#time += 1
r
end
def WalkSim.bench limit, p_stay, runs
histogram = Hash.new{0}
runs.times do
sim = WalkSim.new limit, p_stay
gen = yield
nil until sim.poke gen.next
histogram[sim.time] += 1
end
histogram.to_a.sort
end
end
class Array; def sum; reduce 0, :+; end; end
def stats histogram
count = histogram.map{|k,v|v}.sum.to_f
avg = histogram.map{|k,v|k*v}.sum / count
variance = histogram.map{|k,v|(k-avg)**2*v}.sum / (count - 1)
{avg: avg, stddev: variance ** 0.5}
end
RUNS = 1_000_000
PSTAY = 1.0/3
LIMIT = 100
puts "random sampling"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{y.yield rand (LIMIT + 1)}}
}
puts "sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{0.upto(LIMIT){|i|y.yield i}}}
}
puts "x0.5 speed sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{0.upto(LIMIT){|i|2.times{y.yield i}}}}
}
(2..5).each do |speed|
puts "x#{speed} speed sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{speed.times{|off|off.step(LIMIT, speed){|i|y.yield i}}}}
}
end
puts "sweep LR"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{
0.upto(LIMIT){|i|y.yield i}
LIMIT.downto(0){|i|y.yield i}
}}
}
$sg_gen = Enumerator.new do |y|
probs = Array.new(LIMIT + 1){1.0 / (LIMIT + 1)}
loop do
ix = probs.each_with_index.map{|v,i|[v,rand,i]}.max.last
probs[ix] = 0
probs = [probs[0] * (1 + PSTAY)/2 + probs[1] * (1 - PSTAY)/2,
*probs.each_cons(3).map{|a, b, c| (a + c) / 2 * (1 - PSTAY) + b * PSTAY},
probs[-1] * (1 + PSTAY)/2 + probs[-2] * (1 - PSTAY)/2]
y.yield ix
end
end
$sg_cache = []
def sg_enum; Enumerator.new{|y| $sg_cache.each{|n| y.yield n}; $sg_gen.each{|n| $sg_cache.push n; y.yield n}}; end
puts "smart greedy"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {sg_enum}
no forget everything about loops.
copy this array to another array and then check what cells are now non-zero. for example if your main array is mainArray[] you can use:
int temp[sizeOfMainArray]
int counter = 0;
while(counter < sizeOfArray)
{
temp[counter] == mainArray[counter];
}
//then check what is non-zero in copied array
counter = 0;
while(counter < sizeOfArray)
{
if(temp[counter] != 0)
{
std::cout<<"I Found It!!!";
}
}//end of while
One approach perhaps :
i - Have four index variables f,f1,l,l1. f is pointing at 0,f1 at 1, l is pointing at n-1 (end of the array) and l1 at n-2 (second last element)
ii - Check the elements at f1 and l1 - are any of them non zero ? If so stop. If not, check elements at f and l (to see if the element has jumped back 1).
iii - If f and l are still zero, increment the indexes and repeat step ii. Stop when f1 > l1
Iff an equality check against an array index makes the non-zero element jump.
Why not think of a way where we don't really require an equality check with an array index?
int check = 0;
for(int i = 0 ; i < arr.length ; i++) {
check |= arr[i];
if(check != 0)
break;
}
Orrr. Maybe you can keep reading arr[mid]. The non-zero element will end up there. Some day. Reasoning: Patrick Trentin seems to have put it in his answer (somewhat, its not really that, but you'll get an idea).
If you have some information about the array, maybe we can come up with a niftier approach.
Ignoring the trivial case where the 1 is in the first cell of the array if you iterate through the array testing each element in turn you must eventually get to the position i where the 1 is in cell i+2. So when you read cell i+1 one of three things is going to happen.
The 1 stays where it is, you're going to find it next time you look
The 1 moves away from you, your back to the starting position with the 1 at i+2 next time
The 1 moves to cell you've just checked, it dodged your scan
Re-reading the i+1 cell will find the 1 in case 3 but just give it another chance to move in cases 1 and 2 so a strategy based on re-reading won't work.
My option would therefore to adopt a brute force approach, if I keep scanning the array then I'm going to hit case 1 at some point and find the elusive 1.
Assumptions:
The array is no true array. This is obvious given the problem. We got some class that behaves somewhat like an array.
The array is mostly hidden. The only public operations are [] and size().
The array is obfuscated. We cannot get any information by retrieving it's address and then analyze the memory at that position. Even if we iterate through the whole memory of our system, we can't do tricks due to some advanced cryptographic means.
Every field of the array has the same probability to be the first field that hosts the one.
We know the probabilities of how the one changes it's position when triggered.
Probability controlled algorithm:
Introduce another array of same size, the probability array (over double).
This array is initialized with all fields to be 1/size.
Every time we use [] on the base array, the probability array changes in this way:
The accessed position is set to zero (did not contain the one)
An entry becomes the sum of it's neighbors times the probability of that neighbor to jump to the entries position. (prob_array_next_it[i] = prob_array_last_it[i-1]*prob_jump_to_right + prob_array_last_it[i+1]*prob_jump_to_left + prob_array_last_it[i]*prob_dont_jump, different for i=0 and i=size-1 of course)
The probability array is normalized (setting one entry to zero set the sum of the probabilities to below one)
The algorithm accesses the field with the highest probability (chooses amongst those that have)
It might be able to optimize this by controlling the flow of probabilities, but that needs to be based on the wandering event and might require some research.
No algorithm that tries to solve this problem is guaranteed to terminate after some time. For a complexity, we would analyze the average case.
Example:
Jump probabilities are 1/3, nothing happens if trying to jump out of bounds
Initialize:
Hidden array: 0 0 1 0 0 0 0 0
Probability array: 1/8 1/8 1/8 1/8 1/8
1/8 1/8 1/8
First iteration: try [0] -> failure
Hidden array: 0 0 1 0 0 0 0 0 (no jump)
Probability array step 1: 0
1/8 1/8 1/8 1/8 1/8 1/8 1/8
Probability array step 2: 1/24 2/24 1/8
1/8 1/8 1/8 1/8 1/8
Probability array step 2: same normalized (whole array * 8/7):
1/21 2/21 1/7
1/7 1/7 1/7 1/7 1/7
Second iteration: try [2] as 1/7 is the maximum and this is the first field with 1/7 -> success (example should be clear by now, of course this might not work so fast on another example, had no interest of doing this for a lot of iterations since the probabilities would get cumbersome to compute by hand, would need to implement it. Note that if the one jumped to the left, we wouldn't have checked it so fast, even if it remained there for some time)

How to calculate distance between 2 points in a 2D matrix

I am both new to this website and new to C. I need a program to find the average 'jumps' it takes from all points.
The idea is this: Find "jump" distance from 1 to 2, 1 to 3, 1 to 4 ... 1 to 9, or find 2 to 1, 2 to 3, 2 to 4 2 to 5 etc.
Doing them on the first row is simple, just (2-1) or (3-1) and you get the correct number. But if I want to find the distance between 1 and 4 or 1 to 8 then I have absolutely no idea.
The dimensions of the matrix should potentially be changeable. But I just want help with a 3x3 matrix.
Anyone could show me how to find it?
Jump means vertical or horizontal move from one point to another. from 1 to 2 = 1, from 1 to 9 = 4 (shortest path only)
The definition of "distance" on this kind of problems is always tricky.
Imagine that the points are marks on a field, and you can freely walk all over it. Then, you could take any path from one point to the other. The shortest route then would be a straight line; its length would be the length of the vector that joins the points, which happens to be the difference vector among two points' positions. This length can be computed with the help of Pythagora's theorem: dist = sqrt((x2-x1)^2 + (y2-y1)^2). This is known as the Euclidian distance between the points.
Now imagine that you are in a city, and each point is a building. You can't walk over a building, so the only options are to go either up/down or left/right. Then, the shortest distance is given by the sum of the components of the difference vector; which is the mathematical way of saying that "go down 2 blocks and then one block to the left" means walking 3 blocks' distance: dist = abs(x2-x1) + abs(y2-y1). This is known as the Manhattan distance between the points.
In your problem, however, it looks like the only possible move is to jump to an adjacent point, in a single step, diagonals allowed. Then the problem gets a bit trickier, because the path is very irregular. You need some Graph Theory here, very useful when modeling problems with linked elements, or "nodes". Each point would be a node, connected to their neighbors, and the problem would be to find the shortest path to another given point. If jumps had different weights (for instance, is jumping in diagonal was harder), an easy way to solve this is would be with the Dijkstra's Algorithm; more details on implementation at Wikipedia.
If the cost is always the same, then the problem is reduced to counting the number of jumps in a Breadth-First Search of the destination point from the source.
Let's define the 'jump' distance : "the number of hops required to reach from Point A [Ax,Ay] to Point B [Bx,By]."
Now there can be two ways in which the hops are allowed :
Horizontally/VerticallyIn this case, you can go up/down or left/right. As you have to travel X axis and Y axis independently, your ans is:jumpDistance = abs(Bx - Ax) + abs(By - Ay);
Horizontally/Vertically and also Diagonally
In this case, you can go up/down or left/right and diagonally as well. How it differs from Case 1 is that now you have the ability to change your X axis and Y axis together at the cost of only one jump . Your answer now is:jumpDistance = Max(abs(Bx - Ax),abs(By - Ay));
What is the definition of "jump-distance" ?
If you mean how many jumps a man needs to go from square M to N, if he can only jumps vertically and horizontally, one possibility can:
dist = abs(x2 - x1) + abs(y2 - y1);
For example jump-distance between 1 and 9 is: |3-1|+|3-1| = 4
There are two ways to calculate jump distance.
1) when only horizontal and vertical movements are allowed, in that case all you need to do is form a rectangle in between the two points and calculate the length of two adjacent side. Like if you want to move from 1 to 9 then first move from 1 to 3 and then move from 3 to 9. (Convert it to code)
2) when movements in all eight directions are allowed, things get tricky. Like if you want to move from 1 to 6 suppose. What you'll need to do is you'll have to more from 1 to 5. And then from 5 to 6. The way of doing it in code is to find the maximum in between the difference in x and y coordinates. In this example, in x coordinate, difference is 2 (3-1) and in y coordinate, difference is 1 (2-1). So the maximum of this is 2. So here's the answer. (Convert to code)

Fastest way to generate next move in TIC TAC TOE game

In a X's and 0's game (i.e. TIC TAC TOE(3X3)) if you write a program for this give a fast way to generate the moves by the computer. I mean this should be the fastest way possible.
All I could think of at that time is to store all the board configurations in a hash so that getting best position of move is a O(1) operation.
Each board square can be either 0,1, or 2.
0 represents empty square. 1 represents a X & 2 represents 0.
So every square can be filled with either of the three. There are approx 3^9 board configurations.
In simple, we need a hash of size 3^9. For hashing,we can go for base 3 representation. Means each number in base 3 will be 9 digits long each digit corresponding to each square.
To search in hash, we need to find the decimal representation of this 9 digit number.
Now, each square can be associated with row number & column number. In order to identify each square uniquely, we can again make use of base 3 representation.
say SQ[1][2] will be 12 in base 3 which is equivalent to 5 in decimal.
Thus, we have effectively designed an algorithm which is fast enough to calculate the next move in O(1).
But, the interviewer insisted in reducing the space complexity as DOS system doesn't have that much amount of memory.
How can we reduce the space complexity with no change in time complexity?
Please help me so that I do not miss such type of questions in the future.
For a small game like this, a different way of going about this is to pre-compute and store the potential game tree in a table.
Looking first at the situation where the human starts, she obvious has 9 different start positions. A game-play table would contain 9 entry points, then, each pointing to the correct response - you could use the guidelines outlined in this question to calculate the responses - as well as the next level table of human responses. This time there are only 7 possible responses. For the next level there'll be 5, then 3, then just 1. In total, there will be 9 * 7 * 5 * 3 * 1 = 945 entries in the table, but that can be compressed by realizing symmetries, i.e. rotations and flipped colors.
Of course, the situation where the computer starts is similar in principle but the table is actually smaller because the computer will probably want to start by playing the middle piece - or at least avoid certain spots.
There are not 3^9 different board configurations. Just as tomdemuyt says, there are 9! different board configurations, i.e., 9 choices at first, 8 choices next, 7 choices after that, and so on.
Also, we can further reduce the space complexity by accounting for symmetry. For example, for the first move, placing an X in [0,0] is the same as placing it in [0,2], [2,0], and [2,2]. I believe this reduces 9! to 9!/4
We can even reduce that by accounting for which board configurations were winning before the final move (the 9th move). I don't know the number, but a detailed explanation can be found on the Stack Overflow cousin http://en.wikipedia.org/wiki/Tic-tac-toe
The assumption of 3^9 is wrong. This would include for example a board that only has X which is impossible as both players place each turn an X or an O.
My initial thought was that there are (9*8*7*6*5*4*3*2) * 2 possibilities.
First player has 9 choices, second player has 8 choices, first player has 7 etc.
I put * 2 because you might have different best moves depending who starts.
Now 3^9 is 19863 and 9! is 362880, so clearly this is not the superior solution, a lot of 'different scenarios' actually will end up looking exactly the same. Still, the base idea that many of the 19863 board setups are invalid remain.
This piece of code which probably could be replaced by a simple formula tells me that this is the count of positions you want to have a move for.
<script>
a = permuteString( "X........" ); document.write( Object.keys(a).length + "<br>" );console.log( a );
a = permuteString( "XO......." ); document.write( Object.keys(a).length + "<br>" );console.log( a );
a = permuteString( "XOX......" ); document.write( Object.keys(a).length + "<br>" );console.log( a );
a = permuteString( "XOXO....." ); document.write( Object.keys(a).length + "<br>" );console.log( a );
a = permuteString( "XOXOX...." ); document.write( Object.keys(a).length + "<br>" );console.log( a );
a = permuteString( "XOXOXO..." ); document.write( Object.keys(a).length + "<br>" );console.log( a );
a = permuteString( "XOXOXOX.." ); document.write( Object.keys(a).length + "<br>" );console.log( a );
//Subset of the Array.prototype.slice() functionality for a string
function spliceString( s , i )
{
var a = s.split("");
a.splice( i , 1 );
return a.join("");
}
//Permute the possibilities, throw away equivalencies
function permuteString( s )
{
//Holds result
var result = {};
//Sanity
if( s.length < 2 ) return [];
//The atomic case, if AB is given return { AB : true , BA : true }
if( s.length == 2 )
{
result[s] = true;
result[s.charAt(1)+s.charAt(0)] = true;
return result;
}
//Enumerate
for( var head = 0 ; head < s.length ; head++ )
{
var o = permuteString( spliceString( s , head ) );
for ( key in o )
result[ s.charAt( head ) + key ] = true;
}
return result;
}
</script>
This gives the following numbers:
1st move : 9
2nd move : 72
3rd move : 252
4th move : 756
5th move : 1260
6th move : 1680
7th move : 1260
So in total 5289 moves, this is without even checking for already finished games or symmetry.
These numbers allow you to lookup a move through an array, you can generate this array yourself by looping over all possible games.
T.
The game of Tic Tac Toe is sufficiently simple that optimal algorithm may be implemented by a machine built from Tinker Toys (a brand of sticks and fasteners). Since the level of hardware complexity encapsulated by such a construction is below that of a typical 1970's microprocessor, the time required to find out what moves have been made would in most cases exceed the time required to figure out the next move. Probably the simplest approach would be have a table which, given the presence or absence of markers of a given player (2^9, or 512 entries), would indicate what squares would turn two-in-a-rows into three-in-a-rows. Start by doing a lookup with the pieces owned by the player on move; if any square which would complete a three-in-a-row is not taken by the opponent, take it. Otherwise look up the opponent's combination of pieces; any square it turns up that isn't already occupied must be taken. Otherwise, if the center is available, take it; if only the center is taken, take a corner. Otherwise take an edge.
It might be more interesting to open up your question to 4x4x4 Tic Tac Toe, since that represents a sufficient level of complexity that 1970's-era computer implementations would often take many seconds per move. While today's computers are thousands of times faster than e.g. the Atari 2600, the level of computation at least gets beyond trivial.
If one extends the game to 4x4x4, there will be many possibilities for trading off speed, RAM, and code space. Unlike the original game which has 8 winning lines, the 4x4x4 version has (IIRC) 76. If one keeps track of each line as being in one of 8 states [ten if one counts wins], and for each vacant square one keeps track of how many of the winning lines that pass through it are in what states, it should be possible to formulate some pretty fast heuristics based upon that information. It would probably be necessary to use an exhaustive search algorithm to ensure that heuristics would in fact win, but once the heuristics were validated they should be able to run much faster than would an exhaustive search.

Resources