I have a 2D array of a crossword puzzle:
0 1 2 3 4
0 t h i s q
1 m i l e u
2 a b y w i
3 n b y s c
4 l o o t k
And given a list of words to find: this, quick, loot, mile, by, man
Objective:
Find words in the puzzle in any direction (horizontal, vertical, diagonal)
Output:
Is to list all the points where the word is located.
for example, this: (0,0) , (0, 1), (0, 2), (0,3)
I'm having trouble getting started on this.
Something like this. Might require some fixes but in general:
find_word(x,y,word,index,inc_x,inc_y):
if x,y is not on board return false
if letter[x,y] != word[index] return false
if it is last letter return true
call find_word(x+inc_x,y+inc_y,word,index+1,inc_x,inc_y)
foreach x:
foreach y
foreach word
for x_inc in -1,0,1
for y_inc in -1,0,1
if x_inc + y_inc == 0 continue
find_word(x,y,word,0,x_inc,y_inc)
If you want to make if faster than make a tree of words and you can remove cycle on words.
For example imagine you have: abc, aab, aac.
a
a
b
c
b
Now imagine you have found a letter a.
Call find_word and get letter d for ex. There's no d in tree of a, so no word can be found.
Call find_word and get letter a for ex. Now narrow search of words to subtree of a.
Thus you can go from w*h*num_words*max_length only to w*h*max_length.
Related
This was an interview question.
I was given an array of n+1 integers from the range [1,n]. The property of the array is that it has k (k>=1) duplicates, and each duplicate can appear more than twice. The task was to find an element of the array that occurs more than once in the best possible time and space complexity.
After significant struggling, I proudly came up with O(nlogn) solution that takes O(1) space. My idea was to divide range [1,n-1] into two halves and determine which of two halves contains more elements from the input array (I was using Pigeonhole principle). The algorithm continues recursively until it reaches the interval [X,X] where X occurs twice and that is a duplicate.
The interviewer was satisfied, but then he told me that there exists O(n) solution with constant space. He generously offered few hints (something related to permutations?), but I had no idea how to come up with such solution. Assuming that he wasn't lying, can anyone offer guidelines? I have searched SO and found few (easier) variations of this problem, but not this specific one. Thank you.
EDIT: In order to make things even more complicated, interviewer mentioned that the input array should not be modified.
Take the very last element (x).
Save the element at position x (y).
If x == y you found a duplicate.
Overwrite position x with x.
Assign x = y and continue with step 2.
You are basically sorting the array, it is possible because you know where the element has to be inserted. O(1) extra space and O(n) time complexity. You just have to be careful with the indices, for simplicity I assumed first index is 1 here (not 0) so we don't have to do +1 or -1.
Edit: without modifying the input array
This algorithm is based on the idea that we have to find the entry point of the permutation cycle, then we also found a duplicate (again 1-based array for simplicity):
Example:
2 3 4 1 5 4 6 7 8
Entry: 8 7 6
Permutation cycle: 4 1 2 3
As we can see the duplicate (4) is the first number of the cycle.
Finding the permutation cycle
x = last element
x = element at position x
repeat step 2. n times (in total), this guarantees that we entered the cycle
Measuring the cycle length
a = last x from above, b = last x from above, counter c = 0
a = element at position a, b = elment at position b, b = element at position b, c++ (so we make 2 steps forward with b and 1 step forward in the cycle with a)
if a == b the cycle length is c, otherwise continue with step 2.
Finding the entry point to the cycle
x = last element
x = element at position x
repeat step 2. c times (in total)
y = last element
if x == y then x is a solution (x made one full cycle and y is just about to enter the cycle)
x = element at position x, y = element at position y
repeat steps 5. and 6. until a solution was found.
The 3 major steps are all O(n) and sequential therefore the overall complexity is also O(n) and the space complexity is O(1).
Example from above:
x takes the following values: 8 7 6 4 1 2 3 4 1 2
a takes the following values: 2 3 4 1 2
b takes the following values: 2 4 2 4 2
therefore c = 4 (yes there are 5 numbers but c is only increased when making steps, not initially)
x takes the following values: 8 7 6 4 | 1 2 3 4
y takes the following values: | 8 7 6 4
x == y == 4 in the end and this is a solution!
Example 2 as requested in the comments: 3 1 4 6 1 2 5
Entering cycle: 5 1 3 4 6 2 1 3
Measuring cycle length:
a: 3 4 6 2 1 3
b: 3 6 1 4 2 3
c = 5
Finding the entry point:
x: 5 1 3 4 6 | 2 1
y: | 5 1
x == y == 1 is a solution
Here is a possible implementation:
function checkDuplicate(arr) {
console.log(arr.join(", "));
let len = arr.length
,pos = 0
,done = 0
,cur = arr[0]
;
while (done < len) {
if (pos === cur) {
cur = arr[++pos];
} else {
pos = cur;
if (arr[pos] === cur) {
console.log(`> duplicate is ${cur}`);
return cur;
}
cur = arr[pos];
}
done++;
}
console.log("> no duplicate");
return -1;
}
for (t of [
[0, 1, 2, 3]
,[0, 1, 2, 1]
,[1, 0, 2, 3]
,[1, 1, 0, 2, 4]
]) checkDuplicate(t);
It is basically the solution proposed by #maraca (typed too slowly!) It has constant space requirements (for the local variables), but apart from that only uses the original array for its storage. It should be O(n) in the worst case, because as soon as a duplicate is found, the process terminates.
If you are allowed to non-destructively modify the input vector, then it is pretty easy. Suppose we can "flag" an element in the input by negating it (which is obviously reversible). In that case, we can proceed as follows:
Note: The following assume that the vector is indexed starting at 1. Since it is probably indexed starting at 0 (in most languages), you can implement "Flag item at index i" with "Negate the item at index i-1".
Set i to 0 and do the following loop:
Increment i until item i is unflagged.
Set j to i and do the following loop:
Set j to vector[j].
if the item at j is flagged, j is a duplicate. Terminate both loops.
Flag the item at j.
If j != i, continue the inner loop.
Traverse the vector setting each element to its absolute value (i.e. unflag everything to restore the vector).
It depends what tools are you(your app) can use. Currently a lot of frameworks/libraries exists. For exmaple in case of C++ standart you can use std::map<> ,as maraca mentioned.
Or if you have time you can made your own implementation of binary tree, but you need to keep in mind that insert of elements differs in comarison with usual array. In this case you can optimise search of duplicates as it possible in your particular case.
binary tree expl. ref:
https://www.wikiwand.com/en/Binary_tree
I have a 2D array square size.
such as :
(3x3) (4x4)
1 2 3 or 1 2 3 4
8 9 4 12 13 14 5
7 6 5 11 16 15 6
10 9 8 7
I am trying to find a solution to get by giving a value and the array size the Y, X position of the 2D array.
Exemple:
>> find_y_x_in_snail(3, 4)
1, 2
# in a 3x3 array, search value 4
return y=1 x=2
the only idea i have to create the snail in a 2D array and return the position.. not that great.
I found the opposite algorithm here (first exemple)
Any idea ?
You could use this function:
def find_y_x_in_snail(n, v):
r = 0
span = n
while v > span:
v -= span
r += 1
span -= r%2
d, m = divmod(r,4);
c = n-1-d
return [d, d+v, c, c-v][m], [d+v-1, c, c-v, d][m] # y, x
Explanation
r is the number of corners the "snake" needs to take to get to the targetted value.
span is the number of values in the current, straight segment of the snake, i.e. it starts with n, and decreases after the next corner, and again every second next corner.
d is the distance the snake has from the nearest side of the matrix, i.e. the "winding" level.
m indicates which of the 4 sides the segment -- containing the targetted value -- is at:
0: up
1: right
2: down
3: left
Depending on m a value is taken from a list with 4 expressions, each one tailored for the corresponding side: it defines the y coordinate. A similar method is applied (but with different expressions) for x.
Let us assume that we have 2 sorted arrays A and B of integers and a given interval [L,M] . If x is an element of A and y an element of B ,our task is to find all pairs of (x,y) that hold the following property: L<=y-x<=M.
Which is the most suitable algorithm for that purpose?
So far ,I have considered the following solution:
Brute force. Check the difference of all possible pairs of elements with a double loop .Complexity O(n^2).
A slightly different version of the previous solution is to make use of the fact that arrays are sorted by not checking the elements of A ,once difference gets out of interval .Complexity would still be O(n^2) but hopefully our program would run faster at an average case.
However ,I believe that O(n^2) is not optimal .Is there an algorithm with better complexity?
Here is a solution.
Have a pointer at the beginning of each array say i for array A and j for array B.
Calculate the difference between B[j] and A[i].
If it is less than L, increment the pointer in array B[], i.e increment j by 1
If it is more than M, increment i, i.e pointer of A.
If the difference is in between, then do the following:
search for the position of an element whose value is B[j]-A[i]-L or the nearest
element whose value is lesser than (B[j]-A[i])-L in array A. This
takes O(logN) time. Say the position is p. Increment the count of
(x,y) pairs by p-i+1
Increment only pointer j
My solution only counts the number of possible (x,y) pairs in O(NlogN) time
For A=[1,2,3] and B=[10,12,15] and L=12 and M=14, answer is 3.
Hope this helps. I leave it up to you, to implement the solution
Edit: Enumerating all the possible (x,y) pairs would take O(N^2) worst case time. We will be able to return the count of such pairs (x,y) in O(NlogN) time. Sorry for not clarifying it earlier.
Edit2: I am attaching a sample implementation of my proposed method below:
def binsearch(a, x):
low = 0
high = len(a)-1
while(low<=high):
mid = (low+high)/2
if a[mid] == x:
return mid
elif a[mid]<x:
k = mid
low = low + mid
else:
high = high - mid
return k
a = [1, 2, 3]
b = [10, 12, 15, 16]
i = 0
j = 0
lenA = len(a)
lenB = len(b)
L = 12
M = 14
count = 0
result = []
while i<lenA and j<lenB:
if b[j] - a[i] < L:
j = j + 1
elif b[j] - a[i] > M:
i = i + 1
else:
p = binsearch(a,b[j]-L)
count = count + p - i + 1
j = j + 1
print "number of (x,y) pairs: ", count
Because it's possible for every combination to be in the specified range, the worst-case is O([A][B]), which is basically O(n^2)
However, if you want the best simple algorithm, this is what I've come up with. It starts similarly to user-targaryen's algorithm, but handles overlaps in a simplistic fashion
Create three variables: x,y,Z and s (set all to 0)
Create a boolean 'success' and set to false
Calculate Z = B[y] - A[x]
if Z < L
increment y
if Z >= L and <= M
if success is false
set s = y
set success = true
increment y
store x,y
if Z > M
set y = s //this may seem inefficient with my current example
//but you'll see the necessity if you have a sorted list with duplicate values)
//you can just change the A from my example to {1,1,2,2,3} to see what I mean
set success = false
an example:
A = {1,2,3,4,5}
B = {3,4,5,6,7}
L = 2, M = 3
In this example, the first pair is x,y. The second number is s. The third pair is the values A[x] and B[y]. The fourth number is Z, the difference between A[x] and B[y]. The final value is X for not a match and O for a match
0,0 - 0 - 1,3 = 2 O
increment y
0,1 - 0 - 1,4 = 3 O
increment y
0,2 - 0 - 1,5 = 4 X
//this is the tricky part. Look closely at the changes this makes
set y to s
increment x
1,0 - 0 - 2,3 = 1 X
increment y
1,1 - 0 - 2,4 = 2 O
set s = y, set success = true
increment y
1,2 - 1 - 2,5 = 3 O
increment y
1,3 - 1 - 2,6 = 4 X
set y to s
increment x
2,1 - 1 - 3,4 = 1 X
increment y
2,2 - 1 - 3,5 = 2 O
set s = y, set success = true
increment y
2,3 - 2 - 3,6 = 3 O
... and so on
There are n positive numbers (A1 , ... An) on a circle, how do we divide this circle into subsegments with sum greater or equal to m so that number of subsegments are maximum in O(n) , or O(nlogn)
eg :
n = 6 m = 6
3 1 2 3 6 3
ANS = 3
since we can divide the array into three subsegments{[2,4],[5,5],[6,1]}
If there is at least one number greater or equal to m in the array, just cut the array into smallest possible pieces starting from one of these numbers. Otherwise (and if sum of numbers is at least 2*m) use a pointer-chasing algorithm.
This algorithm uses 2 additional arrays: L for chain lengths (initially zero) and S for starting indices (initially equal to own indices: 0, 1, 2, ...). And 2 array indices: F and B (initially zero).
Increment F while sum between F and B is less than m. Then increment B while sum between F and B is greater than m (but stop when is is still greater or equal to m).
Update arrays: L[F] = 1 + L[B], S[F] = S[B].
Repeat steps 1,2 while F<n. While incrementing F on step 1, copy most recently updated values to L[F] and S[F].
Reset F to zero.
Increment F while sum of elements before F and after B is less than m. Then increment B while sum before F and after B is greater than m (but stop when is is still greater or equal to m).
If F <= S[B] use L[B] + 1 to update maximum number of subsegments.
Repeat steps 5,6 while B<n.
The FIND-S algorithm is probably one of the most simple machine learning algorithms. However, I can't find many examples out there.. Just the standard 'sunny, rainy, play-ball' examples that's always used in machine learning. Please could someone help me with this application (its a past exam question in machine learning).
Hypotheses are of the form a <= x <= b, c <= y <= d where x and y are points in an x,y plane and c and d are any integer. Basically, these hypotheses define rectangles in the x,y space.
These are the training examples where - is a negative example and + is a positive example and the pairs are the x,y co-ordinates:
+ 4, 4
+ 5, 3
+ 6, 5
- 1, 3
- 2, 6
- 5, 1
- 5, 8
- 9, 4
All I want to do is apply FIND-S to this example! It must be simple! Either some tips or a solution would be awesome.
Thank you.
Find-S seeks the most restrictive (ie most 'specific') hypothesis that fits all the positive examples (negatives are ignored).
In your case, there's an obvious graphical interpretation: "find the smallest rectangle that contains all the '+' coordinates"...
... which would be a=4, b=6, c=3, d=5.
The algorithm for doing it would be something like this:
Define a hypothesis rectangle h[a,b,c,d], and initialise it to [-,-,-,-]
for each + example e {
if e is not within h {
enlarge h to be just big enough to hold e (and all previous e's)
} else { do nothing: h already contains e }
}
If we step through this with your training set, we get:
0. h = [-,-,-,-] // initial value
1. h = [4,4,4,4] // (4,4) is not in h: change h so it just contains (4,4)
2. h = [4,5,3,4] // (5,3) is not in h, so enlarge h to fit (4,4) and (5,3)
3. h = [4,6,3,5] // (6,5) is not in h, so enlarge again
4. // no more positive examples left, so we're done.