I was reading the augmenting path or Kuhn's algorithm to find the maximum matching size in an unweighed bipartite graph.
The algorithm tries to find an alternating path(consisting of alternating unmatched and matched edges) starting and ending at unmatched vertices. If an alternating path is found, the path is augmented and matching count is increased by 1.
I can understand the algorithm, however I have problem in understanding the way it is generally implemented. Here is a reference implementation - https://sites.google.com/site/indy256/algo/kuhn_matching2 .
At each stage, we should try to find an alternating path beginning from an unmatched vertex on the left. However, in the given implementation, in each iteration, instead of trying all unmatched vertices as possible start locations, we instead start our search from only a single unmatched vertex, as shown in the following code -
for (int u = 0; u < n1; u++) {
if (findPath(graph, u, matching, new boolean[n1]))
++matches;
}
This way, a vertex on left, which is not matched during its iteration, is never tried again. I can't understand why is this optimal?
It's sufficient to prove there will be no augment path left after algorithm ends. Since no augment path means max flow. Let's say A[i] as left i'th vertices, and B[i] right i'th vertices.
If A[i] has been matched, then it will remain matched in any augmenting path.
So, our only concern is when we've considered A[i] and found no match but after some iterations in for loop, A[i] suddenly has new augment path. We will show it'll never happen.
Let's consider A[i] has no augmenting path before and say S as set of vertices which can be visited by dfs(i). There are only 2 ways for new vertices(which was not in S before) to be included in S afterwards.
For some A[x] in S, edge A[x] - B[y] is changed from matched to unmatched (B[y] will be included in S afterwards)
Contradiction. Because we should find augmenting path B[y] - A[x] - ... - Sink, but Sink is not in S so we can't do that.
For some B[y] in S, edge B[y] - A[x] is changed from unmatched to matched (A[x] will be included in S afterwards)
Contradiction again. This time, we should find augmenting path A[x] - B[y] - ... - Sink, but again, we can't reach to Sink from B[y].
For the reasons above, it is impossible to accidentally left augmenting path which implies maximum flow.
According to my understanding, the algorithm tries to find an augmenting path for each left vertex u (from 0 to n1-1). If such a path exists, u can be added to the matching, keeping all previously added vertices in the matching. If no such a path exists, it is impossible to add u to the matching. This follows from special properties of augmenting paths.
As we check this for each u, we find the maximal number of vertices possible to add to the matching, leading to an optimality guarantee.
Related
Given the heuristic values h(A)=5, h(B)=1, using A* graph search, it will put A and B on the frontier with f(A)=2+5=7, f(B)=4+1=5, then select B for expansion, then put G on frontier with f(G)=4+4=8, then it will select A for expansion, but will not do anything since both S and B are already expanded and not on frontier, and therefore it will select G next and return a non-optimal solution.
Is my argument correct?
There are two heuristic concepts here:
Admissible heuristic: When for each node n in the graph, h(n) never overestimates the cost of reaching the goal.
Consistent heuristic: When for each node n in the graph and each node m of its successors, h(n) <= h(m) + c(n,m), where c(n,m) is the cost of the arc from n to m.
Your heuristic function is admissible but not consistent, since as you have shown:
h(A) > h(B) + c(A,B), 5 > 2.
If the heuristic is consistent, then the estimated final cost of a partial solution will always grow along the path, i.e. f(n) <= f(m) and as we can see again:
f(A) = g(A) + h(A) = 7 > f(B) = g(B) + h(B) = 5,
this heuristic function does not satisfy this property.
With respect to A*:
A* using an admissible heuristic guarantees to find the shortest path from the start to the goal.
A* using a consistent heuristic, in addition to find the shortest path, also guarantees that once a node is explored we have already found the shortest path to this node, and therefore no node needs to be reexplored.
So, answering your question, A* algorithm has to be implemented to reopen nodes when a shorter path to a node is found (updating also the new path cost), and this new path will be added to the open set or frontier, therefore your argument is not correct, since B has to be added again to the frontier (now with the path S->A->B and cost 3).
If you can restrict A* to be used only with consistent heuristic functions then yes, you can discard path to nodes that have been already explored.
You maintain an ordered priority queue of objects on the frontier. You then take the best candidate, expand in all available directions, and put the new nodes in the priority queue. So it's possible for A to be pushed to the back of queue even though in fact the optimal path goes through it. It's also possible for A to be hemmed in by neighbours which were reached through sub-optimal paths, in which case most algorithms won't try to expand it as you say.
A star is only an a way of finding a reasonable path, it doesn't find the globally optimal path.
I have encountered variations of this problem multiple times, and most recently it became a bottleneck in my arithmetic coder implementation. Given N (<= 256) segments of known non-negative size Si laid out in order starting from the origin, and for a given x, I want to find n such that
S0 + S1 + ... + Sn-1 <= x < S0 + S1 + ... + Sn
The catch is that lookups and updates are done at about the same frequency, and almost every update is in the form of increasing the size of a segment by 1. Also, the bigger a segment, the higher the probability it will be looked up or updated again.
Obviously some sort of tree seems like the obvious approach, but I have been unable to come up with any tree implementation that satisfactorily takes advantage of the known domain specific details.
Given the relatively small size of N, I also tried linear approaches, but they turned out to be considerably slower than a naive binary tree (even after some optimization, like starting from the back of the list for numbers above half the total)
Similarly, I tested introducing an intermediate step that remaps values in such a way as to keep segments ordered by size, to make access faster for the most frequently used, but the added overhead exceeded gains.
Sorry for the unclear title -- despite it being a fairly basic problem, I am not aware of any specific names for it.
I suppose some BST would do... You may try to add a new numeric member (int or long) to each node to keep a sum of values of all left descendants. Then you'll seek for each item in approximately logarithmic time, and once an item is added, removed or modified you'll have to update just its ancestors on the returning path from the recursion. You may apply some self-organizing tree structure, for example AVL to keep the worst-case search optimal or a splay tree to optimize searches for those most often used items. Take care to update the left-subtree-sums during rebalancing or splaying.
You could use a binary tree where each node n contains two integers A_n
and U_n, where initially
A_n = S_0 + .. S_n and U_n = 0.
Let, at any fixed subsequent time, T_n = S_0 + .. + S_n.
When looking for the place of a query x, you would go along the tree, knowing that for each node m the current corresponding value of T_m is A_m + U_m + sum_{p : ancestors of m, we visited the right child of p to attain m} U_p.
This solves look up in O(log(N)).
For update of the n-th interval (increasing its size by y), you just look for it in the tree, increasing the value of U_m og y for each node m that you visit along the way. This also solves update in O(log(N)).
So I am implementing A* algorithm in C. Here's the procedure.
I am using Priority Queue [using array] for all the open nodes. Since I'll have duplicate distances, that is more than one node with same distance/Priority, hence while inserting a node in PQ, if the parent of the inserted node has the same priority, I still swap them both, so that my newest entered member remains on the top( or as high as possible),so that I keep following a particular direction. Also, on removing, when I swap the topmost element with the last one, then again, if the swapped last element has the same as one of its children, then it gets swapped to the bottom.(I am not sure if this will affect in any way).
Now the problem is say I have a 100*100 matrix, and I have obstacles from (0,20) to (15,20) of the 2D array , in which I am moving. Now for a starting position (2,2) and ending position (16,20) I get a straight path, i.e. firstly go all the way to right, then go down till 15 then move one right and I am done.
But, if I have starting as (2,2) and last as (12,78) i.e. the points are separated by the obstacles and the path has to go around it, I still go via (16,20) and my path after (16,20) is still straight, but my path upto (16,20) is zig zag, i.e. I go some distance straight down, then some right, then down then right and so on, ultimately reaching (16,20) and going straight after that.
Why this zig zag path for the first half of the distance, what can I do to make sure that my path is straight, as it is, when my destination is (16,20) and not (12,78).
Thanks.
void findPath(array[ROW][COLUMN],sourceX,sourceY,destX,destY) {
PQ pq[SIZE];
int x,y;
insert(pq,sourceX,sourceY);
while(!empty(pq)) {
remove(pq);
if(removedIsDestination)
break; //Path Found
insertAdjacent(pq,x,y,destX,destY);
}
}
void insert(PQ pq[SIZE],element){
++sizeOfPQ;
PQ[sizeOfPQ]==element
int i=sizeOfPQ;
while(i>0){
if(pq[i].priority <= pq[(i-1)/2].priority){
swapWithParent
i=(i-1)/2;
}
else
break;
}
}
You should change your scoring part. Right now you calculate absolute distance. Instead calculate min move distance. If you count each move as one then if you were at (x,y) and going to (dX,dY) that would be
distance moved + (max(x,dX) - min(x,dx) + max(y,dY) - min(y,dY))
A lower value is considered a higher score.
This heuristic is a guess at how many moves it would take if there was nothing in the way.
The nice thing about the heuristic is you can change it to get the results you want, for example if you prefer to move in a straight line as you suggest, then you can make this change:
= distance moved + (max(x,dX) - min(x,dx) + max(y,dY) - min(y,dY))
+ (1 if this is a turn from the last move)
This will cause you to "find" solutions which tend to go in the same direction.
If you want to FORCE as few turns as possible:
= distance moved + (max(x,dX) - min(x,dx) + max(y,dY) - min(y,dY))
+ (1 times the number of turns made)
This is what is nice about A* -- the heuristic will inform the search -- you will still always find a solution, but if there is more than one you can influence where you look first -- this makes it good for simulating AI behavior.
Doubt : How is the first one and second calculating way different from
each other?
The first one puts a lower priority on a move that is a turn. The second one puts a lower priority on a path with more turns. In some cases (eg, the first turn) the value will be the same, but over all the 2nd one will pick paths that have as few turns as possible, where the first one might not.
Also, 1 if this is a turn from the last move , for this,
say i have source at top left and destination at bottom right, now my
path normally would be, left,left,left...down,down,down.... Now, 1 if
this is a turn from the last move, according to this, when I change
from left to down, will I add 1?
Yes
Wont it make the total value more and the priority for down will decrease.
Yes, exactly. You want to not look at choices that have a turn in them first. This will make them lower priority and your algorithm will investigate other options with a higher priority -- exactly what you want.
Or 1 if this is a turn from the last move is when I move to a cell, that is not abutting the cell previously worked upon? Thnks –
No, I don't understand this question -- I don't think it makes sense in this context -- all moves have to abut the previous cell, diagonal moves are not allowed.
Though, I'd really appreciate if you could tell me one instance where the first and second methods will give different answers. If you could. Thanks alot. :)
Not so easy without seeing the details of your algorithm but the following might work:
The red are blocks. The green is what I would expect the first one to do, it locally tries to find the least turn. The blue is the least turn solution. Note, how far the red areas are from each other and the details of how your algorithm influence if this will work. As I have it above -- having an extra turn only costs 1 in the heuristic. SO, if you want to be sure this will work change the heuristic like this:
= distance moved + (max(x,dX) - min(x,dx) + max(y,dY) - min(y,dY))
+ (25 times the number of turns made)
Where 25 is bigger than the distance to get past the 2nd turn in the green path. (Thus after the 2nd turn the blue path will be searched.)
Problem: Start with a set S of size 2n+1 and a subset A of S of size n. You have functions addElement(A,x) and removeElement(A,x) that can add or remove an element of A. Write a function that cycles through all the subsets of S of size n or n+1 using just these two operations on A.
I figured out that there are (2n+1 choose n) + (2n+1 choose n+1) = 2 * (2n+1 choose n) subsets that I need to find. So here's the structure for my function:
for (int k=0; k<2*binomial(2n+1,n); ++k) {
if (k mod 2) {
// somehow choose x from S-A
A = addElement(A,x);
printSet(A,n+1);
} else
// somehow choose x from A
A = removeElement(A,x);
printSet(A,n);
}
}
The function binomial(2n+1,n) just gives the binomial coefficient, and the function printSet prints the elements of A so that I can see if I hit all the sets.
I don't know how to choose the element to add or remove, though. I tried lots of different things, but I didn't get anything that worked in general.
For n=1, here's a solution that I found that works:
for (int k=0; k<6; ++k) {
if (k mod 2) {
x = S[A[0] mod 3];
A = addElement(A,x);
printSet(A,2);
} else
x = A[0];
A = removeElement(A,x);
printSet(A,1);
}
}
and the output for S = [1,2,3] and A=[1] is:
[1,2]
[2]
[2,3]
[3]
[3,1]
[1]
But even getting this to work for n=2 I can't do. Can someone give me some help on this one?
This isn't a solution so much as it's another way to think about the problem.
Make the following graph:
Vertices are all subsets of S of sizes n or n+1.
There is an edge between v and w if the two sets differ by one element.
For example, for n=1, you get the following cycle:
{1} --- {1,3} --- {3}
| |
| |
{1,2} --- {2} --- {2,3}
Your problem is to find a Hamiltonian cycle:
A Hamiltonian cycle (or Hamiltonian circuit) is a cycle in an
undirected graph which visits each vertex exactly once and also
returns to the starting vertex. Determining whether such paths and
cycles exist in graphs is the Hamiltonian path problem which is
NP-complete.
In other words, this problem is hard.
There are a handful of theorems giving sufficient conditions for a Hamiltonian cycle to exist in a graph (e.g. if all vertices have degree at least N/2 where N is the number of vertices), but none that I know immediately implies that this graph has a Hamiltonian cycle.
You could try one of the myriad algorithms to determine if a Hamiltonian cycle exists. For example, from the wikipedia article on the Hamiltonian path problem:
A trivial heuristic algorithm for locating hamiltonian paths is to
construct a path abc... and extend it until no longer possible; when
the path abc...xyz cannot be extended any longer because all
neighbours of z already lie in the path, one goes back one step,
removing the edge yz and extending the path with a different neighbour
of y; if no choice produces a hamiltonian path, then one takes a
further step back, removing the edge xy and extending the path with a
different neighbour of x, and so on. This algorithm will certainly
find an hamiltonian path (if any) but it runs in exponential time.
Hope this helps.
Good News: Though the Hamiltonian cycle problem is difficult in general, this graph is very nice: it's bipartite and (n+1)-regular. This means there may be a nice solution for this particular graph.
Bad News: After doing a bit of searching, it turns out that this problem is known as the Middle Levels Conjecture, and it seems to have originated around 1980. As best I can tell, the problem is still open in general, but it has been computer verified for n <= 17 (and I found a preprint from 12/2009 claiming to verify n=18). These two pages have additional information about the problem and references:
http://www.math.uiuc.edu/~west/openp/revolving.html
http://garden.irmacs.sfu.ca/?q=op/middle_levels_problem
This sort of thing is covered in Knuth Vol 4A (which despite Charles Stross's excellent Laundry novels is now openly available). I think you request is satisfied by a section of a monotonic binary gray code described in section 7.2.1.1. There is an online preprint with a PDF version at http://www.kcats.org/csci/464/doc/knuth/fascicles/fasc2a.pdf
I tried working out the problem using Tarjan's Algorithm and one algorithm from the website: http://discuss.techinterview.org/default.asp?interview.11.532716.6, but none is clear. Maybe my recursion concepts are not build up properly. Please give small demonstration to explain the above two examples. I have an idea of Union Find data-structure.
It looks very interesting problem. So have to decode the problem anyhow. Preparing for the interviews.
If any other logic/algorithm exist, please share.
The LCA algorithm tries to do a simple thing: Figure out paths from the two nodes in question to the root. Now, these two paths would have a common suffix (assuming that the path ends at the root). The LCA is the first node where the suffix begins.
Consider the following tree:
r *
/ \
s * *
/ \
u * * t
/ / \
* v * *
/ \
* *
In order to find the LCA(u, v) we proceed as follows:
Path from u to root: Path(u, r) = usr
Path from v to root: Path(v, r) = vtsr
Now, we check for the common suffix:
Common suffix: 'sr'
Therefore LCA(u, v) = first node of the suffix = s
Note the actual algorithms do not go all the way up to the root. They use Disjoint-Set data structures to stop when they reach s.
An excellent set of alternative approaches are explained here.
Since you mentioned job interviews, I thought of the variation of this problem where you are limited to O(1) memory usage.
In this case, consider the following algorithm:
1) Scan the tree from node u up to the root, finding the path length L(u)
2) Scan the tree from node v up to the root, finding the path length L(v)
3) Calculate the path length difference D = |L(u)-L(v)|
4) Skip D nodes in the longer path from the root
5) Walk up the tree in parallel from the two nodes, until you hit the same node
6) Return this node as the LCA
Assuming you only need to solve the problem once (per data set) then a simple approach is to collect the set of ancestors from one node (along with itself), and then walk the list of ancestors from the other until you find a member of the above set, which is necessarily the lowest common ancestor. Pseudocode for that is given:
Let A and B begin as the nodes in question.
seen := set containing the root node
while A is not root:
add A to seen
A := A's parent
while B is not in seen:
B := B's parent
B is now the lowest common ancestor.
Another method is to compute the entire path-to-room for each node, then scan from the right looking for a common suffix. Its first element is the LCA. Which one of these is faster depends on your data.
If you will be needing to find LCAs of many pairs of nodes, then you can make various space/time trade-offs:
You could, for instance, pre-compute the depth of each node, which would allow you to avoid re-creating the sets(or paths) each time by first walking from the deeper node to the depth of the shallower node, and then walking the two nodes toward the root in lock step: when these paths meet, you have the LCA.
Another approach annotates nodes with their next ancestor at depth-mod-H, so that you first solve a similar-but-H-times-smaller problem and then an H-sized instance of the first problem. This is good on very deep trees, and H is generally chosen as the square root of the average depth of the tree.