a* algorithm pseudocode - c

I am trying to implement in c the pseudocode of a* algorithm given by wikipedia but I am really stuck in understanding what is the reconstruct_path function, can someone explain to me what do the variables in this function (p, p+current_node, set) represent?
function A*(start,goal)
closedset := the empty set // The set of nodes already evaluated.
openset := {start} // The set of tentative nodes to be evaluated, initially containing the start node
came_from := the empty map // The map of navigated nodes.
g_score[start] := 0 // Cost from start along best known path.
// Estimated total cost from start to goal through y.
f_score[start] := g_score[start] + heuristic_cost_estimate(start, goal)
while openset is not empty
current := the node in openset having the lowest f_score[] value
if current = goal
return reconstruct_path(came_from, goal)
remove current from openset
add current to closedset
for each neighbor in neighbor_nodes(current)
tentative_g_score := g_score[current] + dist_between(current,neighbor)
if neighbor in closedset
if tentative_g_score >= g_score[neighbor]
continue
if neighbor not in openset or tentative_g_score < g_score[neighbor]
came_from[neighbor] := current
g_score[neighbor] := tentative_g_score
f_score[neighbor] := g_score[neighbor] + heuristic_cost_estimate(neighbor, goal)
if neighbor not in openset
add neighbor to openset
return failure
function reconstruct_path(came_from, current_node)
if came_from[current_node] in set
p := reconstruct_path(came_from, came_from[current_node])
return (p + current_node)
else
return current_node
Thank you

came_from is a map of navigated nodes, like the comment says. It can be implemented in several ways, but a classic map should be fine for this purpose(even a list is fine).
If you are not familiar with maps, checkout std::map.
The goal of A* is to find a list of moves, that will solve the given problem (represented as a graph). A solution is a path through the graph.
In the pseudocode proposed, came_from store the "history" of the solution you are actually evaluating (so a possible path through the graph).
When you explore a node (a new node or one with less cost in the already visited list):
if neighbor not in openset or tentative_g_score < g_score[neighbor]
came_from[neighbor] := current
you are saving in the came_from map the node where you come from. (It's simpler to think at it as the ordered list of moves till the solution node is reached. A map is used instead of a list for performance issues).
The line above basically means:
"Now I'll visit neighbor node. Remember that I reached neighbor node
coming from current node".
When goal node is reached, A* needs to return the list of moves from start node to goal. You have the reference to the goal node, so you can now recontruct the list(reconstruct_path) of moves to reach it coming from start node, because you stored the list of moves in came_from map.

You have a set of nodes and each node in your path can "point" to its predecessor (the node from which you came from to this node) - this is what came_from map is storing .
You want your a* function to return a list* of nodes in the path.
Now, back to return (p + current_node) - this code basically means return a list which contains all elements from p with current_node at the end. So it's p with 1 element added to the end of p.
You can see, that because this function is recursive, at the beginning it will contain a single element - first in your path, which will be a start. You will then add new elements to it, ending with goal element at the end.
You could also look at this this way: your algorithm allowed you to find a path from goal to start (you just need to follow the came_from of your nodes). This function allows you to traverse your path from start to goal thank you recursion, so you should end up with a list of some sort, containing your path in correct order.
* by list I mean some structure that represent a sequence of elements, not a set.

Related

Given a DAG, the length of the longest path and the node in which it ends, how do I retrace my steps so I can print each node of the longest path?

I'm working on a problem of finding the most parallelepipeds that can be stored into each other given a list of parallelepipeds.
My approach was to represent the graph with an adjacency list, do a topological sort and then for each node in the topological array "unrelax" the edges, giving me the longest path.
Below is the code but I don't think it matters for the question.
typedef struct Edge {
int src; /* source node */
int dst; /* destination node */
struct Edge *next;
} Edge;
int maxend; //node in which the longest path ends
int mp; // longest path
for (int i = 0; i < G.n; i++)
{
int j = TA[i]; //TA is the topological sorted array
if (g->edges[j] != NULL)
{
if(DTA[j] == -1) DTA[j] = 0;
Edge* tmp = G.edges[j];
while (tmp != NULL)
{
if(DTA[tmp->src] >= DTA[tmp->dst]){ //DTA is the array that keeps track of the maximum distance of each node in TA
DTA[tmp->dst] = DTA[tmp->src]+1;
if (DTA[tmp->dst] > mp) {
mp = DTA[tmp->dst];
maxend = tmp->dst;
}
}
tmp = tmp->next;
}
}
}
In the end I have the lenght of the longest path and the node in which said path ends, but how do I efficiently recreate the path?
If parallelepiped A contains parallelepiped B and parallelepiped B contains parallelepiped C that means that parallelepiped A parallelepiped box C aswell, which means that each edge has a weight of 1 and Vertex where the longest path starts has the furthest node of the path in his adjaceny list.
I've thought of 3 solutions but none of them look great.
Iterate the edges of each vertex that has weight 0 (so no predecessors) and if there is a choice avoid choosing the edge that connects it with the furthest node (As said before, the shortest path between the starting node and the ending node will be 1)
In the the array that tracks the maximum distance of each node in the topologically sorted array: start from the index representing the furthest node we found, see if the previous node has a compatible distance (as in, the previous node has 1 less distance than the furthest node). If it does, check it's adjaceny list to see if the furthest node is in it (because if the furthest node has a distance of 10 there could be several nodes that have a distance of 9 but are unconnected to it). Repeat until we reach the root of the path.
Most probable candidate so far, create an array of pointers that keeps track of the "maximum" parent of each node. In the code above everytime a node has it's maximum distance changed it means that it's parent node, if it had any, had a longer distance than the previous parent, which means we can change the maximum parent associated with the current node.
Edit: I ended up just allocating a new array and everytime I updated the weight of a node ( DTA[tmp->src] >= DTA[tmp->dst] ) I also stored the number of the source edge in the cell of the destination edge.
I am assuming the graph edge u <- v indicates that box u is big enough to contain v.
I suggest you dump the topological sort. Instead:
SET weight of every edge to -1
LOOP
LOOP over leaf nodes ( out degree zero, box too small to contain others )
Run Dijkstra algorithm ( gives longest path, with predecessors )
Save length of longest path, and path itself
SAVE longest path
REMOVE nodes on longest path from graph
IF all nodes gone from graph
OUTPUT saved longest paths ( lists of nested boxes )
STOP
This is called a "greedy" algorithm. It is not guaranteed to give the optimal result. But it is fast and simple, always gives a reasonable result and often does give the optimal.
I think this solves it, unless there's something I don't understand.
The highest-weighted path in a DAG is equivalent to the lowest-weighted path if you make the edge weights negative. Then you can apply Dijkstra's algorithm directly.
A longest path between two given vertices s and t in a weighted graph
G is the same thing as a shortest path in a graph −G derived from G by
changing every weight to its negation.
This might even be a special case of Dijkstra that is simpler... not sure.
To retrieve the longest path, you start at the end and go backwards:
Start at the vertex with the greatest DTA V_max
Find the edges that end at V_max (edge->dest = V_max)
Find an edge Src_max where the DTA value is 1 less than the max (DTA[Src_max] == DTA[V_max] - 1)
Repeat this recursively until there are no more source vertices
To make that a little more efficient, you can reverse the endpoints of the edges on the way down and then follow the path back to the start. That way each reverse step is O(1).
I think the option 3 is most promising. You can search for the longest path with DSF starting from all the root vertices (those without incoming edges) and increasing the 'max distance' for each vertex encountered.
This is quite a simple solution, but it may traverse some paths more than once. For example, for edges (a,f), (b,c), (c,f), (d,e), (e,c)
a------------f
/
b----c--/
/
d--e--/
(all directed rightwards)
the starting vertices are a, b, and d, the edge (c,f) will be traversed twice and the vertex f distance will be updated three times. If we append the rest of alphabet to f in a simple chain,
a------------f-----g-- - - ---y---z
/
b----c--/
/
d--e--/
the whole chain from f to z will be probably traversed three times, too.
You can avoid this by separating the phases and modifying the graph between them: after finding all the starting vertices (a, b, d) increment the distance of each vertex available from those (f, c, e), then remove starting vertices and their edges from the graph - and re-iterate as long as some edges remain.
This will transform the example graph after the first step like this:
f-----g-- - - ---y---z
/
c--/
/
e--/
and we can see all the junction vertices (c and f) will wait until the longest path to them is found before letting the analysis go further past them.
That needs iterative seeking for starting vertices, which may be time consuming unless you do some preprocessing (for example, counting all incoming edges for each vertex and storing vertices in some sorting data structure, like an integer-indexed multimap or a simple min-heap.)
The question remains open, whether the whole overhead of truncating a graph and rescaning it for new root vertices makes a net gain compared with multiple traversing some final parts of common paths in your particular graph...

Inserting elements from a sorted array into a BST in an efficient way

I am struggling with this exercise:
Given a BST T, whose nodes contain only a key field, a left and right field and a sorted array A which contains m keys. Write an efficient algorithm which inserts into T any A's keys which are not already present in T. You must not apply the InsertBST(T,key) algorithm on single A's keys.
For example if the BST contains 1,3,4,5 and A contains 1,2,5,6, I have to insert 2,6 without using InsertBST(T,2) and InsertBST(T,6).
This is what I tried:
Algo(T,A,index)
if T != null then
Algo(T->sx,A,index)
p = NewNode()
p->key = A[index]
if T->key > A[index] then
T->sx = p
index = index + 1
else if T->key < A[index] then
T->dx = p
index = index + 1
Algo(T->dx,A,index)
But it inserts nodes at the wrong places. Can you help me?
I see the following issues with your attempt:
In each recursive call you are going to insert a node. So for instance, also when you reach the first, left-most leaf. But this cannot be right. The first node insertion may have to happen completely elsewhere in the tree without affecting this leaf at all
There is no check that index runs out of bounds. The code assumes that there are just as many values to insert as there are nodes in the tree.
There is no provision for when the value is equal to the value of an existing node in the tree. In that case the value should not be added at all, but ignored.
In most languages index will be a local variable to the Algo function, and so updating it (with + 1) will have no effect to the index variable that the caller has. You would need a single index variable that is shared across all executions of Algo. In some languages you may be able to pass index by reference, while in other languages you may be able to access a more global variable, which is not passed as argument.
Algorithm
The algorithm to use is as follows:
Perform the inorder traversal (as you already did). But keep track of the previously visited node. Only when you detect that the next value to be inserted is between the values of those two nodes, you need to act: in that case create a new node and insert it. Here is how to detect where to insert it. Either:
the current node has no left child, which also means the previous node (in the inorder sequence) is not existing or is higher up the tree. In this case the new node should become the left child of the current node
the current node has a left child, which also means the previous node is in the left subtree and has no right child of its own. In that case, add the new node as right child of the previous node.
The Sentinel idea
There is a possibility that after visiting the last node in inorder sequence, there are still values to insert, because they are greater than the greatest value in the tree. Either you must treat that case separately after the recursive traversal has finished, or you can begin your algorithm by adding a new, temporary root to your tree, which has a dummy, but large value -- greater than the greatest value to insert. Its left child is then the real root. With that trick you are sure that the recursive logic described above will insert nodes for all values, because they will all be less than that temporary node's value.
Implementation in Python
Here is the algorithm in Python syntax, using that "sentinel" node idea:
def insertsorted(root, values):
if not values:
return root # Nothing to do
# These variables are common to all recursive executions:
i = 0 # index in values
prev = None # the previously visited node in inorder sequence
def recur(node):
nonlocal i, prev # We allow those shared variables to be modified here
if i < len(values) and node.left:
recur(node.left)
while i < len(values) and values[i] <= node.value:
if values[i] < node.value:
newnode = Node(values[i])
if node.left is None:
node.left = newnode
else:
prev.right = newnode
prev = newnode
i += 1
prev = node
if i < len(values) and node.right:
recur(node.right)
# Create a temporary node that will become the parent of the root
# Give it a value greater than the last one in the array
sentinel = Node(values[-1] + 1)
sentinel.left = root
recur(sentinel)
# Return the real root to the caller
return sentinel.left

Need some explanation about trees in C

Leaf *findLeaf(Leaf *R,int data)
{
if(R->data >= data )
{
if(R->left == NULL) return R;
else return findLeaf(R->left,data);
}
else
{
if(R->right == NULL) return R;
else return findLeaf(R->right,data);
}
}
void traverse(Leaf *R)
{
if(R==root){printf("ROOT is %d\n",R->data);}
if(R->left != NULL)
{
printf("Left data %d\n",R->left->data);
traverse(R->left);
}
if(R->right != NULL)
{
printf("Right data %d\n",R->right->data);
traverse(R->right);
}
}
These code snippets works fine but i wonder how they works?
I need a brief explanation about recursion.I am thankful for your helps.
A Leaf struct will look something like this:
typedef struct struct_t {
int data;
Leaf * left; //These allow structs to be chained together where each node
Leaf * right; //has two pointers to two more nodes, causing exponential
} Leaf; //growth.
The function takes a pointer to a Leaf we call R and some data to search against, it returns a pointer to a Leaf
Leaf *findLeaf(Leaf *R,int data){
This piece of code decides whether we should go left or right, the tree is known to be ordered because the insert function follows this same rule for going left and right.
if(R->data >= data ){
This is an edge case of the recursive nature of the function, if we have reached the last node in a tree, called the Leaf, return that Leaf.
An edge case of a recursive function has the task of ending the recursion and returning a result. Without this, the function would not finish.
if(R->left == NULL) return R;
This is how we walk through the tree, Here, we are traversing down the left side because the data was larger. (Larger data is always inserted on at the left to stay ordered.)
What is happening is that now we call findLeaf() with R->left, but imagine if we get to this point again in this next call.
It will become R->left->left in reference to the first call. If the data is smaller than the current node we are operating on we would go right instead.
else return findLeaf(R->left,data);
Now we are at the case where the data was smaller than the current Node, so we are going right.
} else {
This is exactly the same as with the left.
if(R->right == NULL) return R;
else return findLeaf(R->right,data);
}
}
In the end, the return of the function can be conceptualized as something like R->right->right->left->NULL.
Lets take this tree and operate on it with findLeaf();
findLeaf(Leaf * root, 4) //In this example, root is already pointing to (8)
We start at the root, at the top of the tree, which contains 8.
First we check R->data >= data where we know R->data is (8) and data is (4). Since we know data is smaller than R->data(Current node), we enter the if statement.
Here we operate on the left Leaf, checking if it is NULL. It isn't and so we skip to the else.
Now we return findLeaf(R->left, data);, but to return it, we must solve it first. This causes us to enter a second iteration where we compare (3) to (4) and try again.
Going through the entire process again, we will compare (6) to (4) and then finally find our node when we comepare (4) to (4). Now we will backtrack through the function and return a chain like this:
R(8)->(3)->(6)->(4)
Edit: Also, coincidentally, I wrote a blog post about traversing a linked list to explain the nature of a Binary Search Tree here.
Each Leaf contains three values:
data - an integer
left and right, both pointers to another leaf.
left, right or both, might be NULL, meaning there isn't another leaf in that direction.
So that's a tree. There's one Leaf at the root, and you can follow the trail of left or right pointers until you reach a NULL.
The key to recursion is that if you follow the path by one Leaf, the remaining problem is exactly the same (but "one smaller") as the problem you had when you were at the root. So you can call the same function to solve the problem. Eventually the routine will be at a Leaf with NULL as its pointer, and you've solved the problem.
It's probably easiest to understand a list before you understand a tree. So instead of a Leaf with two pointers, left and right, you have a Node with just one pointer, next. To follow the list to its end, recursively:
Node findEnd(Node node) {
if(node->next == NULL) {
return node; // Solved!!
} else {
return findEnd(node->next);
}
}
What's different about your findLeaf? Well, it uses the data parameter to decide whether to follow the left or right pointer, but otherwise it's exactly the same.
You should be able to make sense of traverse() with this knowledge. It uses the same principle of recursion to visit every Leaf in the structure.
Recursion is a function that breaks a problem down into 2 variants:
one step of solving the problem, and calling itself with the remainder of the problem
the last step of solving the problem
Recursion is simply a different way of looping through code.
Recursive algorithms generally work hand in hand with some form of data structure - in your case the tree. You need to imagine the recursion - very high level - as "reapply the same logic on a subset of the problem".
In your case the subset of the problem is either the branch of the three on the right or the branch of the three on the left.
So, let's look at the traverse algorithm:
It takes the leaf you pass to the method and - if it's the ROOT leaf states it
Then, if there is a "left" sub-leaf it displays the data attached to it and restarts the algorithm (the recursion) which means... on the left node
If the left node is the ROOT, state it (no chance after the first recursion since the ROOT is at the top)
Then , if there is a "left" sub-leaf to our left node, display it and restart the algorithm on this left, left
When reaching the bottom left, i.e. when there is no left leaf left (following? :) ) then it does the same on the first right leaf. If there is neither a left leaf nor a right leaf, which means we are at the real leaf that does not have sub-leafs, the recursive call ends, which means that the algorithm starts again from the place it was before recursing and with all the variables at the state they were in then.
After first recursion termination, you will move from the bottom left leaf up one leaf, and go down on the right leaf if there is one and start again printing and moving on the left.
All in all - the ending result is that you walk through your whole tree in a left first way.
Tell me if it's not crystal clear and try to apply the same pattern on the findLeaf recursive algorithm.
A little comment about recursion and then a little comment about searching on a tree:
let's suppose you want to calculate n!. You can do (pseudocode)
fac 0 = 1
fac (n+1) = (n+1) * fac n
So recursion is solving a problem by manipulating the result of solving the same problem with a smaller data. See http://en.wikipedia.org/wiki/Recursion.
So now, let's suppose we have a data structure tree
T = (L, e, R)
with L the left subtree, e is the root and R is the right subtree... So let's say you want to find the value v in that tree, you would do
find v LEAF = false // you cant find any value in an empty tree, base case
find v (L, e, R) =
if v == e
then something(e)
else
if v < e
find v L (here we have recursion, we say 'go and search for v in the left subtree)
else
find v R (here we have recursion, we say 'go and search for v in the right subtree)
end
end

finding longest path in an adjacency list

I have an adjacency list I have created for a given graph with nodes and weighted edges. I am trying to figure out what the best way would be to find the longest path within the graph. I have a topological sort method, which I've heard can be useful, but I am unsure how to implement it to find the longest path. So is there a way to accomplish this using topology sort or is there a more efficient method?
Here is an example of my out for the adj list (the value in parenthesis are the cost to get to the node after the arrow (cost)to get to -> node:
Node 0 (4)->1(9)->2
Node 1 (10)->3
Node 2 (8)->3
Node 3
Node 4 (3)->8(3)->7
Node 5 (2)->8(4)->7(2)->0
Node 6 (2)->7(1)->0
Node 7 (5)->9(6)->1(4)->2
Node 8 (6)->9(5)->1
Node 9 (7)->3
Node 10 (12)->4(11)->5(1)->6
Bryan already answered your question above, but I thought I could go in more depth.
First, as he pointed out, this problem is only easily solvable if there are no cycles. If there are cycles you run into the situation where you have infinitely long paths. In that case, you might define a longest path to be any path with no repeated nodes. Unfortunately, this problem can be shown to be NP-Hard. So instead, we'll focus on the problem which it seems like you actually need to solve (since you mentioned the topological sort)--longest path in a Directed Acyclic Graph (DAG). We'll also assume that we have two nodes s and t that are our start and end nodes. The problem is a bit uglier otherwise unless you can make certain assumptions about your graph. If you understand the text below, and such assumptions in your graphs are correct, then perhaps you can remove the s and t restrictions (otherwise, you'll have to run it on every pair of vertices in your graph! Slow...)
The first step in the algorithm is to topologically order the vertices. Intuitively this makes sense. Say you order them from left to right (i.e. the leftmost node will have no incoming edges). The longest path from s to t will generally start from the left and end on the right. It's also impossible for the path to ever go in the left direction. This gives you a sequential ordering to generate the longest path--start at the left and move right.
The next step is to sequentially go left to right and define the longest path for each node. For any node that has no incoming edges, the longest path to that node is 0 (this is true by definition). For any node with incoming edges, recursively define the longest path to that node to be the maximum over all incoming edges + the longest path to get to the "incoming" neighbor (note that this number might be negative, if, for example, all of the incoming edges are negative!). Intuitively this makes sense, but the proof is also trivial:
Suppose our algorithm claims that the longest path to some node v is d but the actual longest path is some d' > d. Pick the "least" such node v (we use the ordering as defined by the topological sort. In other words, we pick the "left-most" node that our algorithm failed at. This is important so that we can assume that our algorithm has correctly determined the longest path for any nodes to the "left" of v). Define the length of the hypothetical longest path to be d' = d_1 + e where d_1 is the length of the hypothetical path up to a node v_prev with edge e to v (note the sloppy naming. The edge e also has weight e). We can define it as such because any path to v must go through one of its neighbors which have an edge going to v (since you can't get to v without getting there via some edge that goes to it). Then d_1 must be the longest path to v_prev (else, contradiction. There is a longer path which contradicts our choice of v as the "least" such node!) and our algorithm would choose the path containing d_1 + e as desired.
To generate the actual path you can figure out which edge was used. Say you've reconstructed the path up to some vertex v which has longest path length d. Then go over all incoming vertices and find the one with longest path length d' = d - e where e is the weight of the edge going into v. You could also just keep track of the parents' of nodes as you go through the algorithm. That is, when you find the longest path to v, set its parent to whichever adjacent node was chosen. You can use simple contradiction to show why either method generates the longest path.
Finally some pseudocode (sorry, it's basically in C#. This is a lot messier to code in C without custom classes and I haven't coded C in a while).
public List<Nodes> FindLongestPath(Graph graph, Node start, Node end)
{
var longestPathLengths = Dictionary<Node, int>;
var orderedNodes = graph.Nodes.TopologicallySort();
// Remove any nodes that are topologically less than start.
// They cannot be in a path from start to end by definition
while (orderedNodes.Pop() != start);
// Push it back onto the top of the stack
orderedNodes.Push(start);
// Do algorithm until we process the end node
while (1)
{
var node = orderedNodes.Pop();
if (node.IncomingEdges.Count() == 0)
{
longestPathLengths.Add(node, 0);
}
else
{
var longestPathLength = Int.Min;
foreach (var incomingEdge in node.IncomingEdges)
{
var currPathLength = longestPaths[incomingEdge.Parent] +
incomingEdge.Weight);
if (currPathlength > longestPathLength)
{
longestPath = currPathLength;
}
}
longestPathLengths.Add(node, longestPath);
}
if (node == end)
{
break;
}
}
// Reconstruct path. Go backwards until we hit start
var node = end;
var longestPath = new List<Node>();
while (node != start)
{
foreach (var incomingEdge in node.IncomingEdges)
{
if (longestPathLengths[incomingEdge.Parent] ==
longestPathLengths[node] - incomingEdge.Weight)
{
longestPath.Prepend(incomingEdge.Parent);
node = incomingEdge.Parent;
break;
}
}
}
return longestPath;
}
Note that this implementation is not particularly efficient, but hopefully it's clear! You can optimize in a lot of small ways that should be obvious as you think through the code/implementation. Generally, if you store more stuff in memory, it'll run faster. The way you structure your Graph is also critical. For instance, it didn't seem like you had an IncomingEdges property for your nodes. But without that, finding the incoming edges for each node is a pain (and is not performant!). In my opinion, graph algorithms are conceptually different from, say, algorithms on strings and arrays because the implementation matters so much! If you read the wiki entries on graph algorithms you'll find they often give three or four different runtimes based on different implementations (with different data structures). Keep this in mind if you care about speed
Assuming your graph has no cycles, otherwise longest path becomes a vague concept, you can have a topological sort indeed. Now you can walk this topological sort and for each node compute its longest distance from a source node by looking at all its predecessors and add the weight of the edge connecting to them to their distance. Then choose the predecessor that gives you the longest distance for this node. The topological sort guarantees that all your predecessors have their distance already correctly determined.
If in addition to the length of the longest path, you also want the path itself. Then you start at the node that gave the longest length and look at all its predecessors to find the one that resulted in this length. Then repeat this process until you have found a source node of the graph.

merged linked list in C

This question was asked to me in an interview:
There are two header of two linked lists.
There is a merged linked list in c where in the second linked list is merged into the first one at some point.
How could we identify the merging point and what is the complexity of finding that point ?
Could anybody please help?
O(n)
search = list1->header;
if (mixed->header == list1->header) search = list2->header;
while (mixed->next != search) mixed = mixed->next;
Edit: new name for variables and a few comments
/* search is what we want to find. Here it's the head of `list2` */
search = list2->header;
/* unless the merging put `list2` first; then we want to search for `list1` */
if (mixed->header == list2->header) search = list1->header;
/* assume (wrongly) that the header for the mixed list is the merge point */
mergepoint = mixed->head;
/* traverse the mixed list until we find the pointer we're searching */
while (mergepoint->next != search) mergepoint = mergepoint->next;
/* mergepoint now points to the merge point */
Update: This assumes the Y-shaped joining of two linked lists as described better in Steve Jessop's post. But I think the description of the problem is sufficiently ambiguous that various interpretations are possible, of which this is only one.
This can be done with a single pass through one list plus a partial pass through the other. In other words, it's O(n).
Here's my proposed algorithm:
Create a hashmap. (Yes, this is busywork in C if you don't have a library handy for it).
The keys will be pointers to the items in List1 (i.e. the head pointer and each link).
The values will be integers denoting the position, i.e. distance from the head of List1.
Run through List1, keeping track of the position, and hash all your pointers and positions.
Run through List2, keeping track of the position, and find the first pointer that occurs in the hashmap.
At this point, you'll know the position in List2 of the first node common to both lists.
The hashmap entry will also contain the position in List1 of that same node.
That will nicely identify your merge point.
Do you mean you have a Y-shape, like this:
list1: A -> B -> C -> D -> E -> F
list2: X -> Y -> Z -> E -> F
Where A .. Z are singly-linked list nodes. We want to find the "merge point" E, which is defined to be the first node appearing in both lists. Is that correct?
If so, then I would attach the last node of list2 (F) to the first node of list2 (X). This turns list2 into a loop:
list2 : X -> Y -> Z -> E -> F -> X -> ...
But more importantly:
list1 : A -> B -> C -> D -> E -> F -> X -> Y -> Z -> E -> ...
This reduces the question to a previously-solved problem, which can be solved in O(n) time and O(1) additional storage.
But reading your question, another possibility is that by "merge" you mean "insert". So you have two lists like this:
list1: A -> B -> C
list2: D -> E -> F
and then another completely separate list:
list3: A -> B -> D -> E -> F -> C
where this time, A .. F are the values contained in the list, not the nodes themselves.
If the values are all different, you just need to search list3 for D (or for the later of D and A, if you don't know which list it was that was copied into the other). Which seems like a pointless question. If values can be repeated, then you have to check for the full sequence of list2 inside list3. But just because you find "DEF" doesn't mean that's where list2 was inserted - maybe "DEF" already occurred several times in list1 beforehand, and you've just found the first of those. For instance if I insert "DEF" into "ABCDEF", and the result is "ABCDEFDEF", then did I insert at index 3 or at index 6? There's no way to tell, so the question can't be answered.
So, in conclusion, I don't understand the question. But I might have answered it anyway.
If the question means list2 contained in list1 (that is list2 points somewhere in the middle of list1), then it is easy - just walk list1 and compare pointers until you reach list2.
However such interpretation does not make much sense, because by inserting list2 into the list1 (like 1 1 2 2 1), you would also modify list2 - the last 1 becomes part of list2.
So I will assume the question is about the Y shape:
list1: A -> B -> C -> D -> E -> F
list2: X -> Y -> Z -> E -> F
This can be solved using hashtable as Carl suggested.
Solution without a hashtable would be this:
Walk list1 and disconnect all its pointers as you go
Walk list2. When it ends, you've reached the junction point
Repair the pointers in list1
Disconnecting and repairing pointers in list1 can be done easily using recursion:
Diconnect(node)
{
if (node->next == NULL)
walk list2 to its end, that is the solution, remember it
else
{
tmp = node->next;
node->next = NULL;
Disconnect(tmp);
node->next = tmp; // repair
}
}
Now call Disconnect(list1).
That is recurse down list1 and disconnect pointers. When you reach end, execute step 2 (walk list2 to find junction), repair pointers when returning back from recursion.
This solution modifies list1 temporarily, so it is not thread safe and you should use a lock around the Disconnect(list1) call.
//try this code for merge
void mergeNode(){
node *copy,*current,*current1;
free(copy);
merge=NULL;
current=head;
current1=head1;
while(current!=NULL){
if(merge==NULL){
node *tmp;
tmp=(node*)malloc(sizeof(node));
tmp->data=current->data;
tmp->link=NULL;
merge=tmp;
}
else{
copy=merge;
while(copy->link!=NULL)
copy=copy->link;
node *tmp;
tmp=(node*)malloc(sizeof(node));
tmp->data=current->data;
tmp->link=copy->link;
copy->link=tmp;
}
current=current->link;
}
while(current1!=NULL){
copy=merge;
while(copy->link!=NULL)
copy=copy->link;
node *tmp;
tmp=(node*)malloc(sizeof(node));
tmp->data=current1->data;
tmp->link=copy->link;
copy->link=tmp;
current1=current1->link;
}
display(merge);
}
Sorry if my answer seems too simple, but if you have two linked list which are identified by a header and you join them, so that
A -> B -> C -> D is the first list, and
1 -> 2 -> 3 -> 4 is the second, then suppose
A -> B -> C -> 1 -> 2 -> 3 -> 4 -> D is the result
then to find the merging point you need to go through the final list until you find the second header (the 1). Which goes in O(n1) worst case, where n1 is the number of elements of the first list (this happens if the second list is merged at the end).
That's how I would intend the question. The reference to the C language would probably mean that you have no 'object' or pre-packaged data structure, unless specified.
[update] as suggested by Sebastian, if the two list above have the same elements my solution won't work. I suspect that this is where the C language comes into action: you can search for the address of the first element of the second list (the head). Thus the duplicates objection won't hold.
Well, there are several approaches to solve this problem.
Note that i am only discussing the approaches[corner cases may need to be handled separately] starting from brute force to the best one.
Considering N: number of nodes in first linked list
M: number of nodes in second linked list
Approach 1:
Compare each node of first linked list with every other node of second list. Stop when you find a matching node, this is the merging point.
while(head1)
{
cur2=head2;
while(cur2)
{
if(cur2==head1)
return cur2;
cur2=cur2->next;
}
head1=head1->next;
}
Time Complexity: O(N*M)
Space Complexity: O(1)
Approach 2:
Maintain two stacks. Push all the nodes of he first linked list to first stack. Repeat he same for second linked list.
Start popping nodes from both the stacks until both popped nodes do not match. The last matching node is the merging point.
Time Complexity: O(N+M)
Space Complexity: O(N+M)
Approach 3:
Make use of hash table. Insert all the nodes of the first linked list into hash.
Search for the first matching node of he second list in the hash.
This is the merging point.
Time Complexity: O(N+M)
Space Complexity: O(N)
Note that the space complexity may vary depending upon the hash function used[talking about C where you are supposed to implement your own hash function].
Approach 4:
Insert all the nodes of first linked list[by nodes, i mean addresses] into an array.
Sort the array with some stable sorting algorithm in O(N logN) time[Merge sort would be better].
Now search for the first matching node from the second linked list.
Time Complexity: O(N logN)
Space Complexity: O(N)
Note that this approach may be better than Approach 3 [in terms of space]as it doesn't use a hash.
Approach 5:
1. Take an array of size M+N.
2. Insert each node from the first linked list, followed by inserting each node from the second linked list.
3. Search for the first repeating element[can be found in one scan in O(M+N) time].
Time Complexity: O(N+M)
Space Complexity: O(N+M)
Approach 6: [A better approach]
1. Modify the first linked list & make it circular.
2. Now starting from the head of the second linked list, find the start of the loop using Floyd- Warshall cycle detection algorithm.
3. Remove the loop[can be easily removed as we know the last node].
Time Complexity: O(N+M)
Space Complexity: O(1)
Approach 7: [Probably the best one]
1. Count the number of nodes in first linked list[say c1].
2. Count the number of nodes in second linked list[say c2].
3. Find the difference[Lets say c1>c2] diff=c1-c2.
4. Take two pointers p1 & p2, p1 pointing to the head of the first linked list & p2 pointing to the head of the second linked list.
5. Move p1 diff times.
6. Move both p1 & p2 each node at a time until both point to the same node.
7. p1 or p2 indicates the merging point.
Time Complexity: O(N+M)
Space Complexity: O(1)
Trivial solution is obviously O(N+M). Hm.. What could be better. You can go from start to end of the list or vice versa. When you have a threads, you can go these directions at the some time, so should be a litter bit quicker.

Resources