finding longest path in an adjacency list - c

I have an adjacency list I have created for a given graph with nodes and weighted edges. I am trying to figure out what the best way would be to find the longest path within the graph. I have a topological sort method, which I've heard can be useful, but I am unsure how to implement it to find the longest path. So is there a way to accomplish this using topology sort or is there a more efficient method?
Here is an example of my out for the adj list (the value in parenthesis are the cost to get to the node after the arrow (cost)to get to -> node:
Node 0 (4)->1(9)->2
Node 1 (10)->3
Node 2 (8)->3
Node 3
Node 4 (3)->8(3)->7
Node 5 (2)->8(4)->7(2)->0
Node 6 (2)->7(1)->0
Node 7 (5)->9(6)->1(4)->2
Node 8 (6)->9(5)->1
Node 9 (7)->3
Node 10 (12)->4(11)->5(1)->6

Bryan already answered your question above, but I thought I could go in more depth.
First, as he pointed out, this problem is only easily solvable if there are no cycles. If there are cycles you run into the situation where you have infinitely long paths. In that case, you might define a longest path to be any path with no repeated nodes. Unfortunately, this problem can be shown to be NP-Hard. So instead, we'll focus on the problem which it seems like you actually need to solve (since you mentioned the topological sort)--longest path in a Directed Acyclic Graph (DAG). We'll also assume that we have two nodes s and t that are our start and end nodes. The problem is a bit uglier otherwise unless you can make certain assumptions about your graph. If you understand the text below, and such assumptions in your graphs are correct, then perhaps you can remove the s and t restrictions (otherwise, you'll have to run it on every pair of vertices in your graph! Slow...)
The first step in the algorithm is to topologically order the vertices. Intuitively this makes sense. Say you order them from left to right (i.e. the leftmost node will have no incoming edges). The longest path from s to t will generally start from the left and end on the right. It's also impossible for the path to ever go in the left direction. This gives you a sequential ordering to generate the longest path--start at the left and move right.
The next step is to sequentially go left to right and define the longest path for each node. For any node that has no incoming edges, the longest path to that node is 0 (this is true by definition). For any node with incoming edges, recursively define the longest path to that node to be the maximum over all incoming edges + the longest path to get to the "incoming" neighbor (note that this number might be negative, if, for example, all of the incoming edges are negative!). Intuitively this makes sense, but the proof is also trivial:
Suppose our algorithm claims that the longest path to some node v is d but the actual longest path is some d' > d. Pick the "least" such node v (we use the ordering as defined by the topological sort. In other words, we pick the "left-most" node that our algorithm failed at. This is important so that we can assume that our algorithm has correctly determined the longest path for any nodes to the "left" of v). Define the length of the hypothetical longest path to be d' = d_1 + e where d_1 is the length of the hypothetical path up to a node v_prev with edge e to v (note the sloppy naming. The edge e also has weight e). We can define it as such because any path to v must go through one of its neighbors which have an edge going to v (since you can't get to v without getting there via some edge that goes to it). Then d_1 must be the longest path to v_prev (else, contradiction. There is a longer path which contradicts our choice of v as the "least" such node!) and our algorithm would choose the path containing d_1 + e as desired.
To generate the actual path you can figure out which edge was used. Say you've reconstructed the path up to some vertex v which has longest path length d. Then go over all incoming vertices and find the one with longest path length d' = d - e where e is the weight of the edge going into v. You could also just keep track of the parents' of nodes as you go through the algorithm. That is, when you find the longest path to v, set its parent to whichever adjacent node was chosen. You can use simple contradiction to show why either method generates the longest path.
Finally some pseudocode (sorry, it's basically in C#. This is a lot messier to code in C without custom classes and I haven't coded C in a while).
public List<Nodes> FindLongestPath(Graph graph, Node start, Node end)
{
var longestPathLengths = Dictionary<Node, int>;
var orderedNodes = graph.Nodes.TopologicallySort();
// Remove any nodes that are topologically less than start.
// They cannot be in a path from start to end by definition
while (orderedNodes.Pop() != start);
// Push it back onto the top of the stack
orderedNodes.Push(start);
// Do algorithm until we process the end node
while (1)
{
var node = orderedNodes.Pop();
if (node.IncomingEdges.Count() == 0)
{
longestPathLengths.Add(node, 0);
}
else
{
var longestPathLength = Int.Min;
foreach (var incomingEdge in node.IncomingEdges)
{
var currPathLength = longestPaths[incomingEdge.Parent] +
incomingEdge.Weight);
if (currPathlength > longestPathLength)
{
longestPath = currPathLength;
}
}
longestPathLengths.Add(node, longestPath);
}
if (node == end)
{
break;
}
}
// Reconstruct path. Go backwards until we hit start
var node = end;
var longestPath = new List<Node>();
while (node != start)
{
foreach (var incomingEdge in node.IncomingEdges)
{
if (longestPathLengths[incomingEdge.Parent] ==
longestPathLengths[node] - incomingEdge.Weight)
{
longestPath.Prepend(incomingEdge.Parent);
node = incomingEdge.Parent;
break;
}
}
}
return longestPath;
}
Note that this implementation is not particularly efficient, but hopefully it's clear! You can optimize in a lot of small ways that should be obvious as you think through the code/implementation. Generally, if you store more stuff in memory, it'll run faster. The way you structure your Graph is also critical. For instance, it didn't seem like you had an IncomingEdges property for your nodes. But without that, finding the incoming edges for each node is a pain (and is not performant!). In my opinion, graph algorithms are conceptually different from, say, algorithms on strings and arrays because the implementation matters so much! If you read the wiki entries on graph algorithms you'll find they often give three or four different runtimes based on different implementations (with different data structures). Keep this in mind if you care about speed

Assuming your graph has no cycles, otherwise longest path becomes a vague concept, you can have a topological sort indeed. Now you can walk this topological sort and for each node compute its longest distance from a source node by looking at all its predecessors and add the weight of the edge connecting to them to their distance. Then choose the predecessor that gives you the longest distance for this node. The topological sort guarantees that all your predecessors have their distance already correctly determined.
If in addition to the length of the longest path, you also want the path itself. Then you start at the node that gave the longest length and look at all its predecessors to find the one that resulted in this length. Then repeat this process until you have found a source node of the graph.

Related

Given a DAG, the length of the longest path and the node in which it ends, how do I retrace my steps so I can print each node of the longest path?

I'm working on a problem of finding the most parallelepipeds that can be stored into each other given a list of parallelepipeds.
My approach was to represent the graph with an adjacency list, do a topological sort and then for each node in the topological array "unrelax" the edges, giving me the longest path.
Below is the code but I don't think it matters for the question.
typedef struct Edge {
int src; /* source node */
int dst; /* destination node */
struct Edge *next;
} Edge;
int maxend; //node in which the longest path ends
int mp; // longest path
for (int i = 0; i < G.n; i++)
{
int j = TA[i]; //TA is the topological sorted array
if (g->edges[j] != NULL)
{
if(DTA[j] == -1) DTA[j] = 0;
Edge* tmp = G.edges[j];
while (tmp != NULL)
{
if(DTA[tmp->src] >= DTA[tmp->dst]){ //DTA is the array that keeps track of the maximum distance of each node in TA
DTA[tmp->dst] = DTA[tmp->src]+1;
if (DTA[tmp->dst] > mp) {
mp = DTA[tmp->dst];
maxend = tmp->dst;
}
}
tmp = tmp->next;
}
}
}
In the end I have the lenght of the longest path and the node in which said path ends, but how do I efficiently recreate the path?
If parallelepiped A contains parallelepiped B and parallelepiped B contains parallelepiped C that means that parallelepiped A parallelepiped box C aswell, which means that each edge has a weight of 1 and Vertex where the longest path starts has the furthest node of the path in his adjaceny list.
I've thought of 3 solutions but none of them look great.
Iterate the edges of each vertex that has weight 0 (so no predecessors) and if there is a choice avoid choosing the edge that connects it with the furthest node (As said before, the shortest path between the starting node and the ending node will be 1)
In the the array that tracks the maximum distance of each node in the topologically sorted array: start from the index representing the furthest node we found, see if the previous node has a compatible distance (as in, the previous node has 1 less distance than the furthest node). If it does, check it's adjaceny list to see if the furthest node is in it (because if the furthest node has a distance of 10 there could be several nodes that have a distance of 9 but are unconnected to it). Repeat until we reach the root of the path.
Most probable candidate so far, create an array of pointers that keeps track of the "maximum" parent of each node. In the code above everytime a node has it's maximum distance changed it means that it's parent node, if it had any, had a longer distance than the previous parent, which means we can change the maximum parent associated with the current node.
Edit: I ended up just allocating a new array and everytime I updated the weight of a node ( DTA[tmp->src] >= DTA[tmp->dst] ) I also stored the number of the source edge in the cell of the destination edge.
I am assuming the graph edge u <- v indicates that box u is big enough to contain v.
I suggest you dump the topological sort. Instead:
SET weight of every edge to -1
LOOP
LOOP over leaf nodes ( out degree zero, box too small to contain others )
Run Dijkstra algorithm ( gives longest path, with predecessors )
Save length of longest path, and path itself
SAVE longest path
REMOVE nodes on longest path from graph
IF all nodes gone from graph
OUTPUT saved longest paths ( lists of nested boxes )
STOP
This is called a "greedy" algorithm. It is not guaranteed to give the optimal result. But it is fast and simple, always gives a reasonable result and often does give the optimal.
I think this solves it, unless there's something I don't understand.
The highest-weighted path in a DAG is equivalent to the lowest-weighted path if you make the edge weights negative. Then you can apply Dijkstra's algorithm directly.
A longest path between two given vertices s and t in a weighted graph
G is the same thing as a shortest path in a graph −G derived from G by
changing every weight to its negation.
This might even be a special case of Dijkstra that is simpler... not sure.
To retrieve the longest path, you start at the end and go backwards:
Start at the vertex with the greatest DTA V_max
Find the edges that end at V_max (edge->dest = V_max)
Find an edge Src_max where the DTA value is 1 less than the max (DTA[Src_max] == DTA[V_max] - 1)
Repeat this recursively until there are no more source vertices
To make that a little more efficient, you can reverse the endpoints of the edges on the way down and then follow the path back to the start. That way each reverse step is O(1).
I think the option 3 is most promising. You can search for the longest path with DSF starting from all the root vertices (those without incoming edges) and increasing the 'max distance' for each vertex encountered.
This is quite a simple solution, but it may traverse some paths more than once. For example, for edges (a,f), (b,c), (c,f), (d,e), (e,c)
a------------f
/
b----c--/
/
d--e--/
(all directed rightwards)
the starting vertices are a, b, and d, the edge (c,f) will be traversed twice and the vertex f distance will be updated three times. If we append the rest of alphabet to f in a simple chain,
a------------f-----g-- - - ---y---z
/
b----c--/
/
d--e--/
the whole chain from f to z will be probably traversed three times, too.
You can avoid this by separating the phases and modifying the graph between them: after finding all the starting vertices (a, b, d) increment the distance of each vertex available from those (f, c, e), then remove starting vertices and their edges from the graph - and re-iterate as long as some edges remain.
This will transform the example graph after the first step like this:
f-----g-- - - ---y---z
/
c--/
/
e--/
and we can see all the junction vertices (c and f) will wait until the longest path to them is found before letting the analysis go further past them.
That needs iterative seeking for starting vertices, which may be time consuming unless you do some preprocessing (for example, counting all incoming edges for each vertex and storing vertices in some sorting data structure, like an integer-indexed multimap or a simple min-heap.)
The question remains open, whether the whole overhead of truncating a graph and rescaning it for new root vertices makes a net gain compared with multiple traversing some final parts of common paths in your particular graph...

BFS for m-ary tree - C

I'm taking this term a course in C, and I have got an assignment which deals namely with pointers - building an m-ary tree.
Some description:
We receive at the command line arguments: the file name of a text and two numbers which represent the keys of some two vertices in the graph (I will explain later what we have to do with these two vertices).
The first line of the text is actually the total number of the vertices of the graph, the next line could for example include numbers like "2 5" which implies that vertices 2 and 5 are children of vertex with key 0, the next line may include "6 0" which says that vertex with key of 1 is the father of 6 and 0 vertices, and so on...
If some line contains only '-' then it's a leaf.
This part actually deals with parsing and defining the suitable structure for vertex, and I have already done that (but I have to take care of corner cases later on...).
Now, my problem begins - we have to find the number of edges in the tree in Big O of 1 time complexity; find the root in Big O of n (where n is the number of vertices) time complexity; find the simple shortest path between the two vertices (I think it can also be done with BFS) we received at the command line in Big O of n squared time complexity; find the minimal and maximum heights of the tree; find the diameter of the tree in Big O of n squared time complexity.
To implement it, we have to use BFS and we can use their implementation of queue.
Here is my vertex struct:
typedef struct Vertex {
size_t key;
unsigned int amountOfNeighbors; // The current amount of neighbors
unsigned int capacity; // The capacity of the neighbors (It's updating during run-time)
struct Vertex* parent;
struct Vertex** neighbors; // The possible parent and children of a vertex
} Vertex;
I have went through the pseudo-code of BFS and it uses the idea of the next and previous vertices of a vertex - it's a concept which is not used in my implementation and I really don't know how I can mingle it with my code properly...
Secondly, I have no idea how I can calculate the number of edges in the tree in O(1) - it seems impossible - it requires me to go through all the vertices at least once which is O(n)...
So I actually need help to adjust the BFS algorithm to my needs, and find a way to calculate the number of edges in constant time complexity.
Thanks in advance!

Optimizing a method to find the most traversed edge given an adjacency graph and several traversals

I am given N vertices of a tree and its corresponding adjacency graph represented as an N by N array, adjGraph[N][N]. For example, if (1,3) is an edge, then adjGraph[0][2] == 1. Otherwise, adjGraph[i][j] == 0 for (i,j)s that are not edges.
I'm given a series of inputs in the form of:
1 5
which denote that a path has been traversed starting from vertex 1 to vertex 5. I wish to find the edge that was travesed the most times, along with the number of times it was traversed. To do this, I have another N by N array, numPass[N][N], whose elements I first initialize to 0, then increment by 1 every time I identify a path that includes an edge that matches its index. For example, if path (2,4) included edges (2,3) and (3,4), I would increment numPass[1][2] and numPass[2][3] by 1 each.
As I understand it, the main issue to tackle is that the inputs only give information of the starting vertex and ending vertex, and it is up to me to figure out which edges connect the two. Since the given graph is a tree, any path between two vertices is unique. Therefore, I assumed that given the index of the ending vertex for any input path, I would be able to recursively backtrack which edges were connected.
The following is the function code that I have tried to implement with that idea in mind:
// find the (unique) path of edges from vertices x to y
// and increment edges crossed during such a path
void findPath(int x, int y, int N, int adjGraph[][N], int numPass[][N]) {
int temp;
// if the path is a single edge, case is trivial
if (adjGraph[x][y] == 1) {
numPass[x][y] += 1;
return;
}
// otherwise, find path by backtracking from y
backtrack: while (1) {
temp = y-1;
if (adjGraph[temp][y] == 1) {
numPass[temp][y] += 1;
break;
}
}
if (adjGraph[x][temp] == 1) {
numPass[x][temp] += 1;
return;
} else {
y = temp;
goto backtrack;
}
However, the problem is that while my code works fine for small inputs, it runs out of memory for large inputs, since I have a required memory limit of 128MB and time limit of 1 second. The ranges for the inputs are up to 222222 vertices, and 222222 input paths.
How could I optimize my method to satisfy such large inputs?
Get rid of the adjacency matrix (it uses O(N^2) space). Use adjacency lists instead.
Use a more efficient algorithm. Let's make the tree rooted. For a path from a to b we can add 1 to a and b and subtract 1 from their lca (it is easy to see that this way a one is added to edges on this path and only to them).
After processing all paths, the number of paths going through the edge is just a sum in the subtree.
If we use an efficient algorithm to compute lca, this solution works in O(N + Q * log N), where Q is the number of paths. It looks good enough for this constraints (we can actually do even better by using more complex and more efficient algorithms for finding the lca, but I don't think it's necessary here).
Note: lca means lowest common ancestor.

a* algorithm pseudocode

I am trying to implement in c the pseudocode of a* algorithm given by wikipedia but I am really stuck in understanding what is the reconstruct_path function, can someone explain to me what do the variables in this function (p, p+current_node, set) represent?
function A*(start,goal)
closedset := the empty set // The set of nodes already evaluated.
openset := {start} // The set of tentative nodes to be evaluated, initially containing the start node
came_from := the empty map // The map of navigated nodes.
g_score[start] := 0 // Cost from start along best known path.
// Estimated total cost from start to goal through y.
f_score[start] := g_score[start] + heuristic_cost_estimate(start, goal)
while openset is not empty
current := the node in openset having the lowest f_score[] value
if current = goal
return reconstruct_path(came_from, goal)
remove current from openset
add current to closedset
for each neighbor in neighbor_nodes(current)
tentative_g_score := g_score[current] + dist_between(current,neighbor)
if neighbor in closedset
if tentative_g_score >= g_score[neighbor]
continue
if neighbor not in openset or tentative_g_score < g_score[neighbor]
came_from[neighbor] := current
g_score[neighbor] := tentative_g_score
f_score[neighbor] := g_score[neighbor] + heuristic_cost_estimate(neighbor, goal)
if neighbor not in openset
add neighbor to openset
return failure
function reconstruct_path(came_from, current_node)
if came_from[current_node] in set
p := reconstruct_path(came_from, came_from[current_node])
return (p + current_node)
else
return current_node
Thank you
came_from is a map of navigated nodes, like the comment says. It can be implemented in several ways, but a classic map should be fine for this purpose(even a list is fine).
If you are not familiar with maps, checkout std::map.
The goal of A* is to find a list of moves, that will solve the given problem (represented as a graph). A solution is a path through the graph.
In the pseudocode proposed, came_from store the "history" of the solution you are actually evaluating (so a possible path through the graph).
When you explore a node (a new node or one with less cost in the already visited list):
if neighbor not in openset or tentative_g_score < g_score[neighbor]
came_from[neighbor] := current
you are saving in the came_from map the node where you come from. (It's simpler to think at it as the ordered list of moves till the solution node is reached. A map is used instead of a list for performance issues).
The line above basically means:
"Now I'll visit neighbor node. Remember that I reached neighbor node
coming from current node".
When goal node is reached, A* needs to return the list of moves from start node to goal. You have the reference to the goal node, so you can now recontruct the list(reconstruct_path) of moves to reach it coming from start node, because you stored the list of moves in came_from map.
You have a set of nodes and each node in your path can "point" to its predecessor (the node from which you came from to this node) - this is what came_from map is storing .
You want your a* function to return a list* of nodes in the path.
Now, back to return (p + current_node) - this code basically means return a list which contains all elements from p with current_node at the end. So it's p with 1 element added to the end of p.
You can see, that because this function is recursive, at the beginning it will contain a single element - first in your path, which will be a start. You will then add new elements to it, ending with goal element at the end.
You could also look at this this way: your algorithm allowed you to find a path from goal to start (you just need to follow the came_from of your nodes). This function allows you to traverse your path from start to goal thank you recursion, so you should end up with a list of some sort, containing your path in correct order.
* by list I mean some structure that represent a sequence of elements, not a set.

compare nodes of a binary tree

If I have two binary trees, how would I check if the elements in all the nodes are equal.
Any ideas on how to solve this problem?
You would do a parallel tree traversal - choose your order (pre-order, post-order, in-order). If at any time the values stored in the current nodes differ, so do the two trees. If one left node is null and the other isn't, the trees are different; ditto for right nodes.
Does node order matters? I'm assuming for this answer that the two following trees :
1 1
/ \ / \
3 2 2 3
are not equal, because node position and order is taken into account for the comparison.
A few hints
Do you agree that two empty trees are equal?
Do you agree that two trees that only have a root node, with identical node values, are equal?
Can't you generalize this approach?
Being a bit more precise
Consider this generic tree:
rootnode(value=V)
/ \
/ \
-------- -------
| left | | right |
| subtree| |subtree|
-------- -------
rootnode is a single node. The two children are more generic, and represent binary trees. The children can either be empty, or a single node, or a fully-grown binary tree.
Do you agree that this representation is generic enough to represent any kind of non-empty binary tree? Are you able to decompose, say, this simple tree into my representation?
If you understand this concept, then this decomposition can help you to solve the problem. If you do understand the concept, but can't go any further with the algorithm, please comment here and I'll be a bit more specific :)
you could use something like Tree Traversal to check each value.
If the trees are binary search trees, so that a pre-order walk will produce a reliable, repeatable ordering of items, the existing answers will work. If they're arbitrary binary trees, you have a much more interesting problem, and should look into hash tables.
My solution would be to flatten the two trees into 2 arrays (using level order), and then iterate through each item and compare. You know both arrays are the same order. You can do simple pre-checks such as if the array sizes differ then the two trees aren't the same.
Level Order is fairly easy to implement, the Wikipedia article on tree traversal basically gives you everything you need, including code. If efficiency is being asked for in the question, then a non-recursive solution is best, and done using a FIFO list (a Queue in C# parlance - I'm not a C programmer).
Let the two tree pass through same tree traversal logic and match the outputs. If even a single node data does not match the trees dont match.
Or you could just create a simple tree traversal logic and compare the node values at each recursion.
You can use pointers and recursion to check if node is equal, then check subtrees. The code can be writen as following in Java language.
public boolean sameTree(Node root1, Node root2){
//base case :both are empty
if(root1==null && root2==null )
return true;
if(root1.equals(root2)) {
//subtrees
boolean left=sameTree(root1.left,root2.left);
boolean right=sameTree(root1.right,root2.right);
return (left && right);
}//end if
else{
return false;
}//end else
}//end sameTree()
Writing a C code as a tag mentions in the question.
int is_same(node* T1,node* T2)
{
if(!T1 && !T2)
return 1;
if(!T1 || !T2)
return 0;
if(T1->data == T2->data)
{
int left = is_same(T1->left,T2->left);
int right = is_same(T1->right,T2->right);
return (left && right);
}
else
return 0;
}
Takes care of structure as well as values.
One line code is enough to check if two binary tree node are equal (same value and same structure) or not.
bool isEqual(BinaryTreeNode *a, BinaryTreeNode *b)
{
return (a && b) ? (a->m_nValue==b->m_nValue && isEqual(a->m_pLeft,b->m_pLeft) && isEqual(a->m_pRight,b->m_pRight)) : (a == b);
}
If your values are numerical int, in a known range, you can use an array, (let's say max value n). Traverse through the 1st tree using whatever method you want, adding the data into a said array, in an appropriate index (using the node data as index). Then, traverse through the second tree and check for every node in it, if array[node.data] is not null. If not - trees are identical.
**assuming for each tree all nodes are unique

Resources