BFS for m-ary tree - C - c

I'm taking this term a course in C, and I have got an assignment which deals namely with pointers - building an m-ary tree.
Some description:
We receive at the command line arguments: the file name of a text and two numbers which represent the keys of some two vertices in the graph (I will explain later what we have to do with these two vertices).
The first line of the text is actually the total number of the vertices of the graph, the next line could for example include numbers like "2 5" which implies that vertices 2 and 5 are children of vertex with key 0, the next line may include "6 0" which says that vertex with key of 1 is the father of 6 and 0 vertices, and so on...
If some line contains only '-' then it's a leaf.
This part actually deals with parsing and defining the suitable structure for vertex, and I have already done that (but I have to take care of corner cases later on...).
Now, my problem begins - we have to find the number of edges in the tree in Big O of 1 time complexity; find the root in Big O of n (where n is the number of vertices) time complexity; find the simple shortest path between the two vertices (I think it can also be done with BFS) we received at the command line in Big O of n squared time complexity; find the minimal and maximum heights of the tree; find the diameter of the tree in Big O of n squared time complexity.
To implement it, we have to use BFS and we can use their implementation of queue.
Here is my vertex struct:
typedef struct Vertex {
size_t key;
unsigned int amountOfNeighbors; // The current amount of neighbors
unsigned int capacity; // The capacity of the neighbors (It's updating during run-time)
struct Vertex* parent;
struct Vertex** neighbors; // The possible parent and children of a vertex
} Vertex;
I have went through the pseudo-code of BFS and it uses the idea of the next and previous vertices of a vertex - it's a concept which is not used in my implementation and I really don't know how I can mingle it with my code properly...
Secondly, I have no idea how I can calculate the number of edges in the tree in O(1) - it seems impossible - it requires me to go through all the vertices at least once which is O(n)...
So I actually need help to adjust the BFS algorithm to my needs, and find a way to calculate the number of edges in constant time complexity.
Thanks in advance!

Related

Given a DAG, the length of the longest path and the node in which it ends, how do I retrace my steps so I can print each node of the longest path?

I'm working on a problem of finding the most parallelepipeds that can be stored into each other given a list of parallelepipeds.
My approach was to represent the graph with an adjacency list, do a topological sort and then for each node in the topological array "unrelax" the edges, giving me the longest path.
Below is the code but I don't think it matters for the question.
typedef struct Edge {
int src; /* source node */
int dst; /* destination node */
struct Edge *next;
} Edge;
int maxend; //node in which the longest path ends
int mp; // longest path
for (int i = 0; i < G.n; i++)
{
int j = TA[i]; //TA is the topological sorted array
if (g->edges[j] != NULL)
{
if(DTA[j] == -1) DTA[j] = 0;
Edge* tmp = G.edges[j];
while (tmp != NULL)
{
if(DTA[tmp->src] >= DTA[tmp->dst]){ //DTA is the array that keeps track of the maximum distance of each node in TA
DTA[tmp->dst] = DTA[tmp->src]+1;
if (DTA[tmp->dst] > mp) {
mp = DTA[tmp->dst];
maxend = tmp->dst;
}
}
tmp = tmp->next;
}
}
}
In the end I have the lenght of the longest path and the node in which said path ends, but how do I efficiently recreate the path?
If parallelepiped A contains parallelepiped B and parallelepiped B contains parallelepiped C that means that parallelepiped A parallelepiped box C aswell, which means that each edge has a weight of 1 and Vertex where the longest path starts has the furthest node of the path in his adjaceny list.
I've thought of 3 solutions but none of them look great.
Iterate the edges of each vertex that has weight 0 (so no predecessors) and if there is a choice avoid choosing the edge that connects it with the furthest node (As said before, the shortest path between the starting node and the ending node will be 1)
In the the array that tracks the maximum distance of each node in the topologically sorted array: start from the index representing the furthest node we found, see if the previous node has a compatible distance (as in, the previous node has 1 less distance than the furthest node). If it does, check it's adjaceny list to see if the furthest node is in it (because if the furthest node has a distance of 10 there could be several nodes that have a distance of 9 but are unconnected to it). Repeat until we reach the root of the path.
Most probable candidate so far, create an array of pointers that keeps track of the "maximum" parent of each node. In the code above everytime a node has it's maximum distance changed it means that it's parent node, if it had any, had a longer distance than the previous parent, which means we can change the maximum parent associated with the current node.
Edit: I ended up just allocating a new array and everytime I updated the weight of a node ( DTA[tmp->src] >= DTA[tmp->dst] ) I also stored the number of the source edge in the cell of the destination edge.
I am assuming the graph edge u <- v indicates that box u is big enough to contain v.
I suggest you dump the topological sort. Instead:
SET weight of every edge to -1
LOOP
LOOP over leaf nodes ( out degree zero, box too small to contain others )
Run Dijkstra algorithm ( gives longest path, with predecessors )
Save length of longest path, and path itself
SAVE longest path
REMOVE nodes on longest path from graph
IF all nodes gone from graph
OUTPUT saved longest paths ( lists of nested boxes )
STOP
This is called a "greedy" algorithm. It is not guaranteed to give the optimal result. But it is fast and simple, always gives a reasonable result and often does give the optimal.
I think this solves it, unless there's something I don't understand.
The highest-weighted path in a DAG is equivalent to the lowest-weighted path if you make the edge weights negative. Then you can apply Dijkstra's algorithm directly.
A longest path between two given vertices s and t in a weighted graph
G is the same thing as a shortest path in a graph −G derived from G by
changing every weight to its negation.
This might even be a special case of Dijkstra that is simpler... not sure.
To retrieve the longest path, you start at the end and go backwards:
Start at the vertex with the greatest DTA V_max
Find the edges that end at V_max (edge->dest = V_max)
Find an edge Src_max where the DTA value is 1 less than the max (DTA[Src_max] == DTA[V_max] - 1)
Repeat this recursively until there are no more source vertices
To make that a little more efficient, you can reverse the endpoints of the edges on the way down and then follow the path back to the start. That way each reverse step is O(1).
I think the option 3 is most promising. You can search for the longest path with DSF starting from all the root vertices (those without incoming edges) and increasing the 'max distance' for each vertex encountered.
This is quite a simple solution, but it may traverse some paths more than once. For example, for edges (a,f), (b,c), (c,f), (d,e), (e,c)
a------------f
/
b----c--/
/
d--e--/
(all directed rightwards)
the starting vertices are a, b, and d, the edge (c,f) will be traversed twice and the vertex f distance will be updated three times. If we append the rest of alphabet to f in a simple chain,
a------------f-----g-- - - ---y---z
/
b----c--/
/
d--e--/
the whole chain from f to z will be probably traversed three times, too.
You can avoid this by separating the phases and modifying the graph between them: after finding all the starting vertices (a, b, d) increment the distance of each vertex available from those (f, c, e), then remove starting vertices and their edges from the graph - and re-iterate as long as some edges remain.
This will transform the example graph after the first step like this:
f-----g-- - - ---y---z
/
c--/
/
e--/
and we can see all the junction vertices (c and f) will wait until the longest path to them is found before letting the analysis go further past them.
That needs iterative seeking for starting vertices, which may be time consuming unless you do some preprocessing (for example, counting all incoming edges for each vertex and storing vertices in some sorting data structure, like an integer-indexed multimap or a simple min-heap.)
The question remains open, whether the whole overhead of truncating a graph and rescaning it for new root vertices makes a net gain compared with multiple traversing some final parts of common paths in your particular graph...

Optimizing a method to find the most traversed edge given an adjacency graph and several traversals

I am given N vertices of a tree and its corresponding adjacency graph represented as an N by N array, adjGraph[N][N]. For example, if (1,3) is an edge, then adjGraph[0][2] == 1. Otherwise, adjGraph[i][j] == 0 for (i,j)s that are not edges.
I'm given a series of inputs in the form of:
1 5
which denote that a path has been traversed starting from vertex 1 to vertex 5. I wish to find the edge that was travesed the most times, along with the number of times it was traversed. To do this, I have another N by N array, numPass[N][N], whose elements I first initialize to 0, then increment by 1 every time I identify a path that includes an edge that matches its index. For example, if path (2,4) included edges (2,3) and (3,4), I would increment numPass[1][2] and numPass[2][3] by 1 each.
As I understand it, the main issue to tackle is that the inputs only give information of the starting vertex and ending vertex, and it is up to me to figure out which edges connect the two. Since the given graph is a tree, any path between two vertices is unique. Therefore, I assumed that given the index of the ending vertex for any input path, I would be able to recursively backtrack which edges were connected.
The following is the function code that I have tried to implement with that idea in mind:
// find the (unique) path of edges from vertices x to y
// and increment edges crossed during such a path
void findPath(int x, int y, int N, int adjGraph[][N], int numPass[][N]) {
int temp;
// if the path is a single edge, case is trivial
if (adjGraph[x][y] == 1) {
numPass[x][y] += 1;
return;
}
// otherwise, find path by backtracking from y
backtrack: while (1) {
temp = y-1;
if (adjGraph[temp][y] == 1) {
numPass[temp][y] += 1;
break;
}
}
if (adjGraph[x][temp] == 1) {
numPass[x][temp] += 1;
return;
} else {
y = temp;
goto backtrack;
}
However, the problem is that while my code works fine for small inputs, it runs out of memory for large inputs, since I have a required memory limit of 128MB and time limit of 1 second. The ranges for the inputs are up to 222222 vertices, and 222222 input paths.
How could I optimize my method to satisfy such large inputs?
Get rid of the adjacency matrix (it uses O(N^2) space). Use adjacency lists instead.
Use a more efficient algorithm. Let's make the tree rooted. For a path from a to b we can add 1 to a and b and subtract 1 from their lca (it is easy to see that this way a one is added to edges on this path and only to them).
After processing all paths, the number of paths going through the edge is just a sum in the subtree.
If we use an efficient algorithm to compute lca, this solution works in O(N + Q * log N), where Q is the number of paths. It looks good enough for this constraints (we can actually do even better by using more complex and more efficient algorithms for finding the lca, but I don't think it's necessary here).
Note: lca means lowest common ancestor.

Count number of paths of all length in a DAG

Suppose you have an unweighted DAG, and two vertices, start s and endt. The problem is to count how many paths there are from s to t of length 1, 2, 3...N-1, where N is the number of vertices in the DAG.
My approach:
Build a matrix d of size N*N, where d[u][k] is the number of ways to reach u from s in exactly k steps, and set d[s][0] = 1
Find a topological sorting TS of the DAG
Now, for every vertex u in TS
clone the array d[u] as a
shift every element in a right by 1 (ie, insert 0 on left, discard rightmost element)
for every adjacent vertex v of u, add array a to array d[v]
The answer is d[t]
This seems to work in O(V+EV). I'm wondering if there is a more efficient O(V+E) way?
The optimal algorithm is quite likely O(VE). However a simpler implementation is possible using BFS allowing vertices to be visited multiple times (on most practical cases, this will use less memory than O(V^2).

Creating the number of shortest paths in a directed Graph

my homework is Creating the number of shortest paths from S to any other vertex in a directed Graph using c language
the graph is shown as a txt file like this:
3 // number of vertex in G
{2,3},{1},{} // in the first {} we can see the neighbors for V1 , in the second for V2 and so on
and i have to print an array of number of shortest path for s
the algorithm i use is like BFS with some adds:
numOfShortest(G,S)
for vertex x which belongs to gropu V-S
do color[x]=white, d[x]=0, F[x]=0
color[s]=gray,d[s]=0,F[s]=1
while Q is not empty //= let Q be a queue
do u=dequeue(Q)
for each vertex v = N(u) // = for every neighbor of u
do if color[v] = white
then color[v]= gray, d[v]=d[u]+1
F[v]=f[v]+f[u] // = v must have atleast the same number of paths as u
enqueue(Q,v)
else if color[v]=gray
then if d[u] < d[v]
then f[v]=f[v]+f[u]
color[u]=black // = when finished with every N(u)
now i have to take a few things into account ( correct me if im wrong)
implement a enqueue using a linked list
make a struct called vertex for each v which contains the neightbors
( using a dynamic array )
i need somehow to scan the neighbors written on the file to the
neighbors on the struct vertex
perhaps i took it too far with the preparations and there is a more simple way to do that, i got some mess in my mind.
thanks to whoever can help
You should start having a look at the Dijkstra algorithm to get the shortest path from one vertex S to every other vertex in the graph.
Then maybe mixing it with a BFS-like algorithm will help you counting what you mean.
You can use a 2D array to store the entire graph.
Let int a[][] the 2D array.
As this is your assignment I am not going to give you the code but can give a way- how to store the graph.
First assign to each g[i][j]=0; This means j is not neighbor of i.
Then take the number of node in a variable and use a loop for taking neighbors in a sequence. And save the neighbors in the array i.e. for your input file.
NumOfNode = 3
a[1][2] = 1;
a[1][3] = 1;
a[2][1] = 1;
After that you in your algorithm if you get a[i][j] is 1 then there is a path from i to j. If a[i][j] is 0 then there is no path from i to j.
This link may help you.

finding longest path in an adjacency list

I have an adjacency list I have created for a given graph with nodes and weighted edges. I am trying to figure out what the best way would be to find the longest path within the graph. I have a topological sort method, which I've heard can be useful, but I am unsure how to implement it to find the longest path. So is there a way to accomplish this using topology sort or is there a more efficient method?
Here is an example of my out for the adj list (the value in parenthesis are the cost to get to the node after the arrow (cost)to get to -> node:
Node 0 (4)->1(9)->2
Node 1 (10)->3
Node 2 (8)->3
Node 3
Node 4 (3)->8(3)->7
Node 5 (2)->8(4)->7(2)->0
Node 6 (2)->7(1)->0
Node 7 (5)->9(6)->1(4)->2
Node 8 (6)->9(5)->1
Node 9 (7)->3
Node 10 (12)->4(11)->5(1)->6
Bryan already answered your question above, but I thought I could go in more depth.
First, as he pointed out, this problem is only easily solvable if there are no cycles. If there are cycles you run into the situation where you have infinitely long paths. In that case, you might define a longest path to be any path with no repeated nodes. Unfortunately, this problem can be shown to be NP-Hard. So instead, we'll focus on the problem which it seems like you actually need to solve (since you mentioned the topological sort)--longest path in a Directed Acyclic Graph (DAG). We'll also assume that we have two nodes s and t that are our start and end nodes. The problem is a bit uglier otherwise unless you can make certain assumptions about your graph. If you understand the text below, and such assumptions in your graphs are correct, then perhaps you can remove the s and t restrictions (otherwise, you'll have to run it on every pair of vertices in your graph! Slow...)
The first step in the algorithm is to topologically order the vertices. Intuitively this makes sense. Say you order them from left to right (i.e. the leftmost node will have no incoming edges). The longest path from s to t will generally start from the left and end on the right. It's also impossible for the path to ever go in the left direction. This gives you a sequential ordering to generate the longest path--start at the left and move right.
The next step is to sequentially go left to right and define the longest path for each node. For any node that has no incoming edges, the longest path to that node is 0 (this is true by definition). For any node with incoming edges, recursively define the longest path to that node to be the maximum over all incoming edges + the longest path to get to the "incoming" neighbor (note that this number might be negative, if, for example, all of the incoming edges are negative!). Intuitively this makes sense, but the proof is also trivial:
Suppose our algorithm claims that the longest path to some node v is d but the actual longest path is some d' > d. Pick the "least" such node v (we use the ordering as defined by the topological sort. In other words, we pick the "left-most" node that our algorithm failed at. This is important so that we can assume that our algorithm has correctly determined the longest path for any nodes to the "left" of v). Define the length of the hypothetical longest path to be d' = d_1 + e where d_1 is the length of the hypothetical path up to a node v_prev with edge e to v (note the sloppy naming. The edge e also has weight e). We can define it as such because any path to v must go through one of its neighbors which have an edge going to v (since you can't get to v without getting there via some edge that goes to it). Then d_1 must be the longest path to v_prev (else, contradiction. There is a longer path which contradicts our choice of v as the "least" such node!) and our algorithm would choose the path containing d_1 + e as desired.
To generate the actual path you can figure out which edge was used. Say you've reconstructed the path up to some vertex v which has longest path length d. Then go over all incoming vertices and find the one with longest path length d' = d - e where e is the weight of the edge going into v. You could also just keep track of the parents' of nodes as you go through the algorithm. That is, when you find the longest path to v, set its parent to whichever adjacent node was chosen. You can use simple contradiction to show why either method generates the longest path.
Finally some pseudocode (sorry, it's basically in C#. This is a lot messier to code in C without custom classes and I haven't coded C in a while).
public List<Nodes> FindLongestPath(Graph graph, Node start, Node end)
{
var longestPathLengths = Dictionary<Node, int>;
var orderedNodes = graph.Nodes.TopologicallySort();
// Remove any nodes that are topologically less than start.
// They cannot be in a path from start to end by definition
while (orderedNodes.Pop() != start);
// Push it back onto the top of the stack
orderedNodes.Push(start);
// Do algorithm until we process the end node
while (1)
{
var node = orderedNodes.Pop();
if (node.IncomingEdges.Count() == 0)
{
longestPathLengths.Add(node, 0);
}
else
{
var longestPathLength = Int.Min;
foreach (var incomingEdge in node.IncomingEdges)
{
var currPathLength = longestPaths[incomingEdge.Parent] +
incomingEdge.Weight);
if (currPathlength > longestPathLength)
{
longestPath = currPathLength;
}
}
longestPathLengths.Add(node, longestPath);
}
if (node == end)
{
break;
}
}
// Reconstruct path. Go backwards until we hit start
var node = end;
var longestPath = new List<Node>();
while (node != start)
{
foreach (var incomingEdge in node.IncomingEdges)
{
if (longestPathLengths[incomingEdge.Parent] ==
longestPathLengths[node] - incomingEdge.Weight)
{
longestPath.Prepend(incomingEdge.Parent);
node = incomingEdge.Parent;
break;
}
}
}
return longestPath;
}
Note that this implementation is not particularly efficient, but hopefully it's clear! You can optimize in a lot of small ways that should be obvious as you think through the code/implementation. Generally, if you store more stuff in memory, it'll run faster. The way you structure your Graph is also critical. For instance, it didn't seem like you had an IncomingEdges property for your nodes. But without that, finding the incoming edges for each node is a pain (and is not performant!). In my opinion, graph algorithms are conceptually different from, say, algorithms on strings and arrays because the implementation matters so much! If you read the wiki entries on graph algorithms you'll find they often give three or four different runtimes based on different implementations (with different data structures). Keep this in mind if you care about speed
Assuming your graph has no cycles, otherwise longest path becomes a vague concept, you can have a topological sort indeed. Now you can walk this topological sort and for each node compute its longest distance from a source node by looking at all its predecessors and add the weight of the edge connecting to them to their distance. Then choose the predecessor that gives you the longest distance for this node. The topological sort guarantees that all your predecessors have their distance already correctly determined.
If in addition to the length of the longest path, you also want the path itself. Then you start at the node that gave the longest length and look at all its predecessors to find the one that resulted in this length. Then repeat this process until you have found a source node of the graph.

Resources