Finding the diameter of m-ary tree - C - c

I have to analyze an m-ary tree in C - using namely BFS.
There are some requirements I don't succeed to implement for a while:
1. Find the diameter of the tree.
2. Given two vertices in the tree - find the shortest simple path between them.
As for 1 - I went through the topics in Stack - and have seen some implementations (not in C unfortunately) which are not very clear to me... Some way of calculating the diameter by using BFS twice, starting from a random vertex... I'm not sure if the second BFS has to "remember" the visited array from the first BFS.
As for 2 - I really don't know how to approach to that, but I believe I can use somehow BFS here.
Moreover, I have to implement these two requirements in O(n^2) time complexity.
Besides that, I have to find the maximal and minimal heights of the tree.
As for the maximal height - I have implemented BFS (not sure it's absolutely correct) which to my understanding, deals with this maximal height.
As for the minimal height - I have no idea how to find it.
Here are my vertex struct and BFS implementations:
typedef struct Vertex {
size_t key;
size_t amountOfNeighbors; // The current amount of neighbors
size_t capacity; // The capacity of the neighbors (It's updating during run-time)
struct Vertex* parent;
struct Vertex** neighbors; // The possible parent and children of a vertex
} Vertex;
Vertex* bfs(Vertex* allVertices, size_t numOfVertices, Vertex* startVertex, size_t* pathDistance) {
if (startVertex -> neighbors == NULL) { // In case we have only one vertex in the graph
*pathDistance = 0;
return startVertex;
}
Queue* q = (Queue*)malloc((sizeof(size_t) * numOfVertices));
int* visited = (int*)malloc(sizeof(int) * numOfVertices);
for (size_t i = 0; i < numOfVertices; i++) {
visited[i] = 0; // Mark all the vertices as unvisited
}
size_t lastVertex = 0; // Actually indicates the furthermost vertex from startVertex
*pathDistance = 0; // The number of edges between lastVertex and startVertex
enqueue(q, startVertex->key);
visited[startVertex->key] = 1; // Mark as visited
while (!queueIsEmpty(q)) {
unsigned int currentVertex = dequeue(q); // The key of the current vertex
Vertex* s = &allVertices[currentVertex];
size_t currentAmountOfNeighbors = 0; // Detects the number of processed neighbors of the current vertex
for (Vertex **child = s->neighbors; currentAmountOfNeighbors < s->amountOfNeighbors; currentAmountOfNeighbors++) {
if (!visited[(*(child))->key]) {
visited[(*(child))->key] = 1;
enqueue(q, (*(child))->key);
child++; // TODO Validate it's a correct use of memory!
}
}
*pathDistance += 1; // Another layer passed
lastVertex = peekQueue(q);
}
Vertex* furtherMostVertexFromS = &allVertices[lastVertex];
free(q);
q = NULL;
return furtherMostVertexFromS;
}
My difficulties and wondering are in bold and any help with some of them will be appreciated.

Firstly, questions of this nature are more appropriate to the CS Stack Exchange, but I'll try to help regardless
For your first question(finding the diameter), note that the longest path of the tree must begin(or end) with the deepest node in the tree(which is a leaf). BFS helps you find the depths of all nodes, and thus help you find the deepest node. Can you figure from there how to find the end of said path? Hint: Think about the procedure for finding the deepest node of a graph.
There seems to be a misunderstanding on your part about how BFS works: Note that the point of keeping track of visited nodes, is to avoid crossing through back-edges - that is, to avoid cycles - which aren't possible in a tree.
But hypothetically, even if you do maintain such a 'visited' array (e,g if you want your algorithm to handle cyclic graphs), why would it be shared between different BFS invocations?
As for the second question: BFS finds the distances between nodes in the graph and the starting node(also called 'depth' when called from root). In particular, these are the shortest paths(on an unweighted graph)
The answer to the rest of your bolded questions are also related, the key takeway is that in an acylic, unweighted graph - BFS lets you find the shortest path/minimal distance from the starting node (consult an algorithms textbook for more details on that)

Related

Given a DAG, the length of the longest path and the node in which it ends, how do I retrace my steps so I can print each node of the longest path?

I'm working on a problem of finding the most parallelepipeds that can be stored into each other given a list of parallelepipeds.
My approach was to represent the graph with an adjacency list, do a topological sort and then for each node in the topological array "unrelax" the edges, giving me the longest path.
Below is the code but I don't think it matters for the question.
typedef struct Edge {
int src; /* source node */
int dst; /* destination node */
struct Edge *next;
} Edge;
int maxend; //node in which the longest path ends
int mp; // longest path
for (int i = 0; i < G.n; i++)
{
int j = TA[i]; //TA is the topological sorted array
if (g->edges[j] != NULL)
{
if(DTA[j] == -1) DTA[j] = 0;
Edge* tmp = G.edges[j];
while (tmp != NULL)
{
if(DTA[tmp->src] >= DTA[tmp->dst]){ //DTA is the array that keeps track of the maximum distance of each node in TA
DTA[tmp->dst] = DTA[tmp->src]+1;
if (DTA[tmp->dst] > mp) {
mp = DTA[tmp->dst];
maxend = tmp->dst;
}
}
tmp = tmp->next;
}
}
}
In the end I have the lenght of the longest path and the node in which said path ends, but how do I efficiently recreate the path?
If parallelepiped A contains parallelepiped B and parallelepiped B contains parallelepiped C that means that parallelepiped A parallelepiped box C aswell, which means that each edge has a weight of 1 and Vertex where the longest path starts has the furthest node of the path in his adjaceny list.
I've thought of 3 solutions but none of them look great.
Iterate the edges of each vertex that has weight 0 (so no predecessors) and if there is a choice avoid choosing the edge that connects it with the furthest node (As said before, the shortest path between the starting node and the ending node will be 1)
In the the array that tracks the maximum distance of each node in the topologically sorted array: start from the index representing the furthest node we found, see if the previous node has a compatible distance (as in, the previous node has 1 less distance than the furthest node). If it does, check it's adjaceny list to see if the furthest node is in it (because if the furthest node has a distance of 10 there could be several nodes that have a distance of 9 but are unconnected to it). Repeat until we reach the root of the path.
Most probable candidate so far, create an array of pointers that keeps track of the "maximum" parent of each node. In the code above everytime a node has it's maximum distance changed it means that it's parent node, if it had any, had a longer distance than the previous parent, which means we can change the maximum parent associated with the current node.
Edit: I ended up just allocating a new array and everytime I updated the weight of a node ( DTA[tmp->src] >= DTA[tmp->dst] ) I also stored the number of the source edge in the cell of the destination edge.
I am assuming the graph edge u <- v indicates that box u is big enough to contain v.
I suggest you dump the topological sort. Instead:
SET weight of every edge to -1
LOOP
LOOP over leaf nodes ( out degree zero, box too small to contain others )
Run Dijkstra algorithm ( gives longest path, with predecessors )
Save length of longest path, and path itself
SAVE longest path
REMOVE nodes on longest path from graph
IF all nodes gone from graph
OUTPUT saved longest paths ( lists of nested boxes )
STOP
This is called a "greedy" algorithm. It is not guaranteed to give the optimal result. But it is fast and simple, always gives a reasonable result and often does give the optimal.
I think this solves it, unless there's something I don't understand.
The highest-weighted path in a DAG is equivalent to the lowest-weighted path if you make the edge weights negative. Then you can apply Dijkstra's algorithm directly.
A longest path between two given vertices s and t in a weighted graph
G is the same thing as a shortest path in a graph −G derived from G by
changing every weight to its negation.
This might even be a special case of Dijkstra that is simpler... not sure.
To retrieve the longest path, you start at the end and go backwards:
Start at the vertex with the greatest DTA V_max
Find the edges that end at V_max (edge->dest = V_max)
Find an edge Src_max where the DTA value is 1 less than the max (DTA[Src_max] == DTA[V_max] - 1)
Repeat this recursively until there are no more source vertices
To make that a little more efficient, you can reverse the endpoints of the edges on the way down and then follow the path back to the start. That way each reverse step is O(1).
I think the option 3 is most promising. You can search for the longest path with DSF starting from all the root vertices (those without incoming edges) and increasing the 'max distance' for each vertex encountered.
This is quite a simple solution, but it may traverse some paths more than once. For example, for edges (a,f), (b,c), (c,f), (d,e), (e,c)
a------------f
/
b----c--/
/
d--e--/
(all directed rightwards)
the starting vertices are a, b, and d, the edge (c,f) will be traversed twice and the vertex f distance will be updated three times. If we append the rest of alphabet to f in a simple chain,
a------------f-----g-- - - ---y---z
/
b----c--/
/
d--e--/
the whole chain from f to z will be probably traversed three times, too.
You can avoid this by separating the phases and modifying the graph between them: after finding all the starting vertices (a, b, d) increment the distance of each vertex available from those (f, c, e), then remove starting vertices and their edges from the graph - and re-iterate as long as some edges remain.
This will transform the example graph after the first step like this:
f-----g-- - - ---y---z
/
c--/
/
e--/
and we can see all the junction vertices (c and f) will wait until the longest path to them is found before letting the analysis go further past them.
That needs iterative seeking for starting vertices, which may be time consuming unless you do some preprocessing (for example, counting all incoming edges for each vertex and storing vertices in some sorting data structure, like an integer-indexed multimap or a simple min-heap.)
The question remains open, whether the whole overhead of truncating a graph and rescaning it for new root vertices makes a net gain compared with multiple traversing some final parts of common paths in your particular graph...

Find the index of the farthest smaller number in the right side of an array

Given an array of size N. For every element in the array, the task is to find the index of the farthest element in the array to the right which is smaller than the current element. If no such number exists then print -1
This question is taken from here
Sample Test Cases
Input
3, 1, 5, 2, 4
Output
3, -1, 4, -1, -1
Input
1, 2, 3, 4, 0
Output
4, 4, 4, 4, -1
I would also like to clarify that this is not a duplicate of this post here. While I did understand the solution mentioned in the post, I would really like to know why the above approach does not work for all test cases.
I came up with the following approach,
Create a binary search tree from the right side of the array
Each node stores the following info - the value, the index of the current element and the index of the smallest element which is farthest away from it's right side
While inserting, check if the current element being inserted (while moving to the right subtree) satisfies the condition and update the farthestDst accordingly
I tried to submit this, but I got Wrong Answer (failing test case not shown) despite running successfully against some sample test cases. I have attached my code in C++ below
class TreeNode{
public:
// farthestDst is the index of the smallest element which is farthest away from it's right side
int val,idx,farthestDst;
TreeNode* left;
TreeNode* right;
TreeNode(int value, int index, int dst){
val = value;
idx = index;
farthestDst = dst;
left = right = NULL;
}
};
class Solution{
public:
TreeNode* root = NULL;
unordered_map <int,TreeNode*> mp; // store address of each node just to speed up search
TreeNode* insertBST(TreeNode* root, int val, int idx, int dst){
if(root == NULL){
TreeNode* node = new TreeNode(val,idx,dst);
mp[val] = node;
return node;
}
else if(val >= root->val){ // checking the condition
if((root->idx)-idx > dst){
dst = root->idx;
}
root->right = insertBST(root->right,val,idx,dst);
}
else{
root->left = insertBST(root->left,val,idx,dst);
}
return root;
}
// actual function to complete where N is the size of the vector and nums contains the values
vector<int> farNumber(int N,vector<int> nums){
vector<int> res;
if(nums.size() == 0){ // base case check if nums is empty
return res;
}
for(int i = nums.size()-1; i >= 0; i--){
root = insertBST(root,nums[i],i,-1);
}
for(int i = 0; i < nums.size(); i++){
TreeNode* node = mp[nums[i]];
res.push_back(node->farthestDst);
}
return res;
}
};
Just a note, if anyone wants to test their solution, they can do so at this link
Please do let me know if further clarification about the code is needed
Any help would be appreciated. Thanks!
mp[] assumes that each element value appears at most once in the input. This is not given as part of the problem description, so it's not guaranteed. If some value appears more than once, its original value in mp[] will be overwritten. (Ironically, most C++ standard libraries implement unordered_map<T> as a balanced BST -- an AVL tree or red-black tree.)
Not technically a bug, but as pointed out by nice_dev in a comment, because your BST performs no rebalancing, it can become arbitrarily badly balanced, leading to O(n) insertion times for O(n^2) performance overall. This will occur on, e.g, sorted or reverse-sorted inputs. There are probably test cases large enough to cause timeouts for O(n^2)-time algorithms.
Unfortunately, adding rebalancing to your code to bring the worst-case time down to O(n log n) will cause it to become incorrect, because it currently depends on a delicate property: It doesn't compare each inserted element with all smaller-valued elements to its right, but only with the ones you encounter on the path down from the root of the BST. Whenever during this traversal you encounter an element at position j with value nums[j] < nums[i], you ignore all elements in its left subtree. With your current implementation, this is safe: Although these elements are all known to be smaller than nums[i] by the BST property, they can't be further to the right than j is, because insertion order means that every child is to the left of its parent. But if you change the algorithm to perform tree rotations to rebalance the tree, the second property can be lost -- you could miss some element at position k with nums[k] < nums[j] < nums[i] but k > j.
Finally, having both a member variable root and a function argument root is confusing.

Find maximum subtree in the given BST such that it has no duplicates

Given the BST which allows duplicates as separate vertices, how do I find the highest subtree such that it has no duplicates.
This is the idea:
(1) Check if the root value appears in its right subtree (inserting this way: left < root <= right). If not, tree has no duplicates. I look for it always on the left from the root's child.
(2) Traversing and doing (1) I can find all subtrees without duplicates, storing their root pointer and height.
(3) Comparing heights I can find largest seeked subtree.
I don't know how to store these information while traversing. I found programs for finding all duplicate subtrees of BST that use hash maps, but if possible I would prefer to avoid using hash maps, as I haven't had them on my course yet.
<!-- language: lang-c -->
typedef struct vertex {
int data;
struct vertex *left;
struct vertex *right;
} vertex, *pvertex;
// Utility functions
int Height(pvertex t){
if (t == NULL)
return 0;
if (Height(t->left) > Height(t->right))
return Height(t->left) + 1;
else
return Height(t->right) + 1;
}
int DoesItOccur(pvertex t, int k){
if(!t)
return 0;
if(t->data==k)
return 1;
if(t->data<k){
return DoesItOccur(t->left,k);
}
}
// My function
pvertex MaxSeeked(pvertex t){
if(!t)
return NULL;
if(DoesItOccur(t->right,t->data)==0)
return t;
else if{
if(t->left && t->right){
if(Height(MaxSeeked(t->left))>Height(MaxSeeked(t->right)))
return t->left;
else
return t->right;
}
}
else if{
......
}
}
I don't know how to store these information while traversing. I found programs for finding all duplicate subtrees of BST that use hash maps, but if possible I would prefer to avoid using hash maps, as I haven't had them on my course yet.
Note in the first place that you only need to track all the subtrees of the maximal height discovered so far. Or maybe you can limit that to just one such, if that's all you need to discover. For efficiency, you should also track what that maximal height actually is.
I'll suppose that you must not add members to your node structure, but if you could do, you could add a member or two wherein to record whether the tree rooted at each node contains any dupes, and how high that tree is. You could populate those data as you go, and remember what the maximum height is, then make a second traversal to collect the nodes.
But without modifying any nodes themselves, you can still track the current candidates by other means, such as a linked list. And you can put whatever metadata you want into the tracking data structure. For example,
struct nondupe_subtree {
struct vertex *root;
int height;
struct nondupe_subtree *next;
};
You can then, say, perform a selective traversal of your tree in breadth first order, carrying along a linked list of struct nondupe_subtree nodes:
Start by visiting the root node.
Test the subtree rooted at each visited node to see whether it contains any dupes, according to the procedure you have described.
If so then enqueue its children for traversal.
If not then measure the subtree height and update your linked list (or not) accordingly. Do not enqueue this node's children.
When no more nodes are enqueued for traversal, you linked list contains the roots of all the maximal height subtrees without dupes.
Note that that algorithm would in many cases be significantly sped if you could compute and store all the subtree heights in an initial DFS pass, for it is otherwise prone to performing duplicate tree-height computations. Many of them, in some cases.
Note also that although it does simplify this particular algorithm, your rule for always putting dupes to the right works against balanced trees, which may also yield reduced performance. In the worst case, where are vertices are duplicate, your "tree" will perforce be linear.

Making edge on Ice Sliding puzzle path finding

[My apology for the title, I just specified the problem I encountered on this puzzle.
I'm making a path finding method with the least distance travelled, depending on the number of asterisks encountered.
The rules of the game is simple, traversing from A to B, but I can only move in a straight line and cannot stop moving in that direction until you hit an asterisk (or the B), as if they were sliding across every zero.
Example, the photo shows the shortest path from A to B with 23 as the total distance travelled.
]1
The first idea that appeared on my mind is making an adjacency matrix, initially, which I have my code here:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
int main()
{
FILE *hehehe = fopen("input.txt","r");
//==========================ADJACENCY MATRIX INITIALIZATION=======================================//
int row, column, i, j;
fscanf(hehehe,"%d",&row);
fscanf(hehehe,"%d",&column);
char c;
c = fgetc(hehehe);
int matrix[row][column];
c = fgetc(hehehe);
for (i = 0; i < row; i++)
{
for (j = 0; j < column; j++)
{
if (c == '*'){
matrix[i][j] = 1;
c = fgetc(hehehe);
}
else if (c == 'A')
{
matrix[i][j] = 2;
c = fgetc(hehehe);
}
else if (c == 'B')
{
matrix[i][j] = 3;
c = fgetc(hehehe);
}
else{
matrix[i][j] = 0;
c = fgetc(hehehe);
}
if (c == '\n'){c = fgetc(hehehe);}
}
}
for (i = 0; i < row; i++)
{
for (j = 0; j < column; j++)
{
//if (matrix[i][j] == 1) printf("*");
//else printf(" ");
printf("%d ",matrix[i][j]);
}
printf("\n");
}
fclose(hehehe);
}
Any idea or suggestion for continuing for making of edge in every straight line in the photo is high appreciated. Thank you for any help in advance.
In this case, I think a matrix is overdoing it. Because you cannot make a move while sliding, you only need a directed graph.
There are a few things to keep in mind while making your algorithm:
Stopping points are the goal, or any space adjacent to a wall/asterisk.
A vertex should store two values; direction, and location.
For each asterisk or obstacle, add adjacent spaces to the list of vertices (if they don't exist in your graph yet). They only require one direction.
For each B, add a vertex with all possible directions. (Or one vertex for each direction, depending on whether that makes it easier).
For each vertex, find the closest vertex in the direction stored. Draw an edge between both vertices (if it does not already exist).
Run an appropriate search algorithm. If distance matters, use Dijkstra's. If it doesn't, use Breadth-first. If there are special scoring rules, consider A*.
Because your search space doesn't seem so large, I haven't fully optimized the algorithm. This is why I mention checking that vertices and edges aren't already added. Optimizing these is possible, and I can help with that if you need, but if-statements aren't costly enough to warrant premature optimization. Because your search space is easy to simplify, Breadth-first and Dijkstra's algorithm are absolutely perfect; they find the shortest path and their performance cost is nowhere near as high as putting them on a 2D grid.
If you're not sure of how to make your data structures, here's the way I'd approach it.
1.
Graph Structures
Direction // x and y tuple/variable, integer, or string
Point // x and y tuple/variable. Note that you can use this as direction too
Vertex
- Point
- Map < Direction, Edge > // each direction is linked to another vertex
// maps in C can be made with two arrays
// a vertex for each direction may be easier
Edge
- Vertex // you can store both vertices, but you only need to store the one being moved to.
// without OOP, reuse a simple struct before making it complex
Graph
- Vertices // array of vertex
// each vertex stores its edges; the graph doesn't need to
2.
Pathfinder Structures
Node
- Parent // a link to the previous node
// trace the links back to construct a path
- Depth // last node's depth + 1
- Vertex // the vertex hit by the pathfinder
Path
- Nodes // while parent is not null
// add a node to this list
// then read it backward

How to optimize Dijkstra algorithm for a single shortest path between 2 nodes?

I was trying to understand this implementation in C of the Dijkstra algorithm and at the same time modify it so that only the shortest path between 2 specific nodes (source and destination) is found.
However, I don't know exactly what do to. The way I see it, there's nothing much to do, I can't seem to change d[] or prev[] cause those arrays aggregate some important data for the shortest path calculation.
The only thing I can think of is stopping the algorithm when the path is found, that is, break the cycle when mini = destination when it's being marked as visited.
Is there anything else I could do to make it better or is that enough?
EDIT:
While I appreciate the suggestions given to me, people still fail to answer exactly what I questioned. All I want to know is how to optimize the algorithm to only search the shortest path between 2 nodes. I'm not interested, for now, in all other general optimizations. What I'm saying is, in an algorithm that finds all shortest paths from a node X to all other nodes, how do I optimize it to only search for a specific path?
P.S: I just noticed that the for loops start at 1 until <=, why can't it start at 0 and go until <?
The implementation in your question uses a adjacent matrix, which leads O(n^2) implementation. Considering that the graphs in the real world are usually sparse, i.e. the number of nodes n is usually very big, however, the number of edges is far less from n^2.
You'd better look at a heap-based dijkstra implementation.
BTW, single pair shortest path cannot be solved faster than shortest path from a specific node.
#include<algorithm>
using namespace std;
#define MAXN 100
#define HEAP_SIZE 100
typedef int Graph[MAXN][MAXN];
template <class COST_TYPE>
class Heap
{
public:
int data[HEAP_SIZE],index[HEAP_SIZE],size;
COST_TYPE cost[HEAP_SIZE];
void shift_up(int i)
{
int j;
while(i>0)
{
j=(i-1)/2;
if(cost[data[i]]<cost[data[j]])
{
swap(index[data[i]],index[data[j]]);
swap(data[i],data[j]);
i=j;
}
else break;
}
}
void shift_down(int i)
{
int j,k;
while(2*i+1<size)
{
j=2*i+1;
k=j+1;
if(k<size&&cost[data[k]]<cost[data[j]]&&cost[data[k]]<cost[data[i]])
{
swap(index[data[k]],index[data[i]]);
swap(data[k],data[i]);
i=k;
}
else if(cost[data[j]]<cost[data[i]])
{
swap(index[data[j]],index[data[i]]);
swap(data[j],data[i]);
i=j;
}
else break;
}
}
void init()
{
size=0;
memset(index,-1,sizeof(index));
memset(cost,-1,sizeof(cost));
}
bool empty()
{
return(size==0);
}
int pop()
{
int res=data[0];
data[0]=data[size-1];
index[data[0]]=0;
size--;
shift_down(0);
return res;
}
int top()
{
return data[0];
}
void push(int x,COST_TYPE c)
{
if(index[x]==-1)
{
cost[x]=c;
data[size]=x;
index[x]=size;
size++;
shift_up(index[x]);
}
else
{
if(c<cost[x])
{
cost[x]=c;
shift_up(index[x]);
shift_down(index[x]);
}
}
}
};
int Dijkstra(Graph G,int n,int s,int t)
{
Heap<int> heap;
heap.init();
heap.push(s,0);
while(!heap.empty())
{
int u=heap.pop();
if(u==t)
return heap.cost[t];
for(int i=0;i<n;i++)
if(G[u][i]>=0)
heap.push(i,heap.cost[u]+G[u][i]);
}
return -1;
}
You could perhaps improve somewhat by maintaining a separate open and closed list (visited and unvisited) it may improve seek times a little.
Currently you search for an unvisited node with the smallest distance to source.
1) You could maintain a separate 'open' list that will get smaller and smaller as you iterate and thus making your search space progressively smaller.
2) If you maintain a 'closed' list (those nodes you visited) you can check the distance against only those nodes. This will progressively increasing your search space but you don't have to check all nodes each iteration. The distance check against nodes that have not been visited yet holds no purpose.
Also: perhaps consider following the graph when picking the next node to evaluate: On the 'closed' list you could seek the smallest distance and then search an 'open' node among it's connections. (if the node turns out to have no open nodes in it's connections you can remove it from the closed list; dead end).
You can even use this connectivity to form your open list, this would help with islands also (your code will currently crash if you graph has islands).
You could also pre-build a more efficient connection graph instead of a cross table containing all possible node combinations (eg. a Node struct with a neighbours[] node list). This would remove having to check all nodes for each node in the dist[][] array
Instead of initializing all node distances to infinity you could initialize them to the 'smallest possible optimistic distance' to the target and favor node processing based on that (your possibilities differ here, if the nodes are on a 2D plane you could use bird-distance). See A* descriptions for the heuristic. I once implemented this around a queue, not entirely sure how I would integrate it in this code (without a queue that is).
The biggest improvement you can make over Dijkstra is using A* instead. Of course, this requires that you have a heuristic function.

Resources