Implement data structure with efficient insert/findKthSmallest operations - arrays

I was asked this question by an interviewer. I tried solving it using an array, such that I make sure the array in sorted when I insert. However I don't think this is the best solution. What would be a good solution for this problem?

You can use a Balenced Binary Search Tree(eg: red-black bst) for this.
Simply insert the nodes in the balenced bst. and each node maintains its rank within its own subtree. As we need to find the smallest element we can maintain count of elements in left subtree.
Now, start traversing from root and check:
if k = N +1, where N is number of nodes in roots left subtree. if yes than root is kth node.
else if K < N, continue to search in the left subtree of root.
else if k > N +1, then continue to search in the right subtree. And search for (K-N-1) smallest element.
Time complexity of insertion in a balenced bst is O(logn). While searching for kth smallest element will be O(h), where h is height of the tree.

You can use an order statistic tree. Essentially, it is a self-balancing binary search tree where each node also stores the cardinality of its subtree. Maintaining the cardinality does not increase the complexity of any tree operations, but allows your query to be performed in O(log n) time:
If the left subtree has cardinality > k, recurse on the left subtree.
Otherwise, if the left subtree has cardinality < k, recurse on the right subtree.
Otherwise the left subtree's cardinality equals k, so return the current node's element.

Related

Given a DAG, the length of the longest path and the node in which it ends, how do I retrace my steps so I can print each node of the longest path?

I'm working on a problem of finding the most parallelepipeds that can be stored into each other given a list of parallelepipeds.
My approach was to represent the graph with an adjacency list, do a topological sort and then for each node in the topological array "unrelax" the edges, giving me the longest path.
Below is the code but I don't think it matters for the question.
typedef struct Edge {
int src; /* source node */
int dst; /* destination node */
struct Edge *next;
} Edge;
int maxend; //node in which the longest path ends
int mp; // longest path
for (int i = 0; i < G.n; i++)
{
int j = TA[i]; //TA is the topological sorted array
if (g->edges[j] != NULL)
{
if(DTA[j] == -1) DTA[j] = 0;
Edge* tmp = G.edges[j];
while (tmp != NULL)
{
if(DTA[tmp->src] >= DTA[tmp->dst]){ //DTA is the array that keeps track of the maximum distance of each node in TA
DTA[tmp->dst] = DTA[tmp->src]+1;
if (DTA[tmp->dst] > mp) {
mp = DTA[tmp->dst];
maxend = tmp->dst;
}
}
tmp = tmp->next;
}
}
}
In the end I have the lenght of the longest path and the node in which said path ends, but how do I efficiently recreate the path?
If parallelepiped A contains parallelepiped B and parallelepiped B contains parallelepiped C that means that parallelepiped A parallelepiped box C aswell, which means that each edge has a weight of 1 and Vertex where the longest path starts has the furthest node of the path in his adjaceny list.
I've thought of 3 solutions but none of them look great.
Iterate the edges of each vertex that has weight 0 (so no predecessors) and if there is a choice avoid choosing the edge that connects it with the furthest node (As said before, the shortest path between the starting node and the ending node will be 1)
In the the array that tracks the maximum distance of each node in the topologically sorted array: start from the index representing the furthest node we found, see if the previous node has a compatible distance (as in, the previous node has 1 less distance than the furthest node). If it does, check it's adjaceny list to see if the furthest node is in it (because if the furthest node has a distance of 10 there could be several nodes that have a distance of 9 but are unconnected to it). Repeat until we reach the root of the path.
Most probable candidate so far, create an array of pointers that keeps track of the "maximum" parent of each node. In the code above everytime a node has it's maximum distance changed it means that it's parent node, if it had any, had a longer distance than the previous parent, which means we can change the maximum parent associated with the current node.
Edit: I ended up just allocating a new array and everytime I updated the weight of a node ( DTA[tmp->src] >= DTA[tmp->dst] ) I also stored the number of the source edge in the cell of the destination edge.
I am assuming the graph edge u <- v indicates that box u is big enough to contain v.
I suggest you dump the topological sort. Instead:
SET weight of every edge to -1
LOOP
LOOP over leaf nodes ( out degree zero, box too small to contain others )
Run Dijkstra algorithm ( gives longest path, with predecessors )
Save length of longest path, and path itself
SAVE longest path
REMOVE nodes on longest path from graph
IF all nodes gone from graph
OUTPUT saved longest paths ( lists of nested boxes )
STOP
This is called a "greedy" algorithm. It is not guaranteed to give the optimal result. But it is fast and simple, always gives a reasonable result and often does give the optimal.
I think this solves it, unless there's something I don't understand.
The highest-weighted path in a DAG is equivalent to the lowest-weighted path if you make the edge weights negative. Then you can apply Dijkstra's algorithm directly.
A longest path between two given vertices s and t in a weighted graph
G is the same thing as a shortest path in a graph −G derived from G by
changing every weight to its negation.
This might even be a special case of Dijkstra that is simpler... not sure.
To retrieve the longest path, you start at the end and go backwards:
Start at the vertex with the greatest DTA V_max
Find the edges that end at V_max (edge->dest = V_max)
Find an edge Src_max where the DTA value is 1 less than the max (DTA[Src_max] == DTA[V_max] - 1)
Repeat this recursively until there are no more source vertices
To make that a little more efficient, you can reverse the endpoints of the edges on the way down and then follow the path back to the start. That way each reverse step is O(1).
I think the option 3 is most promising. You can search for the longest path with DSF starting from all the root vertices (those without incoming edges) and increasing the 'max distance' for each vertex encountered.
This is quite a simple solution, but it may traverse some paths more than once. For example, for edges (a,f), (b,c), (c,f), (d,e), (e,c)
a------------f
/
b----c--/
/
d--e--/
(all directed rightwards)
the starting vertices are a, b, and d, the edge (c,f) will be traversed twice and the vertex f distance will be updated three times. If we append the rest of alphabet to f in a simple chain,
a------------f-----g-- - - ---y---z
/
b----c--/
/
d--e--/
the whole chain from f to z will be probably traversed three times, too.
You can avoid this by separating the phases and modifying the graph between them: after finding all the starting vertices (a, b, d) increment the distance of each vertex available from those (f, c, e), then remove starting vertices and their edges from the graph - and re-iterate as long as some edges remain.
This will transform the example graph after the first step like this:
f-----g-- - - ---y---z
/
c--/
/
e--/
and we can see all the junction vertices (c and f) will wait until the longest path to them is found before letting the analysis go further past them.
That needs iterative seeking for starting vertices, which may be time consuming unless you do some preprocessing (for example, counting all incoming edges for each vertex and storing vertices in some sorting data structure, like an integer-indexed multimap or a simple min-heap.)
The question remains open, whether the whole overhead of truncating a graph and rescaning it for new root vertices makes a net gain compared with multiple traversing some final parts of common paths in your particular graph...

time complexity of iterating over an avl tree using nextInOrder

Let's say we have a binary AVL-tree that each node holds a pointer to the parent.
We also have a function that gives us the next item inorder, called treeSuccesor.
We can assume that its time complexity is O(log(N)).
What will be the time complexity of iterating over the tree with it, starting from the lowest value, and ending at the highest value
For the given AVL-tree, what will be the time complexity of iterating over it from the 17's node to 85's node using the treeSuccesor function?
iteration algorithm:
while (L != L2) // L2 is the ending node, 85 in the image
{
L = treeSuccesor(L);
}
Traversing any binary tree can be done in time O(n) since each link is passed twice: once going downwards and once going upwards. For each node the work is constant.
The complexity is not O(n log n) because even though the work of finding the next node is O(log n) in the worst case for an AVL tree (for a general binary tree it is even O(n)), the average work over all the nodes in the tree is constant.

Kth smallest in stream of numbers

We are given a stream of numbers and Q queries.
At each query, we are given a number k.
We need to find the kth smallest number at that point of the stream.
How to approach this problem?
total size of stream is < 10^5
1 < number < 10^9
I tried linked list but finding the right position is time-consuming and in array inserting is time-consuming.
You can use some kind of search tree. They are many different kind of search trees but all the common ones allow insertion in O(log n) and finding the kth element in O(log n) too.
If the stream is too long to keep all the numbers in memory and you also know an upper bound on k, you can prune the tree by only keeping a number of elements equal to the upper bound.
You can use a max heap with size=k.
Put elements until the heap's size reaches to k. After then, put an element and pop the heap's root so you can keep the size=k. Removing(extracting) root makes sense because there are at least k elements smaller than the root value.
When you finished iterating the stream, the root of the heap will be the k-th smallest element. Because you're having smallest k elements in the heap and the root is the largest among them.
As the heap's size is k, time complexity is O(n lg k) which could a bit better than O(n lg n). And the implementation would be a way easy.

Finding the kth smallest value in a BST

Here is what I have to find the kth smallest value in a binary search tree:
struct treeNode
{
int data;
struct treeNode *left, *right:
};
int rank(stuct treeNode* ptr, int k)
{
if(node == NULL)
return root;
while(ptr->left != NULL) {
ptr = ptr->left;
return rank(ptr->left)
}
}
This is obviously not correct. Without providing the solution, could someone guide me in the right direction as to how I could solve this? I am having trouble figuring out how I could find the kth smallest element in a BST.
A BST is a sorted binary tree, an in-order traversal (left subtree, current node, right subtree) will give sorted node values. To find the kth smallest node, just do an in-order traversal with a counter. The counter starts from 0, whenever a node is traversed, increase it by one, when it reaches k, the node is the kth smallest one.
If you have the sizes of each of the subtrees, this can be doable without having to read the data into an array (or otherwise traversing the tree) and counting up. If you don't keep the size information handy, you'll need a helper function to calculate the size.
The basic idea, figure out what is the index of the current node. If it is less than k, you need to search the left subtree. If it is greater than k, search the right offsetting the nodes counted from the left and current. Note that this is essentially the same as searching through a regular BST, except this time we are searching by index, not data. Some pseudocode:
if size of left subtree is equal to k:
// the current node is kth
return data of current node
else if size of left subtree is greater than k:
// the kth node is on the left
repeat on the left subtree
else if size of left subtree is less than k:
// the kth node is on the right
reduce k by the size of the left subtree + 1 // need to find the (k')th node on the right subtree
repeat on the right subtree
To illustrate, consider this tree with the marked indices (don't even worry about the data as it's not important in the search):
3
/ \
2 6
/ / \
0 4 7
\ \
1 5
Suppose we want to find the 2nd (k = 2).
Starting at 3, the size of the left subtree is 3.
It is greater than k so move to the left subtree.
The size of the left subtree is 2.
k is also 2 so the current node must be the 2nd.
Suppose we want to find the 4th (k = 4).
Starting at 3, the size of the left subtree is 3.
It is less than l so adjust the new k to be 0 (k' = 4 - (3 + 1)) and move to the right subtree.
Starting at 6, the size of the left subtree is 2.
It is greater than k' (0) so move to the left subtree.
The size of the left subtree is 0.
k' is also 0 so the current node must be the 4th.
You get the idea.
This should work:
int rank(struct treeNode* n,int k,int* chk)
{
if(!n) return -1;
int _chk = 0;
if(!chk) chk = &_chk;
int t = rank(n->left,k,chk);
if(t>=0) return t;
if(++*chk > k) return n->data;
int t = rank(n->right,k,chk);
if(t>=0) return t;
return -1;
}
call as rank(root,k,0)

AVL Tree insertion

How do I calculate the balance factor for a particular node, when I am recursively calling an insert function for adding a node to an AVL tree. I haven't started on the rotation logic. I simply want to calculate the balance factors.
In my current attempt, I am forced to store heights of left & right subtrees as I can't find the balance factor without them.
typedef struct _avlTree
{
int num;
int balFactor;
int height[2]; // left & right subtree heights
struct _avlTree *left,*right;
} *avlTree;
int avlAdd(avlTree a,avlTree aNew)
{
...
if(a->left == NULL) // left subtree insertion case
{
a->left = aNew;
return(1);
}
else
{
a->height[0] = avlAdd(a->left,aNew);
a->balFactor = a->height[0] - a->height[1];
return( (a->height[0]>a->height[1]) ? (a->height[0]+1) : (a->height[1]+1) );
}
...
}
The balance factor is the difference in heights between the right and left subtrees of a node.
When creating a new node, initialize the balance factor to zero since it is balanced (it has no subtrees).
If you are inserting a new node to the right, increase the balance factor by 1.
If you are inserting a new node to the left, decrease the balance factor by 1.
After rebalancing (rotating), if you increase the height of the subtree at this node, recursively propagate the height increase to the parent node.
Here is a very simple approach. If there was a recursive height() function, then balance factor can be computed simply as
node->balFactor = height( node->right ) - height( node->left );
This is not the best approach though, since the complexity of this approach is O( h ) where h is the height of the node in the AVL tree. For better approach, a bigger discussion is required :)
There are numerous resources on AVL tree in the web, a chosen few are:
http://en.wikipedia.org/wiki/AVL_tree
C implementation: http://www.stanford.edu/~blp/avl/libavl.html
Animation: http://www.cs.jhu.edu/~goodrich/dsa/trees/avltree.html
Animation: http://www.strille.net/works/media_technology_projects/avl-tree_2001/
BTW, The avlAdd() function looks wrong. I don't see where aNew->num is compared to a->num. Whether to go to left subtree or right subtree must depend on that. The given code seems to be adding to the left subtree unconditionally.

Resources