Is it possible to print all node ids / values that are traversed in a visit? Take the following code sample:
top-down visit(asts) {
case x: println(typeOf(x));
}
I now get all types of the nodes that were visited. I would instead like to know the ids / values of all x that were encountered (without printing the entire tree). In other words, I am curious as to how Rascal traverses the list[Declaration] asts containing lang::java::m3::ASTs.
As a corrolary, is it possible to print direct ascendants / descendants of a Declaration regardless of their type? Or print the total number of children of an ast?
In the documentation (https://www.rascal-mpl.org/docs/rascal/expressions/visit/) there is no mention of this.
in principle a Rascal value does not have an "id". It's structure is its "id" and its id is its structure. You can print a hash if you want, using md5sum for example from the IO function: case value x : println(md5sum(x));
Rascal traverses lists from left to right. In top-down mode it first visits the entire list and in bottom-up mode it goes first to the children.
printing the total number of nodes in a tree: (0 | it + 1 | /_ <- t)
printing all children of a node: import Node;, case node n: println(getChildren(n));
ascendants are not accessible unless you match deeper and include two levels of nodes in a pattern.
Related
I am looking at this problem discussed on YouTube:
Given two binary trees, determine whether they have the same inorder traversal:
Tree 1 Tree 2
5 3
/ \ / \
3 7 1 6
/ / / \
1 6 5 7
[1,3,5,6,7] [1,3,5,6,7]
I wanted to know how to solve this problem by doing a simultaneous in-order traversal of both trees using only recursion. I know people alluded to it in the comment section but I assume they mean iteratively.
My first thought was to use 1 function and pass in 2 lists to hold the values of the trees, and then compare them in the parent function. But this seems to work on only trees that have the same height (I think).
def dual_inorder_traversal(self, p, q, pn = [], qn = []):
if not p and not q: return
if not q: pn.append(p.val)
if not p: qn.append(q.val)
if p and q:
self.dualInorder(p.left, q.left)
pn.append(p.val)
qn.append(q.val)
self.dualInorder(p.right, q.right)
return pn, qn
I then tried mutual recursion in which I have two functions, one that recurses tree1 and another that recurses tree2 and have them call each other. My idea was that we could append the root in the tree1 function when we visit it, and then then pop it from the list and compare when we visit the root in the tree2 function. Not even going to post what I tried because it didn't work at all lol. Also not even sure if mutual recursion is even possible in this case.
You are right that your first attempt to recurse through both trees in a single recursive function is difficult to do when the trees have different shapes: you may need to recur deeper in one tree, while in the other you are stuck at a leaf. This means you need to maintain lists to collect values (as can be seen in your code).
But storing values in lists is really defeating the purpose of the challenge. You might then as well collect all values in two lists and compare the lists, or collect the values of one tree in one list and compare against that list while making the traversal in the other (as is done in the video you linked to). Yet, it should not be necessary to use that much space for the job.
The idea to have two separate traversals going on in tandem is more promising. It becomes really easy when you make the traversal function a generator. This way that function stops running whenever the next value is found, giving you the opportunity to proceed in the other traversal also with one step.
Here is an inorder generator you could define in the Node class:
class Node:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def inorder(self):
if self.left:
yield from self.left.inorder()
yield self.value
if self.right:
yield from self.right.inorder()
Now it is a matter of iterating through p.inorder() and q.inorder() in tandem. You can write this with a loop, but itertools has a very usefull zip_longest function for that purpose:
from itertools import zip_longest
def same_inorder(p, q):
return all(v1 == v2 for v1, v2 in zip_longest(p.inorder(), q.inorder()))
There is a boundary case when the trees have a different number of nodes, but their inorder traversal is the same until the smaller tree is fully iterated. Then zip_longest will fill up the gap and produce None values to compare with. So as long as the tree nodes do not have None as value, that is fine. If however you expect nodes to have None values (quite odd), then use a different filler value. For instance nan could be an option:
zip_longest(p.inorder(), q.inorder(), fillvalue=float("nan"))
As a demo, let's take these two trees:
5 3
/ \ / \
3 7 1 6
/ / / \
1 6 5 7
Then you can create and compare their inorder traversals as follows:
tree1 = Node(5,
Node(3,
Node(1)
), Node(7,
Node(6)
)
)
tree2 = Node(3,
Node(1),
Node(6,
Node(5),
Node(7)
)
)
print(same_inorder(tree1, tree2)) # True
Time and Space use
Let's define 𝑚 as the number of nodes in the first tree and 𝑛 as the number of nodes in the second tree.
Runtime complexity = O(𝑚+𝑛)
Average, auxiliary space complexity = O(log(𝑛) + log(𝑚))
The auxiliary space excludes the space already used by the input, which is O(𝑛+𝑚). On top of this we need to use the stack during the recursion. On average that will be O(log(𝑛) + log(𝑚))
I am given two arrays, one defines the relationship of nodes and other gives the values of nodes.
arr1={0,1,1,1,3,3,4}
arr2={22,100,3,3,4,5,9}
arr1 defines the relationship, i.e. root node is 1st element and parent of node 2,3 and 4th is node 1 and parent of node 5th and 6th is root 3rd and parent of node 7th is node 4.
arr2 gives the value of nodes, node 1 have a value of 22 and node 2 has got a value of 100.
I have to find the maximum sum of nodes such that no two included nodes have a parent or a grand parent relationship.
sample input:
a[i]=[0,1,1,1,3,3,6,6]
b[i]=[1,2,3,4,5,100,7,8]
output: 111
I am new to DS and ALGO and not able to even think of the solution. Help is needed thanks.
Any type of help will do.
You can solve it using Dynamic Programming.
Consider an array dp[] which stores the answer for each of the vertex and its subtree.
Now state of DP would be,
dp[currentVertex] = max(sum of all children's dp[] ,
b[currentVertex] + sum of all vertices' dp[] whose
greatGrandParent is currentVertex])
You need to build DP table using bottom-up approach. So start from the leaves.
Answer would be dp[root] after all the calculation.
So I have a problem which i'm pretty sure is solvable, but after many, many hours of thinking and discussion, only partial progress has been made.
The issue is as follows. I'm building a BTree of, potentially, a few million keys. When searching the BTree, it is paged on demand from disk into memory, and each page in operation is relatively expensive. This effectively means that we want to need to traverse as few nodes as possible (although after a node has been traversed to, the cost of traversing through that node, up to that node is 0). As a result, we don't want to waste space by having lots of nodes nearing minimum capacity. In theory, this should be preventable (within reason) as the structure of the tree is dependent on the order that the keys were inserted in.
So, the question is how to reorder the keys such that after the BTree is built the fewest number of nodes are used. Here's an example:
I did stumble on this question In what order should you insert a set of known keys into a B-Tree to get minimal height? which unfortunately asks a slightly different question. The answers, also don't seem to solve my problem. It is also worth adding that we want the mathematical guarantees that come from not building the tree manually, and only using the insert option. We don't want to build a tree manually, make a mistake, and then find it is unsearchable!
I've also stumbled upon 2 research papers which are so close to solving my question but aren't quite there!
Time-and Space-Optimality in B-Trees and Optimal 2,3-Trees (where I took the above image from in fact) discuss and quantify the differences between space optimal and space pessimal BTrees, but don't go as far as to describe how to design an insert order as far as I can see.
Any help on this would be greatly, greatly appreciated.
Thanks
Research papers can be found at:
http://www.uqac.ca/rebaine/8INF805/Automne2007/Sujets2007Automne/p174-rosenberg.pdf
http://scholarship.claremont.edu/cgi/viewcontent.cgi?article=1143&context=hmc_fac_pub
EDIT:: I ended up filling a btree skeleton constructed as described in the above papers with the FILLORDER algorithm. As previously mentioned, I was hoping to avoid this, however I ended up implementing it before the 2 excellent answers were posted!
The algorithm below should work for B-Trees with minimum number of keys in node = d and maximum = 2*d I suppose it can be generalized for 2*d + 1 max keys if way of selecting median is known.
Algorithm below is designed to minimize the number of nodes not just height of the tree.
Method is based on idea of putting keys into any non-full leaf or if all leaves are full to put key under lowest non full node.
More precisely, tree generated by proposed algorithm meets following requirements:
It has minimum possible height;
It has no more then two nonfull nodes on each level. (It's always two most right nodes.)
Since we know that number of nodes on any level excepts root is strictly equal to sum of node number and total keys number on level above we can prove that there is no valid rearrangement of nodes between levels which decrease total number of nodes. For example increasing number of keys inserted above any certain level will lead to increase of nodes on that level and consequently increasing of total number of nodes. While any attempt to decrease number of keys above the certain level will lead to decrease of nodes count on that level and fail to fit all keys on that level without increasing tree height.
It also obvious that arrangement of keys on any certain level is one of optimal ones.
Using reasoning above also more formal proof through math induction may be constructed.
The idea is to hold list of counters (size of list no bigger than height of the tree) to track how much keys added on each level. Once I have d keys added to some level it means node filled in half created in that level and if there is enough keys to fill another half of this node we should skip this keys and add root for higher level. Through this way, root will be placed exactly between first half of previous subtree and first half of next subtree, it will cause split, when root will take it's place and two halfs of subtrees will become separated. Place for skipped keys will be safe while we go through bigger keys and can be filled later.
Here is nearly working (pseudo)code, array needs to be sorted:
PushArray(BTree bTree, int d, key[] Array)
{
List<int> counters = new List<int>{0};
//skip list will contain numbers of nodes to skip
//after filling node of some order in half
List<int> skip = new List<int>();
List<Pair<int,int>> skipList = List<Pair<int,int>>();
int i = -1;
while(true)
{
int order = 0;
while(counters[order] == d) order += 1;
for(int j = order - 1; j >= 0; j--) counters[j] = 0;
if (counters.Lenght <= order + 1) counters.Add(0);
counters[order] += 1;
if (skip.Count <= order)
skip.Add(i + 2);
if (order > 0)
skipList.Add({i,order}); //list of skipped parts that will be needed later
i += skip[order];
if (i > N) break;
bTree.Push(Array[i]);
}
//now we need to add all skipped keys in correct order
foreach(Pair<int,int> p in skipList)
{
for(int i = p.2; i > 0; i--)
PushArray(bTree, d, Array.SubArray(p.1 + skip[i - 1], skip[i] -1))
}
}
Example:
Here is how numbers and corresponding counters keys should be arranged for d = 2 while first pass through array. I marked keys which pushed into the B-Tree during first pass (before loop with recursion) with 'o' and skipped with 'x'.
24
4 9 14 19 29
0 1 2 3 5 6 7 8 10 11 12 13 15 16 17 18 20 21 22 23 25 26 27 28 30 ...
o o x x o o o x x o o o x x x x x x x x x x x x o o o x x o o ...
1 2 0 1 2 0 1 2 0 1 2 0 1 ...
0 0 1 1 1 2 2 2 0 0 0 1 1 ...
0 0 0 0 0 0 0 0 1 1 1 1 1 ...
skip[0] = 1
skip[1] = 3
skip[2] = 13
Since we don't iterate through skipped keys we have O(n) time complexity without adding to B-Tree itself and for sorted array;
In this form it may be unclear how it works when there is not enough keys to fill second half of node after skipped block but we can also avoid skipping of all skip[order] keys if total length of array lesser than ~ i + 2 * skip[order] and skip for skip[order - 1] keys instead, such string after changing counters but before changing variable i might be added:
while(order > 0 && i + 2*skip[order] > N) --order;
it will be correct cause if total count of keys on current level is lesser or equal than 3*d they still are split correctly if add them in original order. Such will lead to slightly different rearrangement of keys between two last nodes on some levels, but will not break any described requirements, and may be it will make behavior more easy to understand.
May be it's reasonable to find some animation and watch how it works, here is the sequence which should be generated on 0..29 range: 0 1 4 5 6 9 10 11 24 25 26 29 /end of first pass/ 2 3 7 8 14 15 16 19 20 21 12 13 17 18 22 23 27 28
The algorithm below attempts to prepare the order the keys so that you don't need to have power or even knowledge about the insertion procedure. The only assumption is that overfilled tree nodes are either split at the middle or at the position of the last inserted element, otherwise the B-tree can be treated as a black box.
The trick is to trigger node splits in a controlled way. First you fill a node exactly, the left half with keys that belong together and the right half with another range of keys that belong together. Finally you insert a key that falls in between those two ranges but which belongs with neither; the two subranges are split into separate nodes and the last inserted key ends up in the parent node. After splitting off in this fashion you can fill the remainder of both child nodes to make the tree as compact as possible. This also works for parent nodes with more than two child nodes, just repeat the trick with one of the children until the desired number of child nodes is created. Below, I use what is conceptually the rightmost childnode as the "splitting ground" (steps 5 and 6.1).
Apply the splitting trick recursively, and all elements should end up in their ideal place (which depends on the number of elements). I believe the algorithm below guarantees that the height of the tree is always minimal and that all nodes except for the root are as full as possible. However, as you can probably imagine it is hard to be completely sure without actually implementing and testing it thoroughly. I have tried this on paper and I do feel confident that this algorithm, or something extremely similar, should do the job.
Implied tree T with maximum branching factor M.
Top procedure with keys of length N:
Sort the keys.
Set minimal-tree-height to ceil(log(N+1)/log(M)).
Call insert-chunk with chunk = keys and H = minimal-tree-height.
Procedure insert-chunk with chunk of length L, subtree height H:
If H is equal to 1:
Insert all keys from the chunk into T
Return immediately.
Set the ideal subchunk size S to pow(M, H - 1).
Set the number of subtrees T to ceil((L + 1) / S).
Set the actual subchunk size S' to ceil((L + 1) / T).
Recursively call insert-chunk with chunk' = the last floor((S - 1) / 2) keys of chunk and H' = H - 1.
For each of the ceil(L / S') subchunks (of size S') except for the last with index I:
Recursively call insert-chunk with chunk' = the first ceil((S - 1) / 2) keys of subchunk I and H' = H - 1.
Insert the last key of subchunk I into T (this insertion purposefully triggers a split).
Recursively call insert-chunk with chunk' = the remaining keys of subchunk I (if any) and H' = H - 1.
Recursively call insert-chunk with chunk' = the remaining keys of the last subchunk and H' = H - 1.
Note that the recursive procedure is called twice for each subtree; that is fine, because the first call always creates a perfectly filled half subtree.
Here is a way which would lead to minimum height in any BST (including b tree) :-
sort array
Say you can have m key in b tree
Divide array recursively in m+1 equal parts using m keys in parent.
construct the child tree of n/(m+1) sorted keys using recursion.
example : -
m = 2 array = [1 2 3 4 5 6 7 8 9 10]
divide array into three parts :-
root = [4,8]
recursively solve :-
child1 = [1 2 3]
root1 = [2]
left1 = [1]
right1 = [3]
similarly for all childs solve recursively.
So is this about optimising the creation procedure, or optimising the tree?
You can clearly create a maximally efficient B-Tree by first creating a full Balanced Binary Tree, and then contracting nodes.
At any level in a binary tree, the gap in numbers between two nodes contains all the numbers between those two values by the definition of a binary tree, and this is more or less the definition of a B-Tree. You simply start contracting the binary tree divisions into B-Tree nodes. Since the binary tree is balanced by construction, the gaps between nodes on the same level always contain the same number of nodes (assuming the tree is filled). Thus the BTree so constructed is guaranteed balanced.
In practice this is probably quite a slow way to create a BTree, but it certainly meets your criteria for constructing the optimal B-Tree, and the literature on creating balanced binary trees is comprehensive.
=====================================
In your case, where you might take an off the shelf "better" over a constructed optimal version, have you considered simply changing the number of children nodes can have? Your diagram looks like a classic 2-3 tree, but its perfectly possible to have a 3-4 tree, or a 3-5 tree, which means that every node will have at least three children.
Your question is about btree optimization. It is unlikely that you do this just for fun. So I can only assume that you would like to optimize data accesses - maybe as part of database programming or something like this. You wrote: "When searching the BTree, it is paged on demand from disk into memory", which means that you either have not enough memory to do any sort of caching or you have a policy to utilize as less memory as possible. In either way this may be the root cause for why any answer to your question will not be satisfying. Let me explain why.
When it comes to data access optimization, memory is your friend. It does not matter if you do read or write optimization you need memory. Any sort of write optimization always works on the assumption that it can read information in a quick way (from memory) - sorting needs data. If you do not have enough memory for read optimization you will not have that for write optimization too.
As soon as you are willing to accept at least some memory utilization you can rethink your statement "When searching the BTree, it is paged on demand from disk into memory", which makes up room for balancing between read and write optimization. A to maximum optimized BTREE is maximized write optimization. In most data access scenarios I know you get a write at any 10-100 reads. That means that a maximized write optimization is likely to give a poor performance in terms of data access optimization. That is why databases accept restructuring cycles, key space waste, unbalanced btrees and things like that...
I thinking about this problem since two days and didn't find a practicable resolution:
I have a two dimensional array and want to find the biggest number of items that are connected (horizontal and vertical, not diagonal) but no item of this group should be duplicate.
Examples for possible groups:
--FG- or -F--- or -----
--E-- -E--- ---AF
-BC-- CBD-- ----B
-AD-- -A--- --CDE
This is a simplified view of my problem because in "reality" the array is 6x9 and there are three different type of "elements" (lets say numbers, letters and symbols) with each 30 distinct possible items and a blank (-) element. In the first pass I check each position and find all connected items of the same elements. This was relatively easy to achieve with a recursive function, the field 0,0 is at the bottom left (another simplified view):
12AB-1- The check for -AB----
23CD23- position 2:0 -CD----
2*CE55- ("C") would --CE---
#2E2*AA result in --E----
#$A23BC this: --A----
$$F1+*E --F----
21C31*2 --C----
The check for position 2:0 "C" would result in an array with 10 connected "letter" items. Now I search for the the biggest number of connected items in this new array that are distinct, so that are not two duplicate items in the new group. For position 2:0 this would result in max 4 connected distinct items, because you can not reach another item without touching an item that is already in the group (here another C).
For my problem it is enough to detect max. 6 different connected items in the 10 items group.
A possible group for the above example would be (when I check position 2:1 "F"):
--B----
--D----
--C----
--E----
--A----
--F----
-------
I don't find an algorithm that would do that, like the simple recursive function I use to find all the items of the same element in the array. It seems to be far more complex.
For example the algorithm must also recognize that it don't add the E at position 3:4 to the group but the E at position 2:3.
I think the above described intermediate step to first find alle connected items of an element is unneccessary, but at the moment I do this here and in my code to make things more clear :)
This is a DFS problem. The algorithm should be;
For each connected component, start a dfs with a map. Here is a pseudocode:
void dfs(position pos, map<char, bool> m, int number) {
//if the element is seen before, quit
if(m[array2d[pos] == true)
return;
//it is seen now
m[array2d[pos]] = true;
//update max value
maxValue = max(maxValue, number);
//go to each neighbor
foreach(neighbor of pos) {
dfs(neighbor, m, number+1);
}
//unlock the position
m[array2d[pos]] = false;
}
I believe that you should start dfs from each location in the array.
Because all algorithm I tried don't work or would use a big recursive stacks I have done it another way:
For my purpose it is enough to check for max. 5 connected different items in a group of elements. I made masks (around 60) for all possible combinations for 5 items. Here five examples.
----- ----- ----- ----- *----
----- ----- ----- ----- *----
----- ----- ----- ***-- ***--
----- ---*- --*-- *---- -----
***** ****- ****- *---- -----
Now I check each connected component with these masks. If all five items on this positions are different the check is true. The actual start position for the check in the mask is always in one of the four corners.
This way it takes less memory and less calculations than every algorithm I tried, but this resolution would be not acceptable for more than six or seven items because there would be to many masks.
I am working on an assignment for an Algorithms and Data Structures class. I am having trouble understanding the instructions given. I will do my best to explain the problem.
The input I am given is a positive integer n followed by n positive integers which represent the frequency (or weight) for symbols in an ordered character set. The first goal is to construct a tree that gives an approximate order-preserving Huffman code for each character of the ordered character set. We are to accomplish this by "greedily merging the two adjacent trees whose weights have the smallest sum."
In the assignment we are shown that a conventional Huffman code tree is constructed by first inserting the weights into a priority queue. Then by using a delmin() function to "pop" off the root from the priority queue I can obtain the two nodes with the lowest frequencies and merge them into one node with its left and right being these two lowest frequency nodes and its priority being the sum of the priorities of its children. This merged node then is inserted back into the min-heap. The process is repeated until all input nodes have been merged. I have implemented this using an array of size 2*n*-1 with the input nodes being from 0...n-1 and then from n...2*n*-1 being the merged nodes.
I do not understand how I can greedily merge the two adjacent trees whose weights have the smallest sum. My input has basically been organized into a min-heap and from there I must find the two adjacent nodes that have the smallest sum and merge them. By adjacent I assume my professor means that they are next to each other in the input.
Example Input:
9
1
2
3
3
2
1
1
2
3
Then my min-heap would look like so:
1
/ \
2 1
/ \ / \
2 2 3 1
/ \
3 3
The two adjacent trees (or nodes) with the smallest sum, then, are the two consecutive 1's that appear near the end of the input. What logic can I apply to start with these nodes? I seem to be missing something but I can't quite grasp it. Please, let me know if you need any more information. I can elaborate myself or provide the entire assignment page if something is unclear.
I think this can be done with a small modification to the conventional algoritm. Instead of storing single trees in your priority queue heap, store pairs of adjacent trees. Then, at each step you remove the minimum pair (t1, t2) as well as the up to two pairs that also contain those trees, i.e. (u, t1) and (t2, r). Then merge t1 and t2 to a new tree t', re-insert the pairs (u, t') and (t', r) in the heap with updated weights and repeat.
You need to pop two trees and make 3rd tree. To it left node join tree with smaller sum and to right node join second tree. Put this tree to heap. From your example
Pop 2 tree from heap:
1 1
Make tree
?
/ \
? ?
Put smaller tree to left node
min(1, 1) = 1
?
/ \
1 ?
Put to right node second tree
?
/ \
1 1
Tree you made have sum = sum of left node + sum of right node
2
/ \
1 1
Put new tree (sum 2) to heap.
Finally you will have one tree, It's Huffman tree.