Checking for node repetition in multi-parent tree

Checking for node repetition in multi-parent tree - sql-server

I've got a pretty simple tree implementation in SQL:
CREATE TABLE [dbo].[Nodes] (
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NULL
);
CREATE TABLE [dbo].[NodeNodes] (
[ParentNodeId] [int] NOT NULL,
[ChildNodeId] [int] NOT NULL
);
My tree implementation is such that a node can have multiple parents. This is so the user can create custom trees that group together commonly used nodes. For example:
1 8 9
/ \ / \ / \
2 3 4 7 2 6
/ \ / \ / \
4 5 6 7 4 5
Node | Parents | Children
---------------------------
1 | - | 2,3
2 | 1,9 | 4,5
3 | 1 | 6,7
4 | 2,8 | -
5 | 2 | -
6 | 3,9 | -
7 | 3,8 | -
8 | - | 4,7
9 | - | 2,6
So there are three trees which are indicated by the three nodes with no parent. My problem is validating a potential relationship when the user adds a node as a child of another. I would like no node to appear twice in the same tree. For example, adding node 2 as a child of node 6 should fail because that would cause node 2 to appear twice in 1's tree and 9's tree. I'm having trouble writing an efficient algorithm that does this.
My first idea was to find all the roots of the prospective parent, flatten the trees of the roots to get one list of nodes per tree, then intersect those lists with the prospective child, and finally pass the validation only if all of the resultant intersected lists are empty. Going with the example, I'd get these steps:
1) Trace prospective parent through all parents to roots:
6->3->1
6->9
2) Flatten trees of the roots
1: {1,2,3,4,5,6,7}
9: {2,4,5,6,9}
3) Intersect lists with the prospective child
1: {1,2,3,4,5,6,7}^{2} = {2}
9: {2,4,5,6,9}^{2} = {2}
4) Only pass if all result lists are empty
1: {2} != {} ; fail
9: {2} != {} ; fail
This process works, except for the fact that it requires putting entire trees into memory. I have some trees with 20,000+ nodes and this takes almost a minute to run. This performance isn't a 100% dealbreaker, but it is very frustrating. Is there a more efficient algorithm to do this?
Edit 4/2 2pm
The above algorithm doesn't actually work. deroby pointed out that adding 9 as a child to 7 will be passed by the algorithm but shouldn't be. The problem is that adding a node with children to another node will succeed as long as the node isn't repeated -- it doesn't validate the children.

A year later I stumbled upon my own question and I decided I would add my solution. It turns out I had just forgotten my basic data structures. What I originally thought was a simple tree was actually a directed graph, and what I was testing for was a cycle. Seeing as how cycle detection is a pretty common thing, there should be numerous solutions and discussions about it out there on the internets. See Best algorithm for detecting cycles in a directed graph for one example.

Related

Longest acyclic path from A to B in a graph

I'm learning C language. And I want to find the longest acyclic path in a graph from A to B.
For example, I have a graph like this:
0
|
1
/ \
2 --- 3
| / \
5---6---4
Now I want to write a function which can find the longest acyclic path from node A to node B.
input: longestPath(node A, node B) output: the longest length is x, node_a -> ... -> node_b
for example:
input: longestPath(0, 6) output: the longest length is 6, 0 -> 1 -> 2 -> 3 -> 4 -> 6
(the output answer may not unique, but one of the right answer)
But I have no idea how to implement a suitable algorithm to find it.
Should I use BFS or DFS to find all possible paths and compare them? (but it seems slow)
Could you please give me some advice? Thanks!

Time based UUID does not follow creation order according when implementing RFC 4122

I am creating a custom algorithm to embed info into a timeUUID. When studying the RFC 4122. In the spec, the version 1 UUID has the following structure:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| time_low |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| time_mid | time_hi_and_version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|clk_seq_hi_res | clk_seq_low | node (0-1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| node (2-5) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
I've found that the lower part of timestamp (rightmost 32 bits) goes in front of the ID making it the most relevant part when sorting UUID.
What I do not understand is how this specification works in a way when sorting UUIDs the sorting will follow creation order.
To illustrate the question, please find two examples here where timestamp t1 > t2 but the created UUID with that timestamp will be in the reverse order.
t1 = 137601405637595834 // 0x1e8dbbfd79f92ba
t2 = 3617559227 // 0xd79f92bb
are transformed to the following parts
t1_low: Uint = 3617559226 // 0xd79f92ba
t1_mid: Ushort = 56255 // 0xdbbf
t1_hi: Ushort = 1e8 // 0x1e8
t2_low: Uint = 3617559226 // 0xd79f92bb
t2_mid: Ushort = 0 // 0x0
t2_hi: Ushort = 0 // 0x0
Since the least significant bytes are not relevant for the order in this case, I will ignore that for the sake of simplification.
The UUIDs geenrated using these timestamps are
UUID1 = d79f92ba-dbbf-11e8-8808-000000000002
UUID2 = d79f92bb-0000-1000-a68b-000000000004
Clearly UUID1 < UUID2 even when its timestamps are in the reverse order.
What is wrong on my analysis?

The UUIDv1 spec deliberately puts the most entropy in the high-order bits so that keys do not sort as you expected; instead, they will be seemingly randomly yet roughly evenly distributed across the full number range regardless of creation order--just like UUIDv3/v4/v5.
If you want a sortable timestamp, add another column; using UUID as anything but an opaque identifier will end up biting you later.

Storing k-ary tree with an array

Does storing a k-ary tree as an array only work if you fill in each node from left to right with k-children before moving to the next one?
Ex:
1
/ | \
2 3 4
/ | \
5 6 7
Can be stored as an array that looks like:
[X,1,2,3,4,5,6,7]
0 1 2 3 4 5 6 7
And any parent can be found by taking the index/k.
However, for the same data but stored as:
1
/ | \
2 3 4
/ | |
5 6 7
with 7 as a child of 3 indexing no longer works.
Also, in general, the siblings are within +- k indices of the current node but how do I make sure I'm not accidentally accessing a parent/uncle node?

Assuming the root node is at index 0 in the array, then the children of the node at index i are at indexes (i*n) + 1 through (i*n) + n. A node's parent is at index (i-1)/n. There are similar equations for if you want to put the root at index 1, but there's no good reason to leave index 0 unoccupied.
If you want to visit all of the siblings of the current node, first find the parent node, and then visit all of that node's children. That way you won't accidentally visit an "uncle" node.
Normally, trees stored in an array like this are complete binary trees: all levels except possibly the last are full, and if the last level isn't completely full, then it's filled from left to right.
You don't have to fill in all positions, but if you don't then you need to have some kind of flag at that node's position to tell you that it's empty.
But storing the tree as in your second example, where the first child of index 3 is where one would normally put the last child of index 2 breaks those calculations. You would have to store an array of child indexes in the parent node, and the index of the parent in each child node.

KDB+ / Q table creation with foreign key

My question is about creating a table with q and using foreign keys. I know how to do it the following way
q)T1:([id:1 2 3 4 5]d1:"acbde")
q)T2:([id:1 2 3 4 5]f1:`T1$2 2 2 4 4)
But now lets say I want to create the table with the ! operator flipping a dictionary this way
q)T3:1!flip ((`id`f1 )!((1 2 3 4 5);(2 2 2 4 4)))
How can I set a foreign key to table T1s primary key with this way of creating a table.
Update
Well, I thought my upper example would be sufficient for myself to solve my actual problem, but unfortunately its not.
What if I have this lists of lists layout A and B
q)A:enlist 1 2 3 4 5
q)B:(enlist "abcde"), (enlist `v`w`x`y`z)
q)flip (`id`v1`v2)!(B,A)
How can I make the list A as foreign key to table T1?
Update 2
And how would I implement it if I have A coming from somewhere, not initializing it myself. Do I have to make a copy from the list?

You can use the same syntax on the column values list:
q)T3:1!flip ((`id`f1 )!((1 2 3 4 5);(`T1$2 2 2 4 4)))
q)T3~T2
1b
Update:
Again for this case we can use the same syntax on the list -
q)A:enlist`T1$1 2 3 4 5
q)meta flip (`id`v1`v2)!(B,A)
c | t f a
--| ------
id| c
v1| s
v2| j T1
Update2:
Same syntax applied to the variable name:
q)A:1 2 3 4 5
q)meta flip (`id`v1`v2)!(B,enlist`T1$A)
c | t f a
--| ------
id| c
v1| s
v2| j T1

input value to create binary search trees

On binary search tree,If I give input as "2,1,3,4,5" the tree will be like
2
/\
1 3
\
4
\
5
But with input like "5,2,1,3,7,6,8".
5
/ \
2 7
/\ /\
1 3 6 8
So my question , how to produce inputs so that above like balanced tree structure is obtained. (I don't want to use AVL trees) .Do we have tricks to sort or re-arrange numbers in proper way and produce them as input.I'm looking for inputs so that tree can can created height upto 10.

One very simple way to guarantee a balanced tree is to sort the input, then recursively insert the values as follows:
Insert the middle of the whole array.
Recursively insert the left and right halves of the array using this procedure.
For example, in your case of the values 1 - 8 with 5 removed, you would sort the value as
1 2 3 5 6 7 8
You would then insert 5, then recursively apply this procedure to the two halves. On the half 1 2 3, you would insert 2, then recursively insert 1 and 3. This gives the ordering so far as
5 2 1 3
Now, you recursively process the other half, 6 7 8, which inserts 7 and then recursively inserts 6 and 8. Overall, this produces the ordering
5 2 1 3 7 6 8
which is precisely the ordering you came up with earlier in your post.
This procedure runs in O(n lg n) time. I'm not positive that this is optimal, so if someone else wants to post a better answer I'd love to see it.