Genealogy: Relating family relationship in database - database

How is the algorithm for family tree in genealogy be done?
For example Parent A has a Children B and C. So, what if Children C produce a Children too in the future. How does it add to the tree by using database?
I've looked on Jit's RGraph tree graph where it populates the tree by using JSON data. I might wanna expand the tree by populating the JSON from a database. But, I couldn't figure out how will the database look like if I add more children to the data. I think it is doable if it was only for JSON. But I can't fully grasp when it comes to dynamic data.
What model/structure is suitable to solve this kind of problem?

It could be as simple as this:
FK1 and FK2 that denote FOREIGN KEYs (FATHER_ID references PERSON_ID and MOTHER_ID also references PERSON_ID). FATHER_ID and/or MOTHER_ID can be left NULL if unknown.
This model is not perfect. For example, it doesn't enforce parents' gender (the father is male and mother female) and it only represents biological parents, but not other kinds of relationships such as adoption (all of which can be done, but with certain complications in the model).
However it's very simple and can be naturally traversed (in recursive fashion), analyzed or exported.
For example Parent A has a Children B and C. So, what if Children C produce a Children too in the future. How does it add to the tree by using database?
A child D of C could be represented like this:
PERSON_ID FATHER_ID MOTHER_ID
A
B A
C A
D C
Or like this if C is a mother:
PERSON_ID FATHER_ID MOTHER_ID
A
B A
C A
D C

IMO it's an adjacency list model with some extension. For example to have 2 parents you can simply add a special (end) node or you can mark the node with a comment.

Might look at prior post using GEDCOM model
I use Union and Person tables with a common Union_ID key in both. In the Person table, this refers to the parents. In the graph, the Union may have 2 inbound parents and an unlimited number of outbound children.

Related

Best Way to Store Hierarchal Data (Parent <- Child <- Grandchild)

I have a dataset that I need to work with that represents a part schematic for a large machine. I need to come up with an appropriate database schema for this dataset and am having trouble coming up with something to use that represents this data efficiently.
The top level components are the biggest "structures", and as you traverse down the hierarchy, the data represents inner components, or components that make up the inner components. For example, at the top level, there could be an engine as a level 1 component, and then a level 2 component is a piston, which goes into an engine, and a level 3 component could be a gasket that goes into the piston.
This representation is spread across a few hundred lines of a CSV file. There are 3 columns for IDs:
a master_id, which all components have
a parent_id, which all components have as well but their value varies based on the situation.
If the component in question is a level 1 part, the parent_id is its own master_id.
If the component in question is a level 2 part, the parent_id is the master_id of the level 1 component.
If the component in question is a level 3 part, the parent_id is the master_id of the level 2 component.
Basically, the parent id of any component is the master id of the component in the level above it. So lv1 parent is lv1 master (since it' s the root), lv2 parent is lv1 master, and lv3 is lv2 master. Also, multiple components can share a parent ID, meaning multiple lv2 parts, for example, can have the same parent ID.
a grandparent_id, which only level 3 components have (but not all lv3 components for some reason (idk I didn't make this data set)). If a component is lv3 and has a grandparent_id, the grandparent ID is a direct link back to the master ID of the lv1 component. Yeah, confusing right?
So here's an example. A lv3 component has a master_id of 700000137, a parent_id of 600000049, and a grandparent_id of 500000006. If we look at the component with a master of 600000049, we'll see that this is a lv2 component that has a parent id of 500000006, which is the master id of a lv1 component, and again is the grandparent of this lv3 component.
I prefaced this post saying I need to come up with a database representation for this data set (it has later use in a project but the data organization is the first step). I'm comfortable using PostgreSQL, so my initial thoughts were to make 3 tables, master, parent, and grandparent, where based on the key that I'm parsing out, I would insert this into the appropriate database and foreign key back to the other tables if there were parent or grandparent keys. But I realized this could get quite hairy especially since there could be multiple foreign keys linking back to a single master id, and I feel with this representation some data could possibly get repeated, which I obviously don't want happening.
My second thought was to use something like a python dictionary, where I essentially build out a tree like structure where the lv1 components are in the top level, the lv2 components in the second, etc. I could then convert the dictionary into JSON, since Python is nice that way, and store that json blob in the database. But, this JSON blob could potentially get REALLY big, though I guess that's just something I'd have to live with as the dataset grows. This part schematic I was given is only for one machine, so basically each entry in my database would be like
id | name | json
----------------------
1 | machine_a | JSON_BLOB_MACHINE_A
----------------------
2 | machine_b | JSON_BLOB_MACHINE_B
etc...
does my second approach seem better than trying to create separate tables that represent each part level and foreign keying back to parents? If there's a better way to do this with Postgres, I'd appreciate you explaining it. Otherwise, I'm probably going to go with the latter route. Thanks!
If you don't need to join parts in other machines, then I think a jsonb column for parts may be best. You can still index jsonb using GIN indexes and get really good performance from queries.
As long as the parts are not shared among many machines, which would make updating part properties across all machines tricky, then you probably OK.
This should make queries for a machine pretty effortless as majority of the data is self-contained.

How to model a list to different entities efficiently?

Given the model below:
CustomerType1(id, telephone, address)
CustomerType2(id, telephone, name)
OrderType1(id, timestamp, customerType1.id, comments, enum1)
OrderType2(id, timestamp, customerType2.id, comments)
OrderType3(id, timestamp, name)
How would I model the following?
OrderList(id, OrderType.id, ..)
OrderItem(OrderList.id, MenuItem.id)
A. Would I need 3 different types of OrderLists in order to adapt to the orderTypes?
OrderList1(id, OrderType1.id, ..)
OrderItem1(OrderList1.id, MenuItem.id)
OrderList2(id, OrderType2.id, ..)
OrderItem2(OrderList2.id, MenuItem.id)
OrderList3(id, OrderType3.id, ..)
OrderItem3(OrderList3.id, MenuItem.id)
Or
B. Would 3 definitions of a relationship between orderLists and OrderTypes be better?
OrderList_Type1(orderList.id, orderType1.id)
OrderList_Type2(orderList.id, orderType2.id)
OrderList_Type3(orderList.id, orderType3.id)
This seems like a really inefficient way to store data and I just feel like i've modelled this really incorrectly (although it still makes sense, it might not be good for scaling/efficiency?). Is there a better way to model this?
Note: the given model can be changed but it would still have to contain the same information.
1. Your UML model is ok
From the point of view of UML class diagram your model and the OrderList, OrderItem extensions you want to add are clear and unambiguous and I don't see any modeling question there.
To avoid excessive copy/pastes I have only added 2 parent classes named as ...Base. It is common OOP modelling technique
Drawn as UML class diagram your model looks like this:
2. For the physical implementation I would choose B
As for the implementation "model" of this model from the two choices you gave ((A) many copy/pastes, (B) somehow normalize and minimize the schema) I would go the (B) path drawn below.
It is how one of our company's software systems models class inheritance in the relational language and it works and it works quite well.
In our system most of the necessary glue code is automatically generated. Main thing is the automatically generated OrderType2View which automatically joins corresponding field from the parent table OrderTypeBase and automatically translates all DML operations e.g. the insert as DML operations in both OrderType2 and OrderTypeBase automatically adding correct OrderTypeClassId fields to all records in the parent table. So that it is easily distinguishable which child table actually contains the specific part of the record.
Thanks to the generator we can easily extend the model with other parent classes (the inheritance hierarchy and the number of joined tables can be of any depth) and still enable some older code to treat them as their general parents - without caring about the details.
I don't know if there are better ways, given the (A) or (B) I would choose (B) because it is a design that works (I have seen it :)

Table with hierarchy or multiple tables?

I am designing database model for some application, and I have one table Post which belong to some category. OK, Category will logically be other table.
But, more categories belong to some super category or domain or area, and my question is next:
Whether create other table for super categories or domains, or to do this hierarchy in table Category with some combination of key to point to parent.
I hope I was clear with problem?
PS.I know that I can do this problem with both solution, but is there any benefits with using first over second solution, and contrary.
Thanks
It depends: if nearly each category has a parent, you could add a parent serial as a column. Then your category table will look like
+--+----+------+
|ID|Name|Parent|
+--+----+------+
The problem with this representation is that, as long the hierarchy is not cyclic, some categories will have no parent. Furthermore a category can only have one parent.
Therefore I would suggest using a category_hierarchy table. An additional table:
+-----+------+
|Child|Parent|
+-----+------+
The disadvantage of this approach is that nearly each category will be repeated. And therefore if nearly all categories have parents, the redundancy will approximately scale with that number. If relations however are quite sparse, one saves space. Furthermore using an intelligent join will prevent the second representation from taking long execution times. You can for instance define a view to handle such requests.
Furthermore there are situations where the second approach can improve speed. For instance if you don't need the hierarchy all the time (for instance when mapping serials to the category-name), lookups in the category table can be faster, simply because the table is more compact and thus more parts of the table will be cached.

Database Design: Hierarchical Data

I am having trouble arriving at a normalized relational database design to describe a small hierarchy which deviates enough from the typical hierarchy examples such that I am unsure how to proceed my first time tackling such a problem.
My problem is as follows:
Each branch in the hierarchy is guaranteed to be either 2, 4, or 6 levels deep.
If it is 2 levels deep, the hierarchy looks like this:
Category / Group / Component
If it is 4 levels deep, it looks like this:
Category / Group / Component / Group / Component
If it is 6 levels deep, it looks like this:
Category / Group / Component / Group / Component / Group / Component
Categories, Groups, and Components each have their own set of attributes. To further complicate matters, a relationship exists between a Component and entity A, a Component and entity B, and a Component and entity C.
My original thought was to strive to keep the Components in one table, however, I have been unable to come up with a normalized solution that satisfies this goal.
Instead, I came up with a normalized solution where there is a separate table for Components at each of the three possible component levels. However, I am not really comfortable with this because it triples the number of tables capturing links between components and entitites A, B, and C (9 total link tables rather than 3 if all components were in one table).
Here is what the design I came up with looks like:
TABLE: Group_1_Components
ATTRIBUTES: Row_ID, Category, Component
RELATES-TO: Group_1_Components_A_Links, Group_1_Components_B_Links, Group_1_Components_C_Links, Group_2_Components
TABLE: Group_2_Components
ATTRIBUTES: Row_ID, Group, Component, Group_1_Component_Row_ID
RELATES-TO: Group_2_Components_A_Links, Group_2_Components_B_Links, Group_2_Components_C_Links, Group_1_Components, Group_3_Components
TABLE: Group_3_Components
ATTRIBUTES: Row_ID, Group, Component, Group_2_Component_Row_ID
RELATES-TO: Group_3_Components_A_Links, Group_3_Components_B_Links, Group_3_Components_C_Links, Group_2_Components
Each of the 9 links tables contain two Row IDs to address a many-to-many relationship with either table A, B, or C.
Is this a reasonable design or am I overlooking a simpler, more typical solution? I looked at a few design techniques specific to capturing hierarchies in a relational database, notably the adjacency list, but I am not sure they fit here, nor do they appear to be normalized solutions.
It should be noted that the hierarchy will be seldomly modified; it will frequently be read where reads retrieve either all of the components or components at a specific level for a selected group. The link tables to entities A, B, and C will be written to regularly.
Any and all suggestions are welcome. Thanks in advance for your help. Brian
I suggest that you de-normalize your data so that your hierarchy is based on component/group entities, so that you match "regular" hierarchies. In this case you can have the following tables:
a) Components
b) Groups
c) Component_Groups - with a unique key on component_id and group_id to ensure that you only have one combination for each component and group
In this case then your hierarchy will be: Category -> Component_Group -> Component_Group -> Component_Group
Another option for this kind of problem is using a self-referencing table. Just one table.
Single table with ID, PARENT_ID and a TYPE so you can distinguish CATEGORY, GROUP and COMPONENT.
All categories would have no PARENT_ID and then you could search for all child objects where the parent id is equal to the id of the category you want to dive deeper into.

Self-Referential ManyToMany Convention in CakePHP

I have an existing data model where I can rename things freely to match CakePHP's conventions. I have a type of graph node, where a node can have an arbitrary number of child nodes and an arbitrary number of parent nodes (uni-directional relationships).
Here's the table of nodes, following CakePHP's conventions:
Table: nodes
Column: node_id (INT)
Column: description (TEXT)
My question is what the join table should look like? Here is what it looks like now:
Table: nodes_nodes
Column: parent_node_id (INT)
Column: child_node_id (INT)
And what the documentation implies it should be:
Table: nodes_nodes
Column: node_id (INT)
Column: node_id (INT)
Notice that two column names are the same, which obviously won't work. What should these two columns be called? Or can CakePHP's conventions not handle this situation without configuration?
As neilcrookes noted, there are some articles on this on the web about doing this in CakePHP. Here is one of them, using User HABTM User (friends) as an example.
In that linked article, you can ignore everything after the User class definition if you aren't going to be paginating on the model.
If a node has child nodes, do those child nodes automatically have the first node as a parent?
This relationship may be similar to a users to users relationship where the relationship symbolises the common 'friend' notion in social networks. Suggest you have a google around for user/friend data models to see if that helps.

Resources