Database Design: Hierarchical Data - database

I am having trouble arriving at a normalized relational database design to describe a small hierarchy which deviates enough from the typical hierarchy examples such that I am unsure how to proceed my first time tackling such a problem.
My problem is as follows:
Each branch in the hierarchy is guaranteed to be either 2, 4, or 6 levels deep.
If it is 2 levels deep, the hierarchy looks like this:
Category / Group / Component
If it is 4 levels deep, it looks like this:
Category / Group / Component / Group / Component
If it is 6 levels deep, it looks like this:
Category / Group / Component / Group / Component / Group / Component
Categories, Groups, and Components each have their own set of attributes. To further complicate matters, a relationship exists between a Component and entity A, a Component and entity B, and a Component and entity C.
My original thought was to strive to keep the Components in one table, however, I have been unable to come up with a normalized solution that satisfies this goal.
Instead, I came up with a normalized solution where there is a separate table for Components at each of the three possible component levels. However, I am not really comfortable with this because it triples the number of tables capturing links between components and entitites A, B, and C (9 total link tables rather than 3 if all components were in one table).
Here is what the design I came up with looks like:
TABLE: Group_1_Components
ATTRIBUTES: Row_ID, Category, Component
RELATES-TO: Group_1_Components_A_Links, Group_1_Components_B_Links, Group_1_Components_C_Links, Group_2_Components
TABLE: Group_2_Components
ATTRIBUTES: Row_ID, Group, Component, Group_1_Component_Row_ID
RELATES-TO: Group_2_Components_A_Links, Group_2_Components_B_Links, Group_2_Components_C_Links, Group_1_Components, Group_3_Components
TABLE: Group_3_Components
ATTRIBUTES: Row_ID, Group, Component, Group_2_Component_Row_ID
RELATES-TO: Group_3_Components_A_Links, Group_3_Components_B_Links, Group_3_Components_C_Links, Group_2_Components
Each of the 9 links tables contain two Row IDs to address a many-to-many relationship with either table A, B, or C.
Is this a reasonable design or am I overlooking a simpler, more typical solution? I looked at a few design techniques specific to capturing hierarchies in a relational database, notably the adjacency list, but I am not sure they fit here, nor do they appear to be normalized solutions.
It should be noted that the hierarchy will be seldomly modified; it will frequently be read where reads retrieve either all of the components or components at a specific level for a selected group. The link tables to entities A, B, and C will be written to regularly.
Any and all suggestions are welcome. Thanks in advance for your help. Brian

I suggest that you de-normalize your data so that your hierarchy is based on component/group entities, so that you match "regular" hierarchies. In this case you can have the following tables:
a) Components
b) Groups
c) Component_Groups - with a unique key on component_id and group_id to ensure that you only have one combination for each component and group
In this case then your hierarchy will be: Category -> Component_Group -> Component_Group -> Component_Group

Another option for this kind of problem is using a self-referencing table. Just one table.
Single table with ID, PARENT_ID and a TYPE so you can distinguish CATEGORY, GROUP and COMPONENT.
All categories would have no PARENT_ID and then you could search for all child objects where the parent id is equal to the id of the category you want to dive deeper into.

Related

Best Way to Store Hierarchal Data (Parent <- Child <- Grandchild)

I have a dataset that I need to work with that represents a part schematic for a large machine. I need to come up with an appropriate database schema for this dataset and am having trouble coming up with something to use that represents this data efficiently.
The top level components are the biggest "structures", and as you traverse down the hierarchy, the data represents inner components, or components that make up the inner components. For example, at the top level, there could be an engine as a level 1 component, and then a level 2 component is a piston, which goes into an engine, and a level 3 component could be a gasket that goes into the piston.
This representation is spread across a few hundred lines of a CSV file. There are 3 columns for IDs:
a master_id, which all components have
a parent_id, which all components have as well but their value varies based on the situation.
If the component in question is a level 1 part, the parent_id is its own master_id.
If the component in question is a level 2 part, the parent_id is the master_id of the level 1 component.
If the component in question is a level 3 part, the parent_id is the master_id of the level 2 component.
Basically, the parent id of any component is the master id of the component in the level above it. So lv1 parent is lv1 master (since it' s the root), lv2 parent is lv1 master, and lv3 is lv2 master. Also, multiple components can share a parent ID, meaning multiple lv2 parts, for example, can have the same parent ID.
a grandparent_id, which only level 3 components have (but not all lv3 components for some reason (idk I didn't make this data set)). If a component is lv3 and has a grandparent_id, the grandparent ID is a direct link back to the master ID of the lv1 component. Yeah, confusing right?
So here's an example. A lv3 component has a master_id of 700000137, a parent_id of 600000049, and a grandparent_id of 500000006. If we look at the component with a master of 600000049, we'll see that this is a lv2 component that has a parent id of 500000006, which is the master id of a lv1 component, and again is the grandparent of this lv3 component.
I prefaced this post saying I need to come up with a database representation for this data set (it has later use in a project but the data organization is the first step). I'm comfortable using PostgreSQL, so my initial thoughts were to make 3 tables, master, parent, and grandparent, where based on the key that I'm parsing out, I would insert this into the appropriate database and foreign key back to the other tables if there were parent or grandparent keys. But I realized this could get quite hairy especially since there could be multiple foreign keys linking back to a single master id, and I feel with this representation some data could possibly get repeated, which I obviously don't want happening.
My second thought was to use something like a python dictionary, where I essentially build out a tree like structure where the lv1 components are in the top level, the lv2 components in the second, etc. I could then convert the dictionary into JSON, since Python is nice that way, and store that json blob in the database. But, this JSON blob could potentially get REALLY big, though I guess that's just something I'd have to live with as the dataset grows. This part schematic I was given is only for one machine, so basically each entry in my database would be like
id | name | json
----------------------
1 | machine_a | JSON_BLOB_MACHINE_A
----------------------
2 | machine_b | JSON_BLOB_MACHINE_B
etc...
does my second approach seem better than trying to create separate tables that represent each part level and foreign keying back to parents? If there's a better way to do this with Postgres, I'd appreciate you explaining it. Otherwise, I'm probably going to go with the latter route. Thanks!
If you don't need to join parts in other machines, then I think a jsonb column for parts may be best. You can still index jsonb using GIN indexes and get really good performance from queries.
As long as the parts are not shared among many machines, which would make updating part properties across all machines tricky, then you probably OK.
This should make queries for a machine pretty effortless as majority of the data is self-contained.

Problems defining normalized SQL database schema with recursive relationships and empty values

I have a question regarding the correct implementation of a Schema that I'm currently wrecking my head with:
We have machines, which consist of components, which consist of parts.
However, the relationships are as follows:
Machines (1) --> Components (N) - a machine is made up of various
components
Components (N) --> Parts (N) - a component is made up of
multiple parts, a part may be used in multiple components
Components (N) --> Components (N) - a component can also be made up of other components
Machines (N) --> Parts (N) - Some parts may also be directly assigned to
a machine
Furthermore, both parts and components that are flagged as needs_welding=1 will have a price associated with them. These prices will change over time.
I'm not quite sure as to how to model the following aspects:
How to relate the Parts directly to the machine table
How to model the parent/child relationship between the components
How to attach prices to the items (kinda reminds me of an SCD in a DWH, but I cannot seem to patch it together)
A good solution for N->N mappings is to create a specific mapping table. So, for example, to map a Component to the Part(s) it is made of, you can create a table called something like
MapComponentToItsParts, which has two columns, the first which contains the ID of the component, the second which contains the ID of the part. They should each be Foreign Keys to their respective tables. You can create similar tables to MapComponentToSubComponent, or MapMachineToPart.

Table with hierarchy or multiple tables?

I am designing database model for some application, and I have one table Post which belong to some category. OK, Category will logically be other table.
But, more categories belong to some super category or domain or area, and my question is next:
Whether create other table for super categories or domains, or to do this hierarchy in table Category with some combination of key to point to parent.
I hope I was clear with problem?
PS.I know that I can do this problem with both solution, but is there any benefits with using first over second solution, and contrary.
Thanks
It depends: if nearly each category has a parent, you could add a parent serial as a column. Then your category table will look like
+--+----+------+
|ID|Name|Parent|
+--+----+------+
The problem with this representation is that, as long the hierarchy is not cyclic, some categories will have no parent. Furthermore a category can only have one parent.
Therefore I would suggest using a category_hierarchy table. An additional table:
+-----+------+
|Child|Parent|
+-----+------+
The disadvantage of this approach is that nearly each category will be repeated. And therefore if nearly all categories have parents, the redundancy will approximately scale with that number. If relations however are quite sparse, one saves space. Furthermore using an intelligent join will prevent the second representation from taking long execution times. You can for instance define a view to handle such requests.
Furthermore there are situations where the second approach can improve speed. For instance if you don't need the hierarchy all the time (for instance when mapping serials to the category-name), lookups in the category table can be faster, simply because the table is more compact and thus more parts of the table will be cached.

How do I create a User-Defined Hierarchy that includes a Parent-Child Hierarchy?

Environment: SSAS 2005, BIDS 2008.
Data Schema:
DepartmentMaster
DepartmentNumber int (PARENT)
SubDepartmentNumber int (CHILD)
CategoryMaster:
DepartmentNumber int references DepartmentMaster.DepartmentNumber
CategoryNumber int
Functionality I desire:
Hierarchical drill down:
Department -> Category
Department -> Sub-Department -> Category
Accomplished through the following:
Category Dimension
Department Dimension
Parent-Child Hierarchy on Sub-Department and Department.
Why this is a problem:
I feel that Category really should be in the same Dimension and call it "Product" or "Item." Initially, without Sub-Department, this is how I had it set up:
Item Dimension
Department -> Category
Unfortunately, once a Parent entity was introduced at the department level, I could no longer configure the attribute relationships in a way that built correctly (or at all).
My question:
Is it possible to configure these relationships such that they are all in one dimension giving me the Hierarchies described above?? If it is possible - should I be? Is it not working well because I'm doing it wrong to begin with?

Does allowing a category to have multiple parents make sense? Are there alternatives?

Short question: How should product categories that appear under multiple categories be managed? Is it a bad practice to do so at all?
Background info:
We have a product database with categories likes this:
Products
-Arts and Crafts Supplies
-Glue
-Paper Clips
-Construction Paper
-Office Supplies
-Glue
-Paper Clips
Note that glue and paper clips are assigned to both categories. And although they appear in two different spots in this category tree, they have the same category ID in the database. Why? Two reasons:
Categories are assigned attributes - for example, a paper clip could have a weight, a material, a color, etc.
Products assigned to the glue category are displayed under arts and crafts and Office Supplies. Which is to be expected - they're the same actual category ID in the database.
This allows us to manage a single category and it's attributes and assigned products, but place it at multiple places within the category tree.
We are using the nested set model, so the db structure we use to support this is:
Category
----------
CategoryID
CategoryName
CategoryTree
------------
CategoryTreeID
CategoryID
Lft
Rgt
So there's a 1:M between Category and CategoryTree because there can be multiple instances of a given category within the category tree.
Is there a simpler way to model this that would allow a product category to display under multiple categories?
I don't see anything wrong with this as long as it is true that all Glue is appropriate for both Office Supplies and craft supplies.
What you have is a good way, though why not simplify the 2nd table like so:
Category
ID
Name
SubCategory
ID
CategoryID
SubCategoryID
Though for the future I would beware of sharing child categories between the two root categories. Sometimes it is better to create a unique categorization of products for consistency, which is easier to manage for you and potentially easier to navigate for the customer. Otherwise, you have the issue that if you're on the Glue page coming from office supplies, then do you show the other path as well? If not, you will have two identical pages, except for the path, which is an issue for SEO. If you do, then the user may get confused.
The most famous example of this is Google Mail, where the classification is done this way. Google is famous for the usability of their products ...
I believe other words are preferable to the "parent" word, that actually suggest only XToOne relationship...
Maybe you could say that a Product as many Categories, so the relationship would be ManyToMany. And only the display would starts with Categories to reach the Products...
This would highlight a problem : if you don't limit the number of categories, and you display the categories with sub-categories and so on, you could end up with:
a huge categories and product list, with many many duplications
a big depth (probably unreadable)
The interesting part is highlighting the problem, then to imagine a solution that is fine for the end-user.
It may well be necessary for a category to have multiple parents. However, no matter what parent you found a category under, its subcategories should remain the same.
I've seen real systems that implemented precisely this logic and worked fine.
edit
To answer your question, I don't think the model I'm suggesting is as restrictive as you imagine. Basically, a given branch of the tree may be found under more than one parent branch, but wherever it is found, it has the same children. Nothing about this prevents you from cherry-picking some children of one branch and also making them children of another.
So, for example, you could include the glues category under both office supplies and hobby supplies, and if you added "Crazy Glue (Suppository Edition)" under glues, it would show up in both. If you have items that might be grouped together logically but need to be separated by their use, you can still do that. You might put mucilage and paste under the category of hobby adhesives, which goes under the hobby root, but not under the office root. Or you could do that and simultaneously have a combined category that's used internally by your buyers. What you can't do is forget to include that new type of glue in all of the relevant categories once you've added it wherever it belongs in your business model ontology.
In short, you lose very little with this restriction, but gain a bit of structure to help avoid the problem of having to manage each item individually.
edit
Assuming I've made a convincing case for the model itself, there's still the issue of implementation. There are lots of options, but here's one way to go:
There is a CatalogItem table containing a synthetic primary key, the label, optional description/detail text, and an optional SKU (or equivalent). You then have a many-to-many CatalogItemJoin with child and parent ID's, both sides constrained to CatalogItemTable.
An item that appears as a parent is a category, so it should not have a SKU. An item that appears only as a child is a product, so it should have a SKU. It's fine for any item to have more than one parent; that just means that it's in multiple categories. Likewise, there's no problem with multiple children per parent; that would be the typical case of a category with a few products in it. However, given a category's ID, its children will be the same regardless of what parent category led you there. The other constraint is that you'll want to avoid loops.

Resources