What is this data referencing anti-pattern called? - database

I have a question related to a kind of duplication I see in databases from time to time. To ask this question, I need to set the stage a bit:
Let's say I have a database of TV shows. Its primary table Content stores information at various levels of granularity (Show -> Season -> Episode), using a parent column to denote hierarchy:
+----+---------------------------+-------------+----------+
| ID | ContentName | ContentType | ParentId |
+----+---------------------------+-------------+----------+
| 1 | Friends | Show | [null] |
| 2 | Season 1 | Season | 1 |
| 3 | The Pilot | Episode | 2 |
| 4 | The One with the Sonogram | Episode | 2 |
+----+---------------------------+-------------+----------+
Maybe this isn't ideal, but let's say it's good enough to work with and we're not looking to change it.
Now let's say we need to build a table that defines air dates. We can set these at any level, and they must apply down the hierarchy (e.g., if set at the Season level, it applies to all episodes within that season; if set at the Show level, it applies to all seasons and episodes).
So the original air dates might look like this:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 71 | 3 | 1994-09-22 |
| 72 | 4 | 1994-09-29 |
+-------+-----------+------------+
Whereas the air date for a streaming service might look like:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
+-------+-----------+------------+
Cool. Everything's fine so far; we're adhering to 4NF (I think!) and we can proceed to our business logic.
Now we get to my question. If we implement our business logic in such a way that disregards the referential hierarchy, and instead duplicates the air dates down the hierarchy, what is this anti-pattern called? e.g., Let's say I set an air date at the Show level like above, but the business logic finds all child elements and creates an entry for each one, resulting in:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
| 92 | 2 | 2015-01-01 |
| 93 | 3 | 2015-01-01 |
| 94 | 4 | 2015-01-01 |
+-------+-----------+------------+
There are some pretty clear problems with this, but please note that my question is not how to fix this. Just, is there a specific term for it? I want to call it something like, "disregarding data relationship" or, "ignoring referential context". Maybe it's not strictly a database anti-pattern, since in my example there's an external actor inserting the excess rows.

Related

Postgres query chain structure data

Assuming there is a gigantic organization with a crazy way to manage. Each employee has one or multiple managers, managers are employees themselves who have one or multiple managers on top.
employee table
| id | name |managers_id|
| -------- | -------------- |-----------|
| 1 | Smith | 5,6 |
| 2 | Matt | 1 |
| 3 | Bob | 1,2 |
| 4 | Adam | 1,3 |
| 5 | Suzi | 6 |
| 6 | Emily | 23,25 |
| ... | ... | ... |
It is a one-way management chain, no loops, meaning it goes A-B-C-D, A-a-b-C-D etc, no such case as A-B-C-D-A
The query is to get the management chains, say C has two management chains on top:
A-B-C
A-a-b-C
C also has one chain below:
C-D
The level of C along the chains is not a matter.
In theory, there is no limitation on the number of levels, the chain can keep going indefinitely.
I was thinking about 'inheritance' but probably it is not the solution.
Any tips on how to design this postgres dababase, please? Thank you.

Traversing and Getting Nodes in Graph without Loop

I have a person table which keeps some personal info. like as table below.
+----+------+----------+----------+--------+
| ID | name | motherID | fatherID | sex |
+----+------+----------+----------+--------+
| 1 | A | NULL | NULL | male |
| 2 | B | NULL | NULL | female |
| 3 | C | 1 | 2 | male |
| 4 | X | NULL | NULL | male |
| 5 | Y | NULL | NULL | female |
| 6 | Z | 5 | 4 | female |
| 7 | T | NULL | NULL | female |
+----+------+----------+----------+--------+
Also I keep marriage relationships between people. Like:
+-----------+--------+
| HusbandID | WifeID |
+-----------+--------+
| 1 | 2 |
| 4 | 5 |
| 1 | 5 |
| 3 | 6 |
+-----------+--------+
With these information we can imagine the relationship graph. Like below;
Question is: How can I get all connected people by giving any of them's ID.
For example;
When I give ID=1, it should return to me 1,2,3,4,5,6.(order is not important)
Likewise When I give ID=6, it should return to me 1,2,3,4,5,6.(order is not important)
Likewise When I give ID=7, it should return to me 7.
Please attention : Person nodes' relationships (edges) may have loop anywhere of graph. Example above shows small part of my data. I mean; person and marriage table may consist thousands of rows and we do not know where loops may occur.
Smilar questions asked in :
PostgreSQL SQL query for traversing an entire undirected graph and returning all edges found
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=118319
But I can't code the working SQL. Thanks in advance. I am using SQL Server.
From SQL Server 2017 and Azure SQL DB you can use the new graph database capabilities and the new MATCH clause to answer queries like this, eg
SELECT FORMATMESSAGE ( 'Person %s (%i) has mother %s (%i) and father %s (%i).', person.userName, person.personId, mother.userName, mother.personId, father.userName, father.personId ) msg
FROM dbo.persons person, dbo.relationship hasMother, dbo.persons mother, dbo.relationship hasFather, dbo.persons father
WHERE hasMother.relationshipType = 'mother'
AND hasFather.relationshipType = 'father'
AND MATCH ( father-(hasFather)->person<-(hasMother)-mother );
My results:
Full script available here.
For your specific questions, the current release does not include transitive closure (the ability to loop through the graph n number of times) or polymorphism (find any node in the graph) and answering these queries may involve loops, recursive CTEs or temp tables. I have attempted this in my sample script and it works for your sample data but it's just an example - I'm not 100% it will work with other sample data.

Create/Update table in MS Access dynamically

EDIT:
Here's what I have: An Access database made up of 3 tables linked from SQL server. I need to create a new table in this database by querying the 3 source tables. Here are examples of the 3 tables I'm using:
PlanTable1
+------+------+------+------+---------+---------+
| Key1 | Key2 | Key3 | Key4 | PName | MainKey |
+------+------+------+------+---------+---------+
| 53 | 1 | 5 | -1 | Bikes | 536681 |
| 53 | 99 | -1 | -1 | Drinks | 536682 |
| 53 | 66 | 68 | -1 | Balls | 536683 |
+------+------+------+------+---------+---------+
SpTable
+----+---------+---------+
| ID | MainKey | SpName |
+----+---------+---------+
| 10 | 536681 | Wing1 |
| 11 | 536682 | Wing2 |
| 12 | 536683 | Wing3 |
+----+---------+---------+
LocTable
+-------+-------------+--------------+
| LocID | CenterState | CenterCity |
+--- ---+-------------+--------------+
| 10 | IN | Indianapolis |
| 11 | OH | Columbus |
| 12 | IL | Chicago |
+-------+-------------+--------------+
You can see the relationships between the tables. The NewMasterTable I need to create based off of these will look something like this:
NewMasterTable
+-------+--------+-------------+------+--------------+-------+-------+-------+
| LocID | PName | CenterState | Key4 | CenterCity | Wing1 | Wing2 | Wing3 |
+-------+--------+-------------+------+--------------+-------+-------+-------+
| 10 | Bikes | IN | -1 | Indianapolis | 1 | 0 | 0 |
| 11 | Drinks | OH | -1 | Columbus | 0 | 1 | 0 |
| 12 | Balls | IL | -1 | Chicago | 0 | 0 | 1 |
+-------+--------+-------------+------+--------------+-------+-------+-------+
The hard part for me is making this new table dynamic. In the future, rows may be added to the source tables. I need my NewMasterTable to reflect any changes/additions to the source. How do I go about building the NewMasterTable as described? Does this make any sort of sense?
Since an Access table is a necessary requirement, then probably the only way to go about it is to create a set of Update and Insert queries that are executed periodically. There is no built-in "dynamic" feature of Access that will monitor and update the table.
First, create the table. You could either 1) do this manually from scratch by defining the columns and constraints yourself, or 2) create a make-table query (i.e. SELECT... INTO) that generates most of the schema, then add any additional columns, edit necessary details and add appropriate indexes.
Define and save Update and Insert (and optional Delete) queries to keep the table synced. I'm not sharing actual code here, because that goes beyond your primary issue I think and requires specifics that you need to define. Due to some ambiguity with your key values (the field names and sample data still are not sufficient to reveal precise relationships and constraints), it is likely that you'll need multiple Update statements.
In particular, the "Wing" columns will likely require a transform statement.
You may not be able to update all columns appropriately using a single query. I recommend not trying to force such an "artificial" requirement. Multiple queries can actually be easier to understand and maintain.
In the event that you experience "query is not updateable" errors, you may need to define other "temporary" tables with appropriate indexes, into which you do initial inserts from the linked tables, then subsequent queries to update your master table from those.
Finally, and I think this is the key to solving your problem, you need to define some Access form (or other code) that periodically runs your set of "sync" queries. Access forms have a [Timer Interval] property and corresponding Timer event that fires periodically. Add VBA code in the Form_Timer sub that runs all your queries. I would suggest "wrapping" such VBA in a transaction and adding appropriate error handling and error logging, etc.

Valid Warehousing Strategy? Type6?

As a database user, I'm having problems interpreting the data in one of our tables at work. When I questioned the data team, the solution architects told me it was done this way on purpose because it is a "Type 6" table.
From my limited googling, I think a Type 6 should look like this:
+--------------+------------------+------------------+------------+------------+---------------------+
| Customer_Key | Customer_Attrib1 | Customer_Attrib2 | Start_Date | End_Date | Record Updated Date |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 1/1/2001 | 6/8/2004 | 6/9/2004 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 6/9/2004 | 4/11/2016 | 4/12/2016 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 4/12/2016 | 4/3/2017 | 4/4/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 4/4/2017 | 5/18/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 5/19/2017 | 12/31/9999 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
The activity in question is the Customer_Attrib1, how it changed from 1 to 2 on 5/18/2017.
I like this style because I can figure out what customer_attrib1 is at any point of time by using the start and end dates:
select customer_attrib1
from table
where customer_key=123
and '2017-03-01' between start_date and end_date;
However...
The table itself actually gets updated in arrears, to look like this:
+--------------+------------------+------------------+------------+------------+---------------------+
| Customer_Key | Customer_Attrib1 | Customer_Attrib2 | Start_Date | End_Date | Record Updated Date |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 1/1/2001 | 6/8/2004 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 6/9/2004 | 4/11/2016 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 4/12/2016 | 4/3/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 4/4/2017 | 5/18/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 5/19/2017 | 12/31/9999 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
Can you see how much trouble I have, if I want to go find what the customer_attrib1 was during March of 2016?
NOTE: There is a previous_customer_attrib1 column, but it also gets mass updated to the value of 1. I wanted to keep the table small enough to get the point across, which is why I didn't add it above.
The big question: Is this a valid warehousing strategy? Is this really what Type 6 is? Or is my solution architect wrong.
Follow up question: Would the answer be different if customer_attrib1 was a foreign key to another table?
Your first example looks like a plain ol Type 2 SCD. The second example looks like it is Type 1 (overwrite) on attribute 1 and Type 2 SCD on attribute 2.
Neither are a Type 6 as presented, which would be where you have a way to see both the history of the changes (in a Type 2 way, as per your first example) but also the current values, typically by holding a separate set of columns for the current values, or by linking to the current record. You mention the previous attrib 1 column, which is crucial to it being a Type 6. However you would not expect that to also be bulk updated, as otherwise you just get the one previous value, and you don't get to see any changes prior to that.
Different people refer to a Type 6 meaning different things. What you need in a type 6 is simply the value itself (which applies to the row at the time) and the current value (which is bulk updated when there is a change), along with (of course) the type 2 approach of creating new rows for each change.
To answer your question, yes, I can see how much trouble you have with the design that's been given to you. Its a valid strategy if and only if it meets the business requirements. These techniques are only there to help meet business needs.
If the attribute is a foreign key then it becomes a bit more tricky, and we'd need more info about how that foreign keyed table was tracking history to be able to answer whether that changes anything.

Ingredients for drink new table or string in column?

I am thinking about my application and I want to store data about drinks. Now I am thinking what is best that if I save ingredients just as nvarchar column in table with drink or if I create new table with ingredients and create relationship one to many? I want database to be just read-only and I want to have option filter by ingredients. So what´s best way for windows phone (for performance)? And if the new table would be the better choice, I should use EntitySet, EntityRef, am I right? And I would have for every ingredient new row in table? Let´s say I have 100drinks, in average that every drink has 4ingredients so I have in first table 100rows and in second cca 400rows? Thanks for help
Actually, both solutions proposed are wrong. A drink can have many ingredients and an ingredient can be used in many drinks. Hence, we have a many-to-many relationship here. The proper way to model this is the following (I'm adding data for it to be more understandable):
Ingredients (PK: Id)
+----+--------------------+
| Id | Name |
+----+--------------------+
| 1 | Water |
| 2 | Sugar |
| 3 | Coffe |
| 4 | Virgin Islands Tea |
| 5 | Ice |
+----+--------------------+
Drinks (PK: Id)
+----+-------------+
| Id | Name |
+----+-------------+
| 1 | Black Coffe |
| 2 | Tea |
| 3 | Ice Tea |
+----+-------------+
Drinks_Ingredients (PK: Drink_Id, Ingredient_Id)
+----------+---------------+------------+
| Drink_Id | Ingredient_Id | Proportion |
+----------+---------------+------------+
| 1 | 1 | 70 |
| 1 | 2 | 10 |
| 1 | 3 | 20 |
| 2 | 1 | 90 |
| 2 | 4 | 10 |
| 3 | 1 | 80 |
| 3 | 4 | 10 |
| 3 | 5 | 10 |
+----------+---------------+------------+
I'm adding this Proportion column to show you how to add data that is dependant on the pair of drink-ingredient. Now, if you're worried about the size of the tables it'll be quite small as the only tables that will have the more complex data types (varchars) will be the ingredients and drinks tables, which will have the minimum amount of records possible: one per each drink and one per each ingredient.
If you still have doubts keep looking at the example, you'll get it :)
I would do a table for ingredients with a description and an id, and store the ids in the drink table cause it's the elegant way to do it. For 100 drinks, you won't see a difference for the performance.

Resources