As a database user, I'm having problems interpreting the data in one of our tables at work. When I questioned the data team, the solution architects told me it was done this way on purpose because it is a "Type 6" table.
From my limited googling, I think a Type 6 should look like this:
+--------------+------------------+------------------+------------+------------+---------------------+
| Customer_Key | Customer_Attrib1 | Customer_Attrib2 | Start_Date | End_Date | Record Updated Date |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 1/1/2001 | 6/8/2004 | 6/9/2004 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 6/9/2004 | 4/11/2016 | 4/12/2016 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 1 | A | 4/12/2016 | 4/3/2017 | 4/4/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 4/4/2017 | 5/18/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 5/19/2017 | 12/31/9999 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
The activity in question is the Customer_Attrib1, how it changed from 1 to 2 on 5/18/2017.
I like this style because I can figure out what customer_attrib1 is at any point of time by using the start and end dates:
select customer_attrib1
from table
where customer_key=123
and '2017-03-01' between start_date and end_date;
However...
The table itself actually gets updated in arrears, to look like this:
+--------------+------------------+------------------+------------+------------+---------------------+
| Customer_Key | Customer_Attrib1 | Customer_Attrib2 | Start_Date | End_Date | Record Updated Date |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 1/1/2001 | 6/8/2004 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 6/9/2004 | 4/11/2016 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | A | 4/12/2016 | 4/3/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 4/4/2017 | 5/18/2017 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
| 123 | 2 | B | 5/19/2017 | 12/31/9999 | 5/19/2017 |
+--------------+------------------+------------------+------------+------------+---------------------+
Can you see how much trouble I have, if I want to go find what the customer_attrib1 was during March of 2016?
NOTE: There is a previous_customer_attrib1 column, but it also gets mass updated to the value of 1. I wanted to keep the table small enough to get the point across, which is why I didn't add it above.
The big question: Is this a valid warehousing strategy? Is this really what Type 6 is? Or is my solution architect wrong.
Follow up question: Would the answer be different if customer_attrib1 was a foreign key to another table?
Your first example looks like a plain ol Type 2 SCD. The second example looks like it is Type 1 (overwrite) on attribute 1 and Type 2 SCD on attribute 2.
Neither are a Type 6 as presented, which would be where you have a way to see both the history of the changes (in a Type 2 way, as per your first example) but also the current values, typically by holding a separate set of columns for the current values, or by linking to the current record. You mention the previous attrib 1 column, which is crucial to it being a Type 6. However you would not expect that to also be bulk updated, as otherwise you just get the one previous value, and you don't get to see any changes prior to that.
Different people refer to a Type 6 meaning different things. What you need in a type 6 is simply the value itself (which applies to the row at the time) and the current value (which is bulk updated when there is a change), along with (of course) the type 2 approach of creating new rows for each change.
To answer your question, yes, I can see how much trouble you have with the design that's been given to you. Its a valid strategy if and only if it meets the business requirements. These techniques are only there to help meet business needs.
If the attribute is a foreign key then it becomes a bit more tricky, and we'd need more info about how that foreign keyed table was tracking history to be able to answer whether that changes anything.
Related
I have a question related to a kind of duplication I see in databases from time to time. To ask this question, I need to set the stage a bit:
Let's say I have a database of TV shows. Its primary table Content stores information at various levels of granularity (Show -> Season -> Episode), using a parent column to denote hierarchy:
+----+---------------------------+-------------+----------+
| ID | ContentName | ContentType | ParentId |
+----+---------------------------+-------------+----------+
| 1 | Friends | Show | [null] |
| 2 | Season 1 | Season | 1 |
| 3 | The Pilot | Episode | 2 |
| 4 | The One with the Sonogram | Episode | 2 |
+----+---------------------------+-------------+----------+
Maybe this isn't ideal, but let's say it's good enough to work with and we're not looking to change it.
Now let's say we need to build a table that defines air dates. We can set these at any level, and they must apply down the hierarchy (e.g., if set at the Season level, it applies to all episodes within that season; if set at the Show level, it applies to all seasons and episodes).
So the original air dates might look like this:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 71 | 3 | 1994-09-22 |
| 72 | 4 | 1994-09-29 |
+-------+-----------+------------+
Whereas the air date for a streaming service might look like:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
+-------+-----------+------------+
Cool. Everything's fine so far; we're adhering to 4NF (I think!) and we can proceed to our business logic.
Now we get to my question. If we implement our business logic in such a way that disregards the referential hierarchy, and instead duplicates the air dates down the hierarchy, what is this anti-pattern called? e.g., Let's say I set an air date at the Show level like above, but the business logic finds all child elements and creates an entry for each one, resulting in:
+-------+-----------+------------+
| airId | ContentId | AirDate |
+-------+-----------+------------+
| 91 | 1 | 2015-01-01 |
| 92 | 2 | 2015-01-01 |
| 93 | 3 | 2015-01-01 |
| 94 | 4 | 2015-01-01 |
+-------+-----------+------------+
There are some pretty clear problems with this, but please note that my question is not how to fix this. Just, is there a specific term for it? I want to call it something like, "disregarding data relationship" or, "ignoring referential context". Maybe it's not strictly a database anti-pattern, since in my example there's an external actor inserting the excess rows.
Let's say I have a report with following table in the body:
ID | Name | FirstName | GivenCity | RecordedCity | IsCorrectCity
1 | Gates | Bill | London | New York | No
2 | McCain | John | Brussels | Brussels | Yes
3 | Bullock | Lili | London | London | Yes
4 | Bravo | Johnny | Paris | Las Vegas | No
The column IsCorrectCity Basically includes an expression that checkes GivenCity and RecordedCity and returns a No if different or a Yes when equal.
Is it possible to add a report filter on the column IsCorrectCity (and how) so the users will be able to just select all records with No or Yes? I know this can be done with a parameter in the SQL query, but I would like to add it based on the expressions rather then adding more calculations and all to the query.
Here's a tutorial which explains how you can do it
Filtering Data Without Changing Dataset [SSRS]
I am trying to design a model for our future database of our toys and certain measurements that have to be done post-production. I have trouble grasping how to model this. I have tried multiple ways, but none of them seem optimal and in the end I've always lost out on the connectivity between entities.
What I need to achieve is some kind of meaningful relationship between the following:
A toy (with some trivial properties).
A series of toys (multiple toys can be related to one series and a toy can only belong to one series).
Measurement steps. There are currently 6 of these steps. Each step has its own input parameters and these vary in type as well as in number (eg. only 3 parameters for measurement step 1 and 10 parameters for measurement step 2).
With each series, a sequence of these measurement steps is defined. Duplicates of tests are allowed (eg. measurement step 1 > measurement step 4 > measurement step 1 is a valid sequence). The sequence along with the parameters must be stored somewhere for future reference.
Each toy goes through the sequence of measurements that is defined by its series. All of the results must be stored somewhere (for each individual toy).
If I split the measurement steps into their own tables I can't reference them conditionally (as foreign keys) to some other table.
If I try to serialize part of the data I lose the ability to make connections between individual measurement steps, measurement results (at least with queries) etc.
I know people here generally hate/don't answer these kinds of "discussion-like" questions, but I'd ask of you to at least point out what is a good practice in a system where I need to store this locally on a machine, but need a database to hold the data - to move towards serial-like data and just do general relationships where it is easy to do so or keep trying to normalize it as much as possible?
If measurements steps share most of attributes (or are of the same type, like what you called PARAMETERS), and I understood correctly your definitions, I would make something like this.
It could be a starting point.
+----------------------------+ +------------------------------+
| TOYS | | TOY_SERIES |
+-----+----------------------+ +---------+--------------------+
| PK | ID_TOY | +----------+ PK, FK1 | ID_S +--------+
| | | | +------------------------------+ |
| FK1 | ID_S +---------+ | | ... | |
+----------------------------+ | | | |
| | ... | | | | |
| | | | | | |
+-----+----------------------+ +---------+--------------------+ |
|
|
|
|
+------------------------------+ |
| BR_SER_MEAS | |
+---------+--------------------+ |
| PK, FK1 | ID_S +--------+
| | |
| PK, FK2 | ID_M +--------+
| | | |
| PK | ID_SEQ | |
| | | |
+---------+--------------------+ |
|
|
+------------------------------+ |
| MEASURE_STEPS | |
+------------------------------+ |
| PK ID_M +--------+
+------------------------------+
| PARAM_01 |
| ... |
| PARAM_10 |
| |
| |
+------------------------------+
EDIT:
Here's what I have: An Access database made up of 3 tables linked from SQL server. I need to create a new table in this database by querying the 3 source tables. Here are examples of the 3 tables I'm using:
PlanTable1
+------+------+------+------+---------+---------+
| Key1 | Key2 | Key3 | Key4 | PName | MainKey |
+------+------+------+------+---------+---------+
| 53 | 1 | 5 | -1 | Bikes | 536681 |
| 53 | 99 | -1 | -1 | Drinks | 536682 |
| 53 | 66 | 68 | -1 | Balls | 536683 |
+------+------+------+------+---------+---------+
SpTable
+----+---------+---------+
| ID | MainKey | SpName |
+----+---------+---------+
| 10 | 536681 | Wing1 |
| 11 | 536682 | Wing2 |
| 12 | 536683 | Wing3 |
+----+---------+---------+
LocTable
+-------+-------------+--------------+
| LocID | CenterState | CenterCity |
+--- ---+-------------+--------------+
| 10 | IN | Indianapolis |
| 11 | OH | Columbus |
| 12 | IL | Chicago |
+-------+-------------+--------------+
You can see the relationships between the tables. The NewMasterTable I need to create based off of these will look something like this:
NewMasterTable
+-------+--------+-------------+------+--------------+-------+-------+-------+
| LocID | PName | CenterState | Key4 | CenterCity | Wing1 | Wing2 | Wing3 |
+-------+--------+-------------+------+--------------+-------+-------+-------+
| 10 | Bikes | IN | -1 | Indianapolis | 1 | 0 | 0 |
| 11 | Drinks | OH | -1 | Columbus | 0 | 1 | 0 |
| 12 | Balls | IL | -1 | Chicago | 0 | 0 | 1 |
+-------+--------+-------------+------+--------------+-------+-------+-------+
The hard part for me is making this new table dynamic. In the future, rows may be added to the source tables. I need my NewMasterTable to reflect any changes/additions to the source. How do I go about building the NewMasterTable as described? Does this make any sort of sense?
Since an Access table is a necessary requirement, then probably the only way to go about it is to create a set of Update and Insert queries that are executed periodically. There is no built-in "dynamic" feature of Access that will monitor and update the table.
First, create the table. You could either 1) do this manually from scratch by defining the columns and constraints yourself, or 2) create a make-table query (i.e. SELECT... INTO) that generates most of the schema, then add any additional columns, edit necessary details and add appropriate indexes.
Define and save Update and Insert (and optional Delete) queries to keep the table synced. I'm not sharing actual code here, because that goes beyond your primary issue I think and requires specifics that you need to define. Due to some ambiguity with your key values (the field names and sample data still are not sufficient to reveal precise relationships and constraints), it is likely that you'll need multiple Update statements.
In particular, the "Wing" columns will likely require a transform statement.
You may not be able to update all columns appropriately using a single query. I recommend not trying to force such an "artificial" requirement. Multiple queries can actually be easier to understand and maintain.
In the event that you experience "query is not updateable" errors, you may need to define other "temporary" tables with appropriate indexes, into which you do initial inserts from the linked tables, then subsequent queries to update your master table from those.
Finally, and I think this is the key to solving your problem, you need to define some Access form (or other code) that periodically runs your set of "sync" queries. Access forms have a [Timer Interval] property and corresponding Timer event that fires periodically. Add VBA code in the Form_Timer sub that runs all your queries. I would suggest "wrapping" such VBA in a transaction and adding appropriate error handling and error logging, etc.
I'm trying to make a database that will hold a table of objects, and these objects are comprised of objects from a second table. One table is a table of possible sets, and the second is a table of possible components. The table of sets has to include fields for each of its components, but each set has an unknown number of components. How do I make a table with fields (Component 1, Component 2, Component 3, ...) that are dependent on each set to decide how many of the fields it needs?
Is there a way to do this just using the Access interface or will I actually have to get into the code behind it?
I think it would also solve my problem if there were a way to make a field in a column that acted as an ArrayList so if anyone could think of how to do that please let me know.
Assuming that a component can be part of more than one set, what you need here is a many-to-many relationship.
In a database you don't do this with an arbitrary number of columns, you use a junction table.
When you need a tabular representation, you use a Pivot / Crosstab query.
Your data model could look like this:
Sets
+--------+----------+
| Set_ID | Set_Name |
+--------+----------+
| 1 | foo |
| 2 | bar |
+--------+----------+
Components
+--------------+----------------+
| Component_ID | Component_Name |
+--------------+----------------+
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |
| 4 | ddd |
+--------------+----------------+
Junction table
+----------+----------------+
| f_Set_ID | f_Component_ID |
+----------+----------------+
| 1 | 2 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
+----------+----------------+
(f_ as in Foreign Key)