Database normalisation to 3NF

Database normalisation to 3NF - database

My original table which is not normalised looked like this:[1]: https://i.stack.imgur.com/NbKV4.png
Now after following the conditions of each form, I managed to separate the table into 3 forms which look like this[2]:https://i.stack.imgur.com/b414X.png [3]:https://i.stack.imgur.com/haz2Q.png [4]:https://i.stack.imgur.com/CpWXw.png
My aim is to make the database into 3NF, is this the case?
If not, please give me some advice on any amendments needed, thanks.

You went from
(product_id, product_name, product_cat, product_subcat, cost, stock)
to
products_details(product_id, product_name, stock)
products_categories(product_cat, product_subcat)
products_costs(cost_id, cost)
That's good but not enough according to me. And not complete either. Here's a first solution…
product_details(product_id*, psub_id#, product_name, cost, stock)
product_categories(pcat_id*, pcat_name)
product_subcategories(psub_id*, pcat_id#, psub_name)
…where * indicates the primary key, and # the foreign key.
Another possible second solution is
product_details(product_id*, pcat_id#, product_name, cost, stock)
product_categorisation(pcat_id*, pcat_parent#, pcat_name)
This later is good only when not all categories have subcateries (not sure my writing is clear here, sorry.)
Also note that I didn't put costs in a table by their own because I don't think they are such independant. However, I don't know the context you are modeling.

Related

Laravel (and not only) Many-to-many relationship Belongs-To-All

I have two tables. 'Products' and 'Discounts'.
Then I create a joining table 'discount_product' for Many-to-many relationship. So far so good.
Now if I want a discount to belong to ALL of the products I have to make insertions into the joining table for as many products I have. That means that having 10000+ products I'll have to insert 10000+ rows for one discount into the joining table? And that's only for one discount! What if I have 1000?
That's compelling me into returning to the old (wrong) way of doing it when I just have a column 'product_ids' in the 'Discounts' table with something like this '1|2|4|7|23|...' (or '*' for 'belongs to all') and then make a small piece of PHP code to check if discount belongs to all or to some products. I know it's wrong way of doing it. So is there a better way to make this properly?
Structure:
**products**
id
description
price
**discounts**
id
procent
value
**discount_product**
product_id
discount_id

I propose to try to change some business logic.
If the discount is not in the discount_product then this means that it applies to all products.
If the discount is in the discount_product then it means that it works only for a certain product.
If you need to ensure that the discount is not applied to any product, add the field is_active in discounts.
It's just my thoughts.
I believe that sometimes it is useful to denormalize the database because of optimization, and I would do as you suggested with the product_ids field.

How to model a 'history' table from a join table

I have a need to track some history for a table that contains ids from other tables:
I want to track the status of the company_device table such that I can make entries to know when the status of the relationship changed (when a device was assigned to a company, and when it was unassigned, etc). The company_device table would only contain current, existing relationships. So I'd like to do 'something' like this:
But this won't work, because it requires there to be a record in company_device for the FK to be satisfied in the company_device_history table. For example, if I
insert into company_device values (1,1);
insert into company_device_history values (1,1,'Assigned',now());
Then I can't ever remove the record from company_device because of the foreign key constraint. So I've currently settled on this:
so I'm not restricted by the foreign key.
My question is : is there a better way to model this? I could add the status and effective_date to the company_device table and query based on status or effective_date, but that doesn't seem to be a good solution to me. I'd like to know how others might approach this.

When looking exclusively at the problem (that is, when modeling the nature of the business problem at hand), the only thing you need is one single table COMPANY_DEVICE_ASSIGNMENT with four columns C_ID, D_ID, FROM and TO, telling you that "device D_ID was assigned to company C_ID from FROM to TO".
Systems do exist that allow you to work on this basis, however none of them speak SQL (an excellent book with an in-depth treatment of the subject matter of temporal data, I'd even call it the "canonical" work, is "Time and Relational Theory - Googling for it can't miss). If you do this in an SQL system, this "straightforward" approach is not going to get you very far. In that case, your practical options are limited by :
what temporal features are offered by the DBMS you want/can/must use
what features are supported by whatever modeling tool you want/can/must use to generate DDL.
As Neil's comment stated : the most recent version of the SQL standard had "temporal support" as its main novelty, and they are absolutely relevant to your problem, but the products actually offering these features are relatively few and far between.

Database Design: Should I create one table or two for this scenario?

The scenario is Time Cards. Employees clock in and clock out on a TimeCardHeader table, but enter Details in a TimeCardDetail table. However, they can enter at least two different kinds of details... and this is my question. Do I create two tables representing each kind, or one table with a Boolean flag that interprets the meaning of the table?
Here are the fields (this example is small, others have many fields):
Id (PK)
Version
StartTime
EndTime
LaborDetailDescription
LaborType: Can be direct or indirect.
If LaborType is Indirect the remaining fields are these:
IndirectNumber (FK)
If LaborType is Direct the remaining fields are these:
JobNumber (FK)
JobType
DirectType: Can be Production or Setup
If DirectType is Production the remaining fields are these:
GoodQty
ScrapQty
If DirectType is Setup the remaining fields are these:
SetupPercent
So... Do I create one table with all of those fields, but when a type is set some fields are blank (which means code, reporting, queries, etc, will need to be interpreting the database), or do I create two tables DirectLaborDetail and IndirectLaborDetail and store the data neatly into the appropriate table? In this case, even DirectLabor is broken into DirectLaborSetup and DirectLaborProduction.
I am asking this question along a number of dimensions:
Theoretical purity according to database design principles.
Performance Issues.
Difficulty in Query creation (this would also include coding against it).
Any other consideration I may not have listed here.
EDIT: More detail added...
Option 1
/*I intentionally left out the type information*/
CREATE TABLE TimeCardDetail
(
Id,
Version,
TimeCardHeaderId, /*Not depicted here, FK*/
StartTime,
EndTime,
LaborDetailDescription,
LaborType, /*FK*/
IndirectId, /*FK*/
JobId, /*FK*/
DirectType, /*FK*/
GoodQty,
ScrapQty,
SetupPercent
);
Option 2
CREATE TABLE TimeCardDetail
(
Id,
Version,
StartTime,
EndTime,
LaborDetailDescription
);
CREATE TABLE DirectLaborDetail
(
Id,
Version,
TimeCardHeaderId, /*Not depicted here, FK*/
JobId, /*FK*/
DirectType, /*FK*/
GoodQty,
ScrapQty,
SetupPercent,
TimeCardDetailId /*FK*/
);
CREATE TABLE IndirectLaborDetail
(
Id,
Version,
TimeCardHeaderId, /*Not depicted here, FK*/
IndirectId, /*FK*/
TimeCardDetailId, /*FK*/
);
I prefer this as a human being because I can see clearly the business meaning of the data, and yet at the same time, everything is cleanly in its place, no interpretation required. Queries become a bit more interesting because if I want to see all the detail for a specific TimeCardHeader, I need to look at two tables. But is that really a problem with today's computing power?
Option 3
Like Option 2 except we reverse the relationship...
CREATE TABLE TimeCardDetail
(
Id,
Version,
TimeCardHeaderId, /*Not depicted here, FK*/
StartTime,
EndTime,
Description,
LaborType, /*FK*/
FKId, /*would link to the DirectLabordetail or IndirectLaborDetail depending on LaborType*/
);
I don't this option because FKId has meaning depending on LaborType.

I would go for a single table with all the columns and then some of them will be loaded with values or left empty if not required. This solution will make your life easier.
Only if you think that you will query details with different LaborTypes always separately then the two tables solution will be a good choice, but even in that case you have to decide if the gain in performance (two smaller tables are easier to handle for the db) is worth the lost in terms of developments (insert in two tables, query for two tables, etc.)
About your point:
Theoretical purity. Not sure if such thing exists, but both approaches are theoretically valid. The practice will tell you which is the best for your case.
Performance. Two tables will be smaller, faster to query, but the you have to maintain more code. Until you don't have millions/billions of rows I won't worry too much about performance. A single table can give you performance issues, but indexes, partitions, caches will help you anyway.
Difficulty of query creation. My suggestion is a table like this:
Id (PK)
Version
StartTime
EndTime
LaborDetailDescription
LaborType (FK)
IndirectNumber (FK)
JobNumber (FK)
JobType
DirectType (FK)
GoodQty
ScrapQty
SetupPercent
With FKs also for LaborType and DirectType to two small lookup table, so you can store only LaborType_id and DirectType_id in your table. Also for the missing foreign keys, because you don't have IndirectNumber for Indirect LaboryType, just create a dummy record to maintain referential integrity. I think that maintain a similar table should be pretty simple, you will just need a couple of joins for the FKs.
Maybe, but I think for now it's enough to start

Snowflake schema: fact table with foreign key to a sub dimension?

Using the snowflake schema image from wikipedia:
http://en.wikipedia.org/wiki/File:Snowflake-schema-example.png
Would it ever make sense to have a "Brand_Id" foreign key in Fact_Sales as you do in Dim_Product? There is a many-to-one relationship of sales/brands just like sales/products or products/brands, so is there any logical reason not to? You may want to join directly to the Dim_Brand table.
I'm probably not seeing something obvious.

The type of relationship you're looking at is a has-a relationship.
A product has a brand. A sale has a product; it's the thing that was sold. But a sale does not have a brand. Or, a better way of saying this, you cannot sell a brand. (don't read too far into that one...)
So, no, you wouldn't want to add brand to sales.

If you are working in a dimensional model (the Star/Snowflake schema note in your question makes be think you are), then adding the BRAND_ID to the sales fact makes sense from a performance perspective, if the questions that the business is trying to answer are "what were the sales for brand X across all products in this time frame".
It also may be useful if the product dimension is a Type 1 SCD, and a product changes brands. You may want to preserve the prior sales as being of the "old" brand.
Keep in mind you are not doing entity - relationship modeling when you build a star/snowflake reporting schema. Questions of is-a or has-a aren't pertinent to a dimensional model.

I think that would be nice as a way to cache the data... but in all honesty, your probably better off just relying on the links as they are.
The reasoning is that you already have that definition of what that table does, store sales. To add in what brand those products are that the store sold is going to muddy the 'topic' or 'theme' of that table, recording sales of a store.
Now if by some way you had a product that can be sold under different brands (heck if I know how a package can have split personalities...) then yea, it would make sense to a degree, but a more reasonable solution is to give each product it's own SKU code then.

Should I separate redundant data in my database?

I have a database app that stores prices for things in different places. Each price has the following data associated with it:
price
date
product ID
country
price type (factory/wholesale/retail)
The last three items (pID, country, pricetype) can be thought of as one composite item describing the purpose of the price; there is a lot of redundancy in this data. So I'm thinking: separate those out into their own table to save space and simplify queries.
Normal:
Prices (price_id, price, date, product_id, country_id, pricetype_id)
vs:
Prices (price_id, price, date, descriptor_id)
Descriptors (descriptor_id, product_id, country_id, pricetype_id)
Is this worth the added programming effort required? Will it be more or less extensible/maintainable in the long run?

Is this worth the added programming effort required?
Yes
Will it be more or less extensible/maintainable in the long run?
More extensible and easier to maintain.
In general
You should always normalize to at least 3NF.
See this article: http://databases.about.com/od/specificproducts/a/normalization.htm

It depends on the amount of data you are expecting in that table. If you have no performance/storage problems, you don't need separate tables (for performance reasons).
On the other hand, you will get all disadvantages that come with redundancy. You have to check your data for inconsistencies etc.
But: Regardless of the design you choose, there's still time to change the road you're on.