I've been given my assignment for college and one of the questions is to describe the importance of the 3NF.
I understand normalisation is to eliminate data redundancy.
Any help or resources would be of great help.
Normalization is important part of the database design, it is defined in 70's by E F Codd. As you already know, it reduces the duplication of data in a table(relation) but it keeps the referential integrity - the information is the same but presumably more optimized. It has a cost though - more tables, related with foreign key relationship. This usually adds abstraction on the database.
Since you need specifically for 3NF it should be ensured:
The relation R (table) is in second normal form (2NF)
Every non-prime attribute of R is non-transitively dependent on every superkey of R.
The table should be normalized first in First Normal Form (1NF) and Second Normal Form (2NF), so after that eventually in 3NF.
Also the row in the table should depend only "Nothing but the key". If contents of a group of field applies to more than one primary key, it should be put in another table.
For example, if you have employee table in a database that has the hometown of the employee you may have several duplicate rows with hometown New York. This hometown can be separated to another table hometown with primary key and column like name, related to the employee (through EmployeeHometown table), where only once would be listed New York (no duplicates).
It will be much easier, to check 5 hometowns on separate table, than to go through 100 employees, get their hometows.
Related
I am trying to normalize the following table. I want to go from the UNF form to 3NF form. I want to know, what do you do at the 1NF stage? It says it's where you remove the repetitive columns or groups (ex. ManagerID, ManagerName). This is considered repetitive because it's leads to the same data.
The Unnormalized data table has the following columns
CustomerRental(CustNo,CustName,PropNo,PAddress,RentStart,RentFinish,Rent,OwnerNo,OName)
There are no repeating columns/fields and each cell has a single value, but there is not a primary key. The functional dependencies I see in the table are:
{CustNo}->{Cname}
{PropNo}->{Paddress,RentStart,RentFinish,Rent,OwnerNo,Oname}
{CustNo,PropNo}->
{Paddress,RentStart,RentFinish,Rent,OwnerNo,OName,CustName}
{OwnerNo,PropNo}->{Rent,Paddress,Oname,RentInfo}
The primary key I picked was a composite key, CustNo + PropNo. Since it has a primary key, the table is in 1NF form, correct? This is what I thought, but the answer excludes CustNo and CustName from the table. They are in their own table.
From the above, I normalized it 2NF. At this stage, you are supposed to ensure that all non-prime attributes are fully dependent on the primary key. This is not the case. These are the functional dependencies in the table:
{OwnerNo}->{Oname}
{CustNo}->{CustName}
{PropNo}->{Paddress,Rent,OwnerNo,Oname}
I moved these values out of the table to create three new tables in 2NF form:
Customers(CustNo(PK),CustName)
Property(PropNo(PK),Paddress,City,Rent,OwnerNo,OwnerName)
Rentals(RentalNo(PK),CustNo,OwnerNo,PropNo,RentStart,RentFinish)
Now the main table, Rentals, is in 2NF form. It has a primary key, RentalNo, and each of the non-prime attributes depends on it.
I think that there is a transitive dependency on it. You can find OwnerNo through the PropNo. So, to make it comply with 3NF rules, you have to move the OwnerNo to its own table to create these tables:
Customers(CustNo,CustName)
Property(PropNo,Paddress,City,Rent)
Owners(OwnerNo,OwnerName)
Rentals(RentalNo,CustNo,PropNo,RentStart,RentFinish)
Is this correct? I read that at the 1NF stage, you are supposed to remove repetitive columns (ex. OwnerNo,OwnerName). Is this true? Why or why not?
The picture showing my tables is here:
Normalized Tables
We don't normalize to a NF (normal form) by going through lower NFs between it and 1NF. We use a proven algorithm for the NF we want. Find one in a published academic textbook. (If that doesn't describe the reference(s) you were told to use, find one that it does & quote it.)
Pay close attention to the terms and steps. Details matter. Eg you will need to know all the FDs (functional dependencies) that hold, not just some of them. Eg whenever some FDs hold, all the ones generated by Armstrong's axioms hold. Eg PKs (primary keys) are irrelevant, CKs (candidate keys) matter. Eg every table has a CK. Eg normalization to higher NFs does not change column names. So already your question does not reflect a correct process.
You really need to read & quote the reference(s) you were told to use in order to get to "1NF", because "1NF" is in the eye of the beholder. Normalization to higher NFs works on any relation.
I'd like to make those tables match the Inmons 3NF approach (im working on northwind database):
Before
I noticed Address thing keeps repeating and its not kinda even atomic so i decided to put another table in diagram called "Address" like this:
After
Is this valid approach?
As table Address stores all the addresses anyway can i share them for all other tables?
Thanks
Your solution is valid and is definitely more normalized.
However, it is not in 3NF yet. Strictly speaking, for a table to be in 3NF, you must not have any non-key inter-dependencies. In your example, there are such dependencies, between city and country for example. So every time someone enters Paris, they also need to enter France. This could lead to anomalies if someone accidentally enters Paris, Germany. For 3NF you would have to create an additional City table, which stores cities and their respective countries. City would be the key, country a non-key attribute. Address table would have a foreign key to city. I omitted Postcode and Region for brevity, but they would also need to be included in normalization. So, to make this 3NF you'd need a few more entities.
This complexity is why Kimball's star schema is more popular than Inmon's 3NF approach...
how normalized is this table?
Example SqlFiddle
So I know that the topic and definition of normalization itself has been pretty well discussed but I was hoping I could get some clarification on my understanding of normalization. An example is a diagram I drew out in Access real quick, from what I think, I believe that these relationships and tables themselves all fit in the 3NF criteria. There is a Projects table with the following fields ProjNumber(PK), ProjName, and ProjDesc. Then there is an Assignments table with a compound key consistent of EmpID/ProjNumber, with the fields HourlyBillingRate, NumOfHours, and TeamNum. And lastly is the Teams table, which consists of the fields TeamNum(PK), TeamName, ProjNumber.
The ProjNumber from Assignments and Teams are both foreign keys that relate back to the Projects table, and the TeamNum field from Assignments is a foreign key relating back to the Teams table primary key. I'm not too sure if it's necessary to directly relate back to the Teams table, if I have the ProjNumber foreign key because that project would have an associated TeamNum.
The context of these tables is that there is a project that has to be done, a team associated with carrying out that team, and then the employees that are on that team which are paid an hourly billing rate for that proj they are working under.
The reason I use a compound key, is I wanted to answer the question of "What is the employee works on multiple projects?", so I couldn't make EmpID the sole primary key, thus I chose to make it a compound key because even if the employee works on multiple projects, the combination of the two will always be unique. I believe that each field is necessary and relevant fully with their respective primary keys.
Thoughts? Does it in fact fulfill 3NF criteria?
It depends. Your diagram and discussion appear to assume that the primary key is the only candidate key in each of the tables. That appears not to be the case.
In the Assignments table, it looks as though EmpID and TeamNumber is another candidate key, provided that TeamNumber may not be NULL.
If we look at this table with EmpId, TeamNumber as the key, then it is not in 2NF. ProjNumber is determined by TeamNumber, which is not the whole key.
So now the answer to your question turns on whether FDs are analyzed with respect to all candidate keys or just the declared prmary key. I have seen tutorials on on normalization that go both ways. I follow the one that considers all candidate keys, so the table is not in 2NF.
Unless I've misconstrued the FDs in your case, or Assigment.TeamNumber can be NULL.
HOWEVER, your SQL Fiddle presentation is different. Now, if there are several teams on one project, and an employee is assigned to one project for a few hours, there isn't any way to tell what team the employee was on. The FDs are not the same in the SQL Fiddle example and in the implicatins I take from your diagram.
I understand what cardinality is, so please don't explain that ;-)
I would like to know, what the purpose of doing cardinality is in data modeling, and why i should care.
Example: In an ER model you make relations and ad the cardinality to the relations.
When am i going to use the cardinality further in the development process? Why should i care about the cardinality?
How, when and where do i use the cardinalities after i finish an ER model for example.
Thanks :-)
Cardinalities tell you something important about table design. A 1:m relationship requires a foreign key column in the child table pointing back to the parent primary key column. A many-to-many relationship means a JOIN table with foreign keys pointing back to the two participants.
How, when and where do i use the cardinalities after i finish an ER model for example.
When physically creating the database, the direction, NULL-ability and number of FKs depends on the cardinalities on both endpoints of the relationship in the ER diagram. It may even "add" or "remove" some tables and keys.
For example:
A "1:N" relationship is represented as a NOT NULL FK from the "N" table to "1" table. You cannot do it in the opposite direction and retain the same meaning.
A "0..1:N" relationship is represented as a NULL-able FK from "N" to "0..1" table.
A "1:1" relationship is represented by two NOT NULL FKs (that are also keys) forming a circular reference1 or by merging two entities into a single physical table.
A "0..1:1" relationship is represented by two FKs, one of which is NULL-able (also under keys).
A "0..1:0..1" relationship is represented by two FKs, both NULL-able and under keys, or by a junction table with specially crafted keys.
An "M:N" relationship requires an additional (so called "junction" or "link") table. A key of that table is a combination of migrated keys from child tables.
Not all cardinalities can be (easily) represented declaratively in the physical database, but fortunately those that can tend to be most useful...
1 Which presents a chicken-and-egg problem when inserting new data, which is typically resolved by deferring constraint checking to the end of the transaction.
Cardinality is a vital piece of information of a relation between two entites. You need them for later models when the actual table architecture is being modelled. Without knowing the relationship cardinality, one cannot model the tables and key restriction between them.
For example, a car must have exactly 4 wheels and those wheels must be attached to exactly one car. Without cardinality, you could have a car with 3, 1, 0, 12, etc... wheels, which moreover could be shared among other cars. Of course, depending on the context, this can make sense, but it usually doesn't.
A data model is a set of constraints; without constraints, anything would be possible. Cardinality is a (special kind of) constraint. In most cultures, a marriage is a relation between exactly two persons. (In some cultures these persons must have different gender.)
The problem with data modelling is that you have to specify the constraints you wish to impose on the data. Some constraints (unique, foreign key) are more important, and less dependent on the problem domain as others ("salary < 100000"). In most cases Cardinality will be somewhere in between crucial and bogus.
If you are creating the data layer of an application and you decided to use an ORM, maybe it's entity framework.
There's a point when you need to create your models and your model maps. At that point you would be able to pull out your ERD, review the cardinality you put on your diagram and create the correct relationships so your data layer shape matched your database shape.
someone told me the following table isn't fit for the second database normalization. but i don't know why? i am a newbie of database design, i have read some tutorials of the 3NF. but to the 2NF and 3NF, i can't understand them well. expect someone can explain it for me. thank you,
+------------+-----------+-------------------+
pk pk row
+------------+-----------+-------------------+
A B C
+------------+-----------+-------------------+
A D C
+------------+-----------+-------------------+
A E C
+------------+-----------+-------------------+
Your question cannot be answered properly unless you state what dependencies are supposed to be satisfied here. You appear to have two attributes with the same name (pk), in which case this table doesn't even satisfy 1NF because it doesn't qualify as a relation.
About your example: that table doesn't fit the second database normalization (with your sample data, I presume that the C depends only on A). The second normalization form requires that:
No non-prime attribute in the table is functionally dependent on a
proper subset of a candidate key
(Wikipedia)
So the C depends on "A", which is a subset of your primary key. Your primary key is a special superkey. (dportas point out the fact that it can't be called candidate key, since it's not minimal).
Let's say more about the second normalization form. Transform your example a little for easy understanding, presume that there's a table CUSTOMER(customer_id, customer_name, address). A super key is a sub-set of your properties which uniquely determine a tube. In this case, there are 3 super key: (customer_id) ; (customer_id, customer_name) ; (customer_id, customer_name, address). (Customer name may be the same for 2 people)
In your case, you have determined (customer_id, customer_name) be the Primary Key. It violated the second form rules; since it only needs customer_id to determine uniquely a tube in your database. For the sake of theory accuration, the problem here raised from the choice of primary key(it's not a candidate key), though the same argument can be applied to show the redundance. You may find some useful example here.
The third normal form states that:
Every non-prime attribute is
non-transitively dependent on every
candidate key in the table
Let give it an example. Changing the previous table to fit the second form, now we have the table CUSTOMER(customer_id,customer_name, city, postal_code), with customer_id is primary key.
Clearly enough, "postal_code" depends on the "city" of customer. This is where it violated the third rule: postal_code depends on city, city depends on customer_id. That means postal_code transitively depends on customer_id, so that table doesn't fit the third normal form.
To correct it, we need to eliminate the transitive dependence. So we split the table into 2 table: CUSTOMER(customer_id, customer_name, city) and CITY(city, postal_code). This prevent the redundance of having too many tubes with the same city & postal_code.