Need help in normalizing a table upto 3NF? - database

I want to normalize the table
So far i have done this:
is it my 3Nf correct?
Order(orderNo,orderDate,customerId)
Customer(customerId,customerName,customerCurrentAddress,customerPermanentAddress)
Product(productId,productName,Qty,categoryId)
Category(categoryId,categoryName)
Sub-Category(subCatId,subCatName,categoryId)
Given table here:
https://drive.google.com/file/d/1F7cQjjxz9rnY6RHaGtZXuwVGFEHBVgNo/view

You have to make a junction table between Order and Product in order to keep track of orders that contain more than one product.
Order(orderNo,orderDate,customerId)
Order_Details(orderNo,productId,Qty)
Customer(customerId,customerName,customerCurrentAddress,customerPermanentAddress)
Product(productId,productName,Qty,categoryId)
Category(categoryId,categoryName)
Sub-Category(subCatId,subCatName,categoryId)
Base on the definition from Wiki: a table is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table. Most of your tables only have 2 columns, so they satisfied this. For Customer(customerId,customerName,customerCurrentAddress,customerPermanentAddress), the candidate key is customerId, and the other columns completely depend on the whole candidate key, then it's OK.
For 3NF, Wiki said: all the attributes in a table are determined only by the candidate keys of that table and not by any non-prime attributes. As you see, your tables are all satisfied.

Related

Normalization and primary keys

In a given table if there is no primary key and even impossible to create a composite primary key then what is the normal form of that table ?
If its zero(0NF) adding a new column and making it primary key will convert this table to 1NF ?
Normal forms apply to relations, which are mathematical structures. Tables can be used to represent relations, but this requires some rules to ensure that the table doesn't contain more or less information than the corresponding relation.
In order for a table to represent a relation:
all rows and columns must be unique
the order they're in mustn't matter
all significant information must be represented as values in cells (i.e. fonts, highlighting, etc, mustn't matter)
every cell must contain one value (doesn't matter how simple or complex that value is)
Also, the relational model cares about candidate keys, not primary keys. A relation can have multiple candidate keys. A primary key is just a selected candidate key that is used by some disciplines (e.g. the entity-relationship model) or by some database management systems (e.g. for physical record ordering).
With all that said, I can now answer your question. If your table follows the rules and specifically the rows are all unique, then there will be at least one candidate key, on all the columns together at worst. If your table's rows aren't unique, then the table doesn't represent a relation and the normal forms don't apply. A surrogate key (like an auto-increment column) can be added to identify rows uniquely, but that isn't necessarily sufficient on its own to make a table represent a relation (1NF).
BTW, I suggest you avoid using "0NF" or "UNF". Non-relational tables don't have a level of normalization, so attaching any kind of "NF" to them is misleading.
As long as you are talking about tables, there is one further case that needs to be covered. It's the case of duplicate rows.
Duplicate rows are rows that are identical in appearance but not in row number. Such a table cannot have a primary key. Sometimes duplicate rows represent the same information. Sometimes not.
For example, consider a table with just four columns: customerid, productid, quentity, price. If a customer orders the same product twice, we'll have two identical rows, representing different inforation. Ths is not good.
Note that the corresonding thing cannot happen with relations. If two tuples in a relation have the same appearance, then they are the same tuple.
As to the other points, they are covered by excellent earlier answers.
before you wan to check for normalization your table must have a Primary key(the primary key is playing lead role in Relational DB,...).
1NF: says that all of your table attributes must be single valued.
Answer of Question 1 : In a given table if there is no primary key and even impossible to create a composite primary key then what is the normal form of that table ?
Answer : If it is no primary key in relation and if it is impossible to create a composite primiary key(According to me your question says ,even if combine all the column of row to make candidate key then also it will not able to identify your relationship uniquly(duplicate rows are there), hence it is not in any normal form.
Answer of Question 2:
If you add some column(having unique values in it) and if all the cell contains only one value then it is in 1NF.
Still if you need some clarification can ask in comment box.
0NF is not any form of normalization. refer C.J. Date or Henry korth(database management system book)
Hope this helps.

Database Normalization Process to 3NF CustomerRental for CustNo, PropNo, OwnerNo, etc

I am trying to normalize the following table. I want to go from the UNF form to 3NF form. I want to know, what do you do at the 1NF stage? It says it's where you remove the repetitive columns or groups (ex. ManagerID, ManagerName). This is considered repetitive because it's leads to the same data.
The Unnormalized data table has the following columns
CustomerRental(CustNo,CustName,PropNo,PAddress,RentStart,RentFinish,Rent,OwnerNo,OName)
There are no repeating columns/fields and each cell has a single value, but there is not a primary key. The functional dependencies I see in the table are:
{CustNo}->{Cname}
{PropNo}->{Paddress,RentStart,RentFinish,Rent,OwnerNo,Oname}
{CustNo,PropNo}->
{Paddress,RentStart,RentFinish,Rent,OwnerNo,OName,CustName}
{OwnerNo,PropNo}->{Rent,Paddress,Oname,RentInfo}
The primary key I picked was a composite key, CustNo + PropNo. Since it has a primary key, the table is in 1NF form, correct? This is what I thought, but the answer excludes CustNo and CustName from the table. They are in their own table.
From the above, I normalized it 2NF. At this stage, you are supposed to ensure that all non-prime attributes are fully dependent on the primary key. This is not the case. These are the functional dependencies in the table:
{OwnerNo}->{Oname}
{CustNo}->{CustName}
{PropNo}->{Paddress,Rent,OwnerNo,Oname}
I moved these values out of the table to create three new tables in 2NF form:
Customers(CustNo(PK),CustName)
Property(PropNo(PK),Paddress,City,Rent,OwnerNo,OwnerName)
Rentals(RentalNo(PK),CustNo,OwnerNo,PropNo,RentStart,RentFinish)
Now the main table, Rentals, is in 2NF form. It has a primary key, RentalNo, and each of the non-prime attributes depends on it.
I think that there is a transitive dependency on it. You can find OwnerNo through the PropNo. So, to make it comply with 3NF rules, you have to move the OwnerNo to its own table to create these tables:
Customers(CustNo,CustName)
Property(PropNo,Paddress,City,Rent)
Owners(OwnerNo,OwnerName)
Rentals(RentalNo,CustNo,PropNo,RentStart,RentFinish)
Is this correct? I read that at the 1NF stage, you are supposed to remove repetitive columns (ex. OwnerNo,OwnerName). Is this true? Why or why not?
The picture showing my tables is here:
Normalized Tables
We don't normalize to a NF (normal form) by going through lower NFs between it and 1NF. We use a proven algorithm for the NF we want. Find one in a published academic textbook. (If that doesn't describe the reference(s) you were told to use, find one that it does & quote it.)
Pay close attention to the terms and steps. Details matter. Eg you will need to know all the FDs (functional dependencies) that hold, not just some of them. Eg whenever some FDs hold, all the ones generated by Armstrong's axioms hold. Eg PKs (primary keys) are irrelevant, CKs (candidate keys) matter. Eg every table has a CK. Eg normalization to higher NFs does not change column names. So already your question does not reflect a correct process.
You really need to read & quote the reference(s) you were told to use in order to get to "1NF", because "1NF" is in the eye of the beholder. Normalization to higher NFs works on any relation.

Normalize table to 3rd normal form

This questions is obviously a homework question. I can't understand my professor and have no idea what he said during the election. I need to make step by step instructions to normalize the following table first into 1NF, then 2NF, then 3NF.
I appreciate any help and instruction.
Okay, I hope I remember all of them correctly, let's start...
Rules
To make them very short (and not very precise, just to give you a first idea of what it's all about):
NF1: A table cell must not contain more than one value.
NF2: NF1, plus all non-primary-key columns must depend on all primary key columns.
NF3: NF2, plus non-primary key columns may not depend on each other.
Instructions
NF1: find table cells containing more than one value, put those into separate columns.
NF2: find columns depending on less then all primary key columns, put them into another table which has only those primary key columns they really depend on.
NF3: find columns which depend on other non-primary-key columns, in addition to depending on the primary key. Put the dependent columns into another table.
Examples
NF1
a column "state" has values like "WA, Washington". NF1 is violated, because that's two values, abbreviation and name.
Solution: To fulfill NF1, create two columns, STATE_ABBREVIATION and STATE_NAME.
NF2
Imagine you've got a table with these 4 columns, expressing international names of car models:
COUNTRY_ID (numeric, primary key)
CAR_MODEL_ID (numeric, primary key)
COUNTRY_NAME (varchar)
CAR_MODEL_NAME (varchar)
The table may have these two data rows:
Row 1: COUNTRY_ID=1, CAR_MODEL_ID=5, COUNTRY_NAME=USA, CAR_MODEL_NAME=Fox
Row 2: COUNTRY_ID=2, CAR_MODEL_ID=5, COUNTRY_NAME=Germany, CAR_MODEL_NAME=Polo
That says, model "Fox" is called "Fox" in USA, but the same car model is called "Polo" in Germany (don't remember if that's actually true).
NF2 is violated, because the country name does not depend on both car model ID and country ID, but only on the country ID.
Solution: To fulfill NF2, move COUNTRY_NAME into a separate table "COUNTRY" with columns COUNTRY_ID (primary key) and COUNTRY_NAME. To get a result set including the country name, you'll need to connect the two tables using a JOIN.
NF3
Say you've got a table with these columns, expressing climatic conditions of states:
STATE_ID (varchar, primary key)
CLIME_ID (foreign key, ID of a climate zone like "desert", "rainforest", etc.)
IS_MOSTLY_DRY (bool)
NF3 is violated, because IS_MOSTLY_DRY only depends on the CLIME_ID (let's at least assume that), but not on the STATE_ID (primary key).
Solution: to fulfill NF3, put the column MOSTLY_DRY into the climate zone table.
Here are some thoughts regarding the actual table given in the exercise:
I apply the above mentioned NF rules without to challenge the primary key columns. But they actually don't make sense, as we will see later.
NF1 isn't violated, each cell holds just one value.
NF2 is violated by EMP_NM and all the phone numbers, because all of these columns don't depend on the full primary key. They all depend on EMP_ID (PK), but not on DEPT_CD (PK). I assume that phone numbers stay the same when an employee moves to another department.
NF2 is also violated by DEPT_NM, because DEPT_NM does not depend on the full primary key. It depends on DEPT_CD, but not on EMP_ID.
NF2 is also violated by all the skill columns, because they are not department- but only employee-specific.
NF3 is violated by SKILL_NM, because the skill name only depends on the skill code, which is not even part of the composite primary key.
SKILL_YRS violates NF3, because it depends on a primary key member (EMP_ID) and a non-primary key member (SKILL_CD). So it is partly dependent on a non-primary-key attribute.
So if you remove all columns which violate NF2 or NF3, only the primary key remains (EMP_ID and DEPT_CD). That remaining part violates the given business rules: this structure would allow an employee to work in multiple departments at the same time.
Let's review it from a distance. Your data model is about employees, departments, skills and the relationships between these entities. If you normalize that, you'll end up with one table for the employees (containing DEPT_CD as a foreign key), one for the departments, one for the skills, and another one for the relationship between employees and skills, holding the "skill years" for each tuple of EMP_ID and SKILL_CD (my teacher would have called the latter an "associative entity").
Looking at the first two rows in your table, and looking at which columns are tagged "PK" in that table, and assuming that "PK" stands for "Primary Key", and looking at the values that appear for those two columns in those two rows, I would recommend your professor to get the hell out of database teaching and not come back until he got himself educated properly on the subject.
This exercise cannot be taken seriously because the problem statement itself contains hopelessly contradictory information.
(Observe that as a consequence, there simply is not any such thing as a "good" or "right" answer to this question !!!)
Another oversimplified answer coming up.
In a 3NF relational table, every nonkey value is determined by the key, the whole key, and nothing but the key (so help me Codd ;)).
1NF: The key. This means that if you specify the key value, and a named column, there will be at most one value at the intersection of the row and the column. A multivalue, like a series of values separated by commas, is disallowed, because you can't get directly to the value with just a key and acolumn name.
2NF: The whole key. If a column that is not part of the key is determined by a proper subset of the key columns, then 2NF is being violated.
3NF: And nothing but the key. If a column is determined by some set of non key columns, then 3NF is being violated.
3NF satisfies only if it is in 2nd normal form and doesnot have any transitive dependency and all the non-key attributes should depend on the primary key.
Transitive dependency:
R=(A,B,C).
A->B AND B->C THEN A->C

What is the 2nd normal form?

My informal representation of these are:
1NF: The table is divided so that no item will appear more than once.
2NF: ?
3NF: Values can only be determined by the primary key.
I cannot make sense of it from the excerpts I found online or in my book. How do I differentiate between 1NF and 2NF?
A relation schema is in 2NF if every non-prime attribute is fully functionally dependent on every key.
Wikipedia says:
A table is in 2NF if and only if, it is in 1NF and every non-prime
attribute of the table is either dependent on the whole of a candidate
key, or on another non prime attribute.
To explain the concept, let's use a table for a inventory of toys adapted from Head First SQL:
TOY_ID| STORE_ID| INVENTORY| STORE_ADDRESS
The primary key is composed by the attributes TOY_ID and STORE_ID. If we analize the non-prime attribute INVENTORY we see that in depends on TOY_ID and STORE_ID at the same time. That's cool.
But on the other side, the non-prime attribue STORE_ADDRESS only depends on the STORE_ID attribute (i.e it's not related to the primary key attribute TOY_ID). That's a clear violation of 2NF, so to complain to the 2NF our schema must be like this:
An Inventory table: TOY_ID| STORE_ID| INVENTORY
and an Store table: STORE_ID| STORE_ADDRESS
Some columns are part of a key (primary or secondary). We'll call these prime attributes.
For second normal form we'll consider the non-prime attributes and see if they should be moved to another table. We might find that some attributes don't require the full key for us to be able to identify what value they hold for at least one candidate key. That is, there is a candidate key where we could still determine the value of that attribute given the candidate key even if the values in one column of that candidate key were erased.

database----database normalization

someone told me the following table isn't fit for the second database normalization. but i don't know why? i am a newbie of database design, i have read some tutorials of the 3NF. but to the 2NF and 3NF, i can't understand them well. expect someone can explain it for me. thank you,
+------------+-----------+-------------------+
pk pk row
+------------+-----------+-------------------+
A B C
+------------+-----------+-------------------+
A D C
+------------+-----------+-------------------+
A E C
+------------+-----------+-------------------+
Your question cannot be answered properly unless you state what dependencies are supposed to be satisfied here. You appear to have two attributes with the same name (pk), in which case this table doesn't even satisfy 1NF because it doesn't qualify as a relation.
About your example: that table doesn't fit the second database normalization (with your sample data, I presume that the C depends only on A). The second normalization form requires that:
No non-prime attribute in the table is functionally dependent on a
proper subset of a candidate key
(Wikipedia)
So the C depends on "A", which is a subset of your primary key. Your primary key is a special superkey. (dportas point out the fact that it can't be called candidate key, since it's not minimal).
Let's say more about the second normalization form. Transform your example a little for easy understanding, presume that there's a table CUSTOMER(customer_id, customer_name, address). A super key is a sub-set of your properties which uniquely determine a tube. In this case, there are 3 super key: (customer_id) ; (customer_id, customer_name) ; (customer_id, customer_name, address). (Customer name may be the same for 2 people)
In your case, you have determined (customer_id, customer_name) be the Primary Key. It violated the second form rules; since it only needs customer_id to determine uniquely a tube in your database. For the sake of theory accuration, the problem here raised from the choice of primary key(it's not a candidate key), though the same argument can be applied to show the redundance. You may find some useful example here.
The third normal form states that:
Every non-prime attribute is
non-transitively dependent on every
candidate key in the table
Let give it an example. Changing the previous table to fit the second form, now we have the table CUSTOMER(customer_id,customer_name, city, postal_code), with customer_id is primary key.
Clearly enough, "postal_code" depends on the "city" of customer. This is where it violated the third rule: postal_code depends on city, city depends on customer_id. That means postal_code transitively depends on customer_id, so that table doesn't fit the third normal form.
To correct it, we need to eliminate the transitive dependence. So we split the table into 2 table: CUSTOMER(customer_id, customer_name, city) and CITY(city, postal_code). This prevent the redundance of having too many tubes with the same city & postal_code.

Resources