someone told me the following table isn't fit for the second database normalization. but i don't know why? i am a newbie of database design, i have read some tutorials of the 3NF. but to the 2NF and 3NF, i can't understand them well. expect someone can explain it for me. thank you,
+------------+-----------+-------------------+
pk pk row
+------------+-----------+-------------------+
A B C
+------------+-----------+-------------------+
A D C
+------------+-----------+-------------------+
A E C
+------------+-----------+-------------------+
Your question cannot be answered properly unless you state what dependencies are supposed to be satisfied here. You appear to have two attributes with the same name (pk), in which case this table doesn't even satisfy 1NF because it doesn't qualify as a relation.
About your example: that table doesn't fit the second database normalization (with your sample data, I presume that the C depends only on A). The second normalization form requires that:
No non-prime attribute in the table is functionally dependent on a
proper subset of a candidate key
(Wikipedia)
So the C depends on "A", which is a subset of your primary key. Your primary key is a special superkey. (dportas point out the fact that it can't be called candidate key, since it's not minimal).
Let's say more about the second normalization form. Transform your example a little for easy understanding, presume that there's a table CUSTOMER(customer_id, customer_name, address). A super key is a sub-set of your properties which uniquely determine a tube. In this case, there are 3 super key: (customer_id) ; (customer_id, customer_name) ; (customer_id, customer_name, address). (Customer name may be the same for 2 people)
In your case, you have determined (customer_id, customer_name) be the Primary Key. It violated the second form rules; since it only needs customer_id to determine uniquely a tube in your database. For the sake of theory accuration, the problem here raised from the choice of primary key(it's not a candidate key), though the same argument can be applied to show the redundance. You may find some useful example here.
The third normal form states that:
Every non-prime attribute is
non-transitively dependent on every
candidate key in the table
Let give it an example. Changing the previous table to fit the second form, now we have the table CUSTOMER(customer_id,customer_name, city, postal_code), with customer_id is primary key.
Clearly enough, "postal_code" depends on the "city" of customer. This is where it violated the third rule: postal_code depends on city, city depends on customer_id. That means postal_code transitively depends on customer_id, so that table doesn't fit the third normal form.
To correct it, we need to eliminate the transitive dependence. So we split the table into 2 table: CUSTOMER(customer_id, customer_name, city) and CITY(city, postal_code). This prevent the redundance of having too many tubes with the same city & postal_code.
Related
I am trying to normalize the following table. I want to go from the UNF form to 3NF form. I want to know, what do you do at the 1NF stage? It says it's where you remove the repetitive columns or groups (ex. ManagerID, ManagerName). This is considered repetitive because it's leads to the same data.
The Unnormalized data table has the following columns
CustomerRental(CustNo,CustName,PropNo,PAddress,RentStart,RentFinish,Rent,OwnerNo,OName)
There are no repeating columns/fields and each cell has a single value, but there is not a primary key. The functional dependencies I see in the table are:
{CustNo}->{Cname}
{PropNo}->{Paddress,RentStart,RentFinish,Rent,OwnerNo,Oname}
{CustNo,PropNo}->
{Paddress,RentStart,RentFinish,Rent,OwnerNo,OName,CustName}
{OwnerNo,PropNo}->{Rent,Paddress,Oname,RentInfo}
The primary key I picked was a composite key, CustNo + PropNo. Since it has a primary key, the table is in 1NF form, correct? This is what I thought, but the answer excludes CustNo and CustName from the table. They are in their own table.
From the above, I normalized it 2NF. At this stage, you are supposed to ensure that all non-prime attributes are fully dependent on the primary key. This is not the case. These are the functional dependencies in the table:
{OwnerNo}->{Oname}
{CustNo}->{CustName}
{PropNo}->{Paddress,Rent,OwnerNo,Oname}
I moved these values out of the table to create three new tables in 2NF form:
Customers(CustNo(PK),CustName)
Property(PropNo(PK),Paddress,City,Rent,OwnerNo,OwnerName)
Rentals(RentalNo(PK),CustNo,OwnerNo,PropNo,RentStart,RentFinish)
Now the main table, Rentals, is in 2NF form. It has a primary key, RentalNo, and each of the non-prime attributes depends on it.
I think that there is a transitive dependency on it. You can find OwnerNo through the PropNo. So, to make it comply with 3NF rules, you have to move the OwnerNo to its own table to create these tables:
Customers(CustNo,CustName)
Property(PropNo,Paddress,City,Rent)
Owners(OwnerNo,OwnerName)
Rentals(RentalNo,CustNo,PropNo,RentStart,RentFinish)
Is this correct? I read that at the 1NF stage, you are supposed to remove repetitive columns (ex. OwnerNo,OwnerName). Is this true? Why or why not?
The picture showing my tables is here:
Normalized Tables
We don't normalize to a NF (normal form) by going through lower NFs between it and 1NF. We use a proven algorithm for the NF we want. Find one in a published academic textbook. (If that doesn't describe the reference(s) you were told to use, find one that it does & quote it.)
Pay close attention to the terms and steps. Details matter. Eg you will need to know all the FDs (functional dependencies) that hold, not just some of them. Eg whenever some FDs hold, all the ones generated by Armstrong's axioms hold. Eg PKs (primary keys) are irrelevant, CKs (candidate keys) matter. Eg every table has a CK. Eg normalization to higher NFs does not change column names. So already your question does not reflect a correct process.
You really need to read & quote the reference(s) you were told to use in order to get to "1NF", because "1NF" is in the eye of the beholder. Normalization to higher NFs works on any relation.
how normalized is this table?
Example SqlFiddle
So I know that the topic and definition of normalization itself has been pretty well discussed but I was hoping I could get some clarification on my understanding of normalization. An example is a diagram I drew out in Access real quick, from what I think, I believe that these relationships and tables themselves all fit in the 3NF criteria. There is a Projects table with the following fields ProjNumber(PK), ProjName, and ProjDesc. Then there is an Assignments table with a compound key consistent of EmpID/ProjNumber, with the fields HourlyBillingRate, NumOfHours, and TeamNum. And lastly is the Teams table, which consists of the fields TeamNum(PK), TeamName, ProjNumber.
The ProjNumber from Assignments and Teams are both foreign keys that relate back to the Projects table, and the TeamNum field from Assignments is a foreign key relating back to the Teams table primary key. I'm not too sure if it's necessary to directly relate back to the Teams table, if I have the ProjNumber foreign key because that project would have an associated TeamNum.
The context of these tables is that there is a project that has to be done, a team associated with carrying out that team, and then the employees that are on that team which are paid an hourly billing rate for that proj they are working under.
The reason I use a compound key, is I wanted to answer the question of "What is the employee works on multiple projects?", so I couldn't make EmpID the sole primary key, thus I chose to make it a compound key because even if the employee works on multiple projects, the combination of the two will always be unique. I believe that each field is necessary and relevant fully with their respective primary keys.
Thoughts? Does it in fact fulfill 3NF criteria?
It depends. Your diagram and discussion appear to assume that the primary key is the only candidate key in each of the tables. That appears not to be the case.
In the Assignments table, it looks as though EmpID and TeamNumber is another candidate key, provided that TeamNumber may not be NULL.
If we look at this table with EmpId, TeamNumber as the key, then it is not in 2NF. ProjNumber is determined by TeamNumber, which is not the whole key.
So now the answer to your question turns on whether FDs are analyzed with respect to all candidate keys or just the declared prmary key. I have seen tutorials on on normalization that go both ways. I follow the one that considers all candidate keys, so the table is not in 2NF.
Unless I've misconstrued the FDs in your case, or Assigment.TeamNumber can be NULL.
HOWEVER, your SQL Fiddle presentation is different. Now, if there are several teams on one project, and an employee is assigned to one project for a few hours, there isn't any way to tell what team the employee was on. The FDs are not the same in the SQL Fiddle example and in the implicatins I take from your diagram.
I have been learning Normalization from "Fundamentals of Database Systems by Elmasri and Navathe (6th edition)" and I am having trouble understanding the following part about 2NF.
The following image is an example given under 2NF in the textbook
The candidate key is {SSN,Pnumber}
The dependencies are
SSN,Pnumber -> hours, SSN -> ename, pnumber->pname, pnumber -> plocation
The formal Definition:
A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R.
for example in the above picture:
if suppose, I define an additional functional dependency SSN -> hours, then taking the two functional dependencies,
{SSN,Pnumber} -> hours and SSN -> hours
the relation wont be in 2NF, because now SSN ->hours is now a partial functional dependency as SSN is a proper subset for the given candidate key {SSN,Pnumber}.
Looking at the relation and its general definition on 2NF, i presume that the above relation is in 2NF
As far as my understanding goes and how i understand what 2NF is,
A relation is in 2NF if one cannot find a proper subset (prime attributes)
of the on the left hand side (candidate key) of a functional dependency
which defines the NPA(non prime attribute).
My first question is, Why is the above relation not in 2NF? (The textbook has considered the above relation as not in 2NF)
There is, however, a informal ways(steps as per the textbook where a normal person not knowing normalization can take to reduce redundancy) being defined at the beginning of this chapter which are:
■ Making sure that the semantics of the attributes is clear in the schema
■ Reducing the redundant information in tuples
■ Reducing the NULL values in tuples
■ Disallowing the possibility of generating spurious tuples
The guideline mentioned is as follows:
My second question is, If the above steps described are taken into account, and consider why the following relation is not in 2NF, do you assume the following functional dependencies, which are,
{SSN,Pnumber} -> Pname
{SSN,Pnumber} -> Plocation
{SSN,Pnumber} -> Ename
making the decomposition of the relation correct? If the functional dependencies assumed are incorrect, then what are the factors leading for the relation to not satisfy 2NF condition?
When looked at a general point of view ... because the table contains more than one primary attributes and the information stored is concerned with both employee and project information, one can point out that those need to be separated, as Pnumber is a primary attribute of the composite key, the redundancy can somehow be intuitively guessed. This is because the semantics of the attributes are known to us.
what if the attributes were replaced with A,B,C,D,E,F
My Third question is, Are functional dependencies pre-determined based on "functionalities of database and a database designer having domain knowledge of the attributes" ?
Because based on the data and relation state at a given point the functional dependencies can change which was valid in one state can go invalid at a certain state.In general this can be said for any non primary attribute determining non primary attribute.
The formal definition :
A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the
possible tuples that can form a relation state r of R. The constraint is
that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must
also have t1[Y] = t2[Y].
So won't predefining a functional dependency be wrong as on cannot generalize relation state at any given point?
Pardon me if my basic understanding of things is flawed to begin with.
Why is the above relation not in 2NF?
Your original/first/informal "definition" of 2NF is garbled and not helpful. Even the quote from the textbook is wrong since 2NF is not defined in terms of "the PK (primary key)" but rather in terms of all the CKs (candidate keys). (Their definition makes sense if there is only one CK.)
A table is in 2NF when there are no partial dependencies of non-prime attributes on CKs. Ie when no determinant of a non-prime attribute is a proper/smaller subset of a CK. Ie when every non-prime attribute is fully functionally dependent on every CK.
Here the only CK is {Ssn, Pnumber}. But there are FDs (functional dependencies) out of {Ssn} and {Pnumber}, both of which are smaller subsets of the CK. So the original table is not in 2NF.
If the above statement is taken into account, do you assume the following functional dependencies
so won't the same process of the decomposition shown based on the informal way alone be difficult each time such a case arrives?
A table holds the rows that make some predicate (statement template parameterized by column names) into a true proposition (statement). Given the business rules, only certain business situations can arise. Then given the table predicates, which give table values from a business situation, only certain database values can arise. That leads to certain tables having certain FDs.
However, given some FDs that hold, we can formally use Armstrong's axioms to get all other FDs that must also hold. So we can use both informal and formal ways to find which FDs hold and don't hold.
There are also shorthand rules that follow from the axioms. Eg if a set of attributes has a different subrow value in each tuple then so does every superset of it. Eg if a FD holds then every superset of its determinant determines every subset of its determined set. Eg every superset of a superkey is a superkey & no proper subset of a CK is a CK. There are also algorithms.
Are functional dependencies pre-determined based on "functionalities of database and a database designer having domain knowledge of the attributes" ?
When normalizing we are concerned with the FDs that hold no matter what the business situation is, ie what the database state is. Each table for each business can have its own particular FDs per the table predicate & the possible business situations.
PS Do "make sense" of formal things in terms of the real world when their definitions are in terms of the real world. Eg applying a predicate to all possible situations to get all possible table values. But once you have the necessary formal information, only use formal definitions and procedures. Eg determining that a FD holds for a table because it holds in every possible table value.
so would any general table be in 2NF based on a solo condition of a table having a composite primary key?
There are tables in 5NF (hence too all lower NFs) with all sorts of mixes of composite & non-composite CKs. PKs don't matter.
It is frequently wrongly said that having no composite CKs guarantees 2NF. A table without composite keys and where {} does not determine any attribute is in 2NF. But if {} determines an attribute then it's a proper/smaller subset of any/every CK with any attributes. {} determines an attribute when every row has to have the same value for that attribute.
Why is the above relation in 2NF?
EP1, EP2, and EP3 are in 2NF because, for each one, the key identifies the non-key. No part of any key identifies any part of any non-key. That is what is meant by for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
By contrast, you might say EMP_PROJ is over-specified. If ssn identifies, ename (as the text says it does), then the combination of {ssn, pnumber} is too much. There exists a subset of the key {ssn,pnumber} that identifies a part of the non-key, {ename}. That situation does not occur in a table conforming to 2NF, as EP1, EP2, and EP3 illustrate.
Are functional dependencies ... based on ... domain knowledge of the attributes?
Emphatically, yes! That's all they're based on. The DBMS is just a logic machine. The ideas of "employee" and "hours" don't exist for it. The database designer chooses to define tables that model some real-world universe of discourse, and imposes meaning on the columns. He gives names to the attributes (above) in X and Y. He decides which columns serve to identify a row based on what is true about the universe being modeled.
if a table has a composite primary key, regardless of the functional dependencies is not in 2NF?
No. Remember, 2NF is defined in terms of FDs. What could it mean to speak of conforming to 2NF "regardless" of them?
The number of columns in the key is immaterial. It's some set, X, identifying the complement, Y.
I'm not sure if I thoroughly understand your questions, but I'll give a try to explain.
Your first statement about 2NF:
a relation is in 2NF if one cannot find a proper subset on the left hand side of a functional dependency which defines the NPA
is correct, as well as your supposition
if {SSN,Pnumber} -> hours and SSN -> hours then this relation wont be in 2NF
because what that means that you could determine 'hours' from 'SSN' alone, so using the composite key {SSN,Pnumber} to determine 'hours' will be redundant, and thus violates the 2NF requirements.
What you call the left hand side of an FD is usually called a key. You use the key to find the related data. In order to save space (and reduce complexity), you should always try to find a minimal key, and break up larger tables into smaller ones if possible, so you do not have to save information in more places than necessary. This is what normalization to the normal forms is all about, and being studied for about half a century now, substantial theory on the matter has been developed, and some rules chrystalized from it, like 1NF, 2NF, 3NF etc.
Your second question confuses me a lot, because from what you are saying, it seems you already understands this.
Could there be some confusion about the FD's? From the figure, it seems to me as they are defined like this:
{SSN,Pnumber} -> hours
{SSN} -> ename
{Pnumber} -> Pname,Plocation
Just like the three lower tables are modeled, together they add up to the relation (table) modeled above.
So, in the first table, you would need the composite key {SSN,Pnumber} to access any data in the relation (search in the table), while that clearly is not necessary for most of the fields.
Now, I'm not sure about what purpose that table would fulfill in real life. While that is not formally necessary, as long as the FD's are given, it might be easier to imagine why the design will benefit from normalization.
So let's day it's about recording workhours per emplyee per project in some organization. SSN identifies the employee, (whose name also is stored as ename because it is easier to remember, but could be duplicate), Pnumber identifies the project, which name and location is also stored much for the same reason.
Then if you as a manager need to register that an employee worked another few hours on some project, you would use your manager app on your device, which in turn will update the tables seamlessly (you cannot expect managers to understand the logics of normalization)
Behind the scenes, however, it would amount to some query, in SQL that would be an 'INSERT' statement which added another row to the relevant table(s).
Now you can see that in the above table, you would have to insert all the six attributes, while with the normalized tables below, you will only need to add a row to table EP1,consisting of three attributes. In a large organization with thousands of employees delivering their worksheets every week, that will quickly become huge differences in storage requirements. That has a number of benefits, perhaps the most significant beeing search speed.
Your third question I don't understand at all, I'm afraid. In a way you could say FD's are predetermined once you have decided what data you will save in your database. The FD's are not dupposed to change. When modeled in the DB, they will not change. If you later find you will alter the design, then that will be new relations with new FD's.
The text you seem to be quoting from somewhere simply says that if you have the FD X -> Y (X gives or determines Y) then if you have any two tuples (records) in that relation (table) that have the same value of X, they must also hve the same value of Y. Or in our example, if Pnumber somewhere is given the value of 888, Pname is 'Battleship' and Plocation is 'Kitchen Sink', then if somewhere else (some other record) the Pnumber 888 is used then also there Pname must be 'Battleship' and Plocation must be 'Kitchen Sink' because Pname and Plocation is functionally dependant on Pnumber.
Now that was almost another chapter in your textbook, or what? Hope it helps, because it took me some time to write :-)
A table can be said to be in 2NF, if the primary key is composed of multiple columns, and that if for each row these columns were concatenated together into a single string, then the resulting column would qualify as the primary key. Alternatively a single column primary key will also qualify as 2NF.
In this case the same employee could have multiple phone numbers (PNUMBER), so a you cannot have a compound primary key that includes the phone number.
This questions is obviously a homework question. I can't understand my professor and have no idea what he said during the election. I need to make step by step instructions to normalize the following table first into 1NF, then 2NF, then 3NF.
I appreciate any help and instruction.
Okay, I hope I remember all of them correctly, let's start...
Rules
To make them very short (and not very precise, just to give you a first idea of what it's all about):
NF1: A table cell must not contain more than one value.
NF2: NF1, plus all non-primary-key columns must depend on all primary key columns.
NF3: NF2, plus non-primary key columns may not depend on each other.
Instructions
NF1: find table cells containing more than one value, put those into separate columns.
NF2: find columns depending on less then all primary key columns, put them into another table which has only those primary key columns they really depend on.
NF3: find columns which depend on other non-primary-key columns, in addition to depending on the primary key. Put the dependent columns into another table.
Examples
NF1
a column "state" has values like "WA, Washington". NF1 is violated, because that's two values, abbreviation and name.
Solution: To fulfill NF1, create two columns, STATE_ABBREVIATION and STATE_NAME.
NF2
Imagine you've got a table with these 4 columns, expressing international names of car models:
COUNTRY_ID (numeric, primary key)
CAR_MODEL_ID (numeric, primary key)
COUNTRY_NAME (varchar)
CAR_MODEL_NAME (varchar)
The table may have these two data rows:
Row 1: COUNTRY_ID=1, CAR_MODEL_ID=5, COUNTRY_NAME=USA, CAR_MODEL_NAME=Fox
Row 2: COUNTRY_ID=2, CAR_MODEL_ID=5, COUNTRY_NAME=Germany, CAR_MODEL_NAME=Polo
That says, model "Fox" is called "Fox" in USA, but the same car model is called "Polo" in Germany (don't remember if that's actually true).
NF2 is violated, because the country name does not depend on both car model ID and country ID, but only on the country ID.
Solution: To fulfill NF2, move COUNTRY_NAME into a separate table "COUNTRY" with columns COUNTRY_ID (primary key) and COUNTRY_NAME. To get a result set including the country name, you'll need to connect the two tables using a JOIN.
NF3
Say you've got a table with these columns, expressing climatic conditions of states:
STATE_ID (varchar, primary key)
CLIME_ID (foreign key, ID of a climate zone like "desert", "rainforest", etc.)
IS_MOSTLY_DRY (bool)
NF3 is violated, because IS_MOSTLY_DRY only depends on the CLIME_ID (let's at least assume that), but not on the STATE_ID (primary key).
Solution: to fulfill NF3, put the column MOSTLY_DRY into the climate zone table.
Here are some thoughts regarding the actual table given in the exercise:
I apply the above mentioned NF rules without to challenge the primary key columns. But they actually don't make sense, as we will see later.
NF1 isn't violated, each cell holds just one value.
NF2 is violated by EMP_NM and all the phone numbers, because all of these columns don't depend on the full primary key. They all depend on EMP_ID (PK), but not on DEPT_CD (PK). I assume that phone numbers stay the same when an employee moves to another department.
NF2 is also violated by DEPT_NM, because DEPT_NM does not depend on the full primary key. It depends on DEPT_CD, but not on EMP_ID.
NF2 is also violated by all the skill columns, because they are not department- but only employee-specific.
NF3 is violated by SKILL_NM, because the skill name only depends on the skill code, which is not even part of the composite primary key.
SKILL_YRS violates NF3, because it depends on a primary key member (EMP_ID) and a non-primary key member (SKILL_CD). So it is partly dependent on a non-primary-key attribute.
So if you remove all columns which violate NF2 or NF3, only the primary key remains (EMP_ID and DEPT_CD). That remaining part violates the given business rules: this structure would allow an employee to work in multiple departments at the same time.
Let's review it from a distance. Your data model is about employees, departments, skills and the relationships between these entities. If you normalize that, you'll end up with one table for the employees (containing DEPT_CD as a foreign key), one for the departments, one for the skills, and another one for the relationship between employees and skills, holding the "skill years" for each tuple of EMP_ID and SKILL_CD (my teacher would have called the latter an "associative entity").
Looking at the first two rows in your table, and looking at which columns are tagged "PK" in that table, and assuming that "PK" stands for "Primary Key", and looking at the values that appear for those two columns in those two rows, I would recommend your professor to get the hell out of database teaching and not come back until he got himself educated properly on the subject.
This exercise cannot be taken seriously because the problem statement itself contains hopelessly contradictory information.
(Observe that as a consequence, there simply is not any such thing as a "good" or "right" answer to this question !!!)
Another oversimplified answer coming up.
In a 3NF relational table, every nonkey value is determined by the key, the whole key, and nothing but the key (so help me Codd ;)).
1NF: The key. This means that if you specify the key value, and a named column, there will be at most one value at the intersection of the row and the column. A multivalue, like a series of values separated by commas, is disallowed, because you can't get directly to the value with just a key and acolumn name.
2NF: The whole key. If a column that is not part of the key is determined by a proper subset of the key columns, then 2NF is being violated.
3NF: And nothing but the key. If a column is determined by some set of non key columns, then 3NF is being violated.
3NF satisfies only if it is in 2nd normal form and doesnot have any transitive dependency and all the non-key attributes should depend on the primary key.
Transitive dependency:
R=(A,B,C).
A->B AND B->C THEN A->C
My informal representation of these are:
1NF: The table is divided so that no item will appear more than once.
2NF: ?
3NF: Values can only be determined by the primary key.
I cannot make sense of it from the excerpts I found online or in my book. How do I differentiate between 1NF and 2NF?
A relation schema is in 2NF if every non-prime attribute is fully functionally dependent on every key.
Wikipedia says:
A table is in 2NF if and only if, it is in 1NF and every non-prime
attribute of the table is either dependent on the whole of a candidate
key, or on another non prime attribute.
To explain the concept, let's use a table for a inventory of toys adapted from Head First SQL:
TOY_ID| STORE_ID| INVENTORY| STORE_ADDRESS
The primary key is composed by the attributes TOY_ID and STORE_ID. If we analize the non-prime attribute INVENTORY we see that in depends on TOY_ID and STORE_ID at the same time. That's cool.
But on the other side, the non-prime attribue STORE_ADDRESS only depends on the STORE_ID attribute (i.e it's not related to the primary key attribute TOY_ID). That's a clear violation of 2NF, so to complain to the 2NF our schema must be like this:
An Inventory table: TOY_ID| STORE_ID| INVENTORY
and an Store table: STORE_ID| STORE_ADDRESS
Some columns are part of a key (primary or secondary). We'll call these prime attributes.
For second normal form we'll consider the non-prime attributes and see if they should be moved to another table. We might find that some attributes don't require the full key for us to be able to identify what value they hold for at least one candidate key. That is, there is a candidate key where we could still determine the value of that attribute given the candidate key even if the values in one column of that candidate key were erased.