Somewhere the definition of 2NF is given as -
A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R.
And somewhere it is given as -
A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on any key of R.
Which is correct ?
Only primary key is to be considered or all the keys are considered when checking partial dependency.
A relation R is in second normal form if every non-prime attribute of R is fully dependent on each candidate key
of R.
E.F.Codd, 1971, Further Normalization of the Data Base Relational Model
2NF is not particularly significant. With respect to functional dependencies the normal forms that matter are Elementary Key Normal Form and Boyce-Codd Normal Form.
Related
What's the main point of Normalization?
I mean if a normal form is not in 2NF, it is because of partial dependency i.e. a non key attribute is dependent on a part of a candidate key.
So, let's say, for a relation R(A,B,C) with FDs:
AB->C, B->C
Clearly, AB is the candidate key and B->C is the partial dependency.
Solution: Decompose the relation such that (B,C) forms a new relation with B as the key.
Now, if a relation is not in 3NF, it is because a non key attribute is dependent on another non key attribute i.e. to say
if FDs for a relation R(A,B,C) are:
A->B,B->C
Clearly, A is the key and B->C shows transitive dependency, so not in 3NF.
Solution: Decompose the relation such that (B,C) forms a new relation with B as the key.
So, what's the exact difference?
I mean, why such a marked distinction? Essentially in both of the cases the action is same.
Decompose the relation using the dependency where the determinant (B here) is either PART of a key or not.
Why have separate terms like partial dependency or transitive dependency?
Why not just see, if there exists a dependency wherein a non prime attribute is determined by a something which is NOT a candidate key( no matter whether it is a partial key or another non prime attribute )
Why can't we implement a method like this:
1 NF -- having all elements in the atomic form
X NF -- if there's any
dependency of the form non_key -> non_prime_attribute(s) ,
decompose the relation with one of the new relation having this
particular "non_key" as the key with those non_prime_attributes.
BCNF
: Where for all the dependencies of the form X->Y, X is a superkey?
Can we have such NF condition format? Does it combine all the conditions?
So, what's the exact difference?
2NF is not 3NF & definitions of 2NF are not definitions of 3NF. There isn't any particular semantic or syntactic structural similarity that would leave some kind of "difference" other than that a 2NF relation can have the sort of problem FD (functional dependency) that violates 3NF that a 3NF relation doesn't have. You can find definitions all over the place. You almost give them correctly here yourself. But a NF (normal form) is a condition, not a process. What do you mean "actions are the same"? Being in 3NF implies being in 2NF, so naturally decomposing to 3NF also gives 2NF. But there are relations that are in 2NF but not in 3NF, and there may be decompositions for a relation to 2NF that don't get to 3NF. Those decompositions will involve in a removal of all problem partial FDs that does not result in the removal of all problem transitive FDs.
(Because 3NF is always achievable and there are no other disadvantages compared to 2NF, 2NF isn't even useful. It's just a condition that was discovered first that is not as strong as 3NF.)
(3NF is frequently defined in terms of 2NF plus no transitive dependencies of non-prime attributes on CKs, but actually no such FDs implies no partial FDs of non-prime attributes on CKs, hence 2NF, so the first condition is redundant.)
Why not just see, if there exists a dependency wherein a non prime attribute is determined by a something which is NOT a candidate key
Why should that condition be helpful? It is not a description of just getting rid of the problem FDs of 2NF & 3NF--that's what putting into 3NF does.
Getting rid of non-trivial FDs that are not determined by superkeys happens to give BCNF. It implies 2NF & 3NF. But it is different from both of them. A BCNF relation exhibits no FD-based update anomalies. It is always achievable. However 3NF is alway achievable while "preserving FDs", whereas BCNF is not. There are cases where in order for a FD that held in the original to be enforced in a view/query that gives it via constraints on its components we need an EQD (equality dependency) constraint. That says two column sets have the same set of subrow values, which is more expensive to enforce than a FD. Either you have BCNF & an EQD & fewer update anomalies or you have 3NF/EKNF & a FD & certain update anomalies.
The NF that really matters is 5NF, which implies BCNF, with no update anomalies & with other benefits. (We might then decide to denormalize for performance reasons.)
PS Normalization to a given NF does not necessarily involve normalization to lower NFs.
It almost sounds as though you want to know why they called these two normal forms by different names instead of inventing just one form that covers both cases. If that's not the case, please ignore this answer.
Part of the answer is that the forms weren't discovered at the same time. And part of the answer is that the problem with 1NF that gave rise to 2NF is not the same as the problem with 2NF that gave rise to 3NF, even though they both exhibit harmful redundancy.
What might satisfy you a little more is BCNF. BCNF was actually discovered later than 4NF, so that name was already in use. But BCNF has to be placed between 3NF and 4NF, because it is more restrictive than 3NF but less restrictive than 4NF. So it was discovered "out of sequence", so to speak.
In BCNF, every (non trivial) determinant is a candidate key. That seems to be what you are looking for. I conjecture that any relation that is in 1NF and where every determinant is a candidate key, could be shown to be in 2NF and 3NF. But the proof is beyond me.
2NF and 3NF are essentially historical concepts and your question is a reasonable one. There is no real reason to apply them in practical database design because better tools exist today.
When it comes to teaching there is possibly some justification for mentioning 2NF and 3NF. Doing so allows students to explore the concepts involved (as you have done) while also teaching them a bit about the origins and rationale of design theory. In school maths lessons I was taught long division and differentiation from first principles. No one uses those techniques in practice, they are just teaching aids.
Before checking for 2NF the relation should be in 1NF. In simple words 2NF have only full dependencies only, no partial dependencies in relation. Full dependency means if x gives y, then by removal of any element in x, then y is not having any relation. If by removal of x, you are having relation with y then it is partial dependency. For 3NF we have to check for the 2NF, in 3NF we should not have any transitive relations like if x gives z, then there is no relation like x gives y and y gives z.
Solution for 2NF create a table for the partial dependcies and add foreign key in new relation which is primary key on the previous relation.
Solution for 3NF create a relation for both x gives y and y gives z. Add keys to relations.
I'm looking at a specific example of a relation with a composite primary key. Based on its functional dependencies, I know it is in 1NF. While normalizing it to 3NF I came across a situation I have not yet encountered. I followed the steps for all partial dependencies and transitive dependencies, but the last step of normalizing to 3NF requires you to create a relation that contains the primary key and all non-prime attributes dependent on it.
In my specific case, I have the primary key, but no full functional dependencies on it. Do I make a table containing only my composite primary key? Or do I not make one at all?
I have no confusion of composite and primary keys. See my comment below to see why I believe my question is different from that one
It is perfectly legitimate to have a relation that consists of a composite key and no other attributes. It's not only theoretically valid, but also it happens in the real world.
In such situation, that relation is merely asserting the existence of something identified by the composite key. And it would be used by the user of the data to test for existence and not for the same kind of lookups that a relation with non key attributes is typically used for.
FDs (functional dependencies) have nothing to do with 1NF, no matter which of the various meanings for "1NF" you are using. So it's not clear what you're trying to say about 1NF. A relation by definition has a value for each attribute of each tuple. A thing like a relation with something like a "list of values" for some part like an attribute of some part like a tuple is not a relation so CKs (candidate keys) & FDs do not apply. If you define a "1NF relation" as one without certain data types (because of some fuzzy application-dependent received wisdom about "atomicity", or in Codd's sense of having no relation-valued attributes) then satisfaction does not depend on whether FDs hold on the design with that data type. (Moreover if the "normalized" "atomic"-attributed version of such a "non-1NF" "non-atomic"-attributed design satisfies a FD then the original has a certain constraint, but it's not a FD constraint.)
FDs that aren't partial are full. The only partial FDs that matter on the way to 2NF & 3NF are partial FDs of non-prime attributes on CKs. When these are gone you have 2NF. (From "followed the steps for all partial dependencies and transitive dependencies" it sounds like your plan is to decompose to 2NF then to 3NF.) Partial FDs just aren't mentioned in a definition of 3NF that requires 2NF. Also, definitions for 3NF and the common algorithm for putting a relation into 3NF just don't make use of partial FDs.
There can also be other partial FDs. They just don't matter. In particular, all the FDs of attributes on proper superkeys are partial. Just follow the definitions for determining what normal form(s) a relation is and follow the algorithms for putting a relation into a normal form. This goes for all definitions and algorithms. There is no point in worrying about every property you notice that it might be "bad".
PS You shouldn't put a relation into 3NF by first putting it into 2NF. That can exclude some good 3NF decompositions of the original from being found. Use an algorithm for 3NF. (The usual one for 3NF actually generates decompositions in the slightly stronger EKNF (Elementary Key Normal Form)).
From the Database Management Systems book: given the relation SNLRWH (each letter denotes an attribute) and the following functional dependencies:
S->SNLRWH (S is the PK)
R->W
My attempt:
First, it is not 3NF: for the second FD, neither R contains W, nor R contains a key, nor W is part of a key.
Second, it is/not 2NF. If we examine the second FD, W is dependent on R, which in turn is not part of a key. STUCK.
2NF is violated if some proper subset of a candidate key appears as a determinant on the left hand side of one of your (non-trivial) dependencies. Ask yourself whether any of your determinants is a subset of a candidate key.
Usually 2NF is violated only when a relation has a composite key - a key with more than one attribute. It is technically possible for a relation with only simple keys (single attribute keys) to violate 2NF if the empty set (∅) happens to be a determinant. Such cases are fairly unusual and rarely thought worthy of consideration because they are so obviously "wrong". For completeness, here's a fun example of that special case. In the following relation Circumference and Diameter are both candidate keys. The dependency in violation of 2NF is ∅ -> Pi, the ratio of the circumference to the diameter.
2NF has to do with partial key dependencies. In order for a relation to fail the test for 2NF, the relation has to have at least one candidate key that has at least two columns.
Since your relation has only one candidate key, and that candidate key has only one column, you can't possibly have a partial key dependency. It passes the test for 2NF.
I have a question:
Considering a relation R{A,B,C,D,E,F} with the next set of functional dependencies {ABC->DEF,D->E,ABC->A}. A, B and C are Prymary Keys.
Can you explain me why this is on 2nd NF? Thanks.
Can you explain me why this is on 2nd NF?
I'm not quite sure what "why this is on 2nd NF" means. (Typo?) But the relation R is not in 3NF, because there's a transitive dependency: ABC->D, and D->E. So relation R must be in either 1NF or 2NF.
Relation R is in 2NF if and only if
it's in 1NF, and
there are no partial key dependencies.
ABC->A might look like a partial key dependency, but it's not, because "A" is a prime attribute. (ABC->A is a trivial dependency, because A->A.) The non-prime attributes are {DEF}. None of those attributes are functionally dependent on only part of any candidate key (a more general way of saying they're not functionally dependent on part of this relation's primary key).
So relation R is in 2NF.
Accoring to Boyce-Codd Normal Form Definition,
Reln R with FDs F is in BCNF if, for all X -> A in F+
-A is subset of X (called a trivial FD), or
-X is a superkey for R.
“R is in BCNF if the only non-trivial FDs over R are key constraints.”
If R in BCNF, then every field of every tuple records information that
cannot be inferred using FDs alone.
What I dont understand is the above two statements about normal form,
Can someone give me an example?
Thanks!
Some Pre-requisite terms before I try to Explain:
• Non-key attribute: An attribute that is not part of any candidate key is known as non-key /non-prime attribute.
• Superkey: A set of attributes within a table whose values can be used to uniquely identify a tuple. A candidate key is a minimal set of attributes necessary to identify a tuple; this is also called a minimal superkey.
Now, BCNF is the advance version of 3NF, stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
Consider a relation : R(A,B,C,D)
The dependencies are:
A->BCD
BC->AD
D->B
So, Candidate keys(or minimal super keys) are A and BC.
But in dependency: D->B, D is not a superkey.
Hence it violates BCNF form.
We can break this relation into R1 and R2 as:
R1(A,D,C) and R2(D,B) to get BCNF.