Consider a relation R, and a functional dependencies set F ,including only one functional dependency: {X->A}.
prove that if R in 3NF iff R in BCNF.
So far, for the <- direction is trivial by definition. But i struggle to the -> direction. What we know about F-closure? from the definition, i need to check for every functional dependency Y->B that in F-closure, that its trivial or Y is superkey. Is there some conclusions on the superkey of R that i'm missing?
Here is a sketch of the proof.
The fact that a relation schema in BCNF implies that the schema is also in 3NF is due to the definition of 3NF (each determinant is a superkey or implies only prime attributes, and we know that each determinant is a superkeys since the schema is in BCNF).
So we must show that if the relation is in 3NF, then it is also in BCNF.
Now consider the only dependency, {X->A}. For the definition of 3NF, either X is a superkey, or A is prime.
In the first case, if X is a superkey, we know that the schema is also in BCNF.
So, we need to check only the case in which X is not a (super)key, and A is prime.
We can prove that this case is impossible, with the following steps.
We have only two possibilities, either X contains A, or not.
If X contains A then this dependency is trivial, and, since there are no other dependencies, X is a key, and this violates our hypothesis, so we have a contradiction.
If, on the other hand, X is not contained in A, then X is again a key, and this again contradicts our hypothesis.
Finally, note that in this proof I have assumed that there are no other attributes in R a part from XU{A}, otherwise those other attributes should be present in any key of the relation, and there should be at least another dependency with them.
Related
The definition of the Boyce–Codd normal form states that the determinants of all non-trivial functional dependencies have to be superkeys.
All the examples for relations in BCNF I found make use of candidate keys. I am looking for an example that actually has a superkey as determinant which is not a candidate key.
I fail to come up with a relation that only uses superkeys which can't be transformed to use candidate keys.
Let's say we have a relation with an candidate key and an additional functional dependency with a superkey as determinant.
R1(A,B,C)
{A}
A,B -> C
This additional FD is redundant because it contains an candidate key that obviously detemines the other attribute (A -> C).
Trying to build another example with two candidate keys is also useless.
R2(A,B,C,D)
{A,B},{B,C}
A,B,C -> D
This has the exact same problem as above.
I am actually wondering if there even is an example without candidate keys. But why would the definition be broader than necessary? Or are the definitions equivalent as the dependencies can always be transformed?
The point is that, when defining a normal form, we must express it in a general form, as a property of all the functional dependencies holding on a certain relation.
Instead, when we reason about a particular relation schema, we usually have only a subset of all the functional dependencies (since their number can be too large, being possibly exponential with the number of attributes). The particular set of dependencies used, denoted usually by the letter F, has a special property: it is a cover of all the dependencies holding in the relation, that is from it we can derive all the dependencies of the relation by applying, in all the possible ways, a set of axioms, called Armstrong’s axioms.
F, the set of dependencies specified together with the attributes in a relational schema, can be given in different ways: for instance in an exercise they can be given as input to the exercise, in real database design they can describe a set of constraints considered important for modelling a certain real-word domain, etc.
Even if they are extracted from the knowledge about a situation to be modeled through a database, they could contain dependencies implied by others already given, or maybe can contain redundant attributes, etc.
For these reasons, it is considered an important first step in normalization to find the canonical cover of the set of dependencies given, that is a cover constituted by a set of dependencies that: a) have only a single attribute on the rigth part; b) do not have superflous attributes on the left part (i.e. attributes that can be removed maintaing the property of being a cover); c) do not have redundant dependencies (i.e. dependencies that can be derived from others through the Armstrong’s axioms).
Now let’s consider the general definition of BCNF:
A relation schema R<T,F> is in BCNF if and only if for each non-trivial dependency X → Y of F+, X is a superkey.
Note that the we are talking about the dependencies in F+, which is the closure of F, in other words, which contains all the dependencies holding in R and derived in some way from F. So if the relation R has a candidate key XK, obviously not only XK → T holds, for instance, but for all the supersets S of XK we will have that S → T holds, and so the definition of the normal form must allow those dependencies.
Now, it is possible to prove, from the general definition of BCNF, the following theorem, that in some way simplifies it (and makes efficient the test to check if a relation is already in BCNF):
Theorem: A relation schema R<T,F> is in BCNF if and only if for each non-trivial dependency X → Y of F, X is a superkey.
See the difference? We are now talking about F and not F+.
And this theorem has the following corollary:
Corollary: A relation schema R<T,F> in which F is a canonical cover, is in BCNF if and only if for each non-trivial dependency X → A of F, X is a candidate key.
Since the dependencies in a canonical cover do not have superfluous attributes, if the relation is in BCNF every determinant (left hand side of a functional dependency) is obviously a candidate key (not a generic superkey), and this explain the difference between the definition and the examples that sometimes we found on the books.
What's the main point of Normalization?
I mean if a normal form is not in 2NF, it is because of partial dependency i.e. a non key attribute is dependent on a part of a candidate key.
So, let's say, for a relation R(A,B,C) with FDs:
AB->C, B->C
Clearly, AB is the candidate key and B->C is the partial dependency.
Solution: Decompose the relation such that (B,C) forms a new relation with B as the key.
Now, if a relation is not in 3NF, it is because a non key attribute is dependent on another non key attribute i.e. to say
if FDs for a relation R(A,B,C) are:
A->B,B->C
Clearly, A is the key and B->C shows transitive dependency, so not in 3NF.
Solution: Decompose the relation such that (B,C) forms a new relation with B as the key.
So, what's the exact difference?
I mean, why such a marked distinction? Essentially in both of the cases the action is same.
Decompose the relation using the dependency where the determinant (B here) is either PART of a key or not.
Why have separate terms like partial dependency or transitive dependency?
Why not just see, if there exists a dependency wherein a non prime attribute is determined by a something which is NOT a candidate key( no matter whether it is a partial key or another non prime attribute )
Why can't we implement a method like this:
1 NF -- having all elements in the atomic form
X NF -- if there's any
dependency of the form non_key -> non_prime_attribute(s) ,
decompose the relation with one of the new relation having this
particular "non_key" as the key with those non_prime_attributes.
BCNF
: Where for all the dependencies of the form X->Y, X is a superkey?
Can we have such NF condition format? Does it combine all the conditions?
So, what's the exact difference?
2NF is not 3NF & definitions of 2NF are not definitions of 3NF. There isn't any particular semantic or syntactic structural similarity that would leave some kind of "difference" other than that a 2NF relation can have the sort of problem FD (functional dependency) that violates 3NF that a 3NF relation doesn't have. You can find definitions all over the place. You almost give them correctly here yourself. But a NF (normal form) is a condition, not a process. What do you mean "actions are the same"? Being in 3NF implies being in 2NF, so naturally decomposing to 3NF also gives 2NF. But there are relations that are in 2NF but not in 3NF, and there may be decompositions for a relation to 2NF that don't get to 3NF. Those decompositions will involve in a removal of all problem partial FDs that does not result in the removal of all problem transitive FDs.
(Because 3NF is always achievable and there are no other disadvantages compared to 2NF, 2NF isn't even useful. It's just a condition that was discovered first that is not as strong as 3NF.)
(3NF is frequently defined in terms of 2NF plus no transitive dependencies of non-prime attributes on CKs, but actually no such FDs implies no partial FDs of non-prime attributes on CKs, hence 2NF, so the first condition is redundant.)
Why not just see, if there exists a dependency wherein a non prime attribute is determined by a something which is NOT a candidate key
Why should that condition be helpful? It is not a description of just getting rid of the problem FDs of 2NF & 3NF--that's what putting into 3NF does.
Getting rid of non-trivial FDs that are not determined by superkeys happens to give BCNF. It implies 2NF & 3NF. But it is different from both of them. A BCNF relation exhibits no FD-based update anomalies. It is always achievable. However 3NF is alway achievable while "preserving FDs", whereas BCNF is not. There are cases where in order for a FD that held in the original to be enforced in a view/query that gives it via constraints on its components we need an EQD (equality dependency) constraint. That says two column sets have the same set of subrow values, which is more expensive to enforce than a FD. Either you have BCNF & an EQD & fewer update anomalies or you have 3NF/EKNF & a FD & certain update anomalies.
The NF that really matters is 5NF, which implies BCNF, with no update anomalies & with other benefits. (We might then decide to denormalize for performance reasons.)
PS Normalization to a given NF does not necessarily involve normalization to lower NFs.
It almost sounds as though you want to know why they called these two normal forms by different names instead of inventing just one form that covers both cases. If that's not the case, please ignore this answer.
Part of the answer is that the forms weren't discovered at the same time. And part of the answer is that the problem with 1NF that gave rise to 2NF is not the same as the problem with 2NF that gave rise to 3NF, even though they both exhibit harmful redundancy.
What might satisfy you a little more is BCNF. BCNF was actually discovered later than 4NF, so that name was already in use. But BCNF has to be placed between 3NF and 4NF, because it is more restrictive than 3NF but less restrictive than 4NF. So it was discovered "out of sequence", so to speak.
In BCNF, every (non trivial) determinant is a candidate key. That seems to be what you are looking for. I conjecture that any relation that is in 1NF and where every determinant is a candidate key, could be shown to be in 2NF and 3NF. But the proof is beyond me.
2NF and 3NF are essentially historical concepts and your question is a reasonable one. There is no real reason to apply them in practical database design because better tools exist today.
When it comes to teaching there is possibly some justification for mentioning 2NF and 3NF. Doing so allows students to explore the concepts involved (as you have done) while also teaching them a bit about the origins and rationale of design theory. In school maths lessons I was taught long division and differentiation from first principles. No one uses those techniques in practice, they are just teaching aids.
Before checking for 2NF the relation should be in 1NF. In simple words 2NF have only full dependencies only, no partial dependencies in relation. Full dependency means if x gives y, then by removal of any element in x, then y is not having any relation. If by removal of x, you are having relation with y then it is partial dependency. For 3NF we have to check for the 2NF, in 3NF we should not have any transitive relations like if x gives z, then there is no relation like x gives y and y gives z.
Solution for 2NF create a table for the partial dependcies and add foreign key in new relation which is primary key on the previous relation.
Solution for 3NF create a relation for both x gives y and y gives z. Add keys to relations.
If I have the following relation R = (A, B, C, D)
And the functional dependencies:
A -> B, B -> A, CDB -> A, CDA -> B
The candidate keys are CDA and CDB.
The third normal form says that there can not be a functional dependency between non-prime attributes. A non-prime attribute is an attribute that doesn't occur in one of the candidate keys. Then that means that this relation already is 3NF since both A and B, that depend on each other, are part of one of the candidate keys, am I right?
If so, I have another question about BCNF. BCNF says that every determinant must be a candidate key. In this case, A and B are not candidate keys, so that violates BCNF, or am I missing something here?
Thanks.
If the three FDs you have given are supposed to be a canonical cover of the FDs satisfied by R then you are right to conclude that CDA and CDB must be candidate keys. (You didn't state the FDs are canonical and if not then there are other ways to satisfy the same dependencies but I guess the intent of the question is that the candidate keys must be inferred only from what you are given.)
If CDA and CDB are in fact the candidate keys of R then you are right that R satisfies 3NF but not BCNF.
I read following example, that relation A(X,Y,Z,P,Q,R) with the following functional dependency.
why this is in 1NF?
anyone could help me?
The diagram is not normal notation. I suppose that arrows point to determined attributes of FDs. I suppose that an arrow that doesn't come from a box means a FD with just one determinant attribute and an arrow that comes from a box means a FD with the boxed attributes as determinant attributes. Find out what the diagram notation means.
If so then the functional dependencies are Y → Z, XYZ → QR and P → QRX.
To show what normal form the relation is in we need to know what definitions you were given for normal forms. It happens that this relation is not in 2nd normal form. So it isn't in any higher normal form. So it is only in 1st normal form. So the only normal form definition we need to know is the one you were given for 2NF. That definition usually involves candidate keys. If so then we need to know what definition you were given for candidate key. The definition of 2NF can involve full and partial FDs. If yours does then we need to know what definitions of full and/or partial FD you were given. Give the definitions.
The only CK is PY because it determines every other attribute but no proper subset of it does. It is the only CK because there is no other such set of attributes. To justify this we need to reference the rules you were given for deriving FDs and CKs. (Eg this includes how we went from one FD list to another.) Give the rules.
But there are then also determinants Y (from Y → Z) and P (from P → QRX) that are proper subsets of that candidate key. So each of non-prime attributes Z, Q, R and X is partially dependent on a CK. But to be in 2NF there must be no non-prime attributes partially dependent on a candidate key. Ie every non-prime attribute must be fully functionally dependent on all CKs. So A is not in 2NF. So the highest normal form it is in is 1NF.
The picture doesn't make its meaning very clear in my opinion because it seems to be mixing two different notations for functional dependencies (FDs). Any answer will depend on how you want to interpret the diagram.
I'd hazard a guess that the diagram is supposed to indicate the following set of FDs: XY->Z, Y->QR, P->QRX. If that's correct then the possible candidate key respecting that set of FDs would be {Y,P}. If my interpretation of the diagram is correct then both Y and P are determinants in their own right. Since Y and P are proper subsets of a candidate key of A we can conclude that A violates 2NF and therefore the highest normal form that A can satisfy is 1NF.
Update: Your new picture specifies some dependencies. Collecting the determinant terms together we can summarize as:
P->XQR
XY->QR
Y->Z
I assume these are supposed be the dependencies actually satisfied by A. On the left-hand side we have P, X and Y so PXY will be a superkey of A. P->X therefore the candidate key (minimal superkey) can only be PY. P->XQR and Y->Z are both FDs with determinants that are proper subsets of the candidate key (PY) and that means those dependencies both violate 2NF. Recap: 2NF prohibits any FD where the left-hand side is a proper subset of a candidate key. So 1NF is the highest normal form of A.
As per 1NF, no two rows of a relation must have repeating values and no column must have more than one value in a row. This increases redundancy as there will be columns with same data repeating in many rows.
Name ID Course
A 1 Computer
B 2 Arts
C 3 Computer
Here Course column has repeated values. But every row has no column which has 2 values. Hence it is in 1NF.
1NF has the least number of restrictions. So any other forms like 2NF, 3NF by default would also be in 1NF.
Consider this analogy
1NF = Living Beings
2NF = Mammals
3NF = Humans
All mammals/2NF are by default living beings/1NF, and so on.
To satisfy the functional dependency X → Y, it is essential that each X value be associated with only one Y value. And thus it satisfies the 1NF criteria which does not allow multiple values in a row for a column.
Accoring to Boyce-Codd Normal Form Definition,
Reln R with FDs F is in BCNF if, for all X -> A in F+
-A is subset of X (called a trivial FD), or
-X is a superkey for R.
“R is in BCNF if the only non-trivial FDs over R are key constraints.”
If R in BCNF, then every field of every tuple records information that
cannot be inferred using FDs alone.
What I dont understand is the above two statements about normal form,
Can someone give me an example?
Thanks!
Some Pre-requisite terms before I try to Explain:
• Non-key attribute: An attribute that is not part of any candidate key is known as non-key /non-prime attribute.
• Superkey: A set of attributes within a table whose values can be used to uniquely identify a tuple. A candidate key is a minimal set of attributes necessary to identify a tuple; this is also called a minimal superkey.
Now, BCNF is the advance version of 3NF, stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
Consider a relation : R(A,B,C,D)
The dependencies are:
A->BCD
BC->AD
D->B
So, Candidate keys(or minimal super keys) are A and BC.
But in dependency: D->B, D is not a superkey.
Hence it violates BCNF form.
We can break this relation into R1 and R2 as:
R1(A,D,C) and R2(D,B) to get BCNF.