5NF and the significance of trivial join dependency - database

This the definition of 5NF from Navathe Book of Fundamentals of Database Systems, 6th Edition.
A relation schema R is in fifth normal form (5NF) (or project-join
normal form (PJNF)) with respect to a set F of functional, multivalued, and
join dependencies if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R.
The definition of join dependency is:
A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a nonadditive join decomposition into R1, R2, ...,Rn. Hence, for every such r we have (πR1(r),πR2(r), ..., πRn(r)) = r
This is where I have a problem understanding:
What does "for every nontrivial join dependency JD(R1,R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R." mean?
This is my attempt at understanding:
If I have a relation R(A, B, C), suppose superkeys are AB and AC and if I decomposed R into R1(A,C) and R2(A, B), then R1 and R2 are superkeys of R(I might be wrong here). And since I can join R1 and R2 to form R, it means that R has a non-trivial join dependency but since each R i is a super key of R, R is in 5NF.
Thanks to #philipxy for suggesting edits. I have tried to make the question clearer. Also, if that definition of 5NF is wrong, where can I get the right definitions?

Related

Database - Lossless Join Decomposition Criteria

On Wikipedia, it says:
The decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies are in F+ (where F+ stands for the closure for every attribute or attribute sets in F):
R1 ∩ R2 → R1 or R1 ∩ R2 → R2
Unfortunately, I do not understand this criteria. It is known that the decomposition is lossless if the join of R1 and R2 is R, but how is this derivable from the criteria above?
That Wikipedia article is a mess.
A decomposition is lossless if and only if the components (which are projections of the original) join back to it.
The stuff you quote is not a definition of lossless decomposition. It is a sufficient condition for showing that a decomposition is lossless given some functional dependencies that hold in the original. If the condition is met then the join is lossless. It's not a necessary condition.
Some university html slides:
Decomposition
10 We'll make a more formal definition of lossless-join: [...]
11 In other words, a lossless-join decomposition is one in which, for any legal relation r, if we decompose r and then "recompose" r, we get what we started with--no more and no less.
A useful sufficient condition for Lossless-Join Decomposition during Normalization Using Functional Dependencies
Let R be a relation schema.
Let F be a set of functional dependencies on R.
Let R1 and R2 form a decomposition of R.
The decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies are in F+:
1 R1 ∩ R2 → R1
2 R1 ∩ R2 → R2
The idea behind knowing that sufficient condition is that you just have to show something about the set of shared attributes & some functional dependencies to know the components join to the original and (equivalently) are a lossless decomposition.
Why is this true? Simply put, it ensures that the attributes involved in the natural join (R1 ∩ R2) are a candidate key for at least one of the two relations.

Trivial join dependency

I'm having a difficulty to understand how to "work" with join dependencies, and I would like to ask a question that will help me clarify things for myself.
Here's the simple definition from Wikipedia:
A table T is subject to a join dependency if T can always be recreated
by joining multiple tables each having a subset of the attributes of
T.
A trivial join dependency is defined as follows:
If one of the tables in the join has all the attributes of the table
T, the join dependency is called trivial.
My question is: If we decompose a relation R into a lossless decomposition, is it possible that every join dependency of R could be a trivial join dependency?
An example would be awesome.
If we decompose a relation R into a lossless decomposition, is it possible that the join dependency\ies of R would be a trivial join dependency\ies?
If you mean, if we decompose a relation R losslessly is it possible that all the JDs of R are trivial: yes.
Whenever all the JDs of R are trivial, you can decompose it losslessly, because by definition a JD is just a description of a lossless decomposition. And there are such relations. Every R, calling its attribute set S, satisfies the JDs *(S,S), *(S,S,S), etc. Some satisfy no other FDs. Some satisfy others but they're also trivial.
Eg: This R only satisfies *(S,S), *(S,S,S), etc:
x y
1 2
5 2
5 4
Eg: Say S = {x,y} and FD {x}->{y} holds, so *({x},S} holds. But say JD *({x},{y}) doesn't hold. Then the only way a JD can have sets unioning to S is if S is one of them. So R has only trivial JDs. But not just the ones using only S.
x y
1 2
5 2
6 4
If you mean, if we decompose a relation R losslessly into smaller components is it possible that all the JDs of R are trivial: no. Because by definition a trivial JD has one set that is all the attributes of R, ie has one component that is R, it doesn't decompose into components smaller than R.

Can someone tell me if this relation is in 3NF?

Consider a relation R(A, B, C, D, E) with the following function dependencies: A->BC, D->CE, C->E
AD+ = ABCDE
Prime Attributes: AD
Non-Prime Attributes: BCE
Decomposed into 3NF but not BCNF
R1(A, B, C, D) R2(C,E)
It's been a long time for me since I've done this. But if I remember right in 3NF it is not allowed for a column to be in a table if it is transitively dependent on another column.
In this case, the only transitive dependency is A -> C -> E, which means E needs to be extracted from R.
This you have done to my understanding.
Something tells me you might need to extract C from R1, but that is probably only nessecary in BCNF.
The third normal form in your case is the following:
R1 (A B C)
R2 (C E)
R3 (C D)
R4 (A D)
Note that this is the only way of decomposing your relations in third normal form preserving the dependencies, and that all the resulting dependencies are such that all the decomposed schemas are also in BCNF.
Finally, one can note that the same relation can be decomposed in BCNF in different ways by losing some functional dependency.

4NF, Multivalued Dependencies without Functional Dependencies

Sorry for asking a question one might consider a basic one)
Suppose we have a relation R(A,B,C,D,E) with multivalued dependencies:
A->>B
B->>D.
Relation R doesn't have any functional dependencies.
Next, suppose we decompose R into 4NF.
My considerations:
Since we don't have any functional dependencies, the only key is all attributes (A,B,C,D,E). There are two ways we can decompose our relation R:
R1(A,B) R2(A,C,D,E)
R3(B,D) R4(A,B,C,E)
My question is - are these 2 decompositions final? Looks like they are since there are no nontrivial multivalued dependencies left. Or am I missing something?
Relation R doesn't have any functional dependencies.
You mean, non-trivial FDs (functional dependencies). (There must always be trivial FDs.)
Assuming that the MVDs (multivalued dependencies) holding in R are those in the transitive closure of {A ↠ B, B ↠ D}:
In 1 R1(A,B) R2(A,C,D,E), we can reconstruct R as R1 JOIN R2 and both R1 & R2 are in 4NF and their join will satisfy A ↠ B. If some component contained all the attributes of the other MVD then we could further decompose it per that MVD. And we would know that, given some alleged values for all components, their alleged reconstruction of R by joining would satisfy both MVDs. But here there is no such component. So we can't further decompose. And we know that an alleged reconstruction of R by joining satisfies A ↠ B but we would still have to check whether B ↠ D. We say that the MVD B ↠ D is "not preserved" and the decomposition to R1 & R2 "does not preserve MVDs".
In 2 R3(B,D) R4(A,B,C,E), we can reconstruct R as R3 JOIN R4 and both R3 & R4 are in 4NF and the join will satisfy B ↠ D. Now some component contains all the attributes of the other MVD so we can further decompose it per that MVD. And we know that, given some alleged values for all components, their alleged reconstruction of R by joining satisfies both MVDs. That component is R4, which we can further decompose, reconstructing as AB JOIN ACE. And we know that an alleged reconstruction of R by joining satisfies both A ↠ B & B ↠ D. Because the MVDs in the original appear in a component, we say these decompositions "preserve MVDs".
PS 1 The 4NF decomposition must be to three components
MVDs always come in pairs. Suppose MVD X ↠ Y holds in a relation with attributes S, normalized to components XY & X(S-Y). Notice that S-XY is the set of non-X non-Y attributes, and X(S-Y) = X(S-XY). Then there is also an MVD X ↠ S-XY, normalized to components X(S-XY) & X(S-(S-XY)), ie X(S-XY) & XY, ie X(S-Y) & XY. Why? Notice that both MVDs give the same component pair. Ie both MVDs describe the same condition, that S = XY JOIN X(S-XY). So when an MVD holds, that partner holds too. We can write the condition expressed by each of the MVDs using the special explicit & symmetrical notation X ↠ Y | S-XY.
We say a JD (join dependency) of some components of S holds if and only if they join to S. So if S = XY JOIN X(S-Y) = XY JOIN X(S-XY) then the JD *{XY, X(S-XY)} holds. Ie the condition that both MVDs describe is that JD. So a certain MVD and a certain binary JD correspond. That's one way of seeing why normalizing an MVD away involves a 2-way join and why MVDs come in pairs. The JDs that cause a 4NF relation to not be in 5NF are those that do not correspond to MVDs.
Your example involves two MVDs that aren't partners & neither otherwise holds as a consequence of the other, so you know that the final form of a lossless decomposition will involve two joins, one for each MVD pair.
PS 2 Ambiguity of "Suppose we have a relation with these multi-valued dependencies"
When decomposing per FDs (functional dependencies) we are usually given a canonical/minimal cover for the relation, ie a set in a certain form whose transitive closure under Armstrong's axioms (set of FDs that must consequently hold) holds all the FDs in the relation. This is frequently forgotten when we are told that some FDs hold. We must either be given a canonical/minimal cover for the relation or be given an arbitrary set and be told that the FDs that hold in the relation are the ones in its transitive closure. If we're just given a set of FDs that hold, we know that the ones in its transitive closure hold, but there might be others. So in general we can't normalize.
Here you give some MVDs that hold. But they aren't the only ones, because each has a partner. Moreover others might (and here do) consequently hold. (Eg X ↠ Y and Y ↠ Z implies X ↠ Z-Y holds.) But you don't say that they form a canonical or minimal cover. One way to get a canonical form for MVDs (a unique one per each transitive closure, hopefully more concise!) would be a minimal cover (one that can't lose any MVDs and still have the same transitive closure) augmented by the partner of each MVD. (Whereas for FDs the standard canonical form is minimal.) You also don't say "the MVDs that hold are those in the transitive closure of these". You just say that those MVDs hold. So maybe some not in the transitive closure do too. So your example can't be solved. We can guess that you probably mean that this is a minimal cover. (It's not canonical.) Or that the MVDs that hold in the relation are those in the transitive closure of the given ones. (Which in this case are then a minimal cover.)
A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X ->> Y, X is a superkey—that is, X is either a candidate key or a superset.
In your first decomposition(1 with R1 and R2) B->>D is not satisfying so it's not dependency preserving decomposition as well as not in 4NF as A is not superkey in 2nd table.
On the other hand,second decomposition(2 with R3 and R4) is dependency preserving and lossless join with B and ACE as primary key in respective tables but it's not in 4NF because A->>B dependency exists in second table and A is not superkey, you have to decompose second table further in to two tables that can be {A B} and {A C E}.
So if I follow your reasoning (suraj3), are R1(A,B) and R2(B,C,D,E) correct decomposition? I think this will preserve the FD B->>D.

Multivalued dependency confusion

I am struggling with the concepts of 4NF and Multivalued Dependencies (MVDs).
I am looking at a supplementary book for the course I am currently taking and one of the examples is below.
The book states the asterisks refer to a unique key or a composite attribute key.
Given: R(A*,B,C*) and the set {(A,B):R,(B,C):R} satisfies the
lossless decomposition property.
Does the multivalued dependency B->>C hold?
Is B definitely a unique key?
Is R in 4NF?
I understand lossless decomposition - if you take the natural join of the two sets above - you are given the original dataset i.e in this case A,B,C.
But I just cannot grasp how to take the given information and prove/confirm that B->>C holds or if it does not.
I emailed my professor about my confusion and he just told me to look over his notes (which I've obviously done numerous times) and it's gotten me nowhere.
Does the multivalued dependency B->>C hold?
You have been told some things about MVDs. One of them is probably:
A decomposition of R into (X, Y) and (X, R − Y) is a lossless-join
decomposition if and only if X ->> Y holds in R.
In your case R is {A,B,C}, X is {B}, Y is {A} and R - Y = {A,B,C} - {A} = {B,C}. So a decomposition of R into (B,A) and (B,C) is a lossless-join decomposition if and only if {B,A} ->> {B,C} holds in R. But we are given that decomposition of R into (B,A) and (B,C) is a lossless-join decomposition. So {B,A} ->> {B,C} holds in R.
Is B definitely a unique key?
I can't make sense of that.
Maybe you are trying to say that we are given that {A,C} is a CK (candidate key) of R but that there might be other CKs and you are trying to ask whether the decompositionality means that {B} must also be a CK of R. Let's look for a counterexample. Pick a simplest example. Suppose R is {(a,b,c1),(a,b,c2)} = {(a,b)} JOIN {(b,c1),(b,c2)}. This agrees with R CK {A,C} & R MVD {B,A} ->> {B,C}. But b appears with c1 and c2 so {B} does not functionally determine all other attributes so {B} is not a CK of R. So that CK and that MVD do not force that {B} is a CK.
Is R in 4NF?
You have been told some things about 4NF. One is probably that:
A Table is in 4NF if and only if, for every one of its non-trivial
multivalued dependencies X ->> Y, X is a superkey
The MVD {B,A} ->> {B,C} is non-trivial. But to show whether R must be in 4NF or must not be in 4NF or that we can't tell, you are going to have to address possible sets of non-trivial MVDs that could hold in R and CKs that R could have.

Resources