Database - Lossless Join Decomposition Criteria - database

On Wikipedia, it says:
The decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies are in F+ (where F+ stands for the closure for every attribute or attribute sets in F):
R1 ∩ R2 → R1 or R1 ∩ R2 → R2
Unfortunately, I do not understand this criteria. It is known that the decomposition is lossless if the join of R1 and R2 is R, but how is this derivable from the criteria above?

That Wikipedia article is a mess.
A decomposition is lossless if and only if the components (which are projections of the original) join back to it.
The stuff you quote is not a definition of lossless decomposition. It is a sufficient condition for showing that a decomposition is lossless given some functional dependencies that hold in the original. If the condition is met then the join is lossless. It's not a necessary condition.
Some university html slides:
Decomposition
10 We'll make a more formal definition of lossless-join: [...]
11 In other words, a lossless-join decomposition is one in which, for any legal relation r, if we decompose r and then "recompose" r, we get what we started with--no more and no less.
A useful sufficient condition for Lossless-Join Decomposition during Normalization Using Functional Dependencies
Let R be a relation schema.
Let F be a set of functional dependencies on R.
Let R1 and R2 form a decomposition of R.
The decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies are in F+:
1 R1 ∩ R2 → R1
2 R1 ∩ R2 → R2
The idea behind knowing that sufficient condition is that you just have to show something about the set of shared attributes & some functional dependencies to know the components join to the original and (equivalently) are a lossless decomposition.
Why is this true? Simply put, it ensures that the attributes involved in the natural join (R1 ∩ R2) are a candidate key for at least one of the two relations.

Related

5NF and the significance of trivial join dependency

This the definition of 5NF from Navathe Book of Fundamentals of Database Systems, 6th Edition.
A relation schema R is in fifth normal form (5NF) (or project-join
normal form (PJNF)) with respect to a set F of functional, multivalued, and
join dependencies if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R.
The definition of join dependency is:
A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a nonadditive join decomposition into R1, R2, ...,Rn. Hence, for every such r we have (πR1(r),πR2(r), ..., πRn(r)) = r
This is where I have a problem understanding:
What does "for every nontrivial join dependency JD(R1,R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R." mean?
This is my attempt at understanding:
If I have a relation R(A, B, C), suppose superkeys are AB and AC and if I decomposed R into R1(A,C) and R2(A, B), then R1 and R2 are superkeys of R(I might be wrong here). And since I can join R1 and R2 to form R, it means that R has a non-trivial join dependency but since each R i is a super key of R, R is in 5NF.
Thanks to #philipxy for suggesting edits. I have tried to make the question clearer. Also, if that definition of 5NF is wrong, where can I get the right definitions?

Dependency preservation, based of original functional dependencies or canonical cover?

Given these functional dependencies for
R: {A,B,C,D,E,F}
AC->EF
E->CD
C->ADEF
BDF->ACD
I got this as the canonical cover:
E->C
C->ADEF
BF->C
And then broke it down to Boyce Codd Normal Form:
Relation 1: {C,A,D,E,F}
Relation 2: {B,F,C}
I figured that this is lossless and dependency preserving? But is this true, since from the original functional dependencies BDF->ACD is no longer in any of my relations. But if I go from my calculated canonical cover then all my functional dependencies are preserved.
So that question is: Is this decomposition to BCNF dependency preserving?
A decomposition preserves the dependencies if and only if the union of the projection of the dependencies on the decomposed relations is a cover of the dependencies of the relation.
So, to know if a decomposition preserves or not the dependencies it is not sufficient to check if the dependencies of a particular cover have been preserved or not (for instance by looking if some decomposed relation has all the attributes of the dependency). For instance, in a relation R(ABC) with a cover F = {A→B, B→C, C→A} one could think that in the decomposition R1(AB) and R2(BC) the dependency C→A is not preserved. But if you project F on AB you obtain A→B, B→A, projecting it on BC you obtain B→C, C→B, so from their union you can derive also C→A.
The check is not simple, even if there exists polynomial algorithms that perform this task (for instance, one is described in J. Ullman, Principles of Database Systems, Computer Science Press, 1983).
Assuming the dependencies that you have given form a cover of the dependencies of the relation, the canonical cover that you have found is incorrect. In fact BF -> C cannot be derived from the original dependencies.
For this reason, your decomposition is not correct, since R2(BCF) is not in BCNF (actually, it is not in 2NF).
One possible canonical cover of R is the following:
BDF → C
C → A
C → E
C → F
E → C
E → D
Following the analysis algorithm, there are two possible decompositions in BCNF (according to the dependencies chosen for elimination). One is:
R1 = (ACDEF)
R2 = (BC)
while the other is:
R1 = (ACDEF)
R3 = (BE)
(note that BC and BE are candidate keys of the original relation, together with BDF).
A cover of the dependencies in R1 is:
C → A
C → E
C → F
E → C
E → D
while both in R2 and R3 no non-trivial dependencies hold.
From this, we can conclude that both decompositions do not preserve the dependencies; for instance the following dependency (and all those derived from it) cannot be obtained:
BDF → C

Lossy decompositions

If a relation is decomposed into 2 sub relations such that the decomposition is lossy then can these sub-relations be in any normal form (3nf or bcnf) if the parent relation is in bcnf ?
Consider a relation R(S,T,U,V) with following functional dependencies:-
S->T , T->U, U->V, V->S.
Now if I decompose the above relation into 2 relations R1 and R2 such that R1 intersection R2 is null like R1(S,T) and R2(U,V) , then is the decomposition in bcnf?
I know that R1 can have functional dependencies S->T, T->S and R2 can have functional dependencies U->V and V->U which makes it look like BCNF.
My question was do we consider the decomposition as BCNF even though it's not a valid decomposition ? By not valid I mean lossy decomposition.
From your (unclear) question:
If a relation is decomposed into 2 sub relations such that the decomposition is lossy
From a comment:
I just asked if there is a possibility that the decomposed relations be in bcnf if the parent relation is in bcnf
If that's your question then the answer is yes, there is.
Consider a variable with CK (candidate key) {a} in BCNF that can hold this:
a b
1 2
3 4
Binary decomposition {{a},{b}} is lossy with components in BCNF.
(When trying to prove something wrong always check out some simple cases in case you can find a counterexample as proof.)
do we consider the decomposition as BCNF even though it's not a valid decomposition ? By not valid I mean lossy decomposition.
When we say that a decomposition is in a certain NF, that's short for saying that all its components are in that NF. But we only ever use this shorthand when the decomposition is lossless, ie when "decomposition" is understood to be short for "lossless decomposition", because lossy decompositions are not useful.

4NF, Multivalued Dependencies without Functional Dependencies

Sorry for asking a question one might consider a basic one)
Suppose we have a relation R(A,B,C,D,E) with multivalued dependencies:
A->>B
B->>D.
Relation R doesn't have any functional dependencies.
Next, suppose we decompose R into 4NF.
My considerations:
Since we don't have any functional dependencies, the only key is all attributes (A,B,C,D,E). There are two ways we can decompose our relation R:
R1(A,B) R2(A,C,D,E)
R3(B,D) R4(A,B,C,E)
My question is - are these 2 decompositions final? Looks like they are since there are no nontrivial multivalued dependencies left. Or am I missing something?
Relation R doesn't have any functional dependencies.
You mean, non-trivial FDs (functional dependencies). (There must always be trivial FDs.)
Assuming that the MVDs (multivalued dependencies) holding in R are those in the transitive closure of {A ↠ B, B ↠ D}:
In 1 R1(A,B) R2(A,C,D,E), we can reconstruct R as R1 JOIN R2 and both R1 & R2 are in 4NF and their join will satisfy A ↠ B. If some component contained all the attributes of the other MVD then we could further decompose it per that MVD. And we would know that, given some alleged values for all components, their alleged reconstruction of R by joining would satisfy both MVDs. But here there is no such component. So we can't further decompose. And we know that an alleged reconstruction of R by joining satisfies A ↠ B but we would still have to check whether B ↠ D. We say that the MVD B ↠ D is "not preserved" and the decomposition to R1 & R2 "does not preserve MVDs".
In 2 R3(B,D) R4(A,B,C,E), we can reconstruct R as R3 JOIN R4 and both R3 & R4 are in 4NF and the join will satisfy B ↠ D. Now some component contains all the attributes of the other MVD so we can further decompose it per that MVD. And we know that, given some alleged values for all components, their alleged reconstruction of R by joining satisfies both MVDs. That component is R4, which we can further decompose, reconstructing as AB JOIN ACE. And we know that an alleged reconstruction of R by joining satisfies both A ↠ B & B ↠ D. Because the MVDs in the original appear in a component, we say these decompositions "preserve MVDs".
PS 1 The 4NF decomposition must be to three components
MVDs always come in pairs. Suppose MVD X ↠ Y holds in a relation with attributes S, normalized to components XY & X(S-Y). Notice that S-XY is the set of non-X non-Y attributes, and X(S-Y) = X(S-XY). Then there is also an MVD X ↠ S-XY, normalized to components X(S-XY) & X(S-(S-XY)), ie X(S-XY) & XY, ie X(S-Y) & XY. Why? Notice that both MVDs give the same component pair. Ie both MVDs describe the same condition, that S = XY JOIN X(S-XY). So when an MVD holds, that partner holds too. We can write the condition expressed by each of the MVDs using the special explicit & symmetrical notation X ↠ Y | S-XY.
We say a JD (join dependency) of some components of S holds if and only if they join to S. So if S = XY JOIN X(S-Y) = XY JOIN X(S-XY) then the JD *{XY, X(S-XY)} holds. Ie the condition that both MVDs describe is that JD. So a certain MVD and a certain binary JD correspond. That's one way of seeing why normalizing an MVD away involves a 2-way join and why MVDs come in pairs. The JDs that cause a 4NF relation to not be in 5NF are those that do not correspond to MVDs.
Your example involves two MVDs that aren't partners & neither otherwise holds as a consequence of the other, so you know that the final form of a lossless decomposition will involve two joins, one for each MVD pair.
PS 2 Ambiguity of "Suppose we have a relation with these multi-valued dependencies"
When decomposing per FDs (functional dependencies) we are usually given a canonical/minimal cover for the relation, ie a set in a certain form whose transitive closure under Armstrong's axioms (set of FDs that must consequently hold) holds all the FDs in the relation. This is frequently forgotten when we are told that some FDs hold. We must either be given a canonical/minimal cover for the relation or be given an arbitrary set and be told that the FDs that hold in the relation are the ones in its transitive closure. If we're just given a set of FDs that hold, we know that the ones in its transitive closure hold, but there might be others. So in general we can't normalize.
Here you give some MVDs that hold. But they aren't the only ones, because each has a partner. Moreover others might (and here do) consequently hold. (Eg X ↠ Y and Y ↠ Z implies X ↠ Z-Y holds.) But you don't say that they form a canonical or minimal cover. One way to get a canonical form for MVDs (a unique one per each transitive closure, hopefully more concise!) would be a minimal cover (one that can't lose any MVDs and still have the same transitive closure) augmented by the partner of each MVD. (Whereas for FDs the standard canonical form is minimal.) You also don't say "the MVDs that hold are those in the transitive closure of these". You just say that those MVDs hold. So maybe some not in the transitive closure do too. So your example can't be solved. We can guess that you probably mean that this is a minimal cover. (It's not canonical.) Or that the MVDs that hold in the relation are those in the transitive closure of the given ones. (Which in this case are then a minimal cover.)
A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X ->> Y, X is a superkey—that is, X is either a candidate key or a superset.
In your first decomposition(1 with R1 and R2) B->>D is not satisfying so it's not dependency preserving decomposition as well as not in 4NF as A is not superkey in 2nd table.
On the other hand,second decomposition(2 with R3 and R4) is dependency preserving and lossless join with B and ACE as primary key in respective tables but it's not in 4NF because A->>B dependency exists in second table and A is not superkey, you have to decompose second table further in to two tables that can be {A B} and {A C E}.
So if I follow your reasoning (suraj3), are R1(A,B) and R2(B,C,D,E) correct decomposition? I think this will preserve the FD B->>D.

Dependency Preserving Decomposition?

I am working on a textbook question and it asks the following:
Let R(A,B,C,D,E) be decomposed into relations with the following three sets of attributes:
{A,B,C} , {B,C,D}, {A,C,E}
For each of the following sets of functional dependencies, determine if the dependencies are preserved by the decomposition.
AC -> E and BC -> D
How do I solve this?
The textbook doesn't provide a clear enough explanation on dependency preserving.
R = {A,B,C,D,E} decomposed into R1 ={A,B,C} , R2 ={B,C,D} and R3 ={A,C,E}.
"determine if the dependencies are preserved by the decomposition."
Yes they are as BC->D is preserved in R2 and AC->E is preserved in R3 as is very apparent!Note - Although a decomposition may be dependency-preserving it is not necessary that it is in a higher normal form.
There is an easy method to check whether a decomposition is dependency-preserving. Check this video.

Resources