I'm having a difficulty to understand how to "work" with join dependencies, and I would like to ask a question that will help me clarify things for myself.
Here's the simple definition from Wikipedia:
A table T is subject to a join dependency if T can always be recreated
by joining multiple tables each having a subset of the attributes of
T.
A trivial join dependency is defined as follows:
If one of the tables in the join has all the attributes of the table
T, the join dependency is called trivial.
My question is: If we decompose a relation R into a lossless decomposition, is it possible that every join dependency of R could be a trivial join dependency?
An example would be awesome.
If we decompose a relation R into a lossless decomposition, is it possible that the join dependency\ies of R would be a trivial join dependency\ies?
If you mean, if we decompose a relation R losslessly is it possible that all the JDs of R are trivial: yes.
Whenever all the JDs of R are trivial, you can decompose it losslessly, because by definition a JD is just a description of a lossless decomposition. And there are such relations. Every R, calling its attribute set S, satisfies the JDs *(S,S), *(S,S,S), etc. Some satisfy no other FDs. Some satisfy others but they're also trivial.
Eg: This R only satisfies *(S,S), *(S,S,S), etc:
x y
1 2
5 2
5 4
Eg: Say S = {x,y} and FD {x}->{y} holds, so *({x},S} holds. But say JD *({x},{y}) doesn't hold. Then the only way a JD can have sets unioning to S is if S is one of them. So R has only trivial JDs. But not just the ones using only S.
x y
1 2
5 2
6 4
If you mean, if we decompose a relation R losslessly into smaller components is it possible that all the JDs of R are trivial: no. Because by definition a trivial JD has one set that is all the attributes of R, ie has one component that is R, it doesn't decompose into components smaller than R.
Related
In R(A,B,C, D),
let the dependencies be
A->B
B->C
C->D
D-> B
the decomposition of R into (A,B), (B,C) and (B,D) is lossless or dependency preserving?
My attempt : (A,B) and (B,C) can be combined lossless-ly because of B->C. However, for (A,B,C) and (B,D), B does not form a key for either. Hence the decomposition in lossy.
Also for dependency preserving, the relation (C-D) can't be gotten from any of the decomposed relations and hence the decomposition is not dependency preserving.
However the answer given is that the decomposition is both lossless and dependency preserving. So where am I wrong?
Also the key for relation R is only {A} isn't it?
You say:
However, for (A,B,C) and (B,D), B does not form a key for either. Hence the decomposition in lossy.
This is wrong, since B is a key for (B, D). We can see this by computing B+ from the original dependencies, assuming that they form a cover of the relation.
B+ = B
B+ = BC (for the dependency B->C)
B+ = BCD (for the dependency C->D)
so, since D is included in B+, we have that B->D can be derived from the original set of dependencies, and in the decomposition (B, D) B is a key (as it is D).
To be dependency preserving we must check that the union of all the projected dependencies of the decomposition is a cover of the original set of dependencies. Since three covers of the decomposed relations are, respectively, {A->B}, {C->B, B->C} and {D->B, B->D}, by uniting those three sets you can easily derive also D->C as well as C->D, so the dependencies are preserved.
Finally, yes {A} is the unique candidate key of the original relation.
This the definition of 5NF from Navathe Book of Fundamentals of Database Systems, 6th Edition.
A relation schema R is in fifth normal form (5NF) (or project-join
normal form (PJNF)) with respect to a set F of functional, multivalued, and
join dependencies if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R.
The definition of join dependency is:
A join dependency (JD), denoted by JD(R1, R2, ..., Rn), specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a nonadditive join decomposition into R1, R2, ...,Rn. Hence, for every such r we have (πR1(r),πR2(r), ..., πRn(r)) = r
This is where I have a problem understanding:
What does "for every nontrivial join dependency JD(R1,R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R." mean?
This is my attempt at understanding:
If I have a relation R(A, B, C), suppose superkeys are AB and AC and if I decomposed R into R1(A,C) and R2(A, B), then R1 and R2 are superkeys of R(I might be wrong here). And since I can join R1 and R2 to form R, it means that R has a non-trivial join dependency but since each R i is a super key of R, R is in 5NF.
Thanks to #philipxy for suggesting edits. I have tried to make the question clearer. Also, if that definition of 5NF is wrong, where can I get the right definitions?
Hey all I have an assignment that says:
Let R(ABCD) be a relation with functional dependencies
A → B, C → D, AD → C, BC → A
Which of the following is a lossless-join decomposition of R into Boyce-Codd Normal Form (BCNF)?
I have been researching and watching videos on youtube and I cannot seem to find how to start this. I think I'm supposed to break it down to subschemas and then fill out a table to find which one is lossless, but I'm having trouble getting started with that. Any help would be appreciated!
Your question
Which of the following is a lossless-join decomposition of R into
Boyce-Codd Normal Form (BCNF)?
suggests that you have a set of options and you have to choose which one of those is a lossless decomposition but since you have not mentioned the options I would first (PART A) decompose the relation into BCNF ( first to 3NF then BCNF ) and then (PART B) illustrate how to check whether this given decomposition is a lossless-join decomposition or not. If you are just interested in knowing how to check whether a given BCNF decomposition is lossless or not jump directly to PART B of my answer.
PART A
To convert a relation R and a set of functional dependencies(FD's) into 3NF you can use Bernstein's Synthesis. To apply Bernstein's Synthesis -
First we make sure the given set of FD's is a minimal cover
Second we take each FD and make it its own sub-schema.
Third we try to combine those sub-schemas
For example in your case:
R = {A,B,C,D}
FD's = {A->B,C->D,AD->C,BC->A}
First we check whether the FD's is a minimal cover (singleton right-hand side , no extraneous left-hand side attribute, no redundant FD)
Singleton RHS: All the given FD's already have singleton RHS.
No extraneous LHS attribute: None of the FD's have extraneous LHS attribute that needs to e removed.
No redundant FD's: There is no redundant FD.
Hence the given set of FD's is already a minimal cover.
Second we make each FD its own sub-schema. So now we have - (the keys for each relation are in bold)
R1={A,D,C}
R2={B,C,A}
R3={C,D}
R4={A,B}
Third we see if any of the sub-schemas can be combined. We see that R1 and R2 already have all the attributes of R and hence R3 and R4 can be omitted. So now we have -
S1 = {A,D,C}
S2 = {B,C,A}
This is in 3NF. Now to check for BCNF we check if any of these relations (S1,S2) violate the conditions of BCNF (i.e. for every functional dependency X->Y the left hand side (X) has to be a superkey) . In this case none of these violate BCNF and hence it is also decomposed to BCNF.
PART B
When you apply Bernstein Synthesis as above to decompose R the decomposition is always dependency preserving. Now the question is, is the decomposition lossless? To check that we can follow the following method :
Create a table as shown in figure 1, with number of rows equal to the number of decomposed relations and number of column equal to the number of attributes in our original given R.
We put a in all the attributes that our present in the respective decomposed relation as in figure 1. Now we go through all the FD's {C->D,A->B,AD->C,BC->A} one by one and add a whenever possible. For example, first FD is C->D. Since both the rows in column C has a and there is an empty slot in second row of column D we put a a there as shown in the right part of the image. We stop as soon as one of the rows is completely filled with a which indicates that it is a lossless decomposition. If we go through all the FD's and none of the rows of our table get completely filled with a then it is a lossy decomposition.
Also, note if it is a lossy decomposition we can always make it lossless by adding one more relation to our set of decomposed relations consisting of all attributes of the primary key.
I suggest you see this video for more examples of this method. Also other way to check for lossless join decomposition which involves relational algebra.
Sorry for asking a question one might consider a basic one)
Suppose we have a relation R(A,B,C,D,E) with multivalued dependencies:
A->>B
B->>D.
Relation R doesn't have any functional dependencies.
Next, suppose we decompose R into 4NF.
My considerations:
Since we don't have any functional dependencies, the only key is all attributes (A,B,C,D,E). There are two ways we can decompose our relation R:
R1(A,B) R2(A,C,D,E)
R3(B,D) R4(A,B,C,E)
My question is - are these 2 decompositions final? Looks like they are since there are no nontrivial multivalued dependencies left. Or am I missing something?
Relation R doesn't have any functional dependencies.
You mean, non-trivial FDs (functional dependencies). (There must always be trivial FDs.)
Assuming that the MVDs (multivalued dependencies) holding in R are those in the transitive closure of {A ↠ B, B ↠ D}:
In 1 R1(A,B) R2(A,C,D,E), we can reconstruct R as R1 JOIN R2 and both R1 & R2 are in 4NF and their join will satisfy A ↠ B. If some component contained all the attributes of the other MVD then we could further decompose it per that MVD. And we would know that, given some alleged values for all components, their alleged reconstruction of R by joining would satisfy both MVDs. But here there is no such component. So we can't further decompose. And we know that an alleged reconstruction of R by joining satisfies A ↠ B but we would still have to check whether B ↠ D. We say that the MVD B ↠ D is "not preserved" and the decomposition to R1 & R2 "does not preserve MVDs".
In 2 R3(B,D) R4(A,B,C,E), we can reconstruct R as R3 JOIN R4 and both R3 & R4 are in 4NF and the join will satisfy B ↠ D. Now some component contains all the attributes of the other MVD so we can further decompose it per that MVD. And we know that, given some alleged values for all components, their alleged reconstruction of R by joining satisfies both MVDs. That component is R4, which we can further decompose, reconstructing as AB JOIN ACE. And we know that an alleged reconstruction of R by joining satisfies both A ↠ B & B ↠ D. Because the MVDs in the original appear in a component, we say these decompositions "preserve MVDs".
PS 1 The 4NF decomposition must be to three components
MVDs always come in pairs. Suppose MVD X ↠ Y holds in a relation with attributes S, normalized to components XY & X(S-Y). Notice that S-XY is the set of non-X non-Y attributes, and X(S-Y) = X(S-XY). Then there is also an MVD X ↠ S-XY, normalized to components X(S-XY) & X(S-(S-XY)), ie X(S-XY) & XY, ie X(S-Y) & XY. Why? Notice that both MVDs give the same component pair. Ie both MVDs describe the same condition, that S = XY JOIN X(S-XY). So when an MVD holds, that partner holds too. We can write the condition expressed by each of the MVDs using the special explicit & symmetrical notation X ↠ Y | S-XY.
We say a JD (join dependency) of some components of S holds if and only if they join to S. So if S = XY JOIN X(S-Y) = XY JOIN X(S-XY) then the JD *{XY, X(S-XY)} holds. Ie the condition that both MVDs describe is that JD. So a certain MVD and a certain binary JD correspond. That's one way of seeing why normalizing an MVD away involves a 2-way join and why MVDs come in pairs. The JDs that cause a 4NF relation to not be in 5NF are those that do not correspond to MVDs.
Your example involves two MVDs that aren't partners & neither otherwise holds as a consequence of the other, so you know that the final form of a lossless decomposition will involve two joins, one for each MVD pair.
PS 2 Ambiguity of "Suppose we have a relation with these multi-valued dependencies"
When decomposing per FDs (functional dependencies) we are usually given a canonical/minimal cover for the relation, ie a set in a certain form whose transitive closure under Armstrong's axioms (set of FDs that must consequently hold) holds all the FDs in the relation. This is frequently forgotten when we are told that some FDs hold. We must either be given a canonical/minimal cover for the relation or be given an arbitrary set and be told that the FDs that hold in the relation are the ones in its transitive closure. If we're just given a set of FDs that hold, we know that the ones in its transitive closure hold, but there might be others. So in general we can't normalize.
Here you give some MVDs that hold. But they aren't the only ones, because each has a partner. Moreover others might (and here do) consequently hold. (Eg X ↠ Y and Y ↠ Z implies X ↠ Z-Y holds.) But you don't say that they form a canonical or minimal cover. One way to get a canonical form for MVDs (a unique one per each transitive closure, hopefully more concise!) would be a minimal cover (one that can't lose any MVDs and still have the same transitive closure) augmented by the partner of each MVD. (Whereas for FDs the standard canonical form is minimal.) You also don't say "the MVDs that hold are those in the transitive closure of these". You just say that those MVDs hold. So maybe some not in the transitive closure do too. So your example can't be solved. We can guess that you probably mean that this is a minimal cover. (It's not canonical.) Or that the MVDs that hold in the relation are those in the transitive closure of the given ones. (Which in this case are then a minimal cover.)
A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X ->> Y, X is a superkey—that is, X is either a candidate key or a superset.
In your first decomposition(1 with R1 and R2) B->>D is not satisfying so it's not dependency preserving decomposition as well as not in 4NF as A is not superkey in 2nd table.
On the other hand,second decomposition(2 with R3 and R4) is dependency preserving and lossless join with B and ACE as primary key in respective tables but it's not in 4NF because A->>B dependency exists in second table and A is not superkey, you have to decompose second table further in to two tables that can be {A B} and {A C E}.
So if I follow your reasoning (suraj3), are R1(A,B) and R2(B,C,D,E) correct decomposition? I think this will preserve the FD B->>D.
I am struggling with the concepts of 4NF and Multivalued Dependencies (MVDs).
I am looking at a supplementary book for the course I am currently taking and one of the examples is below.
The book states the asterisks refer to a unique key or a composite attribute key.
Given: R(A*,B,C*) and the set {(A,B):R,(B,C):R} satisfies the
lossless decomposition property.
Does the multivalued dependency B->>C hold?
Is B definitely a unique key?
Is R in 4NF?
I understand lossless decomposition - if you take the natural join of the two sets above - you are given the original dataset i.e in this case A,B,C.
But I just cannot grasp how to take the given information and prove/confirm that B->>C holds or if it does not.
I emailed my professor about my confusion and he just told me to look over his notes (which I've obviously done numerous times) and it's gotten me nowhere.
Does the multivalued dependency B->>C hold?
You have been told some things about MVDs. One of them is probably:
A decomposition of R into (X, Y) and (X, R − Y) is a lossless-join
decomposition if and only if X ->> Y holds in R.
In your case R is {A,B,C}, X is {B}, Y is {A} and R - Y = {A,B,C} - {A} = {B,C}. So a decomposition of R into (B,A) and (B,C) is a lossless-join decomposition if and only if {B,A} ->> {B,C} holds in R. But we are given that decomposition of R into (B,A) and (B,C) is a lossless-join decomposition. So {B,A} ->> {B,C} holds in R.
Is B definitely a unique key?
I can't make sense of that.
Maybe you are trying to say that we are given that {A,C} is a CK (candidate key) of R but that there might be other CKs and you are trying to ask whether the decompositionality means that {B} must also be a CK of R. Let's look for a counterexample. Pick a simplest example. Suppose R is {(a,b,c1),(a,b,c2)} = {(a,b)} JOIN {(b,c1),(b,c2)}. This agrees with R CK {A,C} & R MVD {B,A} ->> {B,C}. But b appears with c1 and c2 so {B} does not functionally determine all other attributes so {B} is not a CK of R. So that CK and that MVD do not force that {B} is a CK.
Is R in 4NF?
You have been told some things about 4NF. One is probably that:
A Table is in 4NF if and only if, for every one of its non-trivial
multivalued dependencies X ->> Y, X is a superkey
The MVD {B,A} ->> {B,C} is non-trivial. But to show whether R must be in 4NF or must not be in 4NF or that we can't tell, you are going to have to address possible sets of non-trivial MVDs that could hold in R and CKs that R could have.