Is this Lossy Join and Dependency Preserving - database

I am reading this topic Functional dependency and Normalization in Database Management Subject. I came across this example.
Relation R(A,B,C,D) Which one is Lossy join but Dependency Preserving BCNF Decomposition?
a. A ->B, B -> CD
b. A -> B, B -> C, C->D
c. AB -> C, C -> AD
d. A -> BCD
Now answer given is option C.
How can option C. be a lossy decomposition. if you do ABC union CAD = ABCD This satisfies first condition.
if we do ABC intersection CAD = AC which is perfectly fine, since in AC, C is key for (CAD) C -> AD decomposition. which also satisfies the second condition. Am i making any mistake in understanding this concept.

Usually for a Normalisation/decomposition exercise, you are given:
The full relation and its attributes. [yes: R(A, B, C, D)]
The Functional dependencies. [yes? it looks like a., b., c., d. are possible sets of Fun Deps.]
The proposed decomposition. [Often named R1, R2, etc. I don't see those. I can't interpret option d. to be proposing a decomposition.]
Perhaps your post has missed out part of the exercise? Perhaps the exercise wants you to decide which decomp preserves the dependencies in BCNF? (But results in a lossy join.)
[editted in response to Nikhil's comment] Note that the list of FD's alone doesn't amount to a decomposition: the FD C -> AD is short-hand for C -> A, C -> D. Does that mean two decomposing relations? No, because A and C are already in the FD AB -> C. So we have R1= (A, B, C), R2 = (C, D). But I don't know if that is what the exercise is asking. Think about it. What does option d. mean in terms of decompositions?
Perhaps the exercise is asking (for example): given a proposed decomposition into R1 = (A, B) and R2 = (B, C, D), which of the sets of FD's would give a lossy decomposition?
There's a worked example here: http://en.wikipedia.org/wiki/Lossless-Join_Decomposition.
It points to a previous q Lossless Join Property.
And there's further references.
By the way, options a., b., include the same Fun Deps as option d., by the transitivity of dependencies (Armstrong's Axioms http://en.wikipedia.org/wiki/Armstrong%27s_axioms see also http://en.wikipedia.org/wiki/Heath%27s_theorem). This is a clue.

Related

Dependency preservation, based of original functional dependencies or canonical cover?

Given these functional dependencies for
R: {A,B,C,D,E,F}
AC->EF
E->CD
C->ADEF
BDF->ACD
I got this as the canonical cover:
E->C
C->ADEF
BF->C
And then broke it down to Boyce Codd Normal Form:
Relation 1: {C,A,D,E,F}
Relation 2: {B,F,C}
I figured that this is lossless and dependency preserving? But is this true, since from the original functional dependencies BDF->ACD is no longer in any of my relations. But if I go from my calculated canonical cover then all my functional dependencies are preserved.
So that question is: Is this decomposition to BCNF dependency preserving?
A decomposition preserves the dependencies if and only if the union of the projection of the dependencies on the decomposed relations is a cover of the dependencies of the relation.
So, to know if a decomposition preserves or not the dependencies it is not sufficient to check if the dependencies of a particular cover have been preserved or not (for instance by looking if some decomposed relation has all the attributes of the dependency). For instance, in a relation R(ABC) with a cover F = {A→B, B→C, C→A} one could think that in the decomposition R1(AB) and R2(BC) the dependency C→A is not preserved. But if you project F on AB you obtain A→B, B→A, projecting it on BC you obtain B→C, C→B, so from their union you can derive also C→A.
The check is not simple, even if there exists polynomial algorithms that perform this task (for instance, one is described in J. Ullman, Principles of Database Systems, Computer Science Press, 1983).
Assuming the dependencies that you have given form a cover of the dependencies of the relation, the canonical cover that you have found is incorrect. In fact BF -> C cannot be derived from the original dependencies.
For this reason, your decomposition is not correct, since R2(BCF) is not in BCNF (actually, it is not in 2NF).
One possible canonical cover of R is the following:
BDF → C
C → A
C → E
C → F
E → C
E → D
Following the analysis algorithm, there are two possible decompositions in BCNF (according to the dependencies chosen for elimination). One is:
R1 = (ACDEF)
R2 = (BC)
while the other is:
R1 = (ACDEF)
R3 = (BE)
(note that BC and BE are candidate keys of the original relation, together with BDF).
A cover of the dependencies in R1 is:
C → A
C → E
C → F
E → C
E → D
while both in R2 and R3 no non-trivial dependencies hold.
From this, we can conclude that both decompositions do not preserve the dependencies; for instance the following dependency (and all those derived from it) cannot be obtained:
BDF → C

How to identify the corrects step to complete 3NF?

This is an example from a textbook:
Consider the relation R (A ,B ,C ,D ,E ) with FD’s AB -> C,
C -> B, and A -> D.
We get that the key is ABE and ACE. With decompositions: ABE+=ACE+=ABCDE.
How do you check minimality? I know that AB+=ABD and the textbook says that because AB+ does not include C. Then it is minimal. C+=AB and A+=AD are also minimal. But I do not know why. How do you check minimality?
Also, do we have to find all the FD's besides the ones given to check whether to perform 3-NF or not?
We then check if AB -> C can be split into A -> C and B -> C, we notice that these do not stand on their own so AB -> C is not splittable.
We are left with the final relations: S1(ABC), S2(BC), S3(AD) and the key (since not present) S4(ABE) (or S4(ABC)). We then remove S2 because it's a subset of S1.
If it is in 3NF and there are no violations, then why do they split the original relation into: S1(A, B, C), S2(A, D), and S4(A, B, E).
Book name and page: Ullman's Database Systems page 103
How do you check minimality?
The authors don't use the word minimality here. To check for the minimal basis, follow the procedure in the first two paragraphs of example 3.27. It boils down to
". . . verify that we cannot eliminate any of the given dependencies."
". . . verify that we cannot eliminate any attributes from a left side."
Also, do we have to find all the FD's besides the ones given to check whether to perform 3-NF or not?
That question doesn't really make sense. 3NF isn't something you perform. The example in the textbook has to do with the synthesis algorithm for 3NF schemas. The synthesis algorithm decomposes a relation R into relations that are all in at least 3NF.
The synthesis algorithm operates on the FDs you've been given. In an academic setting, as you might find in a textbook, the assumption is that you've been given enough information to solve the problem. In real-world applications, you might be given a set of FDs from a business analyst. Don't assume the analyst has given you enough information; look for more FDs.
We then check if AB -> C can be split into A -> C and B -> C, we notice that these do not stand on their own so AB -> C is not splittable.
No. You verify (not notice) that you can't eliminate any attributes from a left side. Eliminating A leaves B->C; eliminating B leaves A->C. Neither of these are implied by the three original FDs. So you can't eliminate any attributes from a left side.
If [the original relation] is in 3NF and there are no violations . . .
The original relation is not in 3NF. It's not even in 2NF. (A->D)

Decomposing into 2NF

I'm currently studying normalization. I know how to normalize the data for given unnormalized list.
But this one is little bit confusing me
Q. Decompose R{a,b,c,d,e,f} into 2NF using following functional dependencies.
a -> b,c,d,e,f
b,c -> a,d,e,f
b -> f
d -> e
For this my answer is:
R0 = a - > b,c
R1 = b,c - > a,d,e
R2 = b - > f
Can anyone help me with this?
There are two main issues:
There is a lot of redundancy in your FDs set. You'll often save yourself some time if you compute the minimal cover first.
The way you split the relation doesn't make sense, regardless of the normalization level. The candidate keys for this relation are A and BC; but in your answer you have all the keys together in R0 and nothing else, which is redundant (one key per table is enough) and useless (think about it, there is nothing you can query for in such a table!); and again you put all the keys together in R1, which is redundant as well.
A better way to decompose that relation would be
R1(B, F), R2(D, E), R3(A, B, C, D)
which satisfies both 2NF and 3NF.
BTW you should check out this Stanford course, it is really useful to understand normalization. Wikipedia pages are also well written.
EDIT to answer your question on the comments: a functional dependency means that the values on the RHS are determined by the values on the LHS. In this case we have
A -> BC
BC -> A
if you replace the letters with something more intuitive this is equivalent to:
post_id -> { post_title, post_date }
{ post_title, post_date } -> post_id
That is, if you know the post_id you can figure out both the post_title and the post_date; at the same time, if you know both the post_title and the post_date you can track back the post_id. This is the meaning of a circular dependency.
That said, in every relation all the FDs should be preserved so in R3 both BC -> D and A -> D hold, but you don't need ABC -> D which is not in your FDs set and it is clearly redundant. As a side, A -> D is redundant as well since you already have A -> BC, BC -> D. This is why I mentioned to compute the minimal cover first.

Database Relational Homework help

The Problem "Consider a relation R with five attributes ABCDE. You are given the following dependancies
A->B
BC->E
ED->A
List all the keys for R.
The teacher gave us the keys, Which are ACD,BCD,CDE
And we need to show the work to get to them.
The First two I solved.
For BCD, the transitive of 2 with 3 to get (BC->E)D->A => BCD->A.
and for ACD id the the transitive of 1 with 4 (BCD), to get (A->B)CD->A => ACD->A
But I can't figure out how to get CDE.
So it seems I did it wrong, after googling I found this answer
methodology to find keys:
consider attribute sets α containing: a. the determinant attributes of F (i.e. A, BC,
ED) and b. the attributes NOT contained in the determined ones (i.e. C,D). Then
do the attribute closure algorithm:
if α+ superset R then α -> R
Three keys: CDE, ACD, BCD
Source
From what I can tell, since C,D are not on the left side of the dependencies. The keys are left sides with CD pre-appended to them. Can anyone explain this to me in better detail as to why?
To get they keys, you start with one of the dependencies and using inference to extend the set.
Let me have a go with simple English, you can find formal definition the net easily.
e.g. start with 3).
ED -> A
(knowing E and D, I know A)
ED ->AB
(knowing E and D, I know A, by knowing A, I know B as well)
ED->AB
Still, C cannot be known, and I have used all the rules now except BC->E,
So I add C to the left hand side, i.e.
CDE ->AB
so, by knowing C,D and E, you will know A and B as well,
Hence CDE is a key for your relation ABCDE. You repeat the same process, starting with other rules until exhausted.

Question about relation normalization

Let's consider, for instance, the following relation:
R (A,B,C,D,E,F)
where the bold denotes that it is a primary key attribute
with
F = {AB->DE, D->E}
Now, this looks to be in the first normal form. It can't be on the third normal form as I have a transitive dependency and it cannot be in the second form as not all non-key attributes depend on the whole primary key.
So my questions are:
I don't know what to make of F and C. I don't have any functional dependency info on them! F doesn't depend on anything? If that is the case, I can't think of any solution to get R into the 2nd normal form without taking it out!
What about C? C also suffers from the problem of not being referred on the functional dependencies list. What to do about it?
My attempt to get R into the 2nd normal form would be something like:
R(A,B,D)
R' (D,E)
but as stated earlier, I don't have a clue of what to do of C and F. Are they redundant so I simply take them out and the above attempt is all I have to do to get it into the 2nd form (and 3rd!)?
Thanks
Given the definition of R that { A, B, C } is the primary key, then there is inherently a functional dependency:
ABC → ABCDEF
That says that the values of A, B and C inherently determine or control the values of D, E and F as well as the trivial fact that they determine their own values.
You have a few additional dependencies, identified by the set F (which is distinct from the attribute F - the notation is not very felicitous, and could be causing confusion*):
AB → DE
D → E
As you rightly diagnose, the system is in 1NF (because 1NF really means "it is a table"). It is not in 2NF or 3NF or BCNF etc because of the transitive dependency and because some of the attributes only depend on part of the key.
You are right that you will end up with the following two relations as part of your decomposition:
R1(D, E)
R2(A, B, D)
You also need the third relation:
R3(A, B, C, F)
From these, you can recreate the original relation R using joins. The set of relations { R1, R2, R3 } is a non-loss decomposition of the original relation R.
* If the F identifying the set of subsidiary functional dependencies is intended to be the same as the attribute F, then there is something very weird about the definition of that attribute. I'd need to see sample data for the relation R to have a chance of knowing how to interpret it.
I think the primary key of R is set wrong. If F isn't functionally related to anything it has to be a part of the key
So you have R( ABCF DE) which is now in the first normal form (with F = {AB->DE, D->E}) Now you can change it to the second normal form. DE isn't dependant on the whole key (partial dependency) so you put it in another relation to get to second normal form:
R( ABCF ) F = {}
R1( #AB DE) F = {AB->DE}
Now this relation doesn't have any transitive dependencies so it is already in third normal form.
F doesn't depend on anything?
No, you just haven't been given any explicit information about it in the form
{something -> F}
And essentially the same can be said for C. You're expected to infer the other dependencies by applying Armstrong's axioms. (Probably.)
Think about how to finish this:
Given R (A,B,C,D,E,F)
{ABC -> ?}
[Later . . . I see that Jonathan Leffler has broken the suspense, so I'll just finish this.]
{ABC -> DEF} (By definition) therefore,
{ABC -> F} (By decomposition. Here's where F and C come in. And this is your third relation. ).

Resources