I'm currently studying normalization. I know how to normalize the data for given unnormalized list.
But this one is little bit confusing me
Q. Decompose R{a,b,c,d,e,f} into 2NF using following functional dependencies.
a -> b,c,d,e,f
b,c -> a,d,e,f
b -> f
d -> e
For this my answer is:
R0 = a - > b,c
R1 = b,c - > a,d,e
R2 = b - > f
Can anyone help me with this?
There are two main issues:
There is a lot of redundancy in your FDs set. You'll often save yourself some time if you compute the minimal cover first.
The way you split the relation doesn't make sense, regardless of the normalization level. The candidate keys for this relation are A and BC; but in your answer you have all the keys together in R0 and nothing else, which is redundant (one key per table is enough) and useless (think about it, there is nothing you can query for in such a table!); and again you put all the keys together in R1, which is redundant as well.
A better way to decompose that relation would be
R1(B, F), R2(D, E), R3(A, B, C, D)
which satisfies both 2NF and 3NF.
BTW you should check out this Stanford course, it is really useful to understand normalization. Wikipedia pages are also well written.
EDIT to answer your question on the comments: a functional dependency means that the values on the RHS are determined by the values on the LHS. In this case we have
A -> BC
BC -> A
if you replace the letters with something more intuitive this is equivalent to:
post_id -> { post_title, post_date }
{ post_title, post_date } -> post_id
That is, if you know the post_id you can figure out both the post_title and the post_date; at the same time, if you know both the post_title and the post_date you can track back the post_id. This is the meaning of a circular dependency.
That said, in every relation all the FDs should be preserved so in R3 both BC -> D and A -> D hold, but you don't need ABC -> D which is not in your FDs set and it is clearly redundant. As a side, A -> D is redundant as well since you already have A -> BC, BC -> D. This is why I mentioned to compute the minimal cover first.
Related
This is an example from a textbook:
Consider the relation R (A ,B ,C ,D ,E ) with FDâs AB -> C,
C -> B, and A -> D.
We get that the key is ABE and ACE. With decompositions: ABE+=ACE+=ABCDE.
How do you check minimality? I know that AB+=ABD and the textbook says that because AB+ does not include C. Then it is minimal. C+=AB and A+=AD are also minimal. But I do not know why. How do you check minimality?
Also, do we have to find all the FD's besides the ones given to check whether to perform 3-NF or not?
We then check if AB -> C can be split into A -> C and B -> C, we notice that these do not stand on their own so AB -> C is not splittable.
We are left with the final relations: S1(ABC), S2(BC), S3(AD) and the key (since not present) S4(ABE) (or S4(ABC)). We then remove S2 because it's a subset of S1.
If it is in 3NF and there are no violations, then why do they split the original relation into: S1(A, B, C), S2(A, D), and S4(A, B, E).
Book name and page: Ullman's Database Systems page 103
How do you check minimality?
The authors don't use the word minimality here. To check for the minimal basis, follow the procedure in the first two paragraphs of example 3.27. It boils down to
". . . verify that we cannot eliminate any of the given dependencies."
". . . verify that we cannot eliminate any attributes from a left side."
Also, do we have to find all the FD's besides the ones given to check whether to perform 3-NF or not?
That question doesn't really make sense. 3NF isn't something you perform. The example in the textbook has to do with the synthesis algorithm for 3NF schemas. The synthesis algorithm decomposes a relation R into relations that are all in at least 3NF.
The synthesis algorithm operates on the FDs you've been given. In an academic setting, as you might find in a textbook, the assumption is that you've been given enough information to solve the problem. In real-world applications, you might be given a set of FDs from a business analyst. Don't assume the analyst has given you enough information; look for more FDs.
We then check if AB -> C can be split into A -> C and B -> C, we notice that these do not stand on their own so AB -> C is not splittable.
No. You verify (not notice) that you can't eliminate any attributes from a left side. Eliminating A leaves B->C; eliminating B leaves A->C. Neither of these are implied by the three original FDs. So you can't eliminate any attributes from a left side.
If [the original relation] is in 3NF and there are no violations . . .
The original relation is not in 3NF. It's not even in 2NF. (A->D)
Consider R(A,B,C,D,E)
F = {BC->AE, A->D, D->C, ABD->E}.
I need to find all candidate key of the schema.
I know that BA,BC,BD are the keys, but i want to know how do discover them.
I saw some answers in candidate keys from functional dependencies = but i didn't fully understand them.
form what they suggest, I got L={B}, M={A,C,D}, R={E}
Now i need to add from M one at a time to L.
I start with A, i get BA. So BA->A, BA->B (trivial) and because A->D so BA->D and because D->C we get BA->C.
But, how we get E?
adapting the answer from https://stackoverflow.com/a/14595217/3591273
Since we have the functional dependencies: BC->AE, A->D, D->C, ABD->E, we have the following superkeys:
ABCDE (All attributes is always a super key)
ABCD (We can get attribute E through ABD -> E)
ABC (Just add D through A -> D)
ABD (Just add C through D -> C)
AB (We can get D through A -> D, and then we can get C through D -> C)
BC (We can get E through BC -> E, and then we can get C through D -> C)
BD (We can get C through D -> C, and then we can get AE through BC -> AE)
(One trick here to realize, is that since B never appears on the right side of a functional dependency, every key must include B, ie key B is independent and cannot be derived from other keys)
Now that we have all our super keys, we can see that only the last
three are candidate keys. Since the first four can all be trimmed
down. But we cannot take any attributes away from the last three
superkeys and still have them remain a superkey.
so the minimal keys are AB, BC, BD
update
this was a reduction approach, i.e succesively reduce the trivial superkey by use of functional dependencies, but one can take the opposite road and use an augment approach, i.e start with single trivial keys and augment them with other keys wrt dependency relations untill keys become superflous
I am studying for my databases exam and I've realized my professor did not teach a section of the normalization lecture notes, but glossed over them so I've been self studying and there is this example without solutions in the notes and I was wondering if I have been doing it right:
Given Relation R = {A,B,C,D,E,F,G,H,I,J}
And functional dependencies:
A,B -> C
A -> D,E
B -> F
F -> G,H
D -> I,J
Determine the primary key
Decompose R so it is in 2NF then show it in 3NF.
So, I got the primary key to be (A, B, D, F)
And then I tried to convert it to 2NF and I got relations:
(ABC), (DIJ), (ADE), (BF), (FGH)
And I honestly have no idea if this is right or how to then put it in 3NF... or if I've just skipped 2NF and already put it in 3NF. Any help?
It appears to me that you have skipped the NF2 and normalised the relation straight into the 3NF :)
The primary key for the original relation should be (A,B) as by inference rules (transitivity, such as A->D,E and D->I,J therefore A->I,J) it determines all other attributes. From this point onwards we have that:
FD1: A,B -> C
FD2: A -> D,E (Partial)
FD3: B -> F (Partial)
FD4: F -> G,H
FD5: D -> I,J
2NF (No partial dependencies allowed)
Now we can decompose the relation in three relations moving partial FDs to separate relations but preserving other FDs which might depend on those partial FDs, such as FD2 and FD5. This would give us the following results:
R1(A,D,E,I,J) -- FD2, FD5 (transitive)
R2(B,F,G,H) -- FD3 FD4 (transitive)
R3(A,B,C) -- FD1
Next, to achieve 3NF, transitive dependencies would have to be removed into separate relations in the same manner as NF2. Which, in turn, would result in the set of relations which you have already derived.
Good luck with your exams!
I am reading this topic Functional dependency and Normalization in Database Management Subject. I came across this example.
Relation R(A,B,C,D) Which one is Lossy join but Dependency Preserving BCNF Decomposition?
a. A ->B, B -> CD
b. A -> B, B -> C, C->D
c. AB -> C, C -> AD
d. A -> BCD
Now answer given is option C.
How can option C. be a lossy decomposition. if you do ABC union CAD = ABCD This satisfies first condition.
if we do ABC intersection CAD = AC which is perfectly fine, since in AC, C is key for (CAD) C -> AD decomposition. which also satisfies the second condition. Am i making any mistake in understanding this concept.
Usually for a Normalisation/decomposition exercise, you are given:
The full relation and its attributes. [yes: R(A, B, C, D)]
The Functional dependencies. [yes? it looks like a., b., c., d. are possible sets of Fun Deps.]
The proposed decomposition. [Often named R1, R2, etc. I don't see those. I can't interpret option d. to be proposing a decomposition.]
Perhaps your post has missed out part of the exercise? Perhaps the exercise wants you to decide which decomp preserves the dependencies in BCNF? (But results in a lossy join.)
[editted in response to Nikhil's comment] Note that the list of FD's alone doesn't amount to a decomposition: the FD C -> AD is short-hand for C -> A, C -> D. Does that mean two decomposing relations? No, because A and C are already in the FD AB -> C. So we have R1= (A, B, C), R2 = (C, D). But I don't know if that is what the exercise is asking. Think about it. What does option d. mean in terms of decompositions?
Perhaps the exercise is asking (for example): given a proposed decomposition into R1 = (A, B) and R2 = (B, C, D), which of the sets of FD's would give a lossy decomposition?
There's a worked example here: http://en.wikipedia.org/wiki/Lossless-Join_Decomposition.
It points to a previous q Lossless Join Property.
And there's further references.
By the way, options a., b., include the same Fun Deps as option d., by the transitivity of dependencies (Armstrong's Axioms http://en.wikipedia.org/wiki/Armstrong%27s_axioms see also http://en.wikipedia.org/wiki/Heath%27s_theorem). This is a clue.
Let's consider, for instance, the following relation:
R (A,B,C,D,E,F)
where the bold denotes that it is a primary key attribute
with
F = {AB->DE, D->E}
Now, this looks to be in the first normal form. It can't be on the third normal form as I have a transitive dependency and it cannot be in the second form as not all non-key attributes depend on the whole primary key.
So my questions are:
I don't know what to make of F and C. I don't have any functional dependency info on them! F doesn't depend on anything? If that is the case, I can't think of any solution to get R into the 2nd normal form without taking it out!
What about C? C also suffers from the problem of not being referred on the functional dependencies list. What to do about it?
My attempt to get R into the 2nd normal form would be something like:
R(A,B,D)
R' (D,E)
but as stated earlier, I don't have a clue of what to do of C and F. Are they redundant so I simply take them out and the above attempt is all I have to do to get it into the 2nd form (and 3rd!)?
Thanks
Given the definition of R that { A, B, C } is the primary key, then there is inherently a functional dependency:
ABC â ABCDEF
That says that the values of A, B and C inherently determine or control the values of D, E and F as well as the trivial fact that they determine their own values.
You have a few additional dependencies, identified by the set F (which is distinct from the attribute F - the notation is not very felicitous, and could be causing confusion*):
AB â DE
D â E
As you rightly diagnose, the system is in 1NF (because 1NF really means "it is a table"). It is not in 2NF or 3NF or BCNF etc because of the transitive dependency and because some of the attributes only depend on part of the key.
You are right that you will end up with the following two relations as part of your decomposition:
R1(D, E)
R2(A, B, D)
You also need the third relation:
R3(A, B, C, F)
From these, you can recreate the original relation R using joins. The set of relations { R1, R2, R3 } is a non-loss decomposition of the original relation R.
* If the F identifying the set of subsidiary functional dependencies is intended to be the same as the attribute F, then there is something very weird about the definition of that attribute. I'd need to see sample data for the relation R to have a chance of knowing how to interpret it.
I think the primary key of R is set wrong. If F isn't functionally related to anything it has to be a part of the key
So you have R( ABCF DE) which is now in the first normal form (with F = {AB->DE, D->E}) Now you can change it to the second normal form. DE isn't dependant on the whole key (partial dependency) so you put it in another relation to get to second normal form:
R( ABCF ) F = {}
R1( #AB DE) F = {AB->DE}
Now this relation doesn't have any transitive dependencies so it is already in third normal form.
F doesn't depend on anything?
No, you just haven't been given any explicit information about it in the form
{something -> F}
And essentially the same can be said for C. You're expected to infer the other dependencies by applying Armstrong's axioms. (Probably.)
Think about how to finish this:
Given R (A,B,C,D,E,F)
{ABC -> ?}
[Later . . . I see that Jonathan Leffler has broken the suspense, so I'll just finish this.]
{ABC -> DEF} (By definition) therefore,
{ABC -> F} (By decomposition. Here's where F and C come in. And this is your third relation. ).