I am trying to understand the notion of data redundancy. Can someone please help to explain what is the difference between the notion of "a relation schema is redundant" and "a relation schema is value redundant"? Below is the formal definition, which I don't quite get.
So far my understanding is that if some data in a relation can be derived using functional dependencies over that relation, that data is redundant. However I don't know why they distinguish "redundant" and"value redundant". Many thanks in advance!
A schema is redundant for a sigma if some relation with that heading and satisfying the FDs in sigma has two equal subrows on the attributes of some FD in the closure of sigma. Eg: If X->Y and Y->Z are in sigma but X->Z is not then X->Z is nevertheless in the closure of sigma, so X->Z also has to hold. So if some relation satisfying sigma's FDs has two rows with the same (X,Y), (Y,Z) or (X,Z) value then the schema is redundant. Ie a schema is redundant when some satisfying relation actually exhibits certain (informally) "redundant" subrows per the closure of a sigma.
A schema is value-redundant for a sigma if some relation with that heading and satisfying the FDs in sigma has an element that when given a different value always gives a relation that doesn't satisfy the FDs in sigma. Ie it has an element value that given the rest of the element values must be that value. Eg in any of the above 3 cases of there being equal subrows (ie XY, YZ or XZ), the element in the determined subrow (ie respectively Y, Z or Z) has to have that value given the rest of the element values. Ie a schema is value-redundant when some satisfying relation actually exhibits a certain (informally) "redundant" subrow per a sigma.
Notice that redundancy is in terms of the closure of sigma but value-redundancy is in terms of just sigma.
The text will go on to show that a schema is redundant for a sigma if and only if it is value-redundant. So to determine redundancy, instead of having to (expensively) calculate the closure of sigma we can just use sigma per value-redundancy (in a less expensive way).
Related
Consider a database relation of student records as follows:
Student (I,G,P,M,S,Y,E,L,R,C)
(a) Show how to derive two candidate keys for Student, or justify why you cannot do so.
(b) What normal form is Student in? Show working that justifies your answer.
(c) If F contained MSY→LRCE instead of PMSY→LRCE, what would this imply about paper
names? (i.e., the values of M)
(d) Find a minimal cover (i.e, an irreducible set of functional dependencies) for Student.
(e) Find a decomposition of Student into third normal form (3NF).
I stuck on the first question about the candidate key. I know that the candidate keys must be a subset of (I,P,M,S,Y,L,R) since these appear on the left hand side of the Functional dependancies above and determine all of the remaining attributes. We can remove M which is determined by P, but then I was kinda confused about how to make these attributes to be the minimal, especially from complexed functional dependencies such as PMSY→LRCE. Thx for any solution and suggestions.
I won't do your homework but as a hint on (a);
F:IGPMSYELRC->IGPMSYELRC
always holds. By virtue of F:P->M you can remove M and get
F:IGPSYELRC->IGPMSYELRC
now apply F:R->C to get
F:IGPSYELR->IGPMSYELRC .
Repeat this until you cannot remove any attributes from the left-hand side.
Then you got a candidate key.
With different permutations of F this may yield other candidate keys.
Hey all I have an assignment that says:
Let R(ABCD) be a relation with functional dependencies
A → B, C → D, AD → C, BC → A
Which of the following is a lossless-join decomposition of R into Boyce-Codd Normal Form (BCNF)?
I have been researching and watching videos on youtube and I cannot seem to find how to start this. I think I'm supposed to break it down to subschemas and then fill out a table to find which one is lossless, but I'm having trouble getting started with that. Any help would be appreciated!
Your question
Which of the following is a lossless-join decomposition of R into
Boyce-Codd Normal Form (BCNF)?
suggests that you have a set of options and you have to choose which one of those is a lossless decomposition but since you have not mentioned the options I would first (PART A) decompose the relation into BCNF ( first to 3NF then BCNF ) and then (PART B) illustrate how to check whether this given decomposition is a lossless-join decomposition or not. If you are just interested in knowing how to check whether a given BCNF decomposition is lossless or not jump directly to PART B of my answer.
PART A
To convert a relation R and a set of functional dependencies(FD's) into 3NF you can use Bernstein's Synthesis. To apply Bernstein's Synthesis -
First we make sure the given set of FD's is a minimal cover
Second we take each FD and make it its own sub-schema.
Third we try to combine those sub-schemas
For example in your case:
R = {A,B,C,D}
FD's = {A->B,C->D,AD->C,BC->A}
First we check whether the FD's is a minimal cover (singleton right-hand side , no extraneous left-hand side attribute, no redundant FD)
Singleton RHS: All the given FD's already have singleton RHS.
No extraneous LHS attribute: None of the FD's have extraneous LHS attribute that needs to e removed.
No redundant FD's: There is no redundant FD.
Hence the given set of FD's is already a minimal cover.
Second we make each FD its own sub-schema. So now we have - (the keys for each relation are in bold)
R1={A,D,C}
R2={B,C,A}
R3={C,D}
R4={A,B}
Third we see if any of the sub-schemas can be combined. We see that R1 and R2 already have all the attributes of R and hence R3 and R4 can be omitted. So now we have -
S1 = {A,D,C}
S2 = {B,C,A}
This is in 3NF. Now to check for BCNF we check if any of these relations (S1,S2) violate the conditions of BCNF (i.e. for every functional dependency X->Y the left hand side (X) has to be a superkey) . In this case none of these violate BCNF and hence it is also decomposed to BCNF.
PART B
When you apply Bernstein Synthesis as above to decompose R the decomposition is always dependency preserving. Now the question is, is the decomposition lossless? To check that we can follow the following method :
Create a table as shown in figure 1, with number of rows equal to the number of decomposed relations and number of column equal to the number of attributes in our original given R.
We put a in all the attributes that our present in the respective decomposed relation as in figure 1. Now we go through all the FD's {C->D,A->B,AD->C,BC->A} one by one and add a whenever possible. For example, first FD is C->D. Since both the rows in column C has a and there is an empty slot in second row of column D we put a a there as shown in the right part of the image. We stop as soon as one of the rows is completely filled with a which indicates that it is a lossless decomposition. If we go through all the FD's and none of the rows of our table get completely filled with a then it is a lossy decomposition.
Also, note if it is a lossy decomposition we can always make it lossless by adding one more relation to our set of decomposed relations consisting of all attributes of the primary key.
I suggest you see this video for more examples of this method. Also other way to check for lossless join decomposition which involves relational algebra.
I am having a hard time understanding the 3 Normal form.
3 NF: 2 NF + No transitions
So, for eg: If I have,
A -> B
B -> C
Then the above is sort of a transition relation and hence won't be in 3 NF?
Am I understanding it correctly?
But in this answer What exactly does database normalization do? , by paxdiablo, it says,
Third normal form (3NF) - 2NF and every non-key column in a table depends on nothing but the key.According to this, it will be in 3 NF. Where am I going wrong?
A relation is in 3NF if it is in 2NF and:
either each attribute depends on a key,
or, if an attribute depends on a non-key, then it is prime.
(being prime means that it belongs to a key).
See for instance Wikipedia.
A relation is in Boyce-Codd normal form if only the first condition hold, that is:
each attribute depends on a key
So, in your example, if the relation has only three attributes A, B and C and the two dependencies, it is not in 3NF, since C is not prime, and depends on B, which is a not a key. On the other hand, if there are other attributes, and C is a key or part of a key, then it could be in 3NF (but this depends on the other functional dependencies, that should satisfy the above conditions).
The 2NF says that each non-prime attribute depends on each whole candidate key, and not by part of it. For instance, if a relation has attributes A, B and C, the only key is AB, and B -> C, then this relation is not in 2NF.
The 2-part 3nf definition you are trying for is:
2NF holds and every non-prime attribute of R is non-transitively dependent on every superkey. (X transitively determines Z when there's a Y where X → Y and Y → Z and not Y → X.)
The other definition of 3NF is:
For every non-trivial FD X → Y, either X is a superkey or the attributes in Y but not in X are prime. (X → Y is trivial when X contains Y.)
Then BCNF is:
For every non-trivial FD X → Y, X is a superkey
See this answer.
If your example's only columns are A, B and C and your two FDs form a minimal cover then the only candidate key is A and C is dependent on a non-superkey so it is not in 3NF (or BCNF).
You are (mis)using terms so sloppily that your sentences don't mean anything. Learn the terms and how they are used in their definitions to refer to various things and use them that way in reference to appropriate things. And get your definitions from a (reputable) textbook.
Many books on RDBMS define a relation to be in 4th normal form if for every non-trivial MVD x ->> Y, X is a super key. However I fail to understand how can the determiner X be a super key when we say it multi-determines Y, i.e., Y can have multiple values for a given value of X. A super key is expected to uniquely identify dependent attribute values whereas Y can have multiple values for the same value of X in the MVD. The books give examples for only trivial MVDs.
I was learning database normalization and join dependencies and
5NF. I had a hard time. Can anyone give me some practical examples of the multivalue dependency rule:
MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y).
Functional dependency / Normalization theory and the normal forms up to and including BCNF, were developed on the hypothesis of all data attributes (columns/types/...) being "atomic" in a certain sense. That "certain sense" has long been deprecated by now, but essentially it boiled down to the notion that "a single cell value in a table could not itself hold a multiplicity of values". Think, a textual CSV list of ISBN numbers, a table appearing as a value in a cell in a table (truly nested tables), ...
Now imagine an example with courses, professors, and study books used as course material. Imagine all of that modeled in a single 3-column table which says that "Professor (P) teaches course (C) and uses book (B) as course material." If there can be more than one book (B) used for any given course (Cn) and there can be more than one course (C) taught by any given professor (Pn) and there can be more than one professor (P) teaching any given course (Cn), then this table is clearly all-key (key is the full set of attributes {P,C,B} ).
This means that this table satisfies BCNF.
But now imagine that there is a rule to the effect that "the set of books used for any given course (Cn) must be the same, regardless of which professor teaches it.".
In the days when normalization was developed to the form in which it is now commonly known, it was not allowed to have table columns (relation attributes) that were themselves tables (relations). (Because such a design was considered a violation of 1NF, a notion which is now considered suspect.)
Imagine for a moment that we are indeed allowed to model relation attributes to be of type relation. Then we could model our 3-column table (/relation) as follows : "Professor (P) teaches course (C) and uses THE SET OF BOOKS (SB) as course material.". Attribute SB would no longer be an ISBN number, as in the previous and more obvious design, but it would be a (probably unary) RELATION holding the entire set of ISBN numbers. If we draw our design like that, and we then consider our rule that "all professors use the same set of books for the same course", then we see that this rule is now expressible as an FD from (C) to (SB) !!! And this means that we have a violation of a lower NF on our hand !!!
4 and 5 NF have arisen out of this kind of problems (where the appearance of a single attribute value -courseID (C)- causes a requirement for the appearance of A MULTITUDE of rows (multiple (B) ISBn numbers) being recognised quite early on, but without the solution that is currently regarded as the best (RVA's), being recognised as a valid one. So 4 and 5 NF were created "new and further normal forms", where the then-existing definitions of 2, 3 and BC NF were already sufficient for dealing with the situation at hand, provided RVA's had been recognised as a valid design approach.
To support that claim, let's look at what whould be done to eliminate the NF violation in our {P,C,SB} design with the FD C->SB :
We would split the table into two separate tables {P,C} and {C,SB} with keys {P,C} and {C}, repsectively. Both tables satisfy BCNF.
But we still have this SB attribute that holds a set of ISBN numbers. Dealing with this can be done by applying a technique like "UNGROUPING". Applying this to our {C,SB} table would get us a {C,B} table, where B are the ISBN book numbers (or whatever identifier you like to use in your database), and the key to the table is {C,B}. This is exactly the same design we would get if we eliminated the 4/5 NF violation !!!
You might also want to take a look at Multivalue Dependency violation?