Practical example of MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y) - database

I was learning database normalization and join dependencies and
5NF. I had a hard time. Can anyone give me some practical examples of the multivalue dependency rule:
MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y).

Functional dependency / Normalization theory and the normal forms up to and including BCNF, were developed on the hypothesis of all data attributes (columns/types/...) being "atomic" in a certain sense. That "certain sense" has long been deprecated by now, but essentially it boiled down to the notion that "a single cell value in a table could not itself hold a multiplicity of values". Think, a textual CSV list of ISBN numbers, a table appearing as a value in a cell in a table (truly nested tables), ...
Now imagine an example with courses, professors, and study books used as course material. Imagine all of that modeled in a single 3-column table which says that "Professor (P) teaches course (C) and uses book (B) as course material." If there can be more than one book (B) used for any given course (Cn) and there can be more than one course (C) taught by any given professor (Pn) and there can be more than one professor (P) teaching any given course (Cn), then this table is clearly all-key (key is the full set of attributes {P,C,B} ).
This means that this table satisfies BCNF.
But now imagine that there is a rule to the effect that "the set of books used for any given course (Cn) must be the same, regardless of which professor teaches it.".
In the days when normalization was developed to the form in which it is now commonly known, it was not allowed to have table columns (relation attributes) that were themselves tables (relations). (Because such a design was considered a violation of 1NF, a notion which is now considered suspect.)
Imagine for a moment that we are indeed allowed to model relation attributes to be of type relation. Then we could model our 3-column table (/relation) as follows : "Professor (P) teaches course (C) and uses THE SET OF BOOKS (SB) as course material.". Attribute SB would no longer be an ISBN number, as in the previous and more obvious design, but it would be a (probably unary) RELATION holding the entire set of ISBN numbers. If we draw our design like that, and we then consider our rule that "all professors use the same set of books for the same course", then we see that this rule is now expressible as an FD from (C) to (SB) !!! And this means that we have a violation of a lower NF on our hand !!!
4 and 5 NF have arisen out of this kind of problems (where the appearance of a single attribute value -courseID (C)- causes a requirement for the appearance of A MULTITUDE of rows (multiple (B) ISBn numbers) being recognised quite early on, but without the solution that is currently regarded as the best (RVA's), being recognised as a valid one. So 4 and 5 NF were created "new and further normal forms", where the then-existing definitions of 2, 3 and BC NF were already sufficient for dealing with the situation at hand, provided RVA's had been recognised as a valid design approach.
To support that claim, let's look at what whould be done to eliminate the NF violation in our {P,C,SB} design with the FD C->SB :
We would split the table into two separate tables {P,C} and {C,SB} with keys {P,C} and {C}, repsectively. Both tables satisfy BCNF.
But we still have this SB attribute that holds a set of ISBN numbers. Dealing with this can be done by applying a technique like "UNGROUPING". Applying this to our {C,SB} table would get us a {C,B} table, where B are the ISBN book numbers (or whatever identifier you like to use in your database), and the key to the table is {C,B}. This is exactly the same design we would get if we eliminated the 4/5 NF violation !!!
You might also want to take a look at Multivalue Dependency violation?

Related

How to determine the candidate key by functional dependancies in relational database theory

Consider a database relation of student records as follows:
Student (I,G,P,M,S,Y,E,L,R,C)
(a) Show how to derive two candidate keys for Student, or justify why you cannot do so.
(b) What normal form is Student in? Show working that justifies your answer.
(c) If F contained MSY→LRCE instead of PMSY→LRCE, what would this imply about paper
names? (i.e., the values of M)
(d) Find a minimal cover (i.e, an irreducible set of functional dependencies) for Student.
(e) Find a decomposition of Student into third normal form (3NF).
I stuck on the first question about the candidate key. I know that the candidate keys must be a subset of (I,P,M,S,Y,L,R) since these appear on the left hand side of the Functional dependancies above and determine all of the remaining attributes. We can remove M which is determined by P, but then I was kinda confused about how to make these attributes to be the minimal, especially from complexed functional dependencies such as PMSY→LRCE. Thx for any solution and suggestions.
I won't do your homework but as a hint on (a);
F:IGPMSYELRC->IGPMSYELRC
always holds. By virtue of F:P->M you can remove M and get
F:IGPSYELRC->IGPMSYELRC
now apply F:R->C to get
F:IGPSYELR->IGPMSYELRC .
Repeat this until you cannot remove any attributes from the left-hand side.
Then you got a candidate key.
With different permutations of F this may yield other candidate keys.

Functional dependencies in case of nulls

Suppose we have a table with 3 columns A,B and C
A B C
---------------
1 2 3
2 4 5
4 6 7
n 5 n
Here 'n' means null.
Can we say that A -> B and A -> C? I know the definition of functional dependencies but I'm just confused in the case of null values.
If null is considered a value, then the answer is yes. A -> B, C holds in the given data. However, to be a value imposes certain requirements. All operators applicable to the domain (e.g. integers) like equality, addition, less than, and so on, must be well-defined in the presence of nulls.
If null is not a value, then the answer is more complicated. Functional dependencies, strictly speaking, apply to relations. If a table represents a relation, then we can refer to functional dependencies in the table. However, a symbol that represents the absence of a value is metadata, not data. It allows multiple union-incompatible relations to be represented by a single table. In this case, we can't apply the concept of functional dependency to the table since it's not clear which relation we're talking about.
Further confusing things, SQL DBMSs don't handle nulls consistently. In some cases, they're handled like values, in others like the absence of values. If you want to understand and describe a table logically, the best option is to decompose it into a set of null-free relations, and then to analyze each of those parts independently.
In the case of your example table, we run into a problem if null isn't a value. The last row has no unique identifier (it can't be B:4 since another row has B:4 as well) and we can't determine anything from a lack of information. The example can't be decomposed into a set of relations without discarding that row.
If we change the last row to have B:5 instead, then we decompose it into two relations: R1 = {(A:1, B:2, C:3), (A:2, B:4, C:5), (A:4, B:6, C:7)} and R2 = {(B:2), (B:4), (B:6), (B:5)}. We can say A -> B, C holds in R1 but not in R2.

Genaralisation of ER diagams

Can Any one explain the difference between using "d" and "o" notations in Generalization & Specialization of ER diagrams. Whether the both notation gives the same meaning or different meanings.
An o is for overlapping, meaning an entity type can belong to more than one subtype. In your example, an Assignment can involve a Grade and/or a Lab_Session.
A d is for disjoint, meaning an entity type can't belong to more than one subtype. In your example, a Lecture can be only one of Enhancement, SpecialDegree or GeneralDegree lectures.

What Happens when Cartesian Product is applied to Relations with same attribute name

I understand that the Cartesian product(X) operation on two databases does not need to be UNION compatible.So,if there is a same attribute called name in the two relations R and S where name in R is the first name and name in S is the second name
How can the related values be identified by the following selection operation
Q=RxS
I want to get the set of tuples whose firstname=lastname,So how am i supposed to write the selection statement?
σ Name=Name(Q)
Will this be a problem using the same attribute name in the selection operation?
Cartesion product does not require attributes to be named differently. It only requires relations to be named differently.
For example, D := A(id, name) X B(id, age) is perfectly valid, and the resulting relation is D(A.id, name, B.id, age).
In other words, the attributes are automatically renamed by prepending the relation name, as part of the cartesion product. This prepend operation also leads to the requirement that relations to be named differently.
Source:
- Database System Concepts 6th Edition, Chapter 6.1.1.6 The Cartesian-Product Operation, for the definition, and an example in Figure 6.8 Result of instructor × teaches.
Correct that for Cartesian product the relations need not be UNION compatible.
But they still need to be compatible! Otherwise there are exactly the difficulties you point out. So the rule for Cartesian product is that there must be no attributes in common.
So if you have a clash of attributes, first you must rename the attributes before crossing.
See http://en.wikipedia.org/wiki/Relational_algebra on 'Natural Join'. (That defines Nat Join in terms of Rename, Cartesian product and Projection.)
From the point of view of learning the RA, I would think of Natural Join as the basic operation. And Cartesian product as a degenerate form when there are no attributes in common. This is for example the approach that Date & Darwen take in their textbooks.

Relational Algebra Query Troubles

I have a problem where I have two relations, one containing attributes song_id, song_name, album_id, and the other containing album_id and album_name. I need to find the names of all the albums that do not have songs in the song relation. The problem is I can only use Rename, Projection, Selection, Grouping(with sum,min,max,count), Cartesian Product, and Natural join. I have spent a good amount of time working on this and would appreciate any help that pointed me in the right direction.
As #ErwinSmout pointed out, difference is a generally easy way to do it. But since you can't use it, there is a tricky workaround using counts. I'm assuming that every album_id present in the songs relation is also present in the albums relation.
PROJECT album_id from the songs relation (note that relational algebra's PROJECT is equivalent to SQL's SELECT DISTINCT). I'll call this relation song_albums. Now lets take the count of the albums relation, call this m, and take the count of the new table, call this n.
Take the Cartesian product of the albums relation and the song_albums relation. This new relation has m*n rows. Now if you do a count, grouped by album_name, each of the m album_name's will have a count of n. Not very helpful.
But now, we SELECT from the relation rows where albums.album_id != song_albums.album_id. Now, if you do a count grouped by album_name, the count for those albums that were not in the original songs relation will be n, while those that were originally in there will have a count less than n, since rows would have been removed based on how many songs with that album were in the original songs relation.
Edit: As it turns out, this isn't a strictly relational-algebra solution: In SQL, a 1 x 1 table, such as the one containing n can simply be treated as an integer and used in an equality comparison. However, according to Wikipedia, selection must make a comparison between either two attributes of a relation, or an attribute and a constant value.
Another obstacle which will be dealt with by another ill-recommended Cartesian product: we can take the Cartesian product of the 1 x 1 relation containing n with our most recent relation. Now we can make a proper relational-algebra selection since we have an attribute that is always equal to n.
Since this has gotten rather complex, here is a relational-algebra expression capturing the above english explanation:
Note that n is a 1 x 1 relation with an attribute named "count".
It's impossible. The problem includes a negation, and in relational algebra, that can only be epxressed using relational difference, which you're seemingly not allowed to use.
I'm curious to see what your teacher presents as the solution to this problem.

Resources