so im stuck in this exercise where I need to translate relational algebra (unary relational operations) expressions based on the Mondial III database to natural language and I need help for the last two and if I have any errors in the ones I answered. BTW i used 6 for sigma (SELECT operation) and |><| for the THETA JOIN operation (couldn't find the sigma or the real theta join operator on my keyboard sorry about that) Any help is much appreciated!Thanks in advance.
Here's the meaning for symbols :
SELECT :
Selects all tuples that satisfy the selection condition from a relation R :
6selection condition(R)
PROJECT : Produces a new relation with only some of the attributes of R, and removes duplicates tuples :
πattribute list(R)
THETHA JOIN : Produces all combinations of tuples from R1 and R2 that satisfy the join condition :
R1< |><|join condition >(R2)
πname(6elevation>1000(MOUNTAIN)) -> Find the name of all mountains whose elevation is higher than 1000.
6elevation>1000(6population>100000(CITY)) -> Select the city's tuples whose elevation is higher than 1000 with a population greater than 100000
6population>100000(6elevation>1000(CITY)) -> Select the city's tuples whose population is greater than 100000 with an elevation higher than 1000
COUNTRY|><|code=country(LANGUAGE) -> ?
πCountry.name(COUNTRY|><|code=country(6Language.name='English' AND percentage>50(LANGUAGE)) -> ?
The fourth expression returns all the informations about the countries together with all the languages spoken (the information about the country is repeated for each different language spoken).
The fifth expression return the name of all the countries where the prevalent language is English.
I'm having a difficulty to understand how to "work" with join dependencies, and I would like to ask a question that will help me clarify things for myself.
Here's the simple definition from Wikipedia:
A table T is subject to a join dependency if T can always be recreated
by joining multiple tables each having a subset of the attributes of
T.
A trivial join dependency is defined as follows:
If one of the tables in the join has all the attributes of the table
T, the join dependency is called trivial.
My question is: If we decompose a relation R into a lossless decomposition, is it possible that every join dependency of R could be a trivial join dependency?
An example would be awesome.
If we decompose a relation R into a lossless decomposition, is it possible that the join dependency\ies of R would be a trivial join dependency\ies?
If you mean, if we decompose a relation R losslessly is it possible that all the JDs of R are trivial: yes.
Whenever all the JDs of R are trivial, you can decompose it losslessly, because by definition a JD is just a description of a lossless decomposition. And there are such relations. Every R, calling its attribute set S, satisfies the JDs *(S,S), *(S,S,S), etc. Some satisfy no other FDs. Some satisfy others but they're also trivial.
Eg: This R only satisfies *(S,S), *(S,S,S), etc:
x y
1 2
5 2
5 4
Eg: Say S = {x,y} and FD {x}->{y} holds, so *({x},S} holds. But say JD *({x},{y}) doesn't hold. Then the only way a JD can have sets unioning to S is if S is one of them. So R has only trivial JDs. But not just the ones using only S.
x y
1 2
5 2
6 4
If you mean, if we decompose a relation R losslessly into smaller components is it possible that all the JDs of R are trivial: no. Because by definition a trivial JD has one set that is all the attributes of R, ie has one component that is R, it doesn't decompose into components smaller than R.
Sorry for asking a question one might consider a basic one)
Suppose we have a relation R(A,B,C,D,E) with multivalued dependencies:
A->>B
B->>D.
Relation R doesn't have any functional dependencies.
Next, suppose we decompose R into 4NF.
My considerations:
Since we don't have any functional dependencies, the only key is all attributes (A,B,C,D,E). There are two ways we can decompose our relation R:
R1(A,B) R2(A,C,D,E)
R3(B,D) R4(A,B,C,E)
My question is - are these 2 decompositions final? Looks like they are since there are no nontrivial multivalued dependencies left. Or am I missing something?
Relation R doesn't have any functional dependencies.
You mean, non-trivial FDs (functional dependencies). (There must always be trivial FDs.)
Assuming that the MVDs (multivalued dependencies) holding in R are those in the transitive closure of {A ↠ B, B ↠ D}:
In 1 R1(A,B) R2(A,C,D,E), we can reconstruct R as R1 JOIN R2 and both R1 & R2 are in 4NF and their join will satisfy A ↠ B. If some component contained all the attributes of the other MVD then we could further decompose it per that MVD. And we would know that, given some alleged values for all components, their alleged reconstruction of R by joining would satisfy both MVDs. But here there is no such component. So we can't further decompose. And we know that an alleged reconstruction of R by joining satisfies A ↠ B but we would still have to check whether B ↠ D. We say that the MVD B ↠ D is "not preserved" and the decomposition to R1 & R2 "does not preserve MVDs".
In 2 R3(B,D) R4(A,B,C,E), we can reconstruct R as R3 JOIN R4 and both R3 & R4 are in 4NF and the join will satisfy B ↠ D. Now some component contains all the attributes of the other MVD so we can further decompose it per that MVD. And we know that, given some alleged values for all components, their alleged reconstruction of R by joining satisfies both MVDs. That component is R4, which we can further decompose, reconstructing as AB JOIN ACE. And we know that an alleged reconstruction of R by joining satisfies both A ↠ B & B ↠ D. Because the MVDs in the original appear in a component, we say these decompositions "preserve MVDs".
PS 1 The 4NF decomposition must be to three components
MVDs always come in pairs. Suppose MVD X ↠ Y holds in a relation with attributes S, normalized to components XY & X(S-Y). Notice that S-XY is the set of non-X non-Y attributes, and X(S-Y) = X(S-XY). Then there is also an MVD X ↠ S-XY, normalized to components X(S-XY) & X(S-(S-XY)), ie X(S-XY) & XY, ie X(S-Y) & XY. Why? Notice that both MVDs give the same component pair. Ie both MVDs describe the same condition, that S = XY JOIN X(S-XY). So when an MVD holds, that partner holds too. We can write the condition expressed by each of the MVDs using the special explicit & symmetrical notation X ↠ Y | S-XY.
We say a JD (join dependency) of some components of S holds if and only if they join to S. So if S = XY JOIN X(S-Y) = XY JOIN X(S-XY) then the JD *{XY, X(S-XY)} holds. Ie the condition that both MVDs describe is that JD. So a certain MVD and a certain binary JD correspond. That's one way of seeing why normalizing an MVD away involves a 2-way join and why MVDs come in pairs. The JDs that cause a 4NF relation to not be in 5NF are those that do not correspond to MVDs.
Your example involves two MVDs that aren't partners & neither otherwise holds as a consequence of the other, so you know that the final form of a lossless decomposition will involve two joins, one for each MVD pair.
PS 2 Ambiguity of "Suppose we have a relation with these multi-valued dependencies"
When decomposing per FDs (functional dependencies) we are usually given a canonical/minimal cover for the relation, ie a set in a certain form whose transitive closure under Armstrong's axioms (set of FDs that must consequently hold) holds all the FDs in the relation. This is frequently forgotten when we are told that some FDs hold. We must either be given a canonical/minimal cover for the relation or be given an arbitrary set and be told that the FDs that hold in the relation are the ones in its transitive closure. If we're just given a set of FDs that hold, we know that the ones in its transitive closure hold, but there might be others. So in general we can't normalize.
Here you give some MVDs that hold. But they aren't the only ones, because each has a partner. Moreover others might (and here do) consequently hold. (Eg X ↠ Y and Y ↠ Z implies X ↠ Z-Y holds.) But you don't say that they form a canonical or minimal cover. One way to get a canonical form for MVDs (a unique one per each transitive closure, hopefully more concise!) would be a minimal cover (one that can't lose any MVDs and still have the same transitive closure) augmented by the partner of each MVD. (Whereas for FDs the standard canonical form is minimal.) You also don't say "the MVDs that hold are those in the transitive closure of these". You just say that those MVDs hold. So maybe some not in the transitive closure do too. So your example can't be solved. We can guess that you probably mean that this is a minimal cover. (It's not canonical.) Or that the MVDs that hold in the relation are those in the transitive closure of the given ones. (Which in this case are then a minimal cover.)
A Table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X ->> Y, X is a superkey—that is, X is either a candidate key or a superset.
In your first decomposition(1 with R1 and R2) B->>D is not satisfying so it's not dependency preserving decomposition as well as not in 4NF as A is not superkey in 2nd table.
On the other hand,second decomposition(2 with R3 and R4) is dependency preserving and lossless join with B and ACE as primary key in respective tables but it's not in 4NF because A->>B dependency exists in second table and A is not superkey, you have to decompose second table further in to two tables that can be {A B} and {A C E}.
So if I follow your reasoning (suraj3), are R1(A,B) and R2(B,C,D,E) correct decomposition? I think this will preserve the FD B->>D.
Many books on RDBMS define a relation to be in 4th normal form if for every non-trivial MVD x ->> Y, X is a super key. However I fail to understand how can the determiner X be a super key when we say it multi-determines Y, i.e., Y can have multiple values for a given value of X. A super key is expected to uniquely identify dependent attribute values whereas Y can have multiple values for the same value of X in the MVD. The books give examples for only trivial MVDs.
I'm studying size estimation of logical query plans in order to select a physical query plan.
I was wondering what is the size of joining (natural join) a relation to itself?
e.g R(a,b) JOIN R(a,b), say total number of tuples is 100 and attributes a and b both has a distinct values of 20.
Will the join size (number of tuples in result) equal to 100?
I'm so confused!
To answer the question as asked:
Natural join of a relation to itself is the identity operation; you'll get exactly the tuples you started with (yes, 100 tuples in this case).
The equivalent SQL for what you ask is:
SELECT R1.a, R1.b FROM R AS R1, R As R2 WHERE R1.a = R2.a AND R1.b = R2.b
This is because RA's (Natural) Join always matches by attribute name.
What could be more sensible? What's to be confused about?