RA translation to natural language - database

so im stuck in this exercise where I need to translate relational algebra (unary relational operations) expressions based on the Mondial III database to natural language and I need help for the last two and if I have any errors in the ones I answered. BTW i used 6 for sigma (SELECT operation) and |><| for the THETA JOIN operation (couldn't find the sigma or the real theta join operator on my keyboard sorry about that) Any help is much appreciated!Thanks in advance.
Here's the meaning for symbols :
SELECT :
Selects all tuples that satisfy the selection condition from a relation R :
6selection condition(R)
PROJECT : Produces a new relation with only some of the attributes of R, and removes duplicates tuples :
πattribute list(R)
THETHA JOIN : Produces all combinations of tuples from R1 and R2 that satisfy the join condition :
R1< |><|join condition >(R2)
πname(6elevation>1000(MOUNTAIN)) -> Find the name of all mountains whose elevation is higher than 1000.
6elevation>1000(6population>100000(CITY)) -> Select the city's tuples whose elevation is higher than 1000 with a population greater than 100000
6population>100000(6elevation>1000(CITY)) -> Select the city's tuples whose population is greater than 100000 with an elevation higher than 1000
COUNTRY|><|code=country(LANGUAGE) -> ?
πCountry.name(COUNTRY|><|code=country(6Language.name='English' AND percentage>50(LANGUAGE)) -> ?

The fourth expression returns all the informations about the countries together with all the languages spoken (the information about the country is repeated for each different language spoken).
The fifth expression return the name of all the countries where the prevalent language is English.

Related

Is multiplication allowed in relational algebra?

I have a relation
R
-------
cid sid gradepoint credits
CS425 001 4.0 3
I need to calculate the GPA. There are more rows, but I believe if I just get this answered I should be ok with the rest. I need to do gradepoint * credits. How do I express this with a relational algebra expression?
My best guess is:
, but I'm not sure if I can multiply attributes with anything other than a constant.
Relational algebra doesn't address domain-specific operations. It neither includes nor excludes it, just like real algebra neither includes nor excludes operations on relations.
If you allow multiplication by constants, you're already combining algebras (which is pretty much required for any practical application) so I see no reason to disallow multiplication between attributes.
Notice that if expressions like you are using are allowed then it is projection that is doing the multiplying. Instead of its inputs being a relation value & attribute names, its inputs are a relation value and expressions of some sort that include names of operators whose inputs are values of attribute types. Your projection is parsing and multiplying. So it is a different operator than one that only accepts attribute names.
The projection that takes attribute expressions begs the question of its implementation given an algebra with projection only on a relation value and attribute names. This is important in an academic setting because a question may be wanting you to actually figure out how to do that, or because the difficulty of a question is dependent on the operators available. So find out what algebra you are supposed to use.
We can introduce an operator on attribute values when we only have basic relation operators taking attribute names and relation values. Each such operator can be associated with a relation value that has an attribute for each operand and an attribute for the result. The relation holds the tuples where the result value is equal to the the result of the operator called on the operand values. (The result is functionally dependent on the operands.)
So suppose we have the following table value Times holding tuples where left * right = result:
left right result
-------------------
0 0 0
1 0 0
...
0 1 0
1 1 1
2 1 2
...
If your calculated attribute is result then you want
/* tuples where for some credits & gradepoint,
course cid's student sid earned grade gradepoint and credits credits
and credits * gradepoint = result
*/
project cid, sid, result (
R natural join (rename left\credits right\gradepoint (Times))
)
Relational algebra - recode column values
PS re algebra vs language: What is a reference to the "relational algebra" you are using? There are many. They even have different notions of what a "relation" is. Some so-called "algebras" are really languages because the expressions don't only represent the results of operators being called on values. Although it is possible for an algebra to have operand values that represent expressions and/or relation values that contain names for themselves.
PS re separation of concerns: You haven't said what the attribute name of the multiplication result is. If you're expecting it to be credit * gradepoint then you're also expecting projection to map expression-valued inputs to attribute names. Except you are expecting credit * gradepoint to be recognized as an expression with two attribute names & and an operator name in one place but to be just one attribute name in another. There are solutions for these things in language design, eg in SQL optional quotes specialized for attribute names. But maybe you can see why simple things like an algebra operating on just attribute names & relation values with unique, unordered attribute names helps us understand in terms of self-contained chunks.

Why are union, intersection and difference operations called boolean operations in relational algebra?

Why are union, intersection and difference operations of relational algebra called boolean operations?
I found them called that in the first line in section 5.4.1 Boolean operations (Section 5.4 is Relational Algebra and Datalog) in a book named A First Course in DATABASE SYSTEMS by Ullman & Widom.
The statement that you are citing is from the Chapter 5 (Algebraic and Logical Query Languages) of the book “First Course in Database Systems” by Ullman and Widom (Pearson 2013).
In particular, the sections 5.3 and 5.4 of that Chapter treat the language Datalog, which can be used to work on a Relational Data Base, in which a Relation is seen as predicate, not a set (see Section 5.3.1).
In other words, a tuple (x1, x2, ..., xn) of R is seen as the fact that the relation R is true for the arguments specified (x1, x2, ..., xn). In this way one could transform the set operators discussed in the context of Relation Algebra (union, difference, intersection) as rules of Datalog expressed through the use of boolean operators, like AND, NOT, etc.
In fact, you can see that in the same book, in section 2.4.4, they are called Set Operators (as they actually are), so I think the naming of “boolean operators” is due to the fact that they are strictly related to boolean operators and are discussed in the context of a logical view of a database.
The notion of "boolean" relational operations is an idiosyncratic ad hoc distinction/categorization in that book and chapter. The authors identify 3 common relation operators that "can each be expressed simply in Datalog". The next sections at the same level go on to each express some other relational operator. ("Selections can be somewhat more difficult to express in Datalog.") The 3 operators' treatments are similar to each other and different from others' in a certain context-specific "boolean" way. (So it's not a particularly helpful or deep distinction.)
5.4.1 Boolean Operations
The boolean operations of relational algebra--union, intersection, and set
difference--can each be expressed simply in Datalog.
To take the union R ∪ S, [...] As a result, each tuple from R and each tuple of S is put into the answer relation.
To take the intersection R ∩ S, [...] Then, a tuple is in the answer relation if and only if it is in both R and S.
To take the difference R - S, [...] Then, a tuple is in the answer relation if and only if it is in R but not in S.
So the "boolean" is because each relation operator corresponds to a certain boolean operator in that context:
UNION returns tuples that are in one operand OR the other
INTERSECTION returns tuples that are in one operand AND the other
DIFFERENCE returns tuples that are in one operand AND NOT the other
(Datalog also literally uses AND and AND NOT, but OR is implicit.)
More importantly, this can be put another way: If every base relation holds the tuples that make a true proposition (statement) from some associated predicate (sentence template parameterized by attribute names) then every result has a predicate made from its operand predicates, and its value holds the tuples that make a true proposition from that predicate:
U holds tuples where U
V holds the tuples where V
U UNION V holds the tuples where U OR V
U INTERSECTION V holds the tuples where U AND V
U DIFFERENCE V holds the tuples where U AND NOT V
But more than that it's not just that those operators correspond to some boolean propositional logic connectives/nonterminals, but that every relational operator corresponds to a predicate logic connective/nonterminal, every query expression has an associated predicate, and every query result value holds the tuples that make a true proposition from that predicate:
U JOIN V holds the tuples where U AND V
R RESTRICTcondition holds the tuples where U AND condition
U PROJECT A holds the tuples where FORSOME values for all attributes but A, U
U RENAME A A' holds the tuples where U with A replaced by A'
So the book ties those three operators to the syntax & semantics of Datalog, but really all operators can be tied to the syntax & semantics of predicate logic. Which is the language/notation of precision in mathematics, science (including computer science) & engineering (including software engineering.) In fact, that's how we know what a query means. So it's not really that those three operators are in some sense boolean so much as that all relational operators are logical: every relation expression corresponds to a predicate logic expression.
(The three operators are also frequently called the "set operators", since the set that is the body of the result relation arises from the sets that are the bodies of the operand relations according to the set operators by the same names. And that might be a helpful way to group them in your mind or remember them or their names, and no doubt inspired their names. But in light of the correspondence between relation operators and predicate logic connectives/nonterminals and the correspondence between relation expressions and predicate logic expressions and the correspondence between predicates and relation values, the fact that some relation operators are reminiscent of some set operators is totally irrelevant.) (Just like that some operators are reminiscent of boolean operators. Just like that you can make an algebra with "relations" whose bodies are bags instead of sets.)

finding max value among two table without using max function in relational algebra

Suppose I have two tables A{int m} and B{int m} and I have to find maximum m among two tables using relational algebra but I cannot use max function.How can I do it?I think using join we can do it but i am not sure if my guess is correct or not.
Note: this is an interview question.
Hmm, I'm puzzled why the question involves two tables. For the question as asked, I would just UNION the two (as StilesCrisis has done), then solve for a single table.
So: how to find the maximum m in a table using only NatJOIN? This is a simplified version of finding the top node on a table that holds a hierarchy (think assembly/component explosions or org charts).
The key idea is that we need to 'copy' the table into something with a different attribute name so that we can compare the tuples pair-wise. (And this will therefore use the degenerate form of NatJOIN aka cross-product). See example here How can I find MAX with relational algebra?
A NOT MATCHING
((A x (A RENAME m AS mm)) WHERE m < mm)
The subtrahend is all tuples with m less than some other tuples. The anti-join is all the tuples except those -- ie the MAX. (Using NOT MATCHING I think is both more understandable than MINUS, and doesn't need the relations to be UNION-compatible. It's roughly equivalent to SQL NOT EXISTS).)
[I've used Tutorial D syntax, to avoid mucking about with greek letters.]
SELECT M FROM (SELECT M FROM A UNION SELECT M FROM B) ORDER BY M DESC LIMIT 1
This doesn't use MAX, just plain vanilla SQL.

What Happens when Cartesian Product is applied to Relations with same attribute name

I understand that the Cartesian product(X) operation on two databases does not need to be UNION compatible.So,if there is a same attribute called name in the two relations R and S where name in R is the first name and name in S is the second name
How can the related values be identified by the following selection operation
Q=RxS
I want to get the set of tuples whose firstname=lastname,So how am i supposed to write the selection statement?
σ Name=Name(Q)
Will this be a problem using the same attribute name in the selection operation?
Cartesion product does not require attributes to be named differently. It only requires relations to be named differently.
For example, D := A(id, name) X B(id, age) is perfectly valid, and the resulting relation is D(A.id, name, B.id, age).
In other words, the attributes are automatically renamed by prepending the relation name, as part of the cartesion product. This prepend operation also leads to the requirement that relations to be named differently.
Source:
- Database System Concepts 6th Edition, Chapter 6.1.1.6 The Cartesian-Product Operation, for the definition, and an example in Figure 6.8 Result of instructor × teaches.
Correct that for Cartesian product the relations need not be UNION compatible.
But they still need to be compatible! Otherwise there are exactly the difficulties you point out. So the rule for Cartesian product is that there must be no attributes in common.
So if you have a clash of attributes, first you must rename the attributes before crossing.
See http://en.wikipedia.org/wiki/Relational_algebra on 'Natural Join'. (That defines Nat Join in terms of Rename, Cartesian product and Projection.)
From the point of view of learning the RA, I would think of Natural Join as the basic operation. And Cartesian product as a degenerate form when there are no attributes in common. This is for example the approach that Date & Darwen take in their textbooks.

Relational Algebra Query Troubles

I have a problem where I have two relations, one containing attributes song_id, song_name, album_id, and the other containing album_id and album_name. I need to find the names of all the albums that do not have songs in the song relation. The problem is I can only use Rename, Projection, Selection, Grouping(with sum,min,max,count), Cartesian Product, and Natural join. I have spent a good amount of time working on this and would appreciate any help that pointed me in the right direction.
As #ErwinSmout pointed out, difference is a generally easy way to do it. But since you can't use it, there is a tricky workaround using counts. I'm assuming that every album_id present in the songs relation is also present in the albums relation.
PROJECT album_id from the songs relation (note that relational algebra's PROJECT is equivalent to SQL's SELECT DISTINCT). I'll call this relation song_albums. Now lets take the count of the albums relation, call this m, and take the count of the new table, call this n.
Take the Cartesian product of the albums relation and the song_albums relation. This new relation has m*n rows. Now if you do a count, grouped by album_name, each of the m album_name's will have a count of n. Not very helpful.
But now, we SELECT from the relation rows where albums.album_id != song_albums.album_id. Now, if you do a count grouped by album_name, the count for those albums that were not in the original songs relation will be n, while those that were originally in there will have a count less than n, since rows would have been removed based on how many songs with that album were in the original songs relation.
Edit: As it turns out, this isn't a strictly relational-algebra solution: In SQL, a 1 x 1 table, such as the one containing n can simply be treated as an integer and used in an equality comparison. However, according to Wikipedia, selection must make a comparison between either two attributes of a relation, or an attribute and a constant value.
Another obstacle which will be dealt with by another ill-recommended Cartesian product: we can take the Cartesian product of the 1 x 1 relation containing n with our most recent relation. Now we can make a proper relational-algebra selection since we have an attribute that is always equal to n.
Since this has gotten rather complex, here is a relational-algebra expression capturing the above english explanation:
Note that n is a 1 x 1 relation with an attribute named "count".
It's impossible. The problem includes a negation, and in relational algebra, that can only be epxressed using relational difference, which you're seemingly not allowed to use.
I'm curious to see what your teacher presents as the solution to this problem.

Resources