Estimating a Size of Joining a Relation with itself - database

I'm studying size estimation of logical query plans in order to select a physical query plan.
I was wondering what is the size of joining (natural join) a relation to itself?
e.g R(a,b) JOIN R(a,b), say total number of tuples is 100 and attributes a and b both has a distinct values of 20.
Will the join size (number of tuples in result) equal to 100?
I'm so confused!

To answer the question as asked:
Natural join of a relation to itself is the identity operation; you'll get exactly the tuples you started with (yes, 100 tuples in this case).
The equivalent SQL for what you ask is:
SELECT R1.a, R1.b FROM R AS R1, R As R2 WHERE R1.a = R2.a AND R1.b = R2.b
This is because RA's (Natural) Join always matches by attribute name.
What could be more sensible? What's to be confused about?

Related

RA translation to natural language

so im stuck in this exercise where I need to translate relational algebra (unary relational operations) expressions based on the Mondial III database to natural language and I need help for the last two and if I have any errors in the ones I answered. BTW i used 6 for sigma (SELECT operation) and |><| for the THETA JOIN operation (couldn't find the sigma or the real theta join operator on my keyboard sorry about that) Any help is much appreciated!Thanks in advance.
Here's the meaning for symbols :
SELECT :
Selects all tuples that satisfy the selection condition from a relation R :
6selection condition(R)
PROJECT : Produces a new relation with only some of the attributes of R, and removes duplicates tuples :
πattribute list(R)
THETHA JOIN : Produces all combinations of tuples from R1 and R2 that satisfy the join condition :
R1< |><|join condition >(R2)
πname(6elevation>1000(MOUNTAIN)) -> Find the name of all mountains whose elevation is higher than 1000.
6elevation>1000(6population>100000(CITY)) -> Select the city's tuples whose elevation is higher than 1000 with a population greater than 100000
6population>100000(6elevation>1000(CITY)) -> Select the city's tuples whose population is greater than 100000 with an elevation higher than 1000
COUNTRY|><|code=country(LANGUAGE) -> ?
πCountry.name(COUNTRY|><|code=country(6Language.name='English' AND percentage>50(LANGUAGE)) -> ?
The fourth expression returns all the informations about the countries together with all the languages spoken (the information about the country is repeated for each different language spoken).
The fifth expression return the name of all the countries where the prevalent language is English.

Count in Relational Algebra

I need to query the number of apartments where all rental contract are signed by occupants from the same nationality
I tried something like this:
π numberapartments
y nationality; numberapartments<--Count(a_id)
And I also need some joins somewhere, I don't know.
How could I do this query?
Thanks.
You can find the schema here
Here are some questions to guide you through a composition of a query like this homework.
When giving tables say exactly what a row says about the business situation in terms of its column values when it is in the table. Also when describing a query result.
What is a query returning rows where
occupant O rents apartment A from date S to date E? Why?
O rents A from a date to a date? Why?
O rents A? Why?
O from nation N rents A? Why?
an occupant from N rents A? Why?
C = the # of nations where an occupant from one rents A? Why?
C = the # of nations where an occupant from one rents A AND C = 1? Why?
a # = the # of nations where an occupant from one rents A & that # = 1? Why?
(the # of nations where an occupant from one rents A) = 1? Why?
What rows are in
Rental?
Occupant?
the result of your desired query? Why?
Re relational querying.
It isn't actually necessary to use counting or grouping to write your query. Such queries that are of the form "rows where … all …" can typically be written using (some variant of) relational division or associated idioms.

Non-First Normal Form natural join operation

I have 2 tables in non-first normal form:
What would be the result of the NATURAL JOIN operation of these two table?
It is not exactly clear what your picture is supposed to represent. I'm going to assume that R1 is a relation with three attributes, A,B and X; R2 is a relation with three attributes, E,B and X.
The natural join would be a join where the values in B and X are equal in both R1 and R2. What type of attribute is X? If X is a relation-valued attribute and the columns labelled C and D represent the tuples in X then it seems that the relation values are different in each case. (X in R1 and X in R2 happen to have some of the same tuple values in common but the values of relation X are different in each case).
So the result of the natural join would be an empty relation with a heading of A,B,E,X but with zero tuples.

query processing (natural join)

i need little help
please answer this question anyone
Consider a join of 3 relations:
r natural join s natural join t
Since join is commutative and associative, the system could join r and s first, s and t first, or r and t first, and then join the remaining relation with the result. If the system is able accurately to estimate. How large the result of a join will be without actually computing the join, should it choose first
1.the join with the largest result
2.the join with the smallest result.
Why?
Doing the join with the smallest result set allows you to reduce the amount of work to be done in the future.
Consider the case where you joing 10 relations with 1.000.000 elements each, knowing that each join will produce on the order of (10^6)^2 elements, and then join with a relation with 10 elements (knowing that result will be 10 elements only). Compare this to a case where you start with 10-element relation first.

finding max value among two table without using max function in relational algebra

Suppose I have two tables A{int m} and B{int m} and I have to find maximum m among two tables using relational algebra but I cannot use max function.How can I do it?I think using join we can do it but i am not sure if my guess is correct or not.
Note: this is an interview question.
Hmm, I'm puzzled why the question involves two tables. For the question as asked, I would just UNION the two (as StilesCrisis has done), then solve for a single table.
So: how to find the maximum m in a table using only NatJOIN? This is a simplified version of finding the top node on a table that holds a hierarchy (think assembly/component explosions or org charts).
The key idea is that we need to 'copy' the table into something with a different attribute name so that we can compare the tuples pair-wise. (And this will therefore use the degenerate form of NatJOIN aka cross-product). See example here How can I find MAX with relational algebra?
A NOT MATCHING
((A x (A RENAME m AS mm)) WHERE m < mm)
The subtrahend is all tuples with m less than some other tuples. The anti-join is all the tuples except those -- ie the MAX. (Using NOT MATCHING I think is both more understandable than MINUS, and doesn't need the relations to be UNION-compatible. It's roughly equivalent to SQL NOT EXISTS).)
[I've used Tutorial D syntax, to avoid mucking about with greek letters.]
SELECT M FROM (SELECT M FROM A UNION SELECT M FROM B) ORDER BY M DESC LIMIT 1
This doesn't use MAX, just plain vanilla SQL.

Resources