Why is join equivalent to intersect on tables with 0 attributes? - database

From Date's Introduction to Database Systems (8th ed., p. 186, edited slightly):
Consider {X1, X2, ..., Xm}, {Y1, Y2, ..., Yn}, {Z1, Z2, ...Zp} as
three composite attributes, X, Y, Z respectively. Then the natural
join of a and b, a JOIN B is a relation with the heading {X, Y, Z{
and body consisting of all the tuples {X x, Y y, Z z} such that a
tuple appears in a with X value x and Y value y and a tuple appears in
b with Y value y and Z value z.
If m = p = 0 (meaning a and b are of the same type), then a JOIN b degenerates to a INTERSECT b.
How does this degeneration work?
Image:

INTERSECT is just JOIN when the argument attribute sets are the same. TIMES is just JOIN when the argument attribute sets are disjoint. Both argument attribute sets empty is just a special case. (That is both INTERSECT & TIMES.)
But the empty sets for m = p = 0 in the quote aren't the attribute sets of the arguments, they are the sets of attributes that are unique to the left argument a and the right argument b. When these sets are empty the only attributes are common ones, ie the argument attribute sets are the same. The empty set for n = 0 is the common attributes, ie a & b are disjoint.
It is clearer to avoid this obscure business of "considering as composite attributes". For a with attribute set A and b with attribute set B, a JOIN b (has attribute set A U B and) holds the set of tuples t where there exist tuples ta & tb where ta IN a and tb IN b and t = ta ∪ tb. When A = B we have INTERSECT and when A ∩ B = {} we have TIMES.

Related

Access elements of group given by finite presentation in Magma Calculator

I have been trying to use the Magma Calculator: http://magma.maths.usyd.edu.au/calc/
Given a word u, in a finitely presented group, how do I declare g to be the group element represented by u?
Context:
I can define a group via a finite presentation. For example using the code:
G<a,b,c> := Group< a,b,c | a^2=b^2, b^2=c^2, c=a*b >;
If I then ask for the order of the group:
Order (G);
The correct answer of 8 is returned, so it does understand what G is.
However I want to know how to ask if two elements of the group are equal.
The problem is that a,b,c as well as G.1, G.2, G.3 denote elements of the free group generated by a,b,c. Similarly products of those symbols (and their inverses) represent words in the free group.
Thus
a*a^-1 eq a^-1*a;
returns true, as it is true in the free group, but
a^2 eq c^2;
returns false, even though it is true in the group G.
This is explained in https://magma.maths.usyd.edu.au/magma/handbook/text/851.
In the section "Operations on words" it says that:
"u eq v : Returns true if the words u and v are identical when freely reduced. NB When G is not a free group and false is returned, this does not imply that u and v do not represent the same element of G."
However Magma can do operations with group elements, including answering if two elements g,h, are the same:
g eq h;
The question is then, given a word u, in a finitely presented group, how do I declare g to be the group element represented by u?
Following the anwer by #Userulli I typed
G<a,b,c> := Group< a,b,c | a^2=b^2, b^2=c^2, c=a*b >;
u := a^2;
v := c^2;
g := ElementToSequence(G, u);
h := ElementToSequence(G, v);
g eq h;
and got the reply
>> g := ElementToSequence(G, u);
^
Runtime error in 'ElementToSequence': Bad argument types
Argument types given: GrpFP, GrpFPElt
>> h := ElementToSequence(G, v);
^
Runtime error in 'ElementToSequence': Bad argument types
Argument types given: GrpFP, GrpFPElt
>> g eq h;
^
User error: Identifier 'g' has not been declared or assigned
Have you tryied using the ElementToSequence method?
For example :
u := a^2;
g := ElementToSequence(u, G);
Then you can compare g with other elements in the group to see if they are the same:
v := c^2;
h := ElementToSequence(v, G);
h eq g; // this should return true
EDIT:
Stating the doc magma.maths.usyd.edu.au/magma/handbook/text/208 you need to use fields instead, so you should convert the group into a field, and then use ElementToSequence method to compare the two sequences?

BCNF and 4NF property

I read a statement "Relation R in BCNF with at-least one simple candidate key is also in 4NF"
I don't think that it is always true but I am not able to prove it.
Can someone please help ?
The statement is true and this is the sketch of the proof taken from the paper "Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases", by C.J.Date and R.Fagin, ACM TODS, Vol.17, No. 3, Sep. 1992.
A relation is in 4NF if, for every nontrivial multivalued dependency X →→ Y in F+, X is a superkey for R. So, if a relation is in BCNF, but not in 4NF, then there must exists a nontrivial multivalued dependency (MVD) X →→ Y such that X is not the key. We will show that this is in contradiction with the fact that the relation is in BCNF and has a candidate key K constituted by a unique attribute (simple candidate key).
Consider the fact that, in a relation R(T), when we have a nontrivial MVD X →→ Y, (assuming, without loss of generality that X and Y are disjoint), then also the MVD dependency X →→ Z must hold in the same relation, with Z = T - X - Y (that is Z are all the other attributes of the relation). We can now prove that each candidate key must contain at least an attribute of Z and an attribute of Y (so it must contain at least 2 attributes!).
Since we have X →→ Y and X →→ Z, and X is not a candidate key, assume that the hypothesis is false, that is that there is a candidate K which does not contain a member of Y (and for symmetry, neither a member of Z). But, since K is a key, we have that K → Y, with K and Y disjoint.
Now, there is an inference rule that says that, in general, if V →→ W and U → W, where U and W are disjoint, then V → W.
Applying this rule to our case, since X →→ Y, and K → Y, we can say that X → Y. But this is a contradiction, since we have said that R is in BCNF, and X is not a candidate key.
In other words, if a relation is not in 4NF, than each key must have at least 2 attributes.
And given the initial hypothesis, that we have a relation in BCNF with at least a simple candidate key, for the previous lemma, the relation must be in 4NF (otherwise every key should be constituted by at least 2 attributes!).

Multi-Valued Dependencies?

I am having some trouble understanding Multi-Valued Dependencies. The definition being: A multivalued dependency exists when there are at least 3 attributes (like X,Y and Z) in a relation and for value of X there is a well defined set of values of Y and a well defined set of values of Z. However, the set of values of Y is independent of set Z and vice versa.
Suppose we have a relation R(A,B,C,D,E) that satisfies the MVD's
A →→ B and B →→ D
How does MVD play into A->B and B->D here? Honestly I'm not sure I really understand the definition after looking at example problems.
If R contains the tuples (0,1,2,3,4) and (0,5,6,7,8), what other tuples must
necessarily be in R? Identify one such tuple in the list below.
a) (0,5,2,3,8)
b) (0,5,6,3,8)
c) (0,5,6,7,4)
d) (0,1,6,3,4)
I would have thought AB is 0,1 and 0,5 and BD is 1,3 and 5,7. None of the answers have 0,1,3,5,7.
MVDs (multi-valued dependencies) have nothing to do with "at least 3 attributes". (You seem to be quoting the Wikipedia article but that informal definition is partly wrong and partly unintelligible.) You need to read and think through a clear, precise and correct definition. (Which you were probably given.)
MVD X ↠ Y mentions two subsets of the set S of all attributes, X & Y. There are lots of ways to define when a MVD holds in a relation but the simplest to state & envisage is probably that the two projections XY and X(S-Y) join to the original relation. Which also mentions a third subset, S-Y. Which is what the (binary) JD (join dependency) {XY, X(S-Y)} says.
Wikipedia (although that article is a mess):
A decomposition of R into (X, Y) and (X, R−Y) is a lossless-join decomposition if and only if X ↠ Y holds in R.
From this answer:
MVDs always come in pairs. Suppose MVD X ↠ Y holds in a relation with attributes S, normalized to components XY & X(S-Y). Notice that S-XY is the set of non-X non-Y attributes, and X(S-Y) = X(S-XY). Then there is also an MVD X ↠ S-XY, normalized to components X(S-XY) & X(S-(S-XY)), ie X(S-XY) & XY, ie X(S-Y) & XY. Why? Notice that both MVDs give the same component pair. Ie both MVDs describe the same condition, that S = XY JOIN X(S-XY). So when an MVD holds, that partner holds too. We can write the condition expressed by each of the MVDs using the special explicit & symmetrical notation X ↠ Y | S-XY.
For R =
A,B,C,D,E
0,1,2,3,4
0,5,6,7,8
...
A ↠ B tells us that the following join to R:
A,B A,C,D,E
0,1 0,2,3,4
0,5 0,6,7,8
... ...
so R has at least
A,B,C,D,E
0,1,2,3,4
0,1,6,7,8
0,5,2,3,4
0,5,6,7,8
of which two are given and two are new but not choices.
B ↠ D tells us that the following join to R:
B,D A,B,C,E
1,3 0,1,2,4
5,7 0,5,6,8
... ...
so R has at least
A,B,C,D,E
0,1,2,3,4
0,5,6,7,8
which we already know.
So we don't yet know whether any of the choices are in R. But we do now know R is
A,B,C,D,E
0,1,2,3,4
0,1,6,7,8
0,5,2,3,4
0,5,6,7,8
...
Repeating, A ↠ B adds no new tuples but B ↠ D now gives this join:
B,D A,B,C,E
1,3 0,1,2,4
1,7 0,1,6,8
5,3 0,5,2,4
5,7 0,5,6,8
... ...
And one of the tuples in that join is choice b) (0,5,6,3,8).
The way the question is phrased, they are probably expecting you to use a definition that they will have given you that is like another two in Wikipedia. One says that α ↠ β holds in R when
[...] if we denote by (x, y, z) the tuple having values for α, β, R − α − β collectively equal to x, y, z, then whenever the tuples (a, b, c) and (a, d, e) exist in r, the tuples (a, b, e) and (a, d, c) should also exist in r.
(The only sense in which this gives the "formal" definition "in more simple words" is that this is also a definition. Because this isn't actually paraphrasing it, because this uses R − α − β whereas it uses R − β.)
By applying this rule to repeatedly generate further tuples for R starting from the given ones, we end up generating b) (0,5,6,3,8) much as we did above.
PS I would normally suggest that you review the (unsound) reasoning that led you to expect "AB is 0,1 and 0,5 and BD is 1,3 and 5,7" (whatever that means) or "0,1,3,5,7". But the "definition" you give (from Wikipedia) doesn't make any sense. So I suggest that you consider what you were doing with it.

How to find top 10 sum of each match?

I mean if I have several X and several Y
and I do a match like this:
X -[ W ]-> Y
With X and Y related by several W ( there can be several W between same pairs (X,Y) )
I want top ten X for each Y with the property sum(W.property)
If I return
return Y , sum(W.property) , X order by sum(W.property) desc Limit 10
I Just get 10 but I need for every Y,
Is there a way to do that?
MATCH X -[ W ]-> Y
WITH Y, sum(W.property) AS total, X
ORDER BY total DESC
WITH Y, collect({sum: total, X: X})[0..10] AS values
UNWIND values AS value
RETURN Y, value.sum, value.X
You can actually skip the UNWIND and just change that second WITH to a RETURN if you're OK with it returning it as an array. It would be a bit more efficient because you're not repeating values of Y over and over. If you were going to do that you could even change the map structure into an array like this:
collect([total, X])[0..10]

What's a good way to store this relation so I can answer queries of this form efficiently?

Excuse me if I get a little mathy for a second:
I have two sets, X and Y, and a many-to-many relation ℜ &subseteq; X&cross;Y.
For all x ∈ X, let xℜ = { y | (x,y) ∈ ℜ } &subseteq; Y, the subset of Y associated with x by ℜ.
For all y ∈ Y, let ℜy = { x | (x,y) ∈ ℜ } &subseteq; X, the subset of X associated with y by ℜ.
Define a query as a set of subsets of Y, Q &subseteq; ℘(Y).
Let the image of the query be the union of the subsets in Q:image(Q) = Uq∈Q q
Say an element of X x satisifies a query Q if for all q ∈ Q, q ∩ xℜ ≠ ∅, that is if all subsets in Q overlap with the subset of Y associated with x.
Define evidence of satisfaction of an element x of a query Q such that:evidence(x,Q) = xℜ ∩ image(Q)
That is, the parts of Y that are associated with x and were used to match some part of Q. This could be used to verify whether x satisfies Q.
My question is how should I store my relation ℜ so that I can efficiently report which x∈X satisfy queries, and preferably report evidence of satisfaction?
The relation isn't overly huge, as csv it's only about 6GB. I've got a couple ideas, neither of which I'm particularly happy with:
I could store { (x, xℜ) | ∀ x∈X } just in a flat file, then do O(|X||Q||Y|) work checking each x to see if it satisfies the query. This could be parallelized, but feels wrong.
I could store ℜ in a DB table indexed on Y, retrieve { (y, ℜy) | ∀ y∈image(Q) }, then invert it to get { (x, evidence(x,Q)) | ∀ x s.t. evidence(x,Q) ≠ ∅ }, then check just that to find the x that satisfy Q and the evidence. This seems a little better, but I feel like inverting it myself might be doing something I could ask my RDBMS to do.
How could I be doing this better?
I think #2 is the way to go. Also, if Q can be represented in CNF you can use several queries plus INTERSECT to get the RDBMS to do some of the heavy lifting. (Similarly with DNF and UNION.)
This also looks a bit a you want a "inverse index", which some RDBMS have support for. X = set of documents, Y = set of words, q = set of words matching the glob "a*c".
HTH

Resources