Relational Algebra - Intersection Properties - database

This investigation initially began with a test question, where I asserted
R ∩ S = R ∩ (R ∩ S)
As I have researched, I have not been able to come up with much information as to the properties of intersection within relational algebra.
If this was set theory, I would assert
R ∩ (R ∩ S) => Associativity
(R ∩ R) ∩ S => Idempotence
(R) ∩ S = R ∩ S
With relational algebra, we are instead operating on bags. I believe I can take these same steps (idempotence seems trivial, and associativity is suggested on pg. 3 of http://comet.lehman.cuny.edu/stjohn/teaching/db/ullmanSlidesF00/slides7.pdf) with bags, but I cannot quite make it to a formal proof.
Could anyone assist me in asserting (or disproving, by counterexample or otherwise) associativity and idempotence of the intersection in relational algebra?
Thank you very much.

The Relational Model is formally defined over sets, so the Relational Algebra is defined only over sets. But since the Relational Databases extended the model to include multisets, or bags, the Relational Algebra has been similarly extended over multisets, and almost all the modern books on Databases describes this extension.
In particular, using the notation Opb to represent the variant of operator Op over multisets, we can define the operators ∩b, ∪b and −b as follows:
R ∩b S: if an element t appear n times in R and m times in S, then it appears min(n,m) times in R ∩b S;
R ∪b S: if an element t appear n times in R and m times in S, then it appears n+m times in R ∪b S;
R −b S: if an element t appear n times in R and m times in S, then it appears max(0,n-m) times in R −b S.
You want to show that:
R ∩b S = R ∩b (R ∩b S)
As your correct derivation shows, this can be proved if we can prove the Associativity and Idempotence properties for the operator ∩b.
Idempotence:
R ∩b R = R
In this case, since each element appear the same number of times in both the operands, that is m = n, we have that it appears min(n, n) = n times in the result, i.e. it appears the same number of times of R.
Associativity:
(R ∩b S) ∩b T = R ∩b (S ∩b T)
Suppose that a certain element t appears n times in R, m times in S, and k times in T, we have that in:
(R ∩b S) ∩b T
it will appear min(min(n,m), k) times in the result, and this is equal to min(n,m,k). On the other hand, in:
R ∩b (S ∩b T)
it will appear min(n, min(m,k)) times in the result, but this again is equal to min(n,m,k).
So this means that it will appear the same number of times in both the results, and so the two results are equal.

As philipxy has already commented, relations are sets, and therefore set theory applies. The link provided below gives you a list of transformation equivalences, and the associative property of both union and intersection are listed in there.
http://www.postgresql.org/message-id/attachment/32495/equivalenceRules.pdf
The question might be raised about how many of the relational algebra theorems are provably true in the realm of SQL databases, rather than relational algebra. This is problematic, because any given implementation might have bugs, and there could even be flaws in the SQL model. I have tended to blindly assume that the results of relational algebra carry over into the practical world, but some deep thinkers sound a note of caution in that regard.

Related

Understanding "well founded" proofs in Coq

I'm writing a fixpoint that requires an integer to be incremented "towards" zero at every iteration. This is too complicated for Coq to recognize as a decreasing argument automatically and I'm trying prove that my fixpoint will terminate.
I have been copying (what I believe is) an example of a well-foundedness proof for a step function on Z from the standard library. (Here)
Require Import ZArith.Zwf.
Section wf_proof_wf_inc.
Variable c : Z.
Let Z_increment (z:Z) := (z + ((Z.sgn c) * (-1)))%Z.
Lemma Zwf_wf_inc : well_founded (Zwf c).
Proof.
unfold well_founded.
intros a.
Qed.
End wf_proof_wf_inc.
which creates the following context:
c : Z
wf_inc := fun z : Z => (z + Z.sgn c * -1)%Z : Z -> Z
a : Z
============================
Acc (Zwf c) a
My question is what does this goal actually mean?
I thought that the goal I'd have to prove for this would at least involve the step function that I want to show has the "well founded" property, "Z_increment".
The most useful explanation I have looked at is this but I've never worked with the list type that it uses and it doesn't explain what is meant by terms like "accessible".
Basically, you don't need to do a well founded proof, you just need to prove that your function decreases the (natural number) abs(z). More concretely, you can implement abs (z:Z) : nat := z_to_nat (z * Z.sgn z) (with some appropriate conversion to nat) and then use this as a measure with Function, something like Function foo z {measure abs z} := ....
The well founded business is for showing relations are well-founded: the idea is that you can prove your function terminates by showing it "decreases" some well-founded relation R (think of it as <); that is, the definition of f x makes recursive subcalls f y only when R y x. For this to work R has to be well-founded, which intuitively means it has no infinitely descending chains. CPDT's general recursion chapter as a really good explanation of how this really works.
How does this relate to what you're doing? The standard library proves that, for all lower bounds c, x < y is a well-founded relation in Z if additionally its only applied to y >= c. I don't think this applies to you - instead you move towards zero, so you can just decrease abs z with the usual < relation on nats. The standard library already has a proof that this relation is well founded, and that's what Function ... {measure ...} uses.

BCNF Decomposition & 3NF

I'm working through problems in my textbook to get ready for my test, and I'm having a pretty tough time with figuring out this question.
Consider a relation
S(B,O,I,S,Q,D)
FDs: S->D, I->B, IS->Q, B->O
I need to do the BCNF decomposition, and then determine all of the keys of S.
I did BCNF decomposition and determined that IS is a superkey, but I can't figure out the rest of the decomposition to figure out the other keys.
I also need to find a minimal bases for the given FDS, and use 3NF synthesis algo to find a loseless-join dependency-preserving decomposition of S into 3NF.
Any help is much appreciated, I am beyond confused here and am really struggling with this problem.
{I S} is the only key, and this is easy to show. The attributes I and S appear only in the left parts of functional dependencies, so they must belong to any key. And since they are already a (super)key, no other key exists.
The functional dependencies are already a minimal cover (or minimal base) since: a) every right part has only one attribute; b) in the dependency IS→Q no attribute on the left part is superfluous, and c) no dependency is redundant.
So the 3NF decomposition is:
R1 < (B O),
{ B → O } >
R2 < (B I),
{ I → B } >
R3 < (I S Q),
{ I S → Q } >
R4 < (D S),
{ S → D } >
which is equal to the result of the decomposition in BCNF.

Determining k of LR(k) from this example?

I have prepared the following grammar that generates a subset of C logical and integer arithmetic expressions:
Expression:
LogicalOrExpression
LogicalOrExpression ? Expression : LogicalOrExpression
LogicalOrExpression:
LogicalAndExpression
LogicalOrExpression || LogicalAndExpression
LogicalAndExpression:
EqualityExpression
LogicalAndExpression && RelationalExpression
EqualityExpression:
RelationalExpression
EqualityExpression EqualityOperator RelationalExpression
EqualityOperator:
==
!=
RelationalExpression:
AdditiveExpression
RelationalExpression RelationalOperator AdditiveExpression
RelationalOperator:
<
>
<=
>=
AdditiveExpression:
MultiplicativeExpression
AdditiveExpression AdditiveOperator MultiplicativeExpression
AdditiveOperator:
+
-
MultiplicativeExpression:
UnaryExpression
MultiplicativeExpression MultiplicativeOperator UnaryExpression
MultiplicativeOperator:
*
/
%
UnaryExpression:
PrimaryExpression
UnaryOperator UnaryExpression
UnaryOperator:
+
-
!
PrimaryExpression:
BoolLiteral // TERMINAL
IntegerLiteral // TERMINAL
Identifier // TERMINAL
( Expression )
I want to try using shift/reduce parsing and so would like to know what is the smallest k (if any) for which this grammar is LR(k)? (and more generally how to determine the k from an arbitrary grammar if possible?)
The sample grammar is (almost) an operator precedence grammar, or Floyd grammar (FG). To make it an FG, you'd have to macro-expand the non-terminals whose right-hand sides consist of only a single terminal, because operator precedence grammars must be operator grammars, and an operator grammar has the feature that no right-hand side has two consecutive non-terminals.
All operator-precedence grammars are LR(1). It's also trivial to show whether or not an operator grammar has the precedence property, and particularly trivial in the case that every terminal appears in precisely one right-hand side, as in your grammar. An operator grammar in which every terminal appears in precisely one right-hand side is always an operator-precedence grammar [1] and consequently always LR(1).
FGs are a large class of grammars, some of them even useful (Algol 60, for example, was described by an FG) for which it is easy to answer the question about them being LR(k) for some k, since the answer is always "yes, with K == 1". Just for precision, here are the properties. We use the normal convention where a grammar G is a 4-tuple (N, Σ, P, S) where N is a set of non-terminals; Σ is a set of terminals, P is a set of productions, and S is the start symbol. We write V for N &Union; Σ. In any grammar, we have:
N &Intersection; Σ &equals; ∅
S &in; N
P &subset; V&plus; × V*
The "context-free" requirement restricts P so every left-hand-side is a single non-terminal:
P &subset; Σ × V*
In an operator grammar, P is further restricted: no right-hand side is either empty, and no right-hand side has two consecutive non-terminals:
P &subset; Σ × (V+ − V*ΣΣV*)
In an operator precedence grammar, we define three precedence relations, ⋖, ⋗ and ≐. These are defined in terms of the relations Leads and Trails [2], where `
T Leads V iff T is the first terminal in some string derived from V
T Trails V iff T is the last terminal in some string derived from V
Then:
t1 ⋖ t2 iff ∃v &bepsi; t2 Leads v ∧ N&rightarrow;V*t1vV* &in; P
t1 ⋗ t2 iff ∃v &bepsi; t1 Trails v ∧ N&rightarrow;V*vt2V* &in; P
t1 &esdot; t2 iff N&rightarrow;V*t1t2V* &in; P ∨ N&rightarrow;V*t1V't2V* &in; P
An intuitive way of thinking about those relations is this: Normally when we do the derivations, we just substitute RHS for LHS, but suppose we substitute ⋖ RHS ⋗ instead. Then we can modify a derivation by dropping the non-terminals and collapsing strings of consecutive ⋖ and ⋗ to single symbols, and finally adding &esdot; between any two consecutive terminals which have no intervening operator. From that, we just read off the relations.
Now, we can perform that computation on any operator grammar, but there is nothing which forces the above relations to be exclusive. An operator grammar is a Floyd grammar precisely if those three relations are mutually exclusive.
Verifying that an operator grammar has mutually exclusive precedence relations is straight-forward; Leads and Trails require a transitive closure over First and Last, which is roughly O(|G|2) (it's actually the product of the number of non-terminals and the number of productions); from there, the precedence relations can be computed with a single linear scan over all productions in the grammar, which is O(|G|).
From Donald Knuths On the Translation of Languages from Left to Right, in the abstract,
It is shown that the problem of whether or not a grammar is LR(k) for some k is undecidable,
In otherwords,
Given a grammar G, "∃k. G ∊ LR(k)" is undecidable.
Therefore, the best we can do in general is try constructing a parser for LR(0), then LR(1), LR(2), etc. At some point you will succeed, or you may at some point give up when k becomes large.
This specific grammar
In this specific case, I happen to know that the grammar you give is LALR(1), which means it must therefore be LR(1). I know this because I have written LALR parsers for similar languages. It can't be LR(0) for obvious reasons (the grammar {A -> x, A -> A + x} is not LR(0)).

If a regular language only contains Kleene star, then is it possible that it comes from the concatenation of two non-regular languages?

I want to know that given a regular language L that only contains Kleene star operator (e.g (ab)*), is it possible that L can be generated by the concatenation of two non-regular languages? I try to prove that L can be only generated by the concatenation of two regular languages.
Thanks.
This statement is false. Consider these two languages over Σ = {a}:
L1 = { an | n is a power of two } ∪ { ε }
L2 = { an | n is not a power of two } ∪ { ε }
Neither of these languages are regular (the first one can be proven to be nonregular by using the Myhill-Nerode theorem, and the second is closely related to the complement of L1 and can also be proven to be nonregular.
However, I'm going to claim that L1L2 = a*. First, note that any string in the concatenation L1L2 has the form an and therefore is an element of a*. Next, take any string in a*; let it be an. If n is a power of two, then it can be formed as the concatenation of an from L1 and ε from L2. Otherwise, n isn't a power of two, and it can be formed as the concatenation of ε from L1 and an from L2. Therefore, L1L2 = a*, so the theorem you're trying to prove is false.
Hope this helps!

How can I prove if this language is regular or not?

How can I prove if this language is regular or not?
L = {an bn: n≥1} union {an bn+2: n≥1}
I'll give an approach and a sketch of a prove, there might be some holes in it that I believe you can fill yourself.
The idea is to use nerode's theorem - show that there are infinte number of equivalence groups for RL - and from the theorem you can derive that the language is irregular.
Define two types of sets:
G_j = {anb k | n-k = j , k≥1} for each j in
[-2,-1,0,1,...]
H_j = {aj } for each j in
[0,1,...]
G_illegal = {0,1}* / (G_j U H_j) [for each j in the specified range]
It is easy to see that for each x in G_illegal, and for each z in {a,b}*: xz is not in L.
So, for every x,y in G_illegal and for each z in {a,b}*: xz in L <-> yz in L.
Also, for each z in {a,b}* - and for each x,y in some G_j [same j for both]:
if z contains a, both xz and yz are not in L
if z = bj, then xz = an bk bj, and since k+j = n - xz is in L. Same applies for y, so yz is in L.
if z = bj+2, then xz = an bk bj+2, and since k+j+2 = n+2 - xz is in L. Same applies for y, so yz is in L.
otherwise, x is bi such that i≠j and i≠j+2, and you get that both xz and yz are not in L.
So, for every j and for every x,y in G_j and for each z in {a,b}*: xz in L <-> yz in L.
Prove the same for every H_j using the same approach.
Also, it is easy to show that for each x G_j U H_j, and for each y in G_illegal - for z = bj, xz is in L and yz is not in L.
For x in G_j, and y in H_i, for z = abj+1 - it is easy to see that xz is not in L and yz is in L.
It is easy to see that for x,y in G_j and G_i respectively or x, y in H_j, H_i - for z = bj: xz is in L while yz is not.
We just proved that the sets we created are actually the equivalence relations for RL from nerode's theorem, and since we have infinite number of these sets, each is an equivalence relation for RL [we have H_j and G_j for every j] - we can derive from nerode's theorem that the language L is irregular.
You could just use the pumping lemma for regular languages. It basically says that if you can find a string for any given integer n and any partition of this string into xyz such that |xy| <= n, and |y| > 0, then you can pump the y part of the string, and it has to stay in the language, that means, if xy^iz it's not in the language for some i, then the language is not regular.
The proof goes like this, kind of a adversary proof. Suppose someone tells you that this language is regular. Then ask him for a number n > 0. You build a convenient string of length greater than n, and you give to the adversary. He partitions the string in x, y z, in any way he wants, as long as |xy| <= n. Then you have to pump y (repeat it i times) until you find a string that is not in that same language.
In this case, I tell you: give me n. You fix n. The I tell you: take the string "a^n b^{n+2}", and tell you to split it. In any way that you can split this string, you will always have to make y = a^k, with k > 0, since you are force to make |xy| <= n, and my string begins with a^n. Here is the trick, you give the adversary a string such that any way he can split it, he gives you a part that you can pump. So now we pump y, let's say, 0 times, and you get "a^m b^{n+2}" with m < n, which is not in your language. Done. We can also pump a 1 time, n times, n! factorial times, anything you need to make it leave the language.
The proof of this theorem goes around saying that if you have a regular language then you have an automaton with n states for some fixed n. If a string has more than n characters, then it must go through some cycle in your automaton. If we name x the part of the string before entering the cycle, and y the part in the cycle, it's clear that we can pump y as many times as we want, because we can keep running on the cycle as many times as we want, and the resulting string has to be in the language, because it will be recognized by that automaton. To use the theorem to prove for non-regularity, since we don't know how the supposed automaton will be, we have to leave to the adversary the choose for n and for the position of the cycle inside the automaton (there will be no automaton, but you say to the adversary something like: dare to give me an automaton and I will show you it cannot exist.)

Resources