Definition of functional dependency, ambigiuous "for all pairs"

Definition of functional dependency, ambigiuous "for all pairs" - database

for all pairs of tuples t1 and t2 such that
t1[A] = t2[A] then t1[B] = t2[B]
Can "a pair" also be a pair of the same tuple, meaning t1 = t2, or does it mean only two distinct tuples?

TL;DR The "pairs of" is redundant informal language. The informal language is trying to say that they have 2 names--a pair/paring/duo/couple/twosome/dyad of names--and a value is sought for each one. The value associated with one can be the value associated with the other--they can name the same value.
You are to twice find a value for a name: a value to call by "t1" then a value to call by "t2". The formal notation is "EXISTS t1, t2 (...)" or "EXISTS t1 EXISTS t2 (...)".
So for each name you might or might not find a value; so you might get zero, one or both names naming values; and if both, the names might or might not end up with the same value; and if they do, you might or might not have got the value from the same tuple-valued element of the set that is the relation body.
From my answer at Determining if this data is really in 4th normal form? re MVDs (mulitvalued dependencies):
"There exist" says some values exist, and they don't have to be different. EXISTS followed by some name(s) says that there exist(s) some value(s) referred to by the name(s), for which a condition holds. Multiple names can refer to the same value. (FOR ALL can be expressed in terms of EXISTS.)
When such statements are given formally we say, "for all X" (universal quantification) or "there exists X" (existential quantification) where "X" is a name and we mean that "for all values" or "there exists a value" that you could use that name for in what follows. This is basic logic as used in mathematics, science and engineering.
They say "for all pairs of tuples", but they mean for all sequences that are a tuple-valued value followed by a tuple-valued value. "The first value" and "the second value" might be equal, ie be "the same value" even though there are two "values". The natural language is not clear, you have to learn what certain phrasings mean.
A free resource https://www.fecundity.com/logic/ :
forall x is an Open Education Resource (OER) introductory textbook in formal logic. It covers translation, proofs, and formal semantics for sentential and predicate logic.
A variant at https://open.umn.edu/opentextbooks/textbooks/1139 is forall x: Calgary.

Related

Venn diagram notation for all other sets except one

I'm trying to find a Venn diagram notation that can illustrate data that is only in a single set.
If I can select data from all the other sets, without knowing how many there are, then I can find the intersection of their complement, to select data only in the targeting set.
My current solution looks like this, but it assumes the existance of sets B and C.
The eventual diagram expecting to look like this:

One way to do it would be by using a system based on regions rather than sets. In your case, it would be the region that belongs to set A but does not belong to any other set. You can find the rationale to do that here. The idea is to express the region as a binary chain where 1 means "belongs to set n" and 0 means "does not belong to set n", where n is determined by the ordering of the sets.
In your example, you might define A as the last set, and therefore as the last bit. With three sets CBA, your region would be 001. The nice thing about this is that the leading zeroes can be naturally disregarded. Your region would be 1b, not matter how many sets there are (the b is for "binary").
You might even extend the idea by translating the number to another base. For instance, say that you want to express the region of elements belonging to set B only. With the same ordering as before, it would be 010 or 10b. But you can also express it as a decimal number and say "region 2". This expression would be valid if sets A and B exist, independently of the presence of any other set.

Is multiplication allowed in relational algebra?

I have a relation
R
-------
cid sid gradepoint credits
CS425 001 4.0 3
I need to calculate the GPA. There are more rows, but I believe if I just get this answered I should be ok with the rest. I need to do gradepoint * credits. How do I express this with a relational algebra expression?
My best guess is:
, but I'm not sure if I can multiply attributes with anything other than a constant.

Relational algebra doesn't address domain-specific operations. It neither includes nor excludes it, just like real algebra neither includes nor excludes operations on relations.
If you allow multiplication by constants, you're already combining algebras (which is pretty much required for any practical application) so I see no reason to disallow multiplication between attributes.

Notice that if expressions like you are using are allowed then it is projection that is doing the multiplying. Instead of its inputs being a relation value & attribute names, its inputs are a relation value and expressions of some sort that include names of operators whose inputs are values of attribute types. Your projection is parsing and multiplying. So it is a different operator than one that only accepts attribute names.
The projection that takes attribute expressions begs the question of its implementation given an algebra with projection only on a relation value and attribute names. This is important in an academic setting because a question may be wanting you to actually figure out how to do that, or because the difficulty of a question is dependent on the operators available. So find out what algebra you are supposed to use.
We can introduce an operator on attribute values when we only have basic relation operators taking attribute names and relation values. Each such operator can be associated with a relation value that has an attribute for each operand and an attribute for the result. The relation holds the tuples where the result value is equal to the the result of the operator called on the operand values. (The result is functionally dependent on the operands.)
So suppose we have the following table value Times holding tuples where left * right = result:
left right result
-------------------
0 0 0
1 0 0
...
0 1 0
1 1 1
2 1 2
...
If your calculated attribute is result then you want
/* tuples where for some credits & gradepoint,
course cid's student sid earned grade gradepoint and credits credits
and credits * gradepoint = result
*/
project cid, sid, result (
R natural join (rename left\credits right\gradepoint (Times))
)
Relational algebra - recode column values
PS re algebra vs language: What is a reference to the "relational algebra" you are using? There are many. They even have different notions of what a "relation" is. Some so-called "algebras" are really languages because the expressions don't only represent the results of operators being called on values. Although it is possible for an algebra to have operand values that represent expressions and/or relation values that contain names for themselves.
PS re separation of concerns: You haven't said what the attribute name of the multiplication result is. If you're expecting it to be credit * gradepoint then you're also expecting projection to map expression-valued inputs to attribute names. Except you are expecting credit * gradepoint to be recognized as an expression with two attribute names & and an operator name in one place but to be just one attribute name in another. There are solutions for these things in language design, eg in SQL optional quotes specialized for attribute names. But maybe you can see why simple things like an algebra operating on just attribute names & relation values with unique, unordered attribute names helps us understand in terms of self-contained chunks.

How to read ternary relationship in ORM(Object role modeling) diagram?

I have an example of allocation Result below. I know how to read binary relationships but this triple box marked red in the image confuses me.
When and in what order do we read those roles after slashes inside the box : in, award of?
I assume that we can read this diagram in 3 ways:
First box Student has result of Grade for Unit.
Second box Grade givent to Student in Unit
Third box Unit grants to Student award of Grade (? this one has no sense ?)
Can we read it any more ways?

This is not really a valid ORM diagram (as defined by Halpin), because:
all the fact types lack a reading (entity type names are provided instead),
no uniqueness constraints are shown (every fact type must have at least one uniqueness constraint),
role predicates are shown inside the role boxes (not an ORM practise),
the ternary is not shown as objectified, even though it (and the other fact types) have entity names.
the identifying roles must be mandatory, but this is not shown.
Here, Result is the name of an entity type. This entity type objectifies the ternary fact type, for which no name and no reading are supplied. If you wanted to name the fact type, a suitable name might be "Grading", in reference to the action in which the Result was assigned. The same problem applies to the other fact types; the creator of this diagram is confused as to the difference between a fact type and the object which may objectify that fact type. For example, the fact type "Student identify" is a noun, but the fact type (if it is to be named) should be called "Student Identification", naming the action not the objectification. Similarly for Grade award (Grade Coding), Unit identity (Unit Naming).
However, leaving aside these syntax differences, the possible intended readings for the Result fact type are "Student achieved Grade for Unit", "Grade was awarded to Student for Unit", and the like. In CQL, the objectification is indicated by saying "each Result is where some Student achieved some Grade for some Unit, that Grade was awarded to that Student for that Unit"
The predicate text inside the roles of the Result fact type are not well defined. The intention here is to apply these predicates to the relationship between the Result entity and the object which plays each respective role. These three binary fact types (called Link Fact Types) are not shown, but are implied by the objectification. I would suggest the following readings for these link fact types:
Result is awarded to Student/Student received Result
Result is of Grade/Grade applies to Result
Result is awarded for Unit/Unit received Result
Normally the link fact types and the associated readings are not shown, and the implicit readings provided by the tools are "involved in/is of", for example "Student is involved in Result", "Result is of Student". You can see why it can be better to provide custom readings instead.
I recommend you get a copy of Terry Halpin's book "Information Modeling and Relational Databases" and learn from that, because it's clear that your instructor has non-standard theory and practises.

What Happens when Cartesian Product is applied to Relations with same attribute name

I understand that the Cartesian product(X) operation on two databases does not need to be UNION compatible.So,if there is a same attribute called name in the two relations R and S where name in R is the first name and name in S is the second name
How can the related values be identified by the following selection operation
Q=RxS
I want to get the set of tuples whose firstname=lastname,So how am i supposed to write the selection statement?
σ Name=Name(Q)
Will this be a problem using the same attribute name in the selection operation?

Cartesion product does not require attributes to be named differently. It only requires relations to be named differently.
For example, D := A(id, name) X B(id, age) is perfectly valid, and the resulting relation is D(A.id, name, B.id, age).
In other words, the attributes are automatically renamed by prepending the relation name, as part of the cartesion product. This prepend operation also leads to the requirement that relations to be named differently.
Source:
- Database System Concepts 6th Edition, Chapter 6.1.1.6 The Cartesian-Product Operation, for the definition, and an example in Figure 6.8 Result of instructor × teaches.

Correct that for Cartesian product the relations need not be UNION compatible.
But they still need to be compatible! Otherwise there are exactly the difficulties you point out. So the rule for Cartesian product is that there must be no attributes in common.
So if you have a clash of attributes, first you must rename the attributes before crossing.
See http://en.wikipedia.org/wiki/Relational_algebra on 'Natural Join'. (That defines Nat Join in terms of Rename, Cartesian product and Projection.)
From the point of view of learning the RA, I would think of Natural Join as the basic operation. And Cartesian product as a degenerate form when there are no attributes in common. This is for example the approach that Date & Darwen take in their textbooks.

Practical example of MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y)

I was learning database normalization and join dependencies and
5NF. I had a hard time. Can anyone give me some practical examples of the multivalue dependency rule:
MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y).

Functional dependency / Normalization theory and the normal forms up to and including BCNF, were developed on the hypothesis of all data attributes (columns/types/...) being "atomic" in a certain sense. That "certain sense" has long been deprecated by now, but essentially it boiled down to the notion that "a single cell value in a table could not itself hold a multiplicity of values". Think, a textual CSV list of ISBN numbers, a table appearing as a value in a cell in a table (truly nested tables), ...
Now imagine an example with courses, professors, and study books used as course material. Imagine all of that modeled in a single 3-column table which says that "Professor (P) teaches course (C) and uses book (B) as course material." If there can be more than one book (B) used for any given course (Cn) and there can be more than one course (C) taught by any given professor (Pn) and there can be more than one professor (P) teaching any given course (Cn), then this table is clearly all-key (key is the full set of attributes {P,C,B} ).
This means that this table satisfies BCNF.
But now imagine that there is a rule to the effect that "the set of books used for any given course (Cn) must be the same, regardless of which professor teaches it.".
In the days when normalization was developed to the form in which it is now commonly known, it was not allowed to have table columns (relation attributes) that were themselves tables (relations). (Because such a design was considered a violation of 1NF, a notion which is now considered suspect.)
Imagine for a moment that we are indeed allowed to model relation attributes to be of type relation. Then we could model our 3-column table (/relation) as follows : "Professor (P) teaches course (C) and uses THE SET OF BOOKS (SB) as course material.". Attribute SB would no longer be an ISBN number, as in the previous and more obvious design, but it would be a (probably unary) RELATION holding the entire set of ISBN numbers. If we draw our design like that, and we then consider our rule that "all professors use the same set of books for the same course", then we see that this rule is now expressible as an FD from (C) to (SB) !!! And this means that we have a violation of a lower NF on our hand !!!
4 and 5 NF have arisen out of this kind of problems (where the appearance of a single attribute value -courseID (C)- causes a requirement for the appearance of A MULTITUDE of rows (multiple (B) ISBn numbers) being recognised quite early on, but without the solution that is currently regarded as the best (RVA's), being recognised as a valid one. So 4 and 5 NF were created "new and further normal forms", where the then-existing definitions of 2, 3 and BC NF were already sufficient for dealing with the situation at hand, provided RVA's had been recognised as a valid design approach.
To support that claim, let's look at what whould be done to eliminate the NF violation in our {P,C,SB} design with the FD C->SB :
We would split the table into two separate tables {P,C} and {C,SB} with keys {P,C} and {C}, repsectively. Both tables satisfy BCNF.
But we still have this SB attribute that holds a set of ISBN numbers. Dealing with this can be done by applying a technique like "UNGROUPING". Applying this to our {C,SB} table would get us a {C,B} table, where B are the ISBN book numbers (or whatever identifier you like to use in your database), and the key to the table is {C,B}. This is exactly the same design we would get if we eliminated the 4/5 NF violation !!!
You might also want to take a look at Multivalue Dependency violation?