Genaralisation of ER diagams - database

Can Any one explain the difference between using "d" and "o" notations in Generalization & Specialization of ER diagrams. Whether the both notation gives the same meaning or different meanings.

An o is for overlapping, meaning an entity type can belong to more than one subtype. In your example, an Assignment can involve a Grade and/or a Lab_Session.
A d is for disjoint, meaning an entity type can't belong to more than one subtype. In your example, a Lecture can be only one of Enhancement, SpecialDegree or GeneralDegree lectures.

Related

Why are union, intersection and difference operations called boolean operations in relational algebra?

Why are union, intersection and difference operations of relational algebra called boolean operations?
I found them called that in the first line in section 5.4.1 Boolean operations (Section 5.4 is Relational Algebra and Datalog) in a book named A First Course in DATABASE SYSTEMS by Ullman & Widom.
The statement that you are citing is from the Chapter 5 (Algebraic and Logical Query Languages) of the book “First Course in Database Systems” by Ullman and Widom (Pearson 2013).
In particular, the sections 5.3 and 5.4 of that Chapter treat the language Datalog, which can be used to work on a Relational Data Base, in which a Relation is seen as predicate, not a set (see Section 5.3.1).
In other words, a tuple (x1, x2, ..., xn) of R is seen as the fact that the relation R is true for the arguments specified (x1, x2, ..., xn). In this way one could transform the set operators discussed in the context of Relation Algebra (union, difference, intersection) as rules of Datalog expressed through the use of boolean operators, like AND, NOT, etc.
In fact, you can see that in the same book, in section 2.4.4, they are called Set Operators (as they actually are), so I think the naming of “boolean operators” is due to the fact that they are strictly related to boolean operators and are discussed in the context of a logical view of a database.
The notion of "boolean" relational operations is an idiosyncratic ad hoc distinction/categorization in that book and chapter. The authors identify 3 common relation operators that "can each be expressed simply in Datalog". The next sections at the same level go on to each express some other relational operator. ("Selections can be somewhat more difficult to express in Datalog.") The 3 operators' treatments are similar to each other and different from others' in a certain context-specific "boolean" way. (So it's not a particularly helpful or deep distinction.)
5.4.1 Boolean Operations
The boolean operations of relational algebra--union, intersection, and set
difference--can each be expressed simply in Datalog.
To take the union R ∪ S, [...] As a result, each tuple from R and each tuple of S is put into the answer relation.
To take the intersection R ∩ S, [...] Then, a tuple is in the answer relation if and only if it is in both R and S.
To take the difference R - S, [...] Then, a tuple is in the answer relation if and only if it is in R but not in S.
So the "boolean" is because each relation operator corresponds to a certain boolean operator in that context:
UNION returns tuples that are in one operand OR the other
INTERSECTION returns tuples that are in one operand AND the other
DIFFERENCE returns tuples that are in one operand AND NOT the other
(Datalog also literally uses AND and AND NOT, but OR is implicit.)
More importantly, this can be put another way: If every base relation holds the tuples that make a true proposition (statement) from some associated predicate (sentence template parameterized by attribute names) then every result has a predicate made from its operand predicates, and its value holds the tuples that make a true proposition from that predicate:
U holds tuples where U
V holds the tuples where V
U UNION V holds the tuples where U OR V
U INTERSECTION V holds the tuples where U AND V
U DIFFERENCE V holds the tuples where U AND NOT V
But more than that it's not just that those operators correspond to some boolean propositional logic connectives/nonterminals, but that every relational operator corresponds to a predicate logic connective/nonterminal, every query expression has an associated predicate, and every query result value holds the tuples that make a true proposition from that predicate:
U JOIN V holds the tuples where U AND V
R RESTRICTcondition holds the tuples where U AND condition
U PROJECT A holds the tuples where FORSOME values for all attributes but A, U
U RENAME A A' holds the tuples where U with A replaced by A'
So the book ties those three operators to the syntax & semantics of Datalog, but really all operators can be tied to the syntax & semantics of predicate logic. Which is the language/notation of precision in mathematics, science (including computer science) & engineering (including software engineering.) In fact, that's how we know what a query means. So it's not really that those three operators are in some sense boolean so much as that all relational operators are logical: every relation expression corresponds to a predicate logic expression.
(The three operators are also frequently called the "set operators", since the set that is the body of the result relation arises from the sets that are the bodies of the operand relations according to the set operators by the same names. And that might be a helpful way to group them in your mind or remember them or their names, and no doubt inspired their names. But in light of the correspondence between relation operators and predicate logic connectives/nonterminals and the correspondence between relation expressions and predicate logic expressions and the correspondence between predicates and relation values, the fact that some relation operators are reminiscent of some set operators is totally irrelevant.) (Just like that some operators are reminiscent of boolean operators. Just like that you can make an algebra with "relations" whose bodies are bags instead of sets.)

How to read ternary relationship in ORM(Object role modeling) diagram?

I have an example of allocation Result below. I know how to read binary relationships but this triple box marked red in the image confuses me.
When and in what order do we read those roles after slashes inside the box : in, award of?
I assume that we can read this diagram in 3 ways:
First box Student has result of Grade for Unit.
Second box Grade givent to Student in Unit
Third box Unit grants to Student award of Grade (? this one has no sense ?)
Can we read it any more ways?
This is not really a valid ORM diagram (as defined by Halpin), because:
all the fact types lack a reading (entity type names are provided instead),
no uniqueness constraints are shown (every fact type must have at least one uniqueness constraint),
role predicates are shown inside the role boxes (not an ORM practise),
the ternary is not shown as objectified, even though it (and the other fact types) have entity names.
the identifying roles must be mandatory, but this is not shown.
Here, Result is the name of an entity type. This entity type objectifies the ternary fact type, for which no name and no reading are supplied. If you wanted to name the fact type, a suitable name might be "Grading", in reference to the action in which the Result was assigned. The same problem applies to the other fact types; the creator of this diagram is confused as to the difference between a fact type and the object which may objectify that fact type. For example, the fact type "Student identify" is a noun, but the fact type (if it is to be named) should be called "Student Identification", naming the action not the objectification. Similarly for Grade award (Grade Coding), Unit identity (Unit Naming).
However, leaving aside these syntax differences, the possible intended readings for the Result fact type are "Student achieved Grade for Unit", "Grade was awarded to Student for Unit", and the like. In CQL, the objectification is indicated by saying "each Result is where some Student achieved some Grade for some Unit, that Grade was awarded to that Student for that Unit"
The predicate text inside the roles of the Result fact type are not well defined. The intention here is to apply these predicates to the relationship between the Result entity and the object which plays each respective role. These three binary fact types (called Link Fact Types) are not shown, but are implied by the objectification. I would suggest the following readings for these link fact types:
Result is awarded to Student/Student received Result
Result is of Grade/Grade applies to Result
Result is awarded for Unit/Unit received Result
Normally the link fact types and the associated readings are not shown, and the implicit readings provided by the tools are "involved in/is of", for example "Student is involved in Result", "Result is of Student". You can see why it can be better to provide custom readings instead.
I recommend you get a copy of Terry Halpin's book "Information Modeling and Relational Databases" and learn from that, because it's clear that your instructor has non-standard theory and practises.

What Happens when Cartesian Product is applied to Relations with same attribute name

I understand that the Cartesian product(X) operation on two databases does not need to be UNION compatible.So,if there is a same attribute called name in the two relations R and S where name in R is the first name and name in S is the second name
How can the related values be identified by the following selection operation
Q=RxS
I want to get the set of tuples whose firstname=lastname,So how am i supposed to write the selection statement?
σ Name=Name(Q)
Will this be a problem using the same attribute name in the selection operation?
Cartesion product does not require attributes to be named differently. It only requires relations to be named differently.
For example, D := A(id, name) X B(id, age) is perfectly valid, and the resulting relation is D(A.id, name, B.id, age).
In other words, the attributes are automatically renamed by prepending the relation name, as part of the cartesion product. This prepend operation also leads to the requirement that relations to be named differently.
Source:
- Database System Concepts 6th Edition, Chapter 6.1.1.6 The Cartesian-Product Operation, for the definition, and an example in Figure 6.8 Result of instructor × teaches.
Correct that for Cartesian product the relations need not be UNION compatible.
But they still need to be compatible! Otherwise there are exactly the difficulties you point out. So the rule for Cartesian product is that there must be no attributes in common.
So if you have a clash of attributes, first you must rename the attributes before crossing.
See http://en.wikipedia.org/wiki/Relational_algebra on 'Natural Join'. (That defines Nat Join in terms of Rename, Cartesian product and Projection.)
From the point of view of learning the RA, I would think of Natural Join as the basic operation. And Cartesian product as a degenerate form when there are no attributes in common. This is for example the approach that Date & Darwen take in their textbooks.

Practical example of MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y)

I was learning database normalization and join dependencies and
5NF. I had a hard time. Can anyone give me some practical examples of the multivalue dependency rule:
MVD3: (transitivity) If X ↠ Y and Y ↠ Z, then X ↠ (Z − Y).
Functional dependency / Normalization theory and the normal forms up to and including BCNF, were developed on the hypothesis of all data attributes (columns/types/...) being "atomic" in a certain sense. That "certain sense" has long been deprecated by now, but essentially it boiled down to the notion that "a single cell value in a table could not itself hold a multiplicity of values". Think, a textual CSV list of ISBN numbers, a table appearing as a value in a cell in a table (truly nested tables), ...
Now imagine an example with courses, professors, and study books used as course material. Imagine all of that modeled in a single 3-column table which says that "Professor (P) teaches course (C) and uses book (B) as course material." If there can be more than one book (B) used for any given course (Cn) and there can be more than one course (C) taught by any given professor (Pn) and there can be more than one professor (P) teaching any given course (Cn), then this table is clearly all-key (key is the full set of attributes {P,C,B} ).
This means that this table satisfies BCNF.
But now imagine that there is a rule to the effect that "the set of books used for any given course (Cn) must be the same, regardless of which professor teaches it.".
In the days when normalization was developed to the form in which it is now commonly known, it was not allowed to have table columns (relation attributes) that were themselves tables (relations). (Because such a design was considered a violation of 1NF, a notion which is now considered suspect.)
Imagine for a moment that we are indeed allowed to model relation attributes to be of type relation. Then we could model our 3-column table (/relation) as follows : "Professor (P) teaches course (C) and uses THE SET OF BOOKS (SB) as course material.". Attribute SB would no longer be an ISBN number, as in the previous and more obvious design, but it would be a (probably unary) RELATION holding the entire set of ISBN numbers. If we draw our design like that, and we then consider our rule that "all professors use the same set of books for the same course", then we see that this rule is now expressible as an FD from (C) to (SB) !!! And this means that we have a violation of a lower NF on our hand !!!
4 and 5 NF have arisen out of this kind of problems (where the appearance of a single attribute value -courseID (C)- causes a requirement for the appearance of A MULTITUDE of rows (multiple (B) ISBn numbers) being recognised quite early on, but without the solution that is currently regarded as the best (RVA's), being recognised as a valid one. So 4 and 5 NF were created "new and further normal forms", where the then-existing definitions of 2, 3 and BC NF were already sufficient for dealing with the situation at hand, provided RVA's had been recognised as a valid design approach.
To support that claim, let's look at what whould be done to eliminate the NF violation in our {P,C,SB} design with the FD C->SB :
We would split the table into two separate tables {P,C} and {C,SB} with keys {P,C} and {C}, repsectively. Both tables satisfy BCNF.
But we still have this SB attribute that holds a set of ISBN numbers. Dealing with this can be done by applying a technique like "UNGROUPING". Applying this to our {C,SB} table would get us a {C,B} table, where B are the ISBN book numbers (or whatever identifier you like to use in your database), and the key to the table is {C,B}. This is exactly the same design we would get if we eliminated the 4/5 NF violation !!!
You might also want to take a look at Multivalue Dependency violation?

OWL universal quantification

I am half way reading the OWL2 primer and is having problem understanding the universal quantification
The example given is
EquivalentClasses(
:HappyPerson
ObjectAllValuesFrom( :hasChild :HappyPerson )
)
It says somebody is a happy person exactly if all their children are happy persons. But what if John Doe has no children can he be an instance of HappyPerson? What about his parent?
I also find this part very confusing, it says:
Hence, by our above statement, every childless person would be qualified as happy.
but wouldn't it violate the ObjectAllValuesFrom() constructor?
I think the primer actually does quite a good job at explaining this, particularly the following:
Natural
language indicators for the usage of
universal quantification are words
like “only,” “exclusively,” or
“nothing but.”
To simplify this a bit further, consider the expression you've given:
HappyPerson ≡ ∀ hasChild . HappyPerson
This says that a HappyPerson is someone who only has children who are also HappyPerson (are also happy). Logically, this actually says nothing about the existence of instances of happy children. It simply serves as a universal constraint on any children that may exist (note that this includes any instances of HappyPerson that don't have any children).
Compare this to the existential quantifier, exists (∃):
HappyPerson ≡ ∃ hasChild . HappyPerson
This says that a HappyPerson is someone who has at least one child that is also a HappyPerson. In constrast to (∀), this expression actually implies the existence of a happy child for every instance of a HappyPerson.
The answer, albeit initially unintuitive, lies in the interpretation/semantics of the ObjectAllValuesFrom OWL construct in first-order logic (actually, Description Logic). Fundamentally, the ObjectAllValuesFrom construct relates to the logical universal quantifier (∀), and the ObjectSomeValuesFrom construct relates to the logical existential quantifier (∃).
I am facing the same kind of issue while reading the "OWL 2 Web Ontology Language Primer (Second Edition - 2012)" and I am not convinced that the answer by Sharky clarifies the issue.
At page 15, when introducing the universal quantifier ∀, the book states:
"Another property restriction, called universal quantification is used to describe a class of individuals for which all related individuals must be instances of a given class. We can use the following statement to indicate that somebody is a happy person exactly if all their children are happy persons."
[I omit the OWL statements in the different sintaxes, they can be found in the book.]
I think that a more formal and may be less ambiguos representation of what the author states is
(1) HappyPerson = {x | ∀y (x HasChild y → y ∈ HappyPerson)}
I hope every reader understands this notation, because I find the notation used in the answer less clear (or may be I am just not accustomed to it).
The book proceeds:
"... There is one particular misconception concerning the universal role restriction. As an example, consider the above happiness axiom. The intuitive reading suggests that in order to be happy, a person must have at least one happy child [my note: actually the definition states that every children should be happy, not just at least one, in order for his/her parents to be happy. This appears to be a lapsus of the author]. Yet, this is not the case: any individual that is not a “starting point” of the property hasChild is a class member of any class defined by universal quantification over hasChild. Hence, by our above statement, every childless person would be qualified as happy . ..."
That is, the author states that (assume '~' for logical NOT), given
(2) ChildessPerson = { x | ~∃y( x HasChild y)}
then (1) and the meaning of ∀ imply
(3) ChildessPerson ⊂ HappyPerson
This does not seem true to me.
If it were true then every child, as far as s/he is a childless person, is happy and so only some parents can be unhappy persons.
Consider this model:
Persons = {a,b,c}, HasChild = {(a,b)}, HappyPerson={a,b}
and c is unhappy (independently from the close world or open world assumption). It is a possible model, which falsifies the thesis of the author.

Resources