Database, how to determine functional dependencies and if is in BCNF? - database

I am currently working on this question to identify which normal form it is and so I have to list out the functional dependencies. I worked out the solution but still have questions about it.
In a final-year-project selection process, students are to select one
research topic for his/her project. Students are allowed to select the
same research topic. For each research topics, supervisors are
assigned to supervise it. A supervisor may be supervising up to two
different research topics and each research topic may be assigned to
different supervisors. For each of the research topic a supervisor
supervises, a consultation day is allocated for the student to meet
and discuss with the supervisor.
This information of final-year-project selection are stored in the following relational table:
FINALYEARPROJECT(supervisor, researchTopic, consultationDay,
student)
These is the functional dependencies I listed:
student → researchTopic, consultationDay, supervisor
student is the candidate key.
supervisor, researchTopic, student → consultationDay
Question 1:
From student I can find all the attributes. But then with the (supervisor, researchTopic, student), I also can find consultationDay. However those are just superkey and not candidate key. So should it be a dependency?
Question 2:
Assume my dependencies were correct, I can deduced this relational table to be in BCNF. However in my lecture notes,
The definition of Boyce-Codd Normal Form (BCNF)states that a relation
is in BCNF if and only if every determinant is a candidate key.
This is very different from what I found on the net (eg. wiki):
A relational schema R is in Boyce–Codd normal form if and only if for
every one of its dependencies X → Y, at least one of the following
conditions hold:
X → Y is a trivial functional dependency (Y ⊆ X)
X is a superkey for schema R
So now, according to my lect notes, with the dependencies found, the table will not be in BCNF as (supervisor, researchTopic, student) is not a candidate key, it is just a superkey. However if is according to wiki's, then this table will be in BCNF as all the determinants are superkey.So is this table in BCNF?

Both the definitions of BCNF that you cite do not mention which set of Functional Dependencies is used when checking for satisfaction of the normal form, but this is important.
You know that, given a set of Functional Dependencies, for instance that found by reasoning over a problem, there are many equivalent sets, or more precisely there are many sets that are a coverage of it; for instance, a minimal or canonical cover of a set of FDs is a cover with no redundant dependencies nor superfluous attributes, and with a single attribute on the right part of each dependency.
So, actually, it is easy to prove that a definition that mentions superkeys, like the wiki definition, for instance, is equivalent to a definition that mentions candidate keys, like those of your lecture notes, when the functional dependencies considered are those of a minimal cover. In fact, in a minimal cover trivial dependencies are not present, as well as no strict superkeys (i.e. a superkey formed by a candidate key plus a non-empty set of attributes) can be present as left part of any dependency, for the definition of minimal cover.
So, when checking for a normal form, it is always a good idea to first find a minimal cover of the given dependencies.
For what concerns the functional dependencies of your example, given the specification of the problem, it is not clear to me if a student selects a reasearch topic and then can go to any supervisor for that research topic to discuss it, or instead is assigned also to a specific supervisor. Of courses the dependencies are different in the two cases.

students are to select one research topic
student -> research-topic
Because each student has only one research topic, and these are the only two attributes in this relation, we know student is unique, and thus a candidate key.
Students are allowed to select the same research topic.
That tells us research topic is not unique in the same relation, and cannot be a candidate key.
For each research topics, supervisors are assigned to supervise it.
research-topic -> supervisor
A supervisor may be supervising up to two different research topics
So supervisor is not unique in this relation, and cannot be a candidate key.
each research topic may be assigned to different supervisors.
OK, revise (1st time)
research-topic, supervisor -> {}
Each researcher many have more than one topic, and each topic more than one researcher.
For each of the research topic a supervisor supervises, a consultation day is allocated for the student
2nd revision:
research-topic, supervisor, student -> consultation-day
This is a bit messy, perhaps intentionally so to create a problem to solve. Since each student has only 1 research topic, For each of the research topic is a red herring. We can equally well say:
3rd revision:
research-topic, supervisor -> {}
and
supervisor, student -> consultation-day
It's unnecessary to put topic in the key of the 3rd relation because when the student meets with the supervisor, it will be on the student's only topic. If a student could have more than one topic, we'd have to add that to the relation, to know what's on the agenda on consultation-day.
to meet and discuss with the supervisor.
Call these three relations student, supervisor, and consultation. I leave it to you to write a join to produce {student, topic, supervisor, day}, and show that the natural join of student with supervisor produces only 1 row.
All I have done is express the stated requirements as dependencies. Every dependency is minimally captured. That, in essence, is BCNF.
Your student table is not BCNF. Nowhere is it stated that students choose or are assigned a supervisor.

Related

4th Normal Form of table not met

Below is a graph of a database to be used to manage university student enrolment and grades across multiple years. Below are the listed requirements for the database
Students must be able to be a part of a class
A class must teach a subject
Each class may have 0 or more courseworks
Each class will have one exam
Each class can be taught by more than one lecturer
Coursework can only be set by one lecturer
Coursework and exams can be marked by different staff than who set them, and the staff member marking it must be able to be identified and recorded.
It is necessary to specify whether an exam taken is being taken for the first time or is a resit
I think the database is now in 4th normal form, and is represented in the table below.
The key represents the primary key for that table, and a green arrow means it is a foreign key.
Can anyone spot any errors or suggest ways to improve it?
Not enough information here to tell whether you are satisfying any Normal Form or not. We can only guess at some dependencies.
For example, "Each class will have one exam" seems to be saying that class→exam. Your Exam table on the other hand satisfies the dependency examID→classID, which is not one of your requirements. I can't tell from your diagram if classID is a candidate key in the Exam table. It also looks like examTaken would not be in 4NF if the classID→examID is one of the dependencies to be satisfied.
From a practical data modelling point of view 4NF is not very important. 5NF is more important. Is this homework? If so I'd suggest you write down the attributes and dependencies before you start drawing a diagram. You seem to have created far more attributes than are suggested by the statement of requirements.
Obviously the cardinality between coursework and courseworktaken cannot be 1:1.
(Why are some lines dotted and others not ?)

Bridging entity from subtype entity ERD design

I have two ERD examples involving subtypes. I cannot seem to find any definitive information online or in textbooks on connecting other entities to subtypes and how far you can inherit keys from subtypes, if at all. Those with good eyes may notice that I recently asked a similar question regarding subtypes, but it was for a different scenario and so far I only received a referral to another question that only explains the basics of subtypes which I do not need - I feel this is a more advanced topic to solve.
My specific issue is I need to know whether the Bridging entity called ENROLMENT is allowed to inherit the PK/FK from STUDENT entity, a Subtype of PATRON. If so, is PatronNumber and/or StudentNumber attributes allowed.
The two ERD examples are slightly different. Version 1 uses PatronNumber from the Subtype Student. Version 2 includes another PK called StudentNumber. Is this ok to add as a PK and can ENROLMENT reference from this? Which is better, if any?
Cheers!
The first version is to be preferred, for the reason that with a single value, PatronNumber, you can obtain all the information about the student with a single join, while in the second case you need to perform two joins.
Imagine, for instance, that you need to know the name of all the students that are enrolled to the course number 3: you can simply perform a join between Enrollment and Patron, while in the second case you need a join between Enrollment and Student and then between Student and Patron.
If your application requires explicitly a StudentNumber different from PatronNumber, you can simply add the attribute to the Student, and declare it unique.

ER Model (Chen notation)

My assignment is to draw an ER model (by hand) using Chen notation using the specifications below:
http://i57.tinypic.com/73ff2f.png
If you have questions about these specs. I'll play the role of the
client who will resolve them.
The database will serve a university.
Students have id's, names and gpa's. They must have exactly one major,
but they could have minors as well. Each major or minor is a
department which has a unique name and a phone number. For each
student with a minor, we record the date she signed up for it. Faculty
members are associated with a unique department and have id's, names
and office locations. Each internship is held by a particular student
at a particular compain and is supervised by a particular faculty
member. We also keep track of the last term in which that student
registered under that advisor for an internship at that company.
Students may have many internships over time. A given faculty member
may supervise many students at a given company, and she may supervise
a given student at several companies. However, for a given student and
company, there can be only one faculty advisor.
Students, Departments,
Faculty and Companies should be your entity types. Internship should
be a ternary relationship type. The specs should also lead you to some
binary relationship types. Don't add any ingredients to this mix other
than what appear in the specs.
Below is my work:
http://i60.tinypic.com/28rf7tf.jpg
Can anyone please help as I really need a better understanding of this (my professor is AWFUL at explaining this).
You missed (per your assignment's last paragraph) a department entity type. (Box.)
You missed 'Faculty members are associated with a unique department'. That's a relationship between those two entity types. (Diamond with lines to those boxes.)
You could have those major and minor entitie types that are 1:1 with departments. (Your present boxes with each a line to its own diamond each with a line to department.) But (per your assignment's last paragraph not listing them as entities) you could have major being a relationship 'student[s] has a major in department [d]' and similarly for minor. (Lines from student to each of two diamonds each with a line to department.) But the assignment actually says 'each major or minor is a department' so that's major as 'student[s] has major department [d]' and similarly for minor. (Same picture.)
Per your assignment's last paragraph you should make internship a ternary relationship. (Under Chen it's a relationship diamond (possibly with its own properties) formed by 3 lines to entity type rectangles (possibly with their own properties) rather than an entity box.) However, it's not clear exactly when your assignment considers that an internship holds. (It tells us what relationships hold; it's just not clear which one it wants to call interning.) (Although we can look for interpretations consistent with it being ternary.) One is 'student [s] interns at company [c] supervised by faculty member [f]'. But since 'for a given student and company, there can be only one faculty advisor' that notion of internship is more simply characterized by a binary relationship 'student [s] interns at company [c]'. But then you still need a relationship 'faculty member [f] advises student [s] at a company [c]'. So I will suggest that your assignment expects the former. We can add property term. (This is more reasonably called a relationship on student, company, faculty member and date; but E-RM considers relationships to be on entities. Although it all depends on your class's method's particulars.)
(The possibility of multiple reasonable variations is why you should propose a particular design fully handling a particular specification in a SO question.)
A problem with the E-R Model [sic] is that it introduces needless distinctions between entities, reltionships and properties. There is really no distinction between a relationship instance and an entity. Eg: Here we could just as well have an internship be per above an entity in a 4-way relationship plus property. Eg: Your assignment says 'each major or minor is a department'. But a major or minor isn't a department. A major or minor could be considered a subject, which would be the subject after which a department is named or the subject of the degree offered by a department. Or we could just have relationships in which a department participates but the relationship is about that department's subject or name or degree being a major or minor.
(If an internship as relationship participated in its own relationships I don't know how your instructor's particular method would keep the further lines organized. Some methods add internship entities (box) 1:1 with relationships (diamond); then some methods specially associate the entity type with the relationship as a reification while some make the relationship 4-way by including the reified entity type. Eg 'internship [i] is student [s] at company [c] and ...'.)
(Correctly speaking there are entity types vs relationships and entities vs relationship instances. But the assignment talks of relationship "types".)
Re E-RM see this answer and this one. Also the E-RM wiki page section 'Entity–relationship modeling'. (Which correctly mentions misinterpretations of Chen's E-RM & E-RDs by some related modeling and diagramming methods and tools and even some presentations of E-RM itself. But the 'Overview' is nonsense.)
Re E-RM problems see this.

When desiging the ER diagram for database?

When we say each department is managed by an employee , Does that imply that each department must be managed by an employee and hence a total participation constraint ?
Does that imply that each department must be managed by an employee
and hence a total participation constraint ?
Yes in other words it's a one to one relationship
In my observation (based on question body and comments):
The relation is one-to-many, showing that an employee can be the manager of many departments.
None of the predicates shows on-to-one relation, since there is no peripatetic saying that an employee can be manger of one department.
The difference: (it is opinion base to decide if there is any difference as comments of this answer shows)
Each department must be managed by an employee
Emphasis a mandatory one-to-many relation (is-managed-by)
Each department is managed by an employee
Emphasis an optional one-to-many relation.
Hint:
Documenting data integrity constraints is most widely done using natural language, which often produces a quick dive into ambiguity. If you use plain English to express
data integrity constraints, you’ll inevitably hit the problem of how the English sentence maps,
unambiguously, into the table structures.Different programmers (and users alike) will interpret such sentences differently, because they all try to convert these into something that will
map into the database design. Programmers then code their perception of the constraint (not
necessarily the specifier’s).
A formal manner will be using the logic and set theory.

Normalization of a table (BCNF)

I'm trying to understand how to normalize a database, and one of the exercise given by our teacher was to normalize in BCNF this table:
Flight(**CityDeparture,CityArrival,Day**,NationDeparture,NationArrival)
where (CityDeparture,CityArrival,Day) is the primary key.
So I assumed that:
1)The city name is unique independently from the nation (there can not be two nation with the same city, even if that is not true in reality), otherwise the primary key would be wrong.
2)The functional depencies are
CityDeparture->NationDeparture
CityArrival->NationArrival
Meaning the table was not even in 2NF, so I decomposed it like so:
Flight(CityDeparture,CityArrival,Day)
there are no non-banal FD so it is in BNCF, right?
CityD(**CityDeparture**,NationDeparture) CityDeparture->NationDeparture
is in BNCF because CityDeparture is key
CityA(**CityArrival**,NationArrival) CityArrival->NationArrival
is in BNCF because CityArrival is key.
I also considered the fact that CityA and CityD could be identical unless every city has a different code of departure/arrival(i.e. NewYork has code 'AAA' if a flight leaves from there and code 'BBB' if a flight lands there) so one could just have a single City(Name,Nation) table and both CityDeparture,CityArrival would reference it.
The decomposition should also be lossless because City.Name is a common attribute for both tables and is key for City (I'm quite unsure about this)
When I showed this to my teacher it just scored 0 and told me to go read the book without further explanation. Now I did read the book, and the articles I found linked around here but I'm honestly clueless, so I'm asking for your advice! Any help would be appreciated
1)The city name is unique independently from the nation (there can not be two nation with the same city, even if that is not true in reality), otherwise the primary key would be wrong.
On the one hand, your reasoning here is correct. On the other hand, many (most?) textbook normalization exercises don't include keys at all. You're usually expected to derive all possible keys from the dependencies. Maybe your teacher expects you to ignore the existing key.
Another possibility is that your teacher wanted you to include the FD {CityDeparture, CityArrival, Day} -> {NationDeparture, NationArrival}.
Another possibility is that your teacher wanted you to explore the dependencies within the primary key. Are there any multi-value dependencies?
If your book includes an algorithm that you can do with pencil and paper--most of them do--try working through it that way. See what you get.
Your decomposition of
Flight(CityDeparture,CityArrival,Day,NationDeparture,NationArrival)
into
Flight(CityDeparture,CityArrival,Day)
CityD(CityDeparture,NationDeparture)
CityA(CityArrival,NationArrival)
gives you indeed BCNF.
Regarding the last step, the unification of CityD and CityA: This is not justified by your functional dependencies, and thus incorrect from a formal database perspective. It would be justified by further context knowledge. In practice, it would of course make sense in most settings.
Keep in mind that database normalization is a formal discipline, and so are its algorithms. Substitute artificial names for your relation, e.g., R(A,B,C,D,E) with the same keys and functional dependencies - the result must be same up to renaming.
EDIT
This assumes that the primary key and the two functional dependencies CityDeparture->NationDeparture and CityArrival->NationArrival were given as part of the exercise - otherwise see Mike's answer.

Resources