Normalize a simple library system - database

In this question I have to normalize this database but I can't understand all the attributes.
For example, what are Max-Num-Book and why there are Loan-status and loan-period
in the relation "
Max-reserved-book "?
Can any one help :) ?

It would appear from the question that Max-Reserved-Book is where information relating to book reservations is stored.
Reservations in this context would mean intent to loan, i.e. when do you want to loan the book (Date-Reserved) and for how long (Loan-Period).
Max-Num-Book appears to refer to the quantity of copies reserved, as the question states 'a person may borrow more than one book at a time.' It initially seemed like it might be a book/row identifier or a foreign key (as it's similar to the table name), but the ISBN field is probably intended for that use (possibly making a foreign key unnecessary).

Related

4th Normal Form of table not met

Below is a graph of a database to be used to manage university student enrolment and grades across multiple years. Below are the listed requirements for the database
Students must be able to be a part of a class
A class must teach a subject
Each class may have 0 or more courseworks
Each class will have one exam
Each class can be taught by more than one lecturer
Coursework can only be set by one lecturer
Coursework and exams can be marked by different staff than who set them, and the staff member marking it must be able to be identified and recorded.
It is necessary to specify whether an exam taken is being taken for the first time or is a resit
I think the database is now in 4th normal form, and is represented in the table below.
The key represents the primary key for that table, and a green arrow means it is a foreign key.
Can anyone spot any errors or suggest ways to improve it?
Not enough information here to tell whether you are satisfying any Normal Form or not. We can only guess at some dependencies.
For example, "Each class will have one exam" seems to be saying that class→exam. Your Exam table on the other hand satisfies the dependency examID→classID, which is not one of your requirements. I can't tell from your diagram if classID is a candidate key in the Exam table. It also looks like examTaken would not be in 4NF if the classID→examID is one of the dependencies to be satisfied.
From a practical data modelling point of view 4NF is not very important. 5NF is more important. Is this homework? If so I'd suggest you write down the attributes and dependencies before you start drawing a diagram. You seem to have created far more attributes than are suggested by the statement of requirements.
Obviously the cardinality between coursework and courseworktaken cannot be 1:1.
(Why are some lines dotted and others not ?)

ER diagram that implements a database for trainee

I edited and remade the ERD. I have a few more questions.
I included participation constraints(between trainee and tutor), cardinality constraints(M means many), weak entities (double line rectangles), weak relationships(double line diamonds), composed attributes, derived attributes (white space with lines circle), and primary keys.
Questions:
Apparently to reduce redundant attributes I should only keep primary keys and descriptive attributes and the other attributes I will remove for simplicity reasons. Which attributes would be redundant in this case? I am thinking start_date, end_date, phone number, and address but that depends on the entity set right? For example the attribute address would be removed from Trainee because we don't really need it?
For the part: "For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date."
Isn't "periods of employment: start date, end date" a composed attribute? because the dates are shown with the symbol ":" Also I believe I didn't make an attribute for "where they worked" which is location?
Also how is it possible to show previous companies (employers) when we already have an attribute employers and different start date? Because if you look at the Question Information it states start_date for employer twice and the second time it says start_date and end_date.
I labeled many attributes as primary keys but how am I able to distinguish from derived attribute, primary key, and which attribute would be redundant?
Is there a multivalued attribute in this ERD? Would salary and job held be a multivalued attribute because a employer has many salaries and jobs.
I believe I did the participation constraints (there is one) and cardinality constraints correctly. But there are sentences where for example "An instructor teaches at least a course. Each course is taught by only one instructor"; how can I write the cardinality constraint for this when I don't have a relationship between course and instructor?
Do my relationship names make sense because all I see is "has" maybe I am not correctly naming the actions of the relationships? Also I believe schedules depend on the actual entity so they are weak entities.... so does that make course entity set also a weak entity (I did not label it as weak here)?
For the company address I put a composed attribute, street num, street address, city... would that be correct? Also would street num and street address be primary keys?
Also I added the final mark attribute to courses and course_schedule is this in the right entity set? The statement for this attribute is "Each trainee identified by: unique code, social security number, name, address, a unique telephone number, the courses attended and the final mark for each course."
For this part: "We store in the database all classrooms available on the site" do i make a composed attribute that contains site information?
Question Information:
A trainee may be self-employed or employee in a company
Each trainee identified by:
unique code, social security number, name, address, a unique
telephone number, the courses attended and the final mark for each course.
If the trainee is an employee in a company: store the current company (employer), start date.
For each trainee we like to store (if any) also previous companies (employers) where they worked, periods of employment: start date and end date.
If a trainee is self-employed: store the area of expertise, and title.
For a trainee that works for a company: we store the salary and job
For each company (employer): name (unique), the address, a unique telephone number.
We store in the database all known companies in the
city.
We need also to represent the courses that each trainee is attending.
Each course has a unique code and a title.
For each course we have to store: the classrooms, dates, and times (start time, and duration in minutes) the course is held.
A classroom is characterized by a building name and a room number and the maximum places’ number.
A course is given in at least a classroom, and may be scheduled in many classrooms.
We store in the database all classrooms
available on the site.
We store in the database all courses given at least once in the company.
For each instructor we will store: the social security number, name, and birth date.
An instructor teaches at least a course.
Each course is taught by only one instructor.
All the instructors’ telephone numbers must also be stored (each instructor has at least a telephone number).
A trainee can be a tutor for one or many trainees for a specific
period of time (start date and end date).
For a trainee it is not mandatory to be a tutor, but it is mandatory to have a tutor
The attribute ‘Code’ will be your PK because it’s only use seems to be that of a Unique Identifier.
The relationship ‘is’ will work but having a reference to two tables like that can get messy. Also you have the reference to "Employers" in the Trainee table which is not good practice. They should really be combined. See my helpful hints section to see how to clean that up.
Company looks like the complete table of Companies in the area as your details suggest. This would mean table is fairly static and used as a reference in your other tables. This means that the attribute ‘employer’ in Employed would simply be a Foreign Key reference to the PK of a specific company in Company. You should draw a relationship between those two.
It seems as though when an employee is ‘employed’ they are either an Employee of a company or self-employed.
The address field in Company will be a unique address your current city, yes, as the question states the table is a complete list of companies in the city. However because this is a unique attribute you must have specifics like street address because simply adding the city name will mean all companies will have the same address which is forbidden in an unique field.
Some other helpful hints:
Stay away from adding fields with plurals on them to your diagram. When you have a plural field it often means you need a separate table with a Foreign Key reference to that table. For example in your Table Trainee, you have ‘Employers’. That should be a Employer table with a foreign key reference to the Trainee Code attribute. In the Employer Table you can combine the Self-employed and Employed tables so that there is a single reference from Trainee to Employer.
ERD Link http://www.imagesup.net/?di=1014217878605. Here's a quick ERD I created for you. Note the use of linker tables to prevent Many to Many relationships in the table. It's important to note there are several ways to solve this schema problem but this is just as I saw your problem laid out. The design is intended to help with normalization of the db. That is prevent redundant data in the DB. Hope this helps. Let me know if you need more clarification on the design I provided. It should be fairly self explanatory when comparing your design parameters to it.
Follow Up Questions:
If you are looking to reduce attributes that might be arbitrary perhaps phone_number and address may be ones to eliminate, but start and end dates are good for sorting and archival reasons when determining whether an entry is current or a past record.
Yes, periods_of_employment does not need to be stored as you can derive that information with start and end dates. Where they worked I believe is just meant to say previous employers, so no location but instead it’s meant that you should be able to get a list all the employers the trainee has had. You can get that with the current schema if you query the employer table for all records where trainee code equals requested trainee and sort by start date. The reason it states start_date twice is to let you know that for all ‘previous’ employers the record will have a start and end date. Hence the previous. However, for current employers the employment hasn't ended which means there will be no end_date so it will null. That’s what the problem was stating in my opinion.
To keep it simple PK’s are unique values used to reference a record within another table. Redundant values are values that you essentially don’t need in a table because the same value can be derived by querying another table. In this case most of your attributes are fine except for Final_Mark in the Course table. This is redundant because Course_Schedule will store the Final_Mark that was received. The Course table is meant to simply hold a list of all potential courses to be referenced by Course_Schedule.
There is no multivalued attributes in this design because that is bad practice Job and salary are singular and if and job or salary changes you would add a new record to the employer table not add to that column. Multivalued attributes make querying a db difficult and I would advise against it. That’s why I mentioned earlier to abstract all attributes with plurals into their own tables and use a foreign key reference.
You essentially do have that written here because Course_Schedule is a linker table meaning that it is meant to simplify relationships between tables so you don’t have many to many relationships.
All your relationships look right to me. Also since the schedules are linker tables and cannot exist without the supporting tables you could consider them weak entities. Course in this schema is a defined list of all courses available so can be independent of any other table. This by definition is not a weak entity. When creating this db you’d probably fill in the course table and it probably wouldn’t change after that, except rarely when adding or removing an available course option.
Yes, you can make address a composite attribute, and that would be right in your diagram. To be clear with your use of Primary key, just because an attribute is unique doesn’t make it a primary key. A table can have one and only one primary key so you must pick a column that you are certain will not be repeated. In this example you may think street number might be unique but what if one company leaves an address and another company moves into that spot. That would break that tables primary key. Typically a company name is licensed in a city or state so cannot be repeated. That would be a better choice for your primary key. You can also make composite primary keys, but that is a more advanced topic that I would recommend reading about at a later date.
Take final_mark out of courses. That’s table will contain rows of only courses, those courses won’t be linked to any trainee except by course_schedule table. The Final_Mark should only be in that table. If you add final_mark to Course table then, if you have 10 trainees in a course, You’d have 10 duplicate rows in the course table with only differing final_marks. Instead only hold the course_code and title that way you can assign different instructors, trainees and classrooms using the linker tables.
No composite attribute is needed using this schema. You have a Classroom table that will hold all available classrooms and their relevant information. You then use the Classroom_Schedule linker table to assign a given Classroom to a Course_Schedule. No attributes of Classroom can be broken down to simpler attributes.

When desiging the ER diagram for database?

When we say each department is managed by an employee , Does that imply that each department must be managed by an employee and hence a total participation constraint ?
Does that imply that each department must be managed by an employee
and hence a total participation constraint ?
Yes in other words it's a one to one relationship
In my observation (based on question body and comments):
The relation is one-to-many, showing that an employee can be the manager of many departments.
None of the predicates shows on-to-one relation, since there is no peripatetic saying that an employee can be manger of one department.
The difference: (it is opinion base to decide if there is any difference as comments of this answer shows)
Each department must be managed by an employee
Emphasis a mandatory one-to-many relation (is-managed-by)
Each department is managed by an employee
Emphasis an optional one-to-many relation.
Hint:
Documenting data integrity constraints is most widely done using natural language, which often produces a quick dive into ambiguity. If you use plain English to express
data integrity constraints, you’ll inevitably hit the problem of how the English sentence maps,
unambiguously, into the table structures.Different programmers (and users alike) will interpret such sentences differently, because they all try to convert these into something that will
map into the database design. Programmers then code their perception of the constraint (not
necessarily the specifier’s).
A formal manner will be using the logic and set theory.

Normalization of a table (BCNF)

I'm trying to understand how to normalize a database, and one of the exercise given by our teacher was to normalize in BCNF this table:
Flight(**CityDeparture,CityArrival,Day**,NationDeparture,NationArrival)
where (CityDeparture,CityArrival,Day) is the primary key.
So I assumed that:
1)The city name is unique independently from the nation (there can not be two nation with the same city, even if that is not true in reality), otherwise the primary key would be wrong.
2)The functional depencies are
CityDeparture->NationDeparture
CityArrival->NationArrival
Meaning the table was not even in 2NF, so I decomposed it like so:
Flight(CityDeparture,CityArrival,Day)
there are no non-banal FD so it is in BNCF, right?
CityD(**CityDeparture**,NationDeparture) CityDeparture->NationDeparture
is in BNCF because CityDeparture is key
CityA(**CityArrival**,NationArrival) CityArrival->NationArrival
is in BNCF because CityArrival is key.
I also considered the fact that CityA and CityD could be identical unless every city has a different code of departure/arrival(i.e. NewYork has code 'AAA' if a flight leaves from there and code 'BBB' if a flight lands there) so one could just have a single City(Name,Nation) table and both CityDeparture,CityArrival would reference it.
The decomposition should also be lossless because City.Name is a common attribute for both tables and is key for City (I'm quite unsure about this)
When I showed this to my teacher it just scored 0 and told me to go read the book without further explanation. Now I did read the book, and the articles I found linked around here but I'm honestly clueless, so I'm asking for your advice! Any help would be appreciated
1)The city name is unique independently from the nation (there can not be two nation with the same city, even if that is not true in reality), otherwise the primary key would be wrong.
On the one hand, your reasoning here is correct. On the other hand, many (most?) textbook normalization exercises don't include keys at all. You're usually expected to derive all possible keys from the dependencies. Maybe your teacher expects you to ignore the existing key.
Another possibility is that your teacher wanted you to include the FD {CityDeparture, CityArrival, Day} -> {NationDeparture, NationArrival}.
Another possibility is that your teacher wanted you to explore the dependencies within the primary key. Are there any multi-value dependencies?
If your book includes an algorithm that you can do with pencil and paper--most of them do--try working through it that way. See what you get.
Your decomposition of
Flight(CityDeparture,CityArrival,Day,NationDeparture,NationArrival)
into
Flight(CityDeparture,CityArrival,Day)
CityD(CityDeparture,NationDeparture)
CityA(CityArrival,NationArrival)
gives you indeed BCNF.
Regarding the last step, the unification of CityD and CityA: This is not justified by your functional dependencies, and thus incorrect from a formal database perspective. It would be justified by further context knowledge. In practice, it would of course make sense in most settings.
Keep in mind that database normalization is a formal discipline, and so are its algorithms. Substitute artificial names for your relation, e.g., R(A,B,C,D,E) with the same keys and functional dependencies - the result must be same up to renaming.
EDIT
This assumes that the primary key and the two functional dependencies CityDeparture->NationDeparture and CityArrival->NationArrival were given as part of the exercise - otherwise see Mike's answer.

What is a good KISS description of Boyce-Codd normal form?

What is a KISS (Keep it Simple, Stupid) way to remember what Boyce-Codd normal form is and how to take a unnormalized table and BCNF it?
Wikipedia's info: not terribly helpful for me.
Chris Date's definition is actually quite good, so long as you understand what he means:
Each attribute
Your data must be broken into separate, distinct attributes/columns/values which do not depend on any other attributes. Your full name is an attribute. Your birthdate is an attribute. Your age is not an attribute, it depends on the current date which is not part of your birthdate.
must represent a fact
Each attribute is a single fact, not a collection of facts. Changing one bit in an attribute changes the whole meaning. Your birthdate is a fact. Is your full name a fact? Well, in some cases it is, because if you change your surname your full name is different, right? But to a genealogist you have a surname and a family name, and if you change your surname your family name does not change, so they are separate facts.
about the key,
One attribute is special, it's a key. The key is an attribute that must be unique for all information in your data and must never change. Your full name is not a key because it can change. Your Social Insurance Number is not a key because they get reused. Your SSN plus birthdate is not a key, even if the combination can never be reused, because an attribute cannot be a combination of two facts. A GUID is a key. A number you increment and never reuse is a key.
the whole key,
The key alone must be sufficient [and necessary!] to identify your values; you cannot have the same data represented by different keys, nor can a subset of the key columns be sufficient to identify the fact.
Suppose you had an address book with a GUID key, name and address values. It is OK to have the same name appearing twice with different keys if they represent different people and are not the "same data".
If Mary Jones in accounting changes her name to Mary Smith, Mary Jones in Sales does not change her name as well.
On the other hand, if Mary Smith and John Smith have the same street address and it really is the same place, this is not allowed. You have to create a new key/value pair with the street address and a new key.
You are also not allowed to use the key for this new single street address as a value in the address book since now the same street address key would be represented twice.
Instead, you have to make a third key/value pair with values of the address book key and the street address key; you find a person's street address by matching their book key and address key in this group of values.
and nothing but the key
There must be nothing other than the key that identifies your values. For example, if you are allowed an address of "The Taj Mahal" (assuming there is only one) you are not allowed a city value in the same record,
since if you know the address you would also know the city. This would also open up the possibility of there being more than one Taj Mahal in a different city.
Instead, you have to again create a secondary Location key with unique values like the Taj, the White House in DC, and so on, and their cities.
Or forbid "addresses" that are unique to a city.
So help me, Codd.
Here are some helpful excerpts from the Wikipedia page on Third Normal Form:
Bill Kent defines Third Normal Form this way:
Each non-key attribute "must provide
a fact about the key, the whole key,
and nothing but the key."
Requiring that non-key attributes be
dependent on "the whole key" ensures
that a table is in 2NF; further
requiring that non-key attributes be
dependent on "nothing but the key"
ensures that the table is in 3NF.
Chris Date adapts Kent's mnemonic to define Boyce-Codd Normal Form:
"Each attribute must represent a fact
about the key, the whole key, and
nothing but the key." Here the
requirement is concerned with every
attribute in the table, not just
non-key attributes.
This comes into play when a table has multiple compound candidate keys, and an attribute within one candidate keys has a dependency on a part of another candidate key. Third Normal Form wouldn't prohibit this, because it excludes key attributes. But BCNF applies the rule to key attributes as well.
As for how to make a table satisfy BCNF, you need to represent the extra dependency, with another attribute and possibly by splitting attributes into another table.
I googled "boyce codd normal form" and after wikipedia this is the second result. My textbook gives a very simple definition in terms of relational database management systems:
The left side of every nontrivial FD must be a superkey.
-"Database Systems The Complete Book" by Garcia-Molina, Ullman and Widom.
The best informal answer I've read is that, in BCNF, every "arrow" in every functional dependency is an "arrow" out of a candidate key. I don't recall the source, but it was probably something Chris Date wrote.
Basically Boyce-Codd is "fifth normal form". It is visually recognizable by the existance of "Attributive entities" in the data model, for things like Types (e.g. roles, status, process state, location-type, phone-type, etc).
The attributive entities (sub-subtypes) are lists of finite sets of values that further categorize a class level entity. So you may have a phone-type ('mobile', ' desk', 'VOIP') email account type ('business', 'personal', 'gaming'), role (project manager, data modeler, super model) etc.
Another morphological clue is the existance of super-types, (aka. master-classes, super-classes, meta-entities) such as Parties (subtypes being company, person, etc.).
It's basically Taxonomy gone wild (..no the video is not that exciting) to the atomic or leaf-level; see Bill Karwin's comment above for a more technical explanation.
Boyce-Codd level models are essentially highly detailed logical models, derived from more simplistic business-based conceptual models. **They are typically NOT implemented ver batim in the PHYSICAL model, because PDM optimization for performance (or functional simplicity) may result in the super-types and attributive entities being managed as drop-down lists in UIs, or in behind the scenes logic in the application, or in database constraints and methods to enforce referential integrity. (i.e. they may end up as look-up tables in the PDM schema, or they may be handled by code and not represented in the database).
So - why do them if they may not end up in the PDM? For the same reason you build a good 3NF model before you 'optimize', so that the database structure reflects the real world and is hence more stable than the typical kludges we inherit and have to do heroic acts to make work as our business/clients requirements change.
Often times it is easiest to listen to your gut and this will come naturally. Generally speaking, if you meet 3NF you have met BCNF. This doesn't cover detailed analysis of an ERD or have examples but there are thirteen rules according to Codd. I find it best to follow these rules but always remember there is no one correct way to do things so follow them loosely. So regarding the RDBMS, here are the rules:
http://www.87android.com/12-rules-of-relational-database-model-by-codd/
This may not answer the question directly, but if you are asking about how to get to BCNF or an easy way to remember it then you don't understand normalization well enough. That is of no concern though. Relational databases take many forms and very few are done well. The best thing you can do is know what it means to be relational, follow the rules above, and do not worry about the level of normalization. The process of normalization eliminates the duplication of data. Each level more so by moving into migration of functional dependencies. Keep that in mind and you will be fine, your gut and intellect will do the rest.

Resources