Entity Relationship Diagram. How does the IS A relationship translate into tables?

Entity Relationship Diagram. How does the IS A relationship translate into tables? - database

I was simply wondering, how an ISA relationship in an ER diagram would translate into tables in a database.
Would there be 3 tables? One for person, one for student, and one for Teacher?
Or would there be 2 tables? One for student, and one for teacher, with each entity having the attributes of person + their own?
Or would there be one table with all 4 attributes and some of the squares in the table being null depending on whether it was a student or teacher in the row?
NOTE: I forgot to add this, but there is full coverage for the ISA relationship, so a person must be either a studen or a teacher.

Assuming the relationship is mandatory (as you said, a person has to be a student or a teacher) and disjoint (a person is either a student or a teacher, but not both), the best solution is with 2 tables, one for students and one for teachers.
If the participation is instead optional (which is not your case, but let's put it for completeness), then the 3 tables option is the way to go, with a Person(PersonID, Name) table and then the two other tables which will reference the Person table, e.g.
Student(PersonID, GPA), with PersonID being PK and FK referencing Person(PersonID).
The 1 table option is probably not the best way here, and it will produce several records with null values (if a person is a student, the teacher-only attributes will be null and vice-versa).
If the disjointness is different, then it's a different story.

there are 4 options you can use to map this into an ER,
option 1
Person(SIN,Name)
Student(SIN,GPA)
Teacher(SIN,Salary)
option 2 Since this is a covering relationship, option 2 is not a good match.
Student(SIN,Name,GPA)
Teacher(SIN,Name,Salary)
option 3
Person(SIN,Name,GPA,Salary,Person_Type)
person type can be student/teacher
option 4
Person(SIN,Name,GPA,Salary,Student,Teacher) Student and Teacher are bool type fields, it can be yes or no,a good option for overlapping
Since the sub classes don't have much attributes, option 3 and option 4 are better to map this into an ER

This answer could have been a comment but I am putting it up here for the visibility.
I would like to address a few things that the chosen answer failed to address - and maybe elaborate a little on the consequences of the "two table" design.
The design of your database depends on the scope of your application and the type of relations and queries you want to perform. For example, if you have two types of users (student and teacher) and you have a lot of general relations that all users can part take, regardless of their type, then the two table design may end up with a lot of "duplicated" relations (like users can subscribe to different newsletters, instead of having one M2M relationship table between "users" and newsletters, you'll need two separate tables to represent that relation). This issue worsens if you have three different types of users instead of two, or if you have an extra layer of IsA in your hierarchy (part-time vs full-time students).
Another issue to consider - the types of constraints you want to implement. If your users have emails and you want to maintain a user-wide unique constraint on emails, then the implementation is trickier for a two-table design - you'll need to add an extra table for every unique constraint.
Another issue to consider is just duplications, generally. If you want to add a new common field to users, you'll need to do it multiple times. If you have unique constraints on that common field, you'll need a new table for that unique constraint too.
All of this is not to say that the two table design isn't the right solution. Depending on the type of relations, queries and features you are building, you may want to pick one design over the other, like is the case for most design decisions.

It depends entirely on the nature of the relationships.
IF the relationship between a Person and a Student is 1 to N (one to many), then the correct way would be to create a foreign key relationship, where the Student has a foreign key back to the Person's ID Primary Key Column. Same thing for the Person to Teacher relationship.
However, if the relationship is M to N (many to many), then you would want to create a separate table containing those relationships.
Assuming your ERD uses 1 to N relationships, your table structure ought to look something like this:
CREATE TABLE Person
(
sin bigint,
name text,
PRIMARY KEY (sin)
);
CREATE TABLE Student
(
GPA float,
fk_sin bigint,
FOREIGN KEY (fk_sin) REFERENCES Person(sin)
);
and follow the same example for the Teacher table. This approach will get you to 3rd Normal Form most of the time.

Related

Relational Database: When do we need to add more entities?

We had a discussion today related to W3 lecture case study about how many entities we need for each situation. And I have some confusion as below:
Case 1) An employee is assigned to be a member of a team. A team with more than 5 members will have a team leader. The members of the team elect the team leader. List the entity(s) which you can identify in the above statement? In this cases, if we don't create 2 entities for above requirement, we need to add two more attributes for each employee which can lead to anomaly issues later. Therefore, we need to have 2 entities as below:
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK teamId&employeeId) -> 2 entities
Case 2) The company also introduced a mentoring program, whereby a new employee will be paired with someone who has been in the company longer." How many entity/ies do you need to model the mentoring program?
The Answer from Lecturer is 1. With that, we have to add 2 more attributes for each Employee, mentorRole (Mentor or Mentee) and pairNo (to distinguish between different pairs and to know who mentors whom), doesn't it?
My question is why can't we create a new Entity named MENTORING which will be similar to TEAM in Q1? And why we can only do that if this is a many-many relationship?
EMPLOYEE (PK is employeeId) (0-M)----------------(0-1) TEAM (PK is pairNo&employeeId) -> 2 entities
Thank you in advance

First of all, about terminology: I use entity to mean an individual person, thing or event. You and I are two distinct entities, but since we're both members of StackOverflow, we're part of the same entity set. Entity sets are contrasted with value sets in the ER model, while the relational model has no such distinction.
While you're right about the number of entity sets, there's some issues with your implementation. TEAM's PK shouldn't be teamId, employeeId, it should be only teamId. The EMPLOYEE table should have a teamId foreign key (not part of the PK) to indicate team membership. The employeeId column in the TEAM table could be used to represent the team leader and is dependent on the teamId (since each team can have only one leader at most).
With only one entity set, we would probably represent team membership and leadership as:
EMPLOYEE(employeeId PK, team, leader)
where team is some team name or number which has to be the same for team members, and leader is a true/false column to indicate whether the employee in that row is the leader of his/her team. A problem with this model is that we can't ensure that a team has only one leader.
Again, there's some issues with the implementation. I don't see the need to identify pairs apart from the employees involved, and having a mentorRole (mentor or mentee) indicates that the association will be recorded for both mentor and mentee. This is redundant and creates an opportunity for inconsistency. If the goal was to represent a one-to-one relationship, there are better ways. I suggest a separate table MENTORING(menteeEmployeeId PK, mentorEmployeeId UQ) (or possibly a unique but nullable mentorEmployeeId in the EMPLOYEE table, depending on how your DBMS handles nulls in unique indexes).
The difference between the two cases is that teams can have any number of members and one leader, which is most effectively implemented by identifying teams separately from employees, whereas mentorship is a simpler association that is sufficiently identified by either of the two people involved (provided you consistently use the same role as identifier). You could create a separate entity set for mentoring, with relationships to the employees involved - it might look like my MENTORING table but with an additional surrogate key as PK, but there's no need for the extra identifier.
And why we can only do that if this is a many-many relationship?
What do you mean? Your examples don't contain a many-to-many relationship and we don't create additional entity sets for many-to-many relationships. If you're thinking of so-called "bridge" tables, you've got some concepts mixed up. Entity sets aren't tables. An entity set is a set of values, a table represents a relation over one or more sets of values. In Chen's original method, all relationships were represented in separate tables. It's just that we've gotten used to denormalizing simple one-to-one and one-to-many relationships into the same tables as entity attributes, but we can't do the same for many-to-many binary relationships or ternary and higher relationships in general.

foreign key or null value to link two long distanced tables

Consider the following database example :
A clinichas many articles
The relationship between supplier and article is many to many ( (1,n) - (1,n))
Let's say I have the clinic's id and I want to retrieve all it's suppliers, what's the best way to do it? is it by creating a "null" article for each supplier in article_supplier OR by creating a foreign key in supplier that references the appropriate clinic?
The latter solution may seem the simplest and easiest but what happens when there is a big chain of tables? Do I keep adding a foreign key each time I need a list of something? e.g :
list of medicines a clinic uses
list of prescriptions a doctor gave
...
If it makes a difference, I am using Laravel's Eloquent ORM

A table represents a relationship among values. (Some values identify entities.) A database can't be used until we are told what each table--base or query result--means: what business/application relationship its rows satisfy. (Ie, what is its (characteristic) predicate.) What are the relationships for your tables?
To query we express a relationship in terms of base relationships then express its table in terms of the corresponding base tables. Eg a join returns rows satisfying one relationship and another. So again, we need to know tables' relationships in business/application terms.
Cardinalities & constraints are properties of relationships given what situations can arise. They aren't needed to update or query. They can guide design & are used for integrity.
When, given a clinic, you talk about "its" suppliers, you do not say what you mean. "Has", "its", "for", "references", "appropriate"--all mean nothing--they refer to related entities, but they don't say how they are related in terms of the business/application. This design contains no explicit relationship on clinics & articles. If it did, you've said nothing by which we could put the right rows in or see the rows & know about the situation. Still, you could then derive clinic-supplier rows where the supplier is "for" some article a given clinic "has". But is that the relationship you mean?? Eg if you want pairs where the clinic is allowed to "have" the supplier "for" some articles, that's a new relationship/table that cannot be derived from what you have.
creating a "null" article for each supplier in article_supplier
That relationship/table is a certain combination of article_supplier & the relationship/table just described. But it is simpler to just have those two.
creating a foreign key in supplier that references the appropriate clinic
That would mean that if a supplier "has" more than one clinic then there can be rows in the new version that differ only by all the other columns; normalization theory says that's worse than the original design and a clinic_supplier relationship/table. And it means that if a supplier "has" no clinic you would need something like a nullable FK (foreign key).
So you likely want a clinic_supplier table. But you should post a new question in which you actually say what business/application relationships you are talking about.
Your question & the following are essentially duplicates in that the basic principles/notions to obviously apply to answer them are all the same:
How do I find relations between tables that are long-distance related?
Required to join 2 tables with their FKs in a 3rd table
Best Solution - Ternary or Binary Relationship
You need to read an information modeling & database design textbook.

In your design :
Supplier is a table.
Articles is a table.
Since Supplier supplies articles, their relationship is expressed in terms of a table that links them. So, there is a supplier_article table.
Clinic is a table.
You mention Clinic uses articles.
By the same principle as above, the relationship between Clinic and Article should be expressed in the form of a table. This should be clinic_article table.
Though you haven't mentioned, you must look at other dependencies that arise. For example, a clinic performs procedures and procedures use articles.
by creating a foreign key in supplier that references the appropriate
clinic
This would assume that one supplier can supply only to one clinic forever. Even if this is true today, it won't be true tomorrow, because you might want to plan for multiple clinics in the same location or using the same application or database.

Implementating Generalizations When At The Logical Design Stage Of Database Design?

I am designing a database and as i do not have much experience in this subject, i am faced with a problem which i do not know how to go about solving.
In my conceptual model i have an object known as "Vehicle" which the customer orders and the stock system monitors. This supertype has two subtypes "Motorcar" and "Motorcycle". The user can order one or the other or even both.
Now that i am at the logical design stage, i need to know how i can have the system allow for two different types of products. The problem i have is that if i put each of the objects separate attributes into the same relation, then i will have columns that are of no use to some objects.
For example, if i just have a generic table holding both "Motorcars" and "Motorcycles" which i call "Vehicles" and all of their attributes, the cars will not need some of the motorcycle attributes and the motorcycle will not need all of the car attributes.
Is there a way to solve this issue?

The decision will need to be guided by the amount of shared information. I would start by identifying all the attributes and the rules about them.
If the majority of information is shared, you might not split into multiple tables. On the other hand, you can always split tables and then join into a view for ease of use.
For instance, you might have a vehicle table with only share information, and then a motorcar table with a foreign key to the vehicles table and a motorcycle table with a foreign key to the vehicles table. There is a certain difficulty ensuring that you don't have a motorocar row AND a motorcycle row referring to the same vehicle, and so there are other possibilities to mitigate that - but all that is unnecessary if the majority of information is common, you just have unused columns in a single vehicles table. You can even enforce with constraints to ensure that columns are NULL for types where they should not be filled in.

What's more readable naming conventions for lookup tables?

We always name lookup tables - such as Countries,Cities,Regions ... etc - as below :
EntityName_LK OR LK_EntityName ( Countries_LK OR LK_Countries )
But I ask if any one have more better naming conversions for lookup tables ?
Edit:
We think to make postfix or prefix to solve like a conflict :
if we have User tables and lookup table for UserTypes (ID-Name) and we have a relation many to many between User & UserTypes that make us a table which we can name it like Types_For_User that may make confusion between UserTypes & Types_For_User So we like to make lookup table UserTypes to be like UserTypesLK to be obvious to all

Before you decide you need the "lookup" moniker, you should try to understand why you are designating some tables as "lookups" and not others. Each table should represent an entity unto itself.
What happens when a table that was designated as a "lookup" grows in scope and is no longer considered a "lookup"? You are either left with changing the table name which can be onerous or leaving it as is and having to explain to everyone that a given table isn't really a "lookup".
A common scenario mentioned in the comments related to a junction table. For example, suppose a User can have multiple "Types" which are expressed in a junction table with two foreign keys. Should that table be called User_UserTypes? To this scenario, I would first say that I prefer to use the suffix Member on the junction table. So we would have Users, UserTypes, UserTypeMembers. Secondly, the word "type" in this context is quite generic. Does a UserType really mean a Role? The term you use can make all the difference. If UserTypes are really Roles, then our table names become Users, Roles, RoleMembers which seems quite clear.

Here are two concerns for whether to use a prefix or suffix.
In a sorted list of tables, do you want the LK tables to be together or do you want all tables pertaining to EntityName to appear together
When programming in environments with auto-complete, are you likely to want to type "LK" to get the list of tables or the beginning of EntityName?
I think there are arguments for either, but I would choose to start with EntityName.

Every table can become a lookup table.
Consider that a person is a lookup in an Invoice table.
So in my opinion, tables should just be named the (singular) entity name, e.g. Person, Invoice.
What you do want is a standard for the column names and constraints, such as
FK_Invoice_Person (in table invoice, link to person)
PersonID or Person_ID (column in table invoice, linking to entity Person)
At the end of the day, it is all up to personal preference (if you can get away with dictating it) or team standards.
updated
If you have lookups that pertain only to entities, like Invoice_Terms which is a lookup from a list of 4 scenarios, then you could name it as Invoice_LK_Terms which would make it appear by name grouped under Invoice. Another way is to have a single lookup table for simple single-value lookups, separated by the function (table+column) it is for, e.g.
Lookups
Table | Column | Value

There is only one type of table and I don't believe there is any good reason for calling some tables "lookup" tables. Use a naming convention that works equally for every table.

One area where table naming conventions can help is data migration between environments. We often have to move data in lookup tables (which constrain values which may appear in other tables) along with schema changes, as these allowed value lists change. Currently we don't name lookup tables differently, but we are considering it to prevent the migration guy asking "which tables are lookup tables again?" every time.

Modelling inheritance in a database

For a database assignment I have to model a system for a school. Part of the requirements is to model information for staff, students and parents.
In the UML class diagram I have modelled this as those three classes being subtypes of a person type. This is because they will all require information on, among other things, address data.
My question is: how do I model this in the database (mysql)?
Thoughts so far are as follows:
Create a monolithic person table that contains all the information for each type and will have lots of null values depending on what type is being stored. (I doubt this would go down well with the lecturer unless I argued the case very convincingly).
A person table with three foreign keys which reference the subtypes but two of which will be null - in fact I'm not even sure if that makes sense or is possible?
According to this wikipage about django it's possible to implement the primary key on the subtypes as follows:
"id" integer NOT NULL PRIMARY KEY REFERENCES "supertype" ("id")
Something else I've not thought of...
So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
Links to articles/blog posts or previous questions are more than welcome.
Thanks for your time!
UPDATE
Alright thanks for the answers everyone. I already had a separate address table so that's not an issue.
Cheers,
Adam

4 tables staff, students, parents and person for the generic stuff.
Staff, students and parents have forign keys that each refer back to Person (not the other way around).
Person has field that identifies what the subclass of this person is (i.e. staff, student or parent).
EDIT:
As pointed out by HLGM, addresses should exist in a seperate table, as any person may have multiple addresses. (However - I'm about to disagree with myself - you may wish to deliberately constrain addresses to one per person, limiting the choices for mailing lists etc).

Well I think all approaches are valid and any lecturer who marks down for shoving it in one table (unless the requirements are specific to say you shouldn't) is removing a viable strategy due to their own personal opinion.
I highly recommend that you check out the documentation on NHibernate as this provides different approaches for performing the above. Which I will now attempt to poorly parrot.
Your options:
1) One table with all the data that has a "delimiter" column. This column states what kind of person the person is. This is viable in simple scenarios and (seriously) high performance where the joins will hurt too much
2) Table per class which will lead to duplication of columns but will avoid joins again, so its simple and a lil faster (although only a lil and indexing mitigates this in most scenarios).
3) "Proper" inheritence. The normalised version. You are almost there but your key is in the wrong place IMO. Your Employee table should contain a PersonId so you can then do:
select employee.id, person.name from employee inner join person on employee.personId = person.personId
To get all the names of employees where name is only specified on the person table.

I would go for #3.
Your goal is to impress a lecturer, not a PM or customer. Academics tend to dislike nulls and might (subconciously) penalise you for using the other methods (which rely on nulls.)
And you don't necessarily need that django extension (PRIMARY KEY ... REFERENCES ...) You could use an ordinary FOREIGN KEY for that.

"So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
"
Methods 1 and 3 are good. The differences are mostly in what your use cases are.
1) adaptability -- which is easier to change? Several separate tables with FK relations to the parent table.
2) performance -- which requires fewer joins? One single table.
Rats. No design accomplishes both.
Also, there's a third design in addition to your mono-table and FK-to-parent.
Three separate tables with some common columns (usually copy-and-paste of the superclass columns among all subclass tables). This is very flexible and easy to work with. But, it requires a union of the three tables to assemble an overall list.

OO databases go through the same stuff and come up with pretty much the same options.
If the point is to model subclasses in a database, you probably are already thinking along the lines of the solutions I've seen in real OO databases (leaving fields empty).
If not, you might think about creating a system that doesn't use inheritance in this way.
Inheritance should always be used quite sparingly, and this is probably a pretty bad case for it.
A good guideline is to never use inheritance unless you actually have code that does different things to the field of a "Parent" class than to the same field in a "Child" class. If business code in your class doesn't specifically refer to a field, that field absolutely shouldn't cause inheritance.
But again, if you are in school, that may not match what they are trying to teach...

The "correct" answer for the purposes of an assignment is probably #3 :
Person
PersonId Name Address1 Address2 City Country
Student
PersonId StudentId GPA Year ..
Staff
PersonId StaffId Salary ..
Parent
PersonId ParentId ParentType EmergencyContactNumber ..
Where PersonId is always the primary key, and also a foreign key in the last three tables.
I like this approach because it makes it easy to represent the same person having more than one role. A teacher could very well also be a parent, for example.

I suggest five tables
Person
Student
Staff
Parent
Address
WHy - because people can have multiple addesses and people can also have multiple roles and the information you want for staff is different than the information you need to store for parent or student.
Further you may want to store name as last_name, Middle_name, first_name, Name_suffix (like jr.) instead of as just name. Belive me you willwant to be able to search on last_name! Name is not unique, so you will need to make sure you have a unique surrogate primary key.
Please read up about normalization before trying to design a database. Here is a source to start with:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx

Super type Person should be created like this:
CREATE TABLE Person(PersonID int primary key, Name varchar ... etc ...)
All Sub types should be created like this:
CREATE TABLE IF NOT EXISTS Staffs(StaffId INT NOT NULL ,
PRIMARY KEY (StaffId) ,
CONSTRAINT FK_StaffId FOREIGN KEY (StaffId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Students(StudentId INT NOT NULL ,
PRIMARY KEY (StudentId) ,
CONSTRAINT FK_StudentId FOREIGN KEY (StudentId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Parents(PersonID INT NOT NULL ,
PRIMARY KEY (PersonID ) ,
CONSTRAINT FK_PersonID FOREIGN KEY (PersonID ) REFERENCES Person(PersonId)
)
Foreign key in subtypes staffs,students,parents adds two conditions:
Person row cannot be deleted unless corresponding subtype row will
not be deleted. For e.g. if there is one student entry in students
table referring to Person table, without deleting student entry
person entry cannot be deleted, which is very important. If Student
object is created then without deleting Student object we cannot
delete base Person object.
All base types have foreign key "not null" to make sure each base
type will have base type existing always. For e.g. If you create
Student object you must create Person object first.