How to implement a super class, sub class relationship in the database? - database

If I have a class called animal, dog and fish is the subclass.
The animal have attribute called "color".
Dog have the attribute called "tail length", and the fish don't have this attribute.
Fish have the attribute called "weight", the dog don't have this attribute.
So, I want to design a database to store this information. What should I do? Here is some ideas:
Idea 1:
Making an animal table, and the table have type, to find what kind of animal, if it is a dog, just get the result from dog table.
Animal:
color:String
type:int
Type:
Dog:0
Fish:1
Dog:
TailLength:int
Fish:
Weight:int
Idea 2:
Store only Dog table and Fish table in the database, remove the animal table.
Dog:
Color: String
TailLength: int
Fish:
Color: String
Weight: int

The two approaches you mentioned:
One table representing objects in the entire inheritance hierarchy, with all the columns you'd need for the entire hierarchy plus a "type" column to tell you which subclass a particular object is.
One table for each concrete class in your inheritance hierarchy, with duplicated schema.
can be supplemented by two others:
One table for each class in your inheritance hierarchy – you now have an Animal table, and subclasses have tables with foreign keys that point to the common set of data in Animal.
Generic schema – have a table to store objects, and an attribute table to support any set of attributes attached to that object.
Each approach has pros and cons. There's a good rundown of them here:
http://www.agiledata.org/essays/mappingObjects.html#ComparingTheStrategies
Also take a look at these SO topics:
Something like inheritance in database design
Help me to connect inheritance and relational concepts
Object-oriented-like structures in relational databases
How to do Inheritance Modeling in Relational Databases?
How do you effectively model inheritance in a database?
Finally, it should be noted that there are object-oriented databases (aka object databases, or OODBMSes) out there that represent objects more naturally in the database, and could easily solve this problem, though I don't think they're as frequently used in the industry. Here are some links that describe such DBs as compared to relational (and other) DBs, though they won't give you an entirely objective (heh) view on the matter:
What is the difference between graph-based databases and object-oriented databases?
"In defense of the RDBMS", Gavin King, 2007
"The relational database needs no defense" (in support of OODBMSes), Ted Neward, 2007
Object Oriented Database - why most of the companies do not use them

You could try it like this:
Animal
PK animal_id
FK animal_type
STRING animal_name (eg. 'Lassie')
AnimalTypes
PK animal_type
STRING animal_type_name (eg. 'Dog')
AnimalAttributes
PK attribute_id
STRING attribute_name (eg. 'tail length')
AnimalToAttributes
PK id
FK animal_id
FK attribute_id
INTEGER value (eg. 20)
This way you can have one or many attributes per animal (it's up to you to choose).

Use a one to zero or one relationship As you note, In database schema design language the tables are called class - sub-class or superclass
Create Table Animal
(animalId Integer Primary Key Not null,
Other columns generic to all animals)
Create Table Birds
(BirdId Integer Primary Key Not Null
references Animal(AnimalId),
-- other columns)

Related

Data modeling: Relationship between tables

I'm developing a Pokémon app to learning how work pokemon internally. My question is what is best option to model?
A little context:
A Pokémon has types (Water, Fire, Grass...)
A type has effectiveness and weaknesses.
Example:
Efectiveness
Weaknesses
Water
Fire
Grass
Fire
Grass
Water
1.- Can i model a table with type but how can relation them?
2.- I can use graph teory, but how do i represent it in database?
This development was born out of the same curiosity.
Thanks!
Create table with relation, but i think is not efficient
This is a many-to-many relationship. Multipe types can have the same strength, and a type can have multiple strengths. Many to many relationship require an intermediate bridge table. And in this case, it's a loop back to the same table, as a type is relating to a type. I don't know how to represent ERD symbols in this dialog box, but this should give you the basic picture. Three tables are pictured below, Pokemon, Types and TypeModifiers:
Pokemon TypeModifiers
------- -------------
[ type id ] --> Types <--- [base type id
[type id] <---- modifier type id
strength-or-weakness code]
Create a foreign key on TypeModifiers.base_type_id to point to Types.type_id
Create another foreign key on TypeModifiers.modifier_type_id to also point to Types.type_id
This will allow you to model any possible relationship between a type and all other types. You could do strengths and weaknesses as separate tables, but it's simpler on queries if you combine them and differentiate with a column value, e.g. 'S' vs. 'W'.

What is the right way to use Associative Entity?

This is the description:
Draw an Entity-Relationship diagram for Poke-Hospital which provides
medical service to pokemon.
Each pokemon has an appointment with one of the nurse Joys. In
addition to recording the name, type and trainer of each pokemon, the
system needs to keep track of the multiple types of sickness being
diagnosed for the pokemon. During an appointment, the nurse will
always prescribe medicine. It is required to record the date, time and
dosage of the medicine. A pokemon may need to take more than one
medicine at a time. Each medicine is stored with its name, brand and
cost of purchase. There is no restriction on the amount of medicine to
be prescribed by any nurse.
Within an appointment, a pokemon may need to undergo procedures such
as a surgery and/or diagnosis. Each procedure requires different type
of rooms and a list of equipment. The date, time and the actual room
of the procedure need to be recorded.
A procedure may be performed by more than one nurse. A nurse is
involved in the procedure based on the training skills that she has
completed. Not all nurses are qualified to perform procedures.
Name, pager number as well as office number for each nurse most be
known. Your diagram should show the entities, relationships and their
attributes, and the cardinality of any relationships. Mark the best
primary key for each entity by underlining it.
This is my solution:
Here are my questions:
Should I use Have Appointment as associative entity?
Should I remove 2 relationships Undergo and Prescribe and connect 2
entities Procedure and Appointment Medicine directly to Have
Appointment associative entity? Will the ERD still right then?
If it's wrong, what about the same as question 2 and I turn the Have
Appointment associative entity into a relationship?
I feel really confused about the difference between using associative entity with a relationship (like in this post Enrollment with Teach and Teacher: When to use Associative entities?) and using ternary relationship (connect Teacher directly to Enrollment relationship instead of changing Enrollment to an associative entity and have the Teach relationship).
Should I use Have Appointment as associative entity?
No, I believe it should be a regular entity set. You gave it its own identity - the ID primary key - which I agree with, but that should've corresponded with a change in element type. Associative entity sets (AES) are relationships first, which means they're identified by the (keys of the) entity sets that they relate.
This is a topic that's widely confused, since AES in the entity-relationship model are different than in the network data model. The latter is intuitively more familiar to developers, since it's essentially a model based on records and pointers, but since it only supports directed binary relationships, anything more complicated - many-to-many relationships as well as ternary and higher relationships - need to be represented as AES. In this model, AES are identified by a surrogate ID, since composite keys generally aren't supported either.
The entity-relationship model supports n-ary relationships and composite keys, and so doesn't need AES nearly as frequently. One situation that can't be represented by regular entity sets and n-ary relationships is when a relationship needs to be the subject of a further relationship.
For example, let's look at the relationship between Procedure and Nurse to represent the nurses involved in a procedure.
I prefer the look-across convention for cardinality indicators - a nurse can perform 0 or more procedures, while a procedure requires 1 or more nurses. Anyway, the relationship Perform here is identified by the composite primary key (ProcedureID, NurseID).
Now, if we wanted to track the equipment used by each nurse in the performance of the procedure, we might think a simple ternary relationship would do the trick:
but that relationship would be identified by (ProcedureID, NurseID, EquipmentID), preventing us from recording nurses that assisted in the procedure without using any equipment. What we need is two separate relationships:
(ProcedureID, NurseID)
((ProcedureID, NurseID), EquipmentID)
with an FK constraint from the second to the first to prevent nurses not assisting in the procedure from handling the equipment.
Back to Have Appointment - it's not a relationship between pokemon and nurses (a pokemon can see the same nurse multiple times), it's an event that involves pokemon, nurses, procedures and medicine. It's best handled as a regular entity set with relationships to the other four. As for identity, I imagine a pokemon or nurse can only have one appointment at a time, so we could choose (PokemonID, DateTime) or (NurseID, DateTime) as a natural key. However, in practice we usually identify events by a surrogate ID since events span an interval which most DBMSs can't handle effectively as a primary key.
Should I remove 2 relationships Undergo and Prescribe and connect 2 entities Procedure and Appointment Medicine directly to Have Appointment associative entity? Will the ERD still right then?
No, I think you should add relationships between Pokemon and Have Appointment, and between Nurse and Have Appointment, after converting the AES to a regular entity set.
If it's wrong, what about the same as question 2 and I turn the Have Appointment associative entity into a relationship?
Answered above.

Entity Relationship Diagram. How does the IS A relationship translate into tables?

I was simply wondering, how an ISA relationship in an ER diagram would translate into tables in a database.
Would there be 3 tables? One for person, one for student, and one for Teacher?
Or would there be 2 tables? One for student, and one for teacher, with each entity having the attributes of person + their own?
Or would there be one table with all 4 attributes and some of the squares in the table being null depending on whether it was a student or teacher in the row?
NOTE: I forgot to add this, but there is full coverage for the ISA relationship, so a person must be either a studen or a teacher.
Assuming the relationship is mandatory (as you said, a person has to be a student or a teacher) and disjoint (a person is either a student or a teacher, but not both), the best solution is with 2 tables, one for students and one for teachers.
If the participation is instead optional (which is not your case, but let's put it for completeness), then the 3 tables option is the way to go, with a Person(PersonID, Name) table and then the two other tables which will reference the Person table, e.g.
Student(PersonID, GPA), with PersonID being PK and FK referencing Person(PersonID).
The 1 table option is probably not the best way here, and it will produce several records with null values (if a person is a student, the teacher-only attributes will be null and vice-versa).
If the disjointness is different, then it's a different story.
there are 4 options you can use to map this into an ER,
option 1
Person(SIN,Name)
Student(SIN,GPA)
Teacher(SIN,Salary)
option 2 Since this is a covering relationship, option 2 is not a good match.
Student(SIN,Name,GPA)
Teacher(SIN,Name,Salary)
option 3
Person(SIN,Name,GPA,Salary,Person_Type)
person type can be student/teacher
option 4
Person(SIN,Name,GPA,Salary,Student,Teacher) Student and Teacher are bool type fields, it can be yes or no,a good option for overlapping
Since the sub classes don't have much attributes, option 3 and option 4 are better to map this into an ER
This answer could have been a comment but I am putting it up here for the visibility.
I would like to address a few things that the chosen answer failed to address - and maybe elaborate a little on the consequences of the "two table" design.
The design of your database depends on the scope of your application and the type of relations and queries you want to perform. For example, if you have two types of users (student and teacher) and you have a lot of general relations that all users can part take, regardless of their type, then the two table design may end up with a lot of "duplicated" relations (like users can subscribe to different newsletters, instead of having one M2M relationship table between "users" and newsletters, you'll need two separate tables to represent that relation). This issue worsens if you have three different types of users instead of two, or if you have an extra layer of IsA in your hierarchy (part-time vs full-time students).
Another issue to consider - the types of constraints you want to implement. If your users have emails and you want to maintain a user-wide unique constraint on emails, then the implementation is trickier for a two-table design - you'll need to add an extra table for every unique constraint.
Another issue to consider is just duplications, generally. If you want to add a new common field to users, you'll need to do it multiple times. If you have unique constraints on that common field, you'll need a new table for that unique constraint too.
All of this is not to say that the two table design isn't the right solution. Depending on the type of relations, queries and features you are building, you may want to pick one design over the other, like is the case for most design decisions.
It depends entirely on the nature of the relationships.
IF the relationship between a Person and a Student is 1 to N (one to many), then the correct way would be to create a foreign key relationship, where the Student has a foreign key back to the Person's ID Primary Key Column. Same thing for the Person to Teacher relationship.
However, if the relationship is M to N (many to many), then you would want to create a separate table containing those relationships.
Assuming your ERD uses 1 to N relationships, your table structure ought to look something like this:
CREATE TABLE Person
(
sin bigint,
name text,
PRIMARY KEY (sin)
);
CREATE TABLE Student
(
GPA float,
fk_sin bigint,
FOREIGN KEY (fk_sin) REFERENCES Person(sin)
);
and follow the same example for the Teacher table. This approach will get you to 3rd Normal Form most of the time.

a layman's term for identifying relationship

There are couples of questions around asking for difference / explanation on identifying and non-identifying relationship in relationship database.
My question is, can you think of a simpler term for these jargons? I understand that technical terms have to be specific and unambiguous though. But having an 'alternative name' might help students relate more easily to the concept behind.
We actually want to use a more layman term in our own database modeling tool, so that first-time users without much computer science background could learn faster.
cheers!
I often see child table or dependent table used as a lay term. You could use either of those terms for a table with an identifying relationship
Then say a referencing table is a table with a non-identifying relationship.
For example, PhoneNumbers is a child of Users, because a phone number has an identifying relationship with its user (i.e. the primary key of PhoneNumbers includes a foreign key to the primary key of Users).
Whereas the Users table has a state column that is a foreign key to the States table, making it a non-identifying relationship. So you could say Users references States, but is not a child of it per se.
I think belongs to would be a good name for the identifying relationship.
A "weak entity type" does not have its own key, just a "partial key", so each entity instance of this weak entity type has to belong to some other entity instance so it can be identified, and this is an "identifying relationship". For example, a landlord could have a database with apartments and rooms. A room can be called kitchen or bathroom, and while that name is unique within an apartment, there will be many rooms in the database with the name kitchen, so it is just a partial key. To uniquely identify a room in the database, you need to say that it is the kitchen in this particular apartment. In other words, the rooms belong to apartments.
I'm going to recommend the term "weak entity" from ER modeling.
Some modelers conceptualize the subject matter as being made up of entities and relationships among entities. This gives rise to Entity-Relationship Modeling (ER Modeling). An attribute can be tied to an entity or a relationship, and values stored in the database are instances of attributes.
If you do ER modeling, there is a kind of entity called a "weak entity". Part of the identity of a weak entity is the identity of a stronger entity, to which the weak one belongs.
An example might be an order in an order processing system. Orders are made up of line items, and each line item contains a product-id, a unit-price, and a quantity. But line items don't have an identifying number across all orders. Instead, a line item is identified by {item number, order number}. In other words, a line item can't exist unless it's part of exactly one order. Item number 1 is the first item in whatever order it belongs to, but you need both numbers to identify an item.
It's easy to turn an ER model into a relational model. It's also easy for people who are experts in the data but know nothing about databases to get used to an ER model of the data they understand.
There are other modelers who argue vehemently against the need for ER modeling. I'm not one of them.
Nothing, absolutely nothing in the kind of modeling where one encounters things such as "relationships" (ER, I presume) is "technical", "precise" or "unambiguous". Nor can it be.
A) ER modeling is always and by necessity informal, because it can never be sufficient to capture/express the entire definition of a database.
B) There are so many different ER dialects out there that it is just impossible for all of them to use exactly the same terms with exactly the same meaning. Recently, I even discovered that some UK university that teaches ER modeling, uses the term "entity subtype" for the very same thing that I always used to name "entity supertype", and vice-versa !
One could use connection.
You have Connection between two tables, where the IDs are the same.
That type of thing.
how about
Association
Link
Correlation

Modelling inheritance in a database

For a database assignment I have to model a system for a school. Part of the requirements is to model information for staff, students and parents.
In the UML class diagram I have modelled this as those three classes being subtypes of a person type. This is because they will all require information on, among other things, address data.
My question is: how do I model this in the database (mysql)?
Thoughts so far are as follows:
Create a monolithic person table that contains all the information for each type and will have lots of null values depending on what type is being stored. (I doubt this would go down well with the lecturer unless I argued the case very convincingly).
A person table with three foreign keys which reference the subtypes but two of which will be null - in fact I'm not even sure if that makes sense or is possible?
According to this wikipage about django it's possible to implement the primary key on the subtypes as follows:
"id" integer NOT NULL PRIMARY KEY REFERENCES "supertype" ("id")
Something else I've not thought of...
So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
Links to articles/blog posts or previous questions are more than welcome.
Thanks for your time!
UPDATE
Alright thanks for the answers everyone. I already had a separate address table so that's not an issue.
Cheers,
Adam
4 tables staff, students, parents and person for the generic stuff.
Staff, students and parents have forign keys that each refer back to Person (not the other way around).
Person has field that identifies what the subclass of this person is (i.e. staff, student or parent).
EDIT:
As pointed out by HLGM, addresses should exist in a seperate table, as any person may have multiple addresses. (However - I'm about to disagree with myself - you may wish to deliberately constrain addresses to one per person, limiting the choices for mailing lists etc).
Well I think all approaches are valid and any lecturer who marks down for shoving it in one table (unless the requirements are specific to say you shouldn't) is removing a viable strategy due to their own personal opinion.
I highly recommend that you check out the documentation on NHibernate as this provides different approaches for performing the above. Which I will now attempt to poorly parrot.
Your options:
1) One table with all the data that has a "delimiter" column. This column states what kind of person the person is. This is viable in simple scenarios and (seriously) high performance where the joins will hurt too much
2) Table per class which will lead to duplication of columns but will avoid joins again, so its simple and a lil faster (although only a lil and indexing mitigates this in most scenarios).
3) "Proper" inheritence. The normalised version. You are almost there but your key is in the wrong place IMO. Your Employee table should contain a PersonId so you can then do:
select employee.id, person.name from employee inner join person on employee.personId = person.personId
To get all the names of employees where name is only specified on the person table.
I would go for #3.
Your goal is to impress a lecturer, not a PM or customer. Academics tend to dislike nulls and might (subconciously) penalise you for using the other methods (which rely on nulls.)
And you don't necessarily need that django extension (PRIMARY KEY ... REFERENCES ...) You could use an ordinary FOREIGN KEY for that.
"So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
"
Methods 1 and 3 are good. The differences are mostly in what your use cases are.
1) adaptability -- which is easier to change? Several separate tables with FK relations to the parent table.
2) performance -- which requires fewer joins? One single table.
Rats. No design accomplishes both.
Also, there's a third design in addition to your mono-table and FK-to-parent.
Three separate tables with some common columns (usually copy-and-paste of the superclass columns among all subclass tables). This is very flexible and easy to work with. But, it requires a union of the three tables to assemble an overall list.
OO databases go through the same stuff and come up with pretty much the same options.
If the point is to model subclasses in a database, you probably are already thinking along the lines of the solutions I've seen in real OO databases (leaving fields empty).
If not, you might think about creating a system that doesn't use inheritance in this way.
Inheritance should always be used quite sparingly, and this is probably a pretty bad case for it.
A good guideline is to never use inheritance unless you actually have code that does different things to the field of a "Parent" class than to the same field in a "Child" class. If business code in your class doesn't specifically refer to a field, that field absolutely shouldn't cause inheritance.
But again, if you are in school, that may not match what they are trying to teach...
The "correct" answer for the purposes of an assignment is probably #3 :
Person
PersonId Name Address1 Address2 City Country
Student
PersonId StudentId GPA Year ..
Staff
PersonId StaffId Salary ..
Parent
PersonId ParentId ParentType EmergencyContactNumber ..
Where PersonId is always the primary key, and also a foreign key in the last three tables.
I like this approach because it makes it easy to represent the same person having more than one role. A teacher could very well also be a parent, for example.
I suggest five tables
Person
Student
Staff
Parent
Address
WHy - because people can have multiple addesses and people can also have multiple roles and the information you want for staff is different than the information you need to store for parent or student.
Further you may want to store name as last_name, Middle_name, first_name, Name_suffix (like jr.) instead of as just name. Belive me you willwant to be able to search on last_name! Name is not unique, so you will need to make sure you have a unique surrogate primary key.
Please read up about normalization before trying to design a database. Here is a source to start with:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx
Super type Person should be created like this:
CREATE TABLE Person(PersonID int primary key, Name varchar ... etc ...)
All Sub types should be created like this:
CREATE TABLE IF NOT EXISTS Staffs(StaffId INT NOT NULL ,
PRIMARY KEY (StaffId) ,
CONSTRAINT FK_StaffId FOREIGN KEY (StaffId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Students(StudentId INT NOT NULL ,
PRIMARY KEY (StudentId) ,
CONSTRAINT FK_StudentId FOREIGN KEY (StudentId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Parents(PersonID INT NOT NULL ,
PRIMARY KEY (PersonID ) ,
CONSTRAINT FK_PersonID FOREIGN KEY (PersonID ) REFERENCES Person(PersonId)
)
Foreign key in subtypes staffs,students,parents adds two conditions:
Person row cannot be deleted unless corresponding subtype row will
not be deleted. For e.g. if there is one student entry in students
table referring to Person table, without deleting student entry
person entry cannot be deleted, which is very important. If Student
object is created then without deleting Student object we cannot
delete base Person object.
All base types have foreign key "not null" to make sure each base
type will have base type existing always. For e.g. If you create
Student object you must create Person object first.

Resources