Postgresql inheritance based database design - database

I'm developing a simple babysitter application that has 2 types of users: a 'Parent' and the 'Babysitter'. I'm using postgresql as my database but I'm having trouble working out my database design.
The 'Parent' and the 'Babysitter' entities have attributes that can be generalized, for example: username, password, email, ... Those attributes could be
placed into a parent entity called 'User'. They both also have their own attributes, for example: Babysitter -> age.
In terms of OOP things are very clear for me, just extend the user class and you are good to go but in DB design things are differently.
Before posting this question I roamed around the internet for a good week looking for insight into this 'issue'. I did find a lot of information but
it seemed to me that there was a lot a disagreement. Here are some of the posts I've read:
How do you effectively model inheritance in a database?: Table-Per-Type (TPT), Table-Per-Hierarchy (TPH) and Table-Per-Concrete (TPC) VS 'Forcing the RDb into a class-based requirements is simply incorrect.'
https://dba.stackexchange.com/questions/75792/multiple-user-types-db-design-advice:
Table: `users`; contains all similar fields as well as a `user_type_id` column (a foreign key on `id` in `user_types`
Table: `user_types`; contains an `id` and a `type` (Student, Instructor, etc.)
Table: `students`; contains fields only related to students as well as a `user_id` column (a foreign key of `id` on `users`)
Table: `instructors`; contains fields only related to instructors as well as a `user_id` column (a foreign key of `id` on `users`)
etc. for all `user_types`
https://dba.stackexchange.com/questions/36573/how-to-model-inheritance-of-two-tables-mysql/36577#36577
When to use inherited tables in PostgreSQL?: Inheritance in postgresql does not work as expected for me and a bunch of other users as the original poster points out.
I am really confused about which approach I should take. Class-table-inheritance (https://stackoverflow.com/tags/class-table-inheritance/info) seems like the most correct in
my OOP mindset but I would very much appreciate and updated DB minded opinion.

The way that I think of inheritance in the database world is "can only be one kind of." No other relational modeling technique works for that specific case; even with check constraints, with a strict relational model, you have the problem of putting the wrong "kind of" person into the wrong table. So, in your example, a user can be a parent or a babysitter, but not both. If a user can be more than one kind-of user, then inheritance is not the best tool to use.
The instructor/student relationship really only works well in the case where students cannot be instructors or vice-versa. If you have a TA, for example, it's better to model that using a strict relational design.
So, back to the parent-babysitter, your table design might look like this:
CREATE TABLE user (
id SERIAL,
full_name TEXT,
email TEXT,
phone_number TEXT
);
CREATE TABLE parent (
preferred_payment_method TEXT,
alternate_contact_info TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE babysitter (
age INT,
min_child_age INT,
preferred_payment_method TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE parent_babysitter (
parent_id INT REFERENCES parent(id),
babysitter_id INT REFERENCES babysitter(id),
PRIMARY KEY(parent_id, babysitter_id)
);
This model allows users to be "only one kind of" user - a parent or a babysitter. Notice how the primary key definitions are left to the child tables. In this model, you can have duplicated ID's between parent and babysitter, though this may not be a problem depending on how you write your code. (Note: Postgres is the only ORDBMS I know of with this restriction - Informix and Oracle, for example, have inherited keys on inherited tables)
Also see how we mixed the relational model in - we have a many-to-many relationship between parents and babysitters. That way we keep the entities separated, but we can still model a relationship without weird self-referencing keys.

All the options can be roughly represented by following cases:
base table + table for each class (class-table inheritance, Table-Per-Type, suggestions from the dba.stackexchange)
single table inheritance (Table-Per-Hierarchy) - just put everything into the single table
create independent tables for each class (Table-Per-Concrete)
I usually prefer option (1), because (2) and (3) are not completely correct in terms of DB design.
With (2) you will have unused columns for some rows (like "age" will be empty for Parent). And with (3) you may have duplicated data.
But you also need to think in terms of data access. With option (1) you will have the data spread over few tables, so to get Parent, you will need to use join operations to select data from both User and Parent tables.
I think that's the reason why options (2) and (3) exist - they are easier to use in terms of SQL queries (no joins are needed, you just select the data you need from one table).

Related

Database theory: best way to have a "flags" table which could apply to many entities?

I'm building a data model for a new project I'm developing. Part of this data model involves entities having "flags".
A flag is simply a boolean value - either an entity has the flag or it does not have the flag. To that end I have a table simply called "flags" that has an ID, a string name, and a description. (An example of a flag might be "is active" or "should be displayed" or "belongs to group".)
So for example, any user in my users table could have none, one, or many flags. So I create a userFlags bridge table with user ID and flag ID. If the table contains a row for the given flag ID and user ID, that user has that flag.
Ok, so now I add another entity - say "section". Each section can also have flags. So I create a sectionFlags table to accommodate this.
Now I have another entity - "content", so again, "contentFlags".
And so on.
My final data model has basically two tables per entity, one to hold the entity and one for flags.
While this certainly works, it seems like there may be a better way to design my model, so I don't have to have so many bridge tables. One idea I had was a master "hasFlags" table with flag ID, item ID and item type. The item type could be an enumerated field only accepting values corresponding to known entities. The only problem there is that my foreign key for the entity will not work because each "item ID" could refer to a different entity. (I have actually used this technique in other data models, and while it certainly works, you lose referential integrity as well as things like cascade updates.)
Or, perhaps my data model is fine as-is and that's just the nature of the beast.
Any more-advanced experienced DB devs care to chime in?
The many-to-many relationships are one way to do it (and possibly faster than what I'm about to suggest because they can use integer key indexes).
The other way to do this is with a polymorphic relationship.
Your entity-to-flag table needs 2 columns as well as the foreign key link to the flag table;
other_key integer not null
other_type varchar(...) not null
And in those fields you store the foreign key of the relation in the integer and the type of the relation in the varchar. Full-on ORMs that support this sometimes store the class name of the foreign relation in the type column, to aid with object loading.
The downside here is that the integer can't be a true foreign key as it will contain duplicates from many tables. It also makes your querying a bit more interesting per-join than the many-to-many tables, but it does allow you to generalise your join in code.

Relational database: indirect reference to a "foreign key"

I have a data schema similar to the following:
USERS:
id
name
email
phone number
...
PHOTOS:
id
width
height
filepath
...
I have an auditing table for any changes to the system
LOGS:
id
acting_user
date
record_type (enum: "users", "photos", "...")
record_id
record_field
new_value
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table? And effectively, the record_type and record_id together are a foreign key to the record in the other table? Is this an anti-pattern? (Note: new_value, and all the thing we would be logging are the same data type, strings).
Is this an anti-pattern?
Yes. Any pattern that makes you enforce referential integrity manually1 is an anti-pattern.
Here is why using FOREIGN KEYs is so important and here is what to do in cases like yours.
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table?
There is no standard term that I know of, but I heard people calling it "generic" or "polymorphic" FKs.
1 As opposed to FOREIGN KEYs built-into the DBMS.
Actually, I think 'Anti-Pattern' is a pretty good name for this set up, but it can be a realistic way to go - especially in this example.
I'll add a similar example with a new table which records LIKES of users' photos, etc, and show why it's bad. Then I'll explain why it might not ne too bad for your LOGS example.
The LIKES table is:
Id
LikedByUserId
RecordType ("users", "photos", "...")
RecordId
This is pretty much the same as the LOGS table. The problem with this is that you cannot make RecordId a foreign key to the USERS table as well as to the PHOTOS table as well as any other tables. If User 1234 is being liked, you couldn't insert it unless there was a PHOTO with ID 1234 and so on. For this reason, all RDBMS's that I know of will not let a Foreign Key be defined with multiple Primary keys - after all, Primary means 'only one' amongst other things.
So you'ld have to create the LIKES table with no relational integrity. This may not be a bad thinbg sometimes, but in this case I'd think I'd want an important table such as LIKES to have valid entries.
To do LIKES properly, I would create the table as:
Id
LikedByUserId (allow null)
PhotoId (allow null)
OtherThingId (allow null)
...and create the appropriate foreign keys. This will actually make queries that read the data easier to read and maintain and probably more efficient too.
However, for a table like LOGS which probably isn't central to the functionality of my system and I'm only doing some ad-hoc querying from to check what's been happening, then I might not want to put in the extra effort and add the complexity that results in more efficient reading. I'm not sure I would actually skip it, though. It is an anti-pattern but depending on usage it might be OK.
To emphasise the point, I would only do this if the system never queried the table; if the only people who look at the data are admin's running ad-hoc queries against it then it might be OK.
Cheers -

database design: a 'code' table that get referenced by other entities

I am building a database as a simple exercise, it could be hosted on any database server, so I am trying to keep things as much standard as possible. Basically what I would like to do is a 'code' table that get referenced by other entities. I explain:
xcode
id code
r role
p property
code
r admin
r staff
p title
....
then I would like to have some view like:
role (select * from code where xcode='r')
r admin
r staff
property (select * from code where xcode='p')
p title
then, suppose we have an entity
myentity
id - 1
role - admin (foreign key to role)
title - title (foreign key to property)
Obviously I cannot create foreign key to a view, but this is to tell the idea I have in mind. How can I reflect such behaviour using whenever possible, standard sql syntax, then as a second option, database additional features like trigger ecc... ?
Because if I tell that role and title in myentity are foreign key to 'code', instead of the views, nothing would stop me to insert a role in title field.
I have worked on systems with a single table for all codes and others with one table per code. I definitely prefer the latter approach.
The advantages of a table per code are:
Foreign keys. As you have already spotted it is not possible to enforce compliance to permitted values through foreign keys with a single table. Using check constraints is an alternative approach but it has a higher maintenance cost.
Performance. Code lookups are not normally a performance bottle neck, but it undoubtedly helps the optimizer to make sensible decisions about execution paths if it knows it is retrieving records from a table with four rows rather than four hundred.
Code groups. Sometimes we want to organise a code into sub-divisions, usually to make it easier to render complex lists of values. If we have a table per code we have more flexibility when it comes to structure.
In addition I notice that you want to be able to deploy "on any database server". In that case avoid triggers. Triggers are usually bad news in most scenarios, but they have product-specific syntax.
What you are trying to do is in most cases an anti pattern and design mistake. Just create the different tables instead of views.
There are some rare cases where this kind of design makes sense. In this kind include the xcode field in the primary key/ foreign key. So your entity will look like this:
myentity
id - 1
role_xcode
role - admin (foreign key to role)
title_xcode
title - title (foreign key to property)
You then can create check constraints to enforce role_xcode='r' and title_xcode='p'
(sorry I don't know if they are standard, they do exist in oracle and are so simple that I'd expect them on other rdbms's as well)

How to design DB table / schema with ease?

Is there a simple method to decide on what fields and indexes are needed for each table in an app you design?
For example, if it is a webapp that simply lets people create lists (any number of lists, and users can create "things to do" list or "shopping" list), and the user can assign other users to edit the list, and whether the list is viewable publicly or to only certain users, how can the tables be design so that it is very accurate and designed quickly? What about the indexes?
I did that in college and then revisited the question some time ago and have a method, but would like to find out if there are standard and good ways to do it out in the field.
Database design is hard ...
As with many things in life, it's a series of tradeoffs. The first thing you need to decide is what DBMS you will use, (MySQL, SQL Server, Oracle, PostgreSQL, one of the "Object-oriented" databases, etc.
Then you need to decide on normalization v. insane numbers of JOINs to get to your data. Questions like "how much logic will I implement in triggers, stored procedures, in app code, etc" need to be addressed.
There is no "Quick'n'Easy" way to design anything but the most trivial of databases.
'Course, that's just my experience. YMWV.
it is beyond the scope of this answer to fully explain database design
I generally break my design into three parts (part 1 and 2 happen up front, while 3 is usually near the project end)
1) create the tables based on relationships (parent/child/etc)
2) create fields based on content (parent has x atributes, etc)
3) create indexes last based on how you select data from your tables
Haven't heard of any formal approaches to this problem but there are rules of thumb. All nouns and business objects become tables, normalized of course. And I'd think the attributes sort of speak for themselves. I guess?
As for indexes, it just comes with working with the data. Any column that's joined off of deserves an index (maybe even clustered). It's very... depends. But there are patterns. But other than optimizing for joins, many indexes are directly related to how the data is used, and this isn't something that can be provided by rule of thumb. Like if you look up users by pk and elsewhere by last_name, last_name deserves an index.
I think the solution is a subjective one. When I have to design tables I look at the Java object that will represent that particular data model and go from there. You'll find a lot of frameworks (Django, CakePHP, RoR) have you develop the model and the frameworks will build the corresponding tables.
So I would suggest evaluating what functionality and data you need to store and develop your tables from that. Also look into whether the tool set you have at your disposal offers to generate the tables for you from the object structure.
I would go for the straightforward (almost) normalized design:
CREATE TABLE lists (
listid serial,
name varchar,
ownerid int references users(userid)
)
CREATE TABLE list_items (
listid int references lists(listid),
value varchar,
date datetime
)
CREATE TABLE permissions (
permissionid serial,
description varchar,
)
CREATE TABLE list_permissions (
listid int references lists(listid),
permissionid int references permissions(permissionid)
userid int references users(userid)
)
CREATE TABLE users (
userid serial,
name varchar
)
Which indexes to create would depend on what are the actual most used queries and how are they performing. For instance, if you query a lot on the lists and list_items (likely) you'd want an index on listid and on name, if you'll be searching by name.
Just some ideas. Hope they're helpful.
I'd try not to lock yourself in if you're still trying to see what works.
Just from your description, you'd want a table for your users' information, as well as:
tbl_lists:
ID_list (primary key)
UserID (foreign key to list owner)
ListName
tbl_listItems:
ID_listItem (primary key)
ListID (foreign key to list)
ItemDescription
tbl_permissions:
ID_permission (primary key)
ListID
UserID (foreign key to user you're granting permission to)
PermissionTypeID (what kind of permission)
tbl_permissionTypes:
ID_permissionType (primary key)
Description ("can view", "can edit", etc.)
The more flexible you can make things while you're designing, the better. You can optimize later.
If you want to keep things very simple and are not too concerned with normalizing. You could create one big table that stores the main object your webapp is based around, ex: lists, and have other smaller supporting tables link to the big table, ex: tbl_listType, tbl_permission, tbl_list_items).
Then when you write queries, you almost certainly include the main table and you can link in other supporting tables for more granular details.

Modelling inheritance in a database

For a database assignment I have to model a system for a school. Part of the requirements is to model information for staff, students and parents.
In the UML class diagram I have modelled this as those three classes being subtypes of a person type. This is because they will all require information on, among other things, address data.
My question is: how do I model this in the database (mysql)?
Thoughts so far are as follows:
Create a monolithic person table that contains all the information for each type and will have lots of null values depending on what type is being stored. (I doubt this would go down well with the lecturer unless I argued the case very convincingly).
A person table with three foreign keys which reference the subtypes but two of which will be null - in fact I'm not even sure if that makes sense or is possible?
According to this wikipage about django it's possible to implement the primary key on the subtypes as follows:
"id" integer NOT NULL PRIMARY KEY REFERENCES "supertype" ("id")
Something else I've not thought of...
So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
Links to articles/blog posts or previous questions are more than welcome.
Thanks for your time!
UPDATE
Alright thanks for the answers everyone. I already had a separate address table so that's not an issue.
Cheers,
Adam
4 tables staff, students, parents and person for the generic stuff.
Staff, students and parents have forign keys that each refer back to Person (not the other way around).
Person has field that identifies what the subclass of this person is (i.e. staff, student or parent).
EDIT:
As pointed out by HLGM, addresses should exist in a seperate table, as any person may have multiple addresses. (However - I'm about to disagree with myself - you may wish to deliberately constrain addresses to one per person, limiting the choices for mailing lists etc).
Well I think all approaches are valid and any lecturer who marks down for shoving it in one table (unless the requirements are specific to say you shouldn't) is removing a viable strategy due to their own personal opinion.
I highly recommend that you check out the documentation on NHibernate as this provides different approaches for performing the above. Which I will now attempt to poorly parrot.
Your options:
1) One table with all the data that has a "delimiter" column. This column states what kind of person the person is. This is viable in simple scenarios and (seriously) high performance where the joins will hurt too much
2) Table per class which will lead to duplication of columns but will avoid joins again, so its simple and a lil faster (although only a lil and indexing mitigates this in most scenarios).
3) "Proper" inheritence. The normalised version. You are almost there but your key is in the wrong place IMO. Your Employee table should contain a PersonId so you can then do:
select employee.id, person.name from employee inner join person on employee.personId = person.personId
To get all the names of employees where name is only specified on the person table.
I would go for #3.
Your goal is to impress a lecturer, not a PM or customer. Academics tend to dislike nulls and might (subconciously) penalise you for using the other methods (which rely on nulls.)
And you don't necessarily need that django extension (PRIMARY KEY ... REFERENCES ...) You could use an ordinary FOREIGN KEY for that.
"So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
"
Methods 1 and 3 are good. The differences are mostly in what your use cases are.
1) adaptability -- which is easier to change? Several separate tables with FK relations to the parent table.
2) performance -- which requires fewer joins? One single table.
Rats. No design accomplishes both.
Also, there's a third design in addition to your mono-table and FK-to-parent.
Three separate tables with some common columns (usually copy-and-paste of the superclass columns among all subclass tables). This is very flexible and easy to work with. But, it requires a union of the three tables to assemble an overall list.
OO databases go through the same stuff and come up with pretty much the same options.
If the point is to model subclasses in a database, you probably are already thinking along the lines of the solutions I've seen in real OO databases (leaving fields empty).
If not, you might think about creating a system that doesn't use inheritance in this way.
Inheritance should always be used quite sparingly, and this is probably a pretty bad case for it.
A good guideline is to never use inheritance unless you actually have code that does different things to the field of a "Parent" class than to the same field in a "Child" class. If business code in your class doesn't specifically refer to a field, that field absolutely shouldn't cause inheritance.
But again, if you are in school, that may not match what they are trying to teach...
The "correct" answer for the purposes of an assignment is probably #3 :
Person
PersonId Name Address1 Address2 City Country
Student
PersonId StudentId GPA Year ..
Staff
PersonId StaffId Salary ..
Parent
PersonId ParentId ParentType EmergencyContactNumber ..
Where PersonId is always the primary key, and also a foreign key in the last three tables.
I like this approach because it makes it easy to represent the same person having more than one role. A teacher could very well also be a parent, for example.
I suggest five tables
Person
Student
Staff
Parent
Address
WHy - because people can have multiple addesses and people can also have multiple roles and the information you want for staff is different than the information you need to store for parent or student.
Further you may want to store name as last_name, Middle_name, first_name, Name_suffix (like jr.) instead of as just name. Belive me you willwant to be able to search on last_name! Name is not unique, so you will need to make sure you have a unique surrogate primary key.
Please read up about normalization before trying to design a database. Here is a source to start with:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx
Super type Person should be created like this:
CREATE TABLE Person(PersonID int primary key, Name varchar ... etc ...)
All Sub types should be created like this:
CREATE TABLE IF NOT EXISTS Staffs(StaffId INT NOT NULL ,
PRIMARY KEY (StaffId) ,
CONSTRAINT FK_StaffId FOREIGN KEY (StaffId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Students(StudentId INT NOT NULL ,
PRIMARY KEY (StudentId) ,
CONSTRAINT FK_StudentId FOREIGN KEY (StudentId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Parents(PersonID INT NOT NULL ,
PRIMARY KEY (PersonID ) ,
CONSTRAINT FK_PersonID FOREIGN KEY (PersonID ) REFERENCES Person(PersonId)
)
Foreign key in subtypes staffs,students,parents adds two conditions:
Person row cannot be deleted unless corresponding subtype row will
not be deleted. For e.g. if there is one student entry in students
table referring to Person table, without deleting student entry
person entry cannot be deleted, which is very important. If Student
object is created then without deleting Student object we cannot
delete base Person object.
All base types have foreign key "not null" to make sure each base
type will have base type existing always. For e.g. If you create
Student object you must create Person object first.

Resources