Database table circular reference between master and child table - database

I have two tables
person
person_photos
with one-to-many relationship (i.e. each person can have list of photos)
e.g.
person {
person_id number, <<THIS IS PK>>
person_name varchar,
other_columns...
}
person_photos {
person_photo_id number,<<THIS IS PK>>
person_id number, <<THIS IS FK>>
photo blob
}
I want one of the photo marked as default. Is it ok to have reference to the default photo in master table
i.e.
person {
person_id number,<<THIS IS PK>>
person_name varchar,
other_columns...
default_person_photo_id number <<Reference to child table>>
}
This basically creates the circular reference between two table.
Is there any issue with this approach?
Or any other better way of doing it?
Note:
I can introduce one column in person_photo table to mark which one is default however I primarily introducing this default photo id in master table to avoid getting that information by joinin the photo table
I can also create a mapping table, but I would like go with that approach only if there is any issue circular design

Part of this depends on what RDBMS you are using. If you are using one without partial unique indexes (like MySQL for example) then then that is probably the best way you can do it.
On the other hand if you can have partial unique indexes then you can do as follows:
Remove person.default_person_photo_id
Add a boolean person_photos.is_default fild
CREATE UNIQUE INDEX default_person_photos_idx ON person_photos(person_id) WHERE is_default
Then you can never have more than one, and if you search for a photo based on a person_id where is_default then the index can be used, possibly saving you a join.
So in answer to your question without knowing your rdbms's capabilities, I can't say you there is a better way, and you certainly aren't doing anything wrong. But for some RDBMSs there is a better way.

Related

Postgresql inheritance based database design

I'm developing a simple babysitter application that has 2 types of users: a 'Parent' and the 'Babysitter'. I'm using postgresql as my database but I'm having trouble working out my database design.
The 'Parent' and the 'Babysitter' entities have attributes that can be generalized, for example: username, password, email, ... Those attributes could be
placed into a parent entity called 'User'. They both also have their own attributes, for example: Babysitter -> age.
In terms of OOP things are very clear for me, just extend the user class and you are good to go but in DB design things are differently.
Before posting this question I roamed around the internet for a good week looking for insight into this 'issue'. I did find a lot of information but
it seemed to me that there was a lot a disagreement. Here are some of the posts I've read:
How do you effectively model inheritance in a database?: Table-Per-Type (TPT), Table-Per-Hierarchy (TPH) and Table-Per-Concrete (TPC) VS 'Forcing the RDb into a class-based requirements is simply incorrect.'
https://dba.stackexchange.com/questions/75792/multiple-user-types-db-design-advice:
Table: `users`; contains all similar fields as well as a `user_type_id` column (a foreign key on `id` in `user_types`
Table: `user_types`; contains an `id` and a `type` (Student, Instructor, etc.)
Table: `students`; contains fields only related to students as well as a `user_id` column (a foreign key of `id` on `users`)
Table: `instructors`; contains fields only related to instructors as well as a `user_id` column (a foreign key of `id` on `users`)
etc. for all `user_types`
https://dba.stackexchange.com/questions/36573/how-to-model-inheritance-of-two-tables-mysql/36577#36577
When to use inherited tables in PostgreSQL?: Inheritance in postgresql does not work as expected for me and a bunch of other users as the original poster points out.
I am really confused about which approach I should take. Class-table-inheritance (https://stackoverflow.com/tags/class-table-inheritance/info) seems like the most correct in
my OOP mindset but I would very much appreciate and updated DB minded opinion.
The way that I think of inheritance in the database world is "can only be one kind of." No other relational modeling technique works for that specific case; even with check constraints, with a strict relational model, you have the problem of putting the wrong "kind of" person into the wrong table. So, in your example, a user can be a parent or a babysitter, but not both. If a user can be more than one kind-of user, then inheritance is not the best tool to use.
The instructor/student relationship really only works well in the case where students cannot be instructors or vice-versa. If you have a TA, for example, it's better to model that using a strict relational design.
So, back to the parent-babysitter, your table design might look like this:
CREATE TABLE user (
id SERIAL,
full_name TEXT,
email TEXT,
phone_number TEXT
);
CREATE TABLE parent (
preferred_payment_method TEXT,
alternate_contact_info TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE babysitter (
age INT,
min_child_age INT,
preferred_payment_method TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE parent_babysitter (
parent_id INT REFERENCES parent(id),
babysitter_id INT REFERENCES babysitter(id),
PRIMARY KEY(parent_id, babysitter_id)
);
This model allows users to be "only one kind of" user - a parent or a babysitter. Notice how the primary key definitions are left to the child tables. In this model, you can have duplicated ID's between parent and babysitter, though this may not be a problem depending on how you write your code. (Note: Postgres is the only ORDBMS I know of with this restriction - Informix and Oracle, for example, have inherited keys on inherited tables)
Also see how we mixed the relational model in - we have a many-to-many relationship between parents and babysitters. That way we keep the entities separated, but we can still model a relationship without weird self-referencing keys.
All the options can be roughly represented by following cases:
base table + table for each class (class-table inheritance, Table-Per-Type, suggestions from the dba.stackexchange)
single table inheritance (Table-Per-Hierarchy) - just put everything into the single table
create independent tables for each class (Table-Per-Concrete)
I usually prefer option (1), because (2) and (3) are not completely correct in terms of DB design.
With (2) you will have unused columns for some rows (like "age" will be empty for Parent). And with (3) you may have duplicated data.
But you also need to think in terms of data access. With option (1) you will have the data spread over few tables, so to get Parent, you will need to use join operations to select data from both User and Parent tables.
I think that's the reason why options (2) and (3) exist - they are easier to use in terms of SQL queries (no joins are needed, you just select the data you need from one table).

Relational database: indirect reference to a "foreign key"

I have a data schema similar to the following:
USERS:
id
name
email
phone number
...
PHOTOS:
id
width
height
filepath
...
I have an auditing table for any changes to the system
LOGS:
id
acting_user
date
record_type (enum: "users", "photos", "...")
record_id
record_field
new_value
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table? And effectively, the record_type and record_id together are a foreign key to the record in the other table? Is this an anti-pattern? (Note: new_value, and all the thing we would be logging are the same data type, strings).
Is this an anti-pattern?
Yes. Any pattern that makes you enforce referential integrity manually1 is an anti-pattern.
Here is why using FOREIGN KEYs is so important and here is what to do in cases like yours.
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table?
There is no standard term that I know of, but I heard people calling it "generic" or "polymorphic" FKs.
1 As opposed to FOREIGN KEYs built-into the DBMS.
Actually, I think 'Anti-Pattern' is a pretty good name for this set up, but it can be a realistic way to go - especially in this example.
I'll add a similar example with a new table which records LIKES of users' photos, etc, and show why it's bad. Then I'll explain why it might not ne too bad for your LOGS example.
The LIKES table is:
Id
LikedByUserId
RecordType ("users", "photos", "...")
RecordId
This is pretty much the same as the LOGS table. The problem with this is that you cannot make RecordId a foreign key to the USERS table as well as to the PHOTOS table as well as any other tables. If User 1234 is being liked, you couldn't insert it unless there was a PHOTO with ID 1234 and so on. For this reason, all RDBMS's that I know of will not let a Foreign Key be defined with multiple Primary keys - after all, Primary means 'only one' amongst other things.
So you'ld have to create the LIKES table with no relational integrity. This may not be a bad thinbg sometimes, but in this case I'd think I'd want an important table such as LIKES to have valid entries.
To do LIKES properly, I would create the table as:
Id
LikedByUserId (allow null)
PhotoId (allow null)
OtherThingId (allow null)
...and create the appropriate foreign keys. This will actually make queries that read the data easier to read and maintain and probably more efficient too.
However, for a table like LOGS which probably isn't central to the functionality of my system and I'm only doing some ad-hoc querying from to check what's been happening, then I might not want to put in the extra effort and add the complexity that results in more efficient reading. I'm not sure I would actually skip it, though. It is an anti-pattern but depending on usage it might be OK.
To emphasise the point, I would only do this if the system never queried the table; if the only people who look at the data are admin's running ad-hoc queries against it then it might be OK.
Cheers -

Organizing database tables - large number of properties

I have a database that stores some users in it. Each user has its account settings, privacy settings and lots of other properties to set. The number of those properties started to grow and I could end up with 30 properties or so.
Till now, I used to keep it in "UserInfo" table having User and UserInfo related as One-To-Many (keeping a log of all changes). Putting it in a single "UserInfo" table doesn't sound nice and, at least in the database model, it would look messy. What's the solution?
Separating privacy settings, account settings and other "groups" of settings in separate tables and have 1-1 relations between UserInfo and each group of settings table is one solution, but would that be too slow (or much slower) when retrieving the data? I guess all data would not be presented on a single page at the same moment. So maybe having one-to-many relationships to each table is a solution too (keeping log of each group separately)?
If it's only 30 properties, I'd recommend just creating 30 columns. That's not too much for a modern database to handle.
But I would guess that if you ahve 30 properties today, you will continue to invent new properties as time goes on, and the number of columns will keep growing. Restructuring your table to add columns every day may become time-consuming as you get lots of rows.
For an alternative solution check out this blog for a nifty solution for storing lots of dynamic attributes in a "schemaless" way: How FriendFeed Uses MySQL.
Basically, collect all the properties into some format and store it in a single TEXT column. The format is semi-structured, that is your application can separate the properties if needed but you can also add more at any time, or even have different properties per row. XML or YAML or JSON are example formats, or some object serialization format supported by your application code language.
CREATE TABLE Users (
user_id SERIAL PRIMARY KEY,
user_proerties TEXT
);
This makes it hard to search for a given value in a given property. So in addition to the TEXT column, create an auxiliary table for each property you want to be searchable, with two columns: values of the given property, and a foreign key back to the main table where that particular value is found. Now you have can index the column so lookups are quick.
CREATE TABLE UserBirthdate (
user_id BIGINT UNSIGNED PRIMARY KEY,
birthdate DATE NOT NULL,
FOREIGN KEY (user_id) REFERENCES Users(user_id),
KEY (birthdate)
);
SELECT u.* FROM Users AS u INNER JOIN UserBirthdate b USING (user_id)
WHERE b.birthdate = '2001-01-01';
This means as you insert or update a row in Users, you also need to insert or update into each of your auxiliary tables, to keep it in sync with your data. This could grow into a complex chore as you add more auxiliary tables.

What's more readable naming conventions for lookup tables?

We always name lookup tables - such as Countries,Cities,Regions ... etc - as below :
EntityName_LK OR LK_EntityName ( Countries_LK OR LK_Countries )
But I ask if any one have more better naming conversions for lookup tables ?
Edit:
We think to make postfix or prefix to solve like a conflict :
if we have User tables and lookup table for UserTypes (ID-Name) and we have a relation many to many between User & UserTypes that make us a table which we can name it like Types_For_User that may make confusion between UserTypes & Types_For_User So we like to make lookup table UserTypes to be like UserTypesLK to be obvious to all
Before you decide you need the "lookup" moniker, you should try to understand why you are designating some tables as "lookups" and not others. Each table should represent an entity unto itself.
What happens when a table that was designated as a "lookup" grows in scope and is no longer considered a "lookup"? You are either left with changing the table name which can be onerous or leaving it as is and having to explain to everyone that a given table isn't really a "lookup".
A common scenario mentioned in the comments related to a junction table. For example, suppose a User can have multiple "Types" which are expressed in a junction table with two foreign keys. Should that table be called User_UserTypes? To this scenario, I would first say that I prefer to use the suffix Member on the junction table. So we would have Users, UserTypes, UserTypeMembers. Secondly, the word "type" in this context is quite generic. Does a UserType really mean a Role? The term you use can make all the difference. If UserTypes are really Roles, then our table names become Users, Roles, RoleMembers which seems quite clear.
Here are two concerns for whether to use a prefix or suffix.
In a sorted list of tables, do you want the LK tables to be together or do you want all tables pertaining to EntityName to appear together
When programming in environments with auto-complete, are you likely to want to type "LK" to get the list of tables or the beginning of EntityName?
I think there are arguments for either, but I would choose to start with EntityName.
Every table can become a lookup table.
Consider that a person is a lookup in an Invoice table.
So in my opinion, tables should just be named the (singular) entity name, e.g. Person, Invoice.
What you do want is a standard for the column names and constraints, such as
FK_Invoice_Person (in table invoice, link to person)
PersonID or Person_ID (column in table invoice, linking to entity Person)
At the end of the day, it is all up to personal preference (if you can get away with dictating it) or team standards.
updated
If you have lookups that pertain only to entities, like Invoice_Terms which is a lookup from a list of 4 scenarios, then you could name it as Invoice_LK_Terms which would make it appear by name grouped under Invoice. Another way is to have a single lookup table for simple single-value lookups, separated by the function (table+column) it is for, e.g.
Lookups
Table | Column | Value
There is only one type of table and I don't believe there is any good reason for calling some tables "lookup" tables. Use a naming convention that works equally for every table.
One area where table naming conventions can help is data migration between environments. We often have to move data in lookup tables (which constrain values which may appear in other tables) along with schema changes, as these allowed value lists change. Currently we don't name lookup tables differently, but we are considering it to prevent the migration guy asking "which tables are lookup tables again?" every time.

Modelling inheritance in a database

For a database assignment I have to model a system for a school. Part of the requirements is to model information for staff, students and parents.
In the UML class diagram I have modelled this as those three classes being subtypes of a person type. This is because they will all require information on, among other things, address data.
My question is: how do I model this in the database (mysql)?
Thoughts so far are as follows:
Create a monolithic person table that contains all the information for each type and will have lots of null values depending on what type is being stored. (I doubt this would go down well with the lecturer unless I argued the case very convincingly).
A person table with three foreign keys which reference the subtypes but two of which will be null - in fact I'm not even sure if that makes sense or is possible?
According to this wikipage about django it's possible to implement the primary key on the subtypes as follows:
"id" integer NOT NULL PRIMARY KEY REFERENCES "supertype" ("id")
Something else I've not thought of...
So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
Links to articles/blog posts or previous questions are more than welcome.
Thanks for your time!
UPDATE
Alright thanks for the answers everyone. I already had a separate address table so that's not an issue.
Cheers,
Adam
4 tables staff, students, parents and person for the generic stuff.
Staff, students and parents have forign keys that each refer back to Person (not the other way around).
Person has field that identifies what the subclass of this person is (i.e. staff, student or parent).
EDIT:
As pointed out by HLGM, addresses should exist in a seperate table, as any person may have multiple addresses. (However - I'm about to disagree with myself - you may wish to deliberately constrain addresses to one per person, limiting the choices for mailing lists etc).
Well I think all approaches are valid and any lecturer who marks down for shoving it in one table (unless the requirements are specific to say you shouldn't) is removing a viable strategy due to their own personal opinion.
I highly recommend that you check out the documentation on NHibernate as this provides different approaches for performing the above. Which I will now attempt to poorly parrot.
Your options:
1) One table with all the data that has a "delimiter" column. This column states what kind of person the person is. This is viable in simple scenarios and (seriously) high performance where the joins will hurt too much
2) Table per class which will lead to duplication of columns but will avoid joins again, so its simple and a lil faster (although only a lil and indexing mitigates this in most scenarios).
3) "Proper" inheritence. The normalised version. You are almost there but your key is in the wrong place IMO. Your Employee table should contain a PersonId so you can then do:
select employee.id, person.name from employee inner join person on employee.personId = person.personId
To get all the names of employees where name is only specified on the person table.
I would go for #3.
Your goal is to impress a lecturer, not a PM or customer. Academics tend to dislike nulls and might (subconciously) penalise you for using the other methods (which rely on nulls.)
And you don't necessarily need that django extension (PRIMARY KEY ... REFERENCES ...) You could use an ordinary FOREIGN KEY for that.
"So for those who have modelled inheritance in a database before; how did you do it? What method do you recommend and why?
"
Methods 1 and 3 are good. The differences are mostly in what your use cases are.
1) adaptability -- which is easier to change? Several separate tables with FK relations to the parent table.
2) performance -- which requires fewer joins? One single table.
Rats. No design accomplishes both.
Also, there's a third design in addition to your mono-table and FK-to-parent.
Three separate tables with some common columns (usually copy-and-paste of the superclass columns among all subclass tables). This is very flexible and easy to work with. But, it requires a union of the three tables to assemble an overall list.
OO databases go through the same stuff and come up with pretty much the same options.
If the point is to model subclasses in a database, you probably are already thinking along the lines of the solutions I've seen in real OO databases (leaving fields empty).
If not, you might think about creating a system that doesn't use inheritance in this way.
Inheritance should always be used quite sparingly, and this is probably a pretty bad case for it.
A good guideline is to never use inheritance unless you actually have code that does different things to the field of a "Parent" class than to the same field in a "Child" class. If business code in your class doesn't specifically refer to a field, that field absolutely shouldn't cause inheritance.
But again, if you are in school, that may not match what they are trying to teach...
The "correct" answer for the purposes of an assignment is probably #3 :
Person
PersonId Name Address1 Address2 City Country
Student
PersonId StudentId GPA Year ..
Staff
PersonId StaffId Salary ..
Parent
PersonId ParentId ParentType EmergencyContactNumber ..
Where PersonId is always the primary key, and also a foreign key in the last three tables.
I like this approach because it makes it easy to represent the same person having more than one role. A teacher could very well also be a parent, for example.
I suggest five tables
Person
Student
Staff
Parent
Address
WHy - because people can have multiple addesses and people can also have multiple roles and the information you want for staff is different than the information you need to store for parent or student.
Further you may want to store name as last_name, Middle_name, first_name, Name_suffix (like jr.) instead of as just name. Belive me you willwant to be able to search on last_name! Name is not unique, so you will need to make sure you have a unique surrogate primary key.
Please read up about normalization before trying to design a database. Here is a source to start with:
http://www.deeptraining.com/litwin/dbdesign/FundamentalsOfRelationalDatabaseDesign.aspx
Super type Person should be created like this:
CREATE TABLE Person(PersonID int primary key, Name varchar ... etc ...)
All Sub types should be created like this:
CREATE TABLE IF NOT EXISTS Staffs(StaffId INT NOT NULL ,
PRIMARY KEY (StaffId) ,
CONSTRAINT FK_StaffId FOREIGN KEY (StaffId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Students(StudentId INT NOT NULL ,
PRIMARY KEY (StudentId) ,
CONSTRAINT FK_StudentId FOREIGN KEY (StudentId) REFERENCES Person(PersonId)
)
CREATE TABLE IF NOT EXISTS Parents(PersonID INT NOT NULL ,
PRIMARY KEY (PersonID ) ,
CONSTRAINT FK_PersonID FOREIGN KEY (PersonID ) REFERENCES Person(PersonId)
)
Foreign key in subtypes staffs,students,parents adds two conditions:
Person row cannot be deleted unless corresponding subtype row will
not be deleted. For e.g. if there is one student entry in students
table referring to Person table, without deleting student entry
person entry cannot be deleted, which is very important. If Student
object is created then without deleting Student object we cannot
delete base Person object.
All base types have foreign key "not null" to make sure each base
type will have base type existing always. For e.g. If you create
Student object you must create Person object first.

Resources