database design: a 'code' table that get referenced by other entities

database design: a 'code' table that get referenced by other entities - database

I am building a database as a simple exercise, it could be hosted on any database server, so I am trying to keep things as much standard as possible. Basically what I would like to do is a 'code' table that get referenced by other entities. I explain:
xcode
id code
r role
p property
code
r admin
r staff
p title
....
then I would like to have some view like:
role (select * from code where xcode='r')
r admin
r staff
property (select * from code where xcode='p')
p title
then, suppose we have an entity
myentity
id - 1
role - admin (foreign key to role)
title - title (foreign key to property)
Obviously I cannot create foreign key to a view, but this is to tell the idea I have in mind. How can I reflect such behaviour using whenever possible, standard sql syntax, then as a second option, database additional features like trigger ecc... ?
Because if I tell that role and title in myentity are foreign key to 'code', instead of the views, nothing would stop me to insert a role in title field.

I have worked on systems with a single table for all codes and others with one table per code. I definitely prefer the latter approach.
The advantages of a table per code are:
Foreign keys. As you have already spotted it is not possible to enforce compliance to permitted values through foreign keys with a single table. Using check constraints is an alternative approach but it has a higher maintenance cost.
Performance. Code lookups are not normally a performance bottle neck, but it undoubtedly helps the optimizer to make sensible decisions about execution paths if it knows it is retrieving records from a table with four rows rather than four hundred.
Code groups. Sometimes we want to organise a code into sub-divisions, usually to make it easier to render complex lists of values. If we have a table per code we have more flexibility when it comes to structure.
In addition I notice that you want to be able to deploy "on any database server". In that case avoid triggers. Triggers are usually bad news in most scenarios, but they have product-specific syntax.

What you are trying to do is in most cases an anti pattern and design mistake. Just create the different tables instead of views.
There are some rare cases where this kind of design makes sense. In this kind include the xcode field in the primary key/ foreign key. So your entity will look like this:
myentity
id - 1
role_xcode
role - admin (foreign key to role)
title_xcode
title - title (foreign key to property)
You then can create check constraints to enforce role_xcode='r' and title_xcode='p'
(sorry I don't know if they are standard, they do exist in oracle and are so simple that I'd expect them on other rdbms's as well)

Related

Database theory: best way to have a "flags" table which could apply to many entities?

I'm building a data model for a new project I'm developing. Part of this data model involves entities having "flags".
A flag is simply a boolean value - either an entity has the flag or it does not have the flag. To that end I have a table simply called "flags" that has an ID, a string name, and a description. (An example of a flag might be "is active" or "should be displayed" or "belongs to group".)
So for example, any user in my users table could have none, one, or many flags. So I create a userFlags bridge table with user ID and flag ID. If the table contains a row for the given flag ID and user ID, that user has that flag.
Ok, so now I add another entity - say "section". Each section can also have flags. So I create a sectionFlags table to accommodate this.
Now I have another entity - "content", so again, "contentFlags".
And so on.
My final data model has basically two tables per entity, one to hold the entity and one for flags.
While this certainly works, it seems like there may be a better way to design my model, so I don't have to have so many bridge tables. One idea I had was a master "hasFlags" table with flag ID, item ID and item type. The item type could be an enumerated field only accepting values corresponding to known entities. The only problem there is that my foreign key for the entity will not work because each "item ID" could refer to a different entity. (I have actually used this technique in other data models, and while it certainly works, you lose referential integrity as well as things like cascade updates.)
Or, perhaps my data model is fine as-is and that's just the nature of the beast.
Any more-advanced experienced DB devs care to chime in?

The many-to-many relationships are one way to do it (and possibly faster than what I'm about to suggest because they can use integer key indexes).
The other way to do this is with a polymorphic relationship.
Your entity-to-flag table needs 2 columns as well as the foreign key link to the flag table;
other_key integer not null
other_type varchar(...) not null
And in those fields you store the foreign key of the relation in the integer and the type of the relation in the varchar. Full-on ORMs that support this sometimes store the class name of the foreign relation in the type column, to aid with object loading.
The downside here is that the integer can't be a true foreign key as it will contain duplicates from many tables. It also makes your querying a bit more interesting per-join than the many-to-many tables, but it does allow you to generalise your join in code.

Postgresql inheritance based database design

I'm developing a simple babysitter application that has 2 types of users: a 'Parent' and the 'Babysitter'. I'm using postgresql as my database but I'm having trouble working out my database design.
The 'Parent' and the 'Babysitter' entities have attributes that can be generalized, for example: username, password, email, ... Those attributes could be
placed into a parent entity called 'User'. They both also have their own attributes, for example: Babysitter -> age.
In terms of OOP things are very clear for me, just extend the user class and you are good to go but in DB design things are differently.
Before posting this question I roamed around the internet for a good week looking for insight into this 'issue'. I did find a lot of information but
it seemed to me that there was a lot a disagreement. Here are some of the posts I've read:
How do you effectively model inheritance in a database?: Table-Per-Type (TPT), Table-Per-Hierarchy (TPH) and Table-Per-Concrete (TPC) VS 'Forcing the RDb into a class-based requirements is simply incorrect.'
https://dba.stackexchange.com/questions/75792/multiple-user-types-db-design-advice:
Table: `users`; contains all similar fields as well as a `user_type_id` column (a foreign key on `id` in `user_types`
Table: `user_types`; contains an `id` and a `type` (Student, Instructor, etc.)
Table: `students`; contains fields only related to students as well as a `user_id` column (a foreign key of `id` on `users`)
Table: `instructors`; contains fields only related to instructors as well as a `user_id` column (a foreign key of `id` on `users`)
etc. for all `user_types`
https://dba.stackexchange.com/questions/36573/how-to-model-inheritance-of-two-tables-mysql/36577#36577
When to use inherited tables in PostgreSQL?: Inheritance in postgresql does not work as expected for me and a bunch of other users as the original poster points out.
I am really confused about which approach I should take. Class-table-inheritance (https://stackoverflow.com/tags/class-table-inheritance/info) seems like the most correct in
my OOP mindset but I would very much appreciate and updated DB minded opinion.

The way that I think of inheritance in the database world is "can only be one kind of." No other relational modeling technique works for that specific case; even with check constraints, with a strict relational model, you have the problem of putting the wrong "kind of" person into the wrong table. So, in your example, a user can be a parent or a babysitter, but not both. If a user can be more than one kind-of user, then inheritance is not the best tool to use.
The instructor/student relationship really only works well in the case where students cannot be instructors or vice-versa. If you have a TA, for example, it's better to model that using a strict relational design.
So, back to the parent-babysitter, your table design might look like this:
CREATE TABLE user (
id SERIAL,
full_name TEXT,
email TEXT,
phone_number TEXT
);
CREATE TABLE parent (
preferred_payment_method TEXT,
alternate_contact_info TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE babysitter (
age INT,
min_child_age INT,
preferred_payment_method TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE parent_babysitter (
parent_id INT REFERENCES parent(id),
babysitter_id INT REFERENCES babysitter(id),
PRIMARY KEY(parent_id, babysitter_id)
);
This model allows users to be "only one kind of" user - a parent or a babysitter. Notice how the primary key definitions are left to the child tables. In this model, you can have duplicated ID's between parent and babysitter, though this may not be a problem depending on how you write your code. (Note: Postgres is the only ORDBMS I know of with this restriction - Informix and Oracle, for example, have inherited keys on inherited tables)
Also see how we mixed the relational model in - we have a many-to-many relationship between parents and babysitters. That way we keep the entities separated, but we can still model a relationship without weird self-referencing keys.

All the options can be roughly represented by following cases:
base table + table for each class (class-table inheritance, Table-Per-Type, suggestions from the dba.stackexchange)
single table inheritance (Table-Per-Hierarchy) - just put everything into the single table
create independent tables for each class (Table-Per-Concrete)
I usually prefer option (1), because (2) and (3) are not completely correct in terms of DB design.
With (2) you will have unused columns for some rows (like "age" will be empty for Parent). And with (3) you may have duplicated data.
But you also need to think in terms of data access. With option (1) you will have the data spread over few tables, so to get Parent, you will need to use join operations to select data from both User and Parent tables.
I think that's the reason why options (2) and (3) exist - they are easier to use in terms of SQL queries (no joins are needed, you just select the data you need from one table).

Relational database: indirect reference to a "foreign key"

I have a data schema similar to the following:
USERS:
id
name
email
phone number
...
PHOTOS:
id
width
height
filepath
...
I have an auditing table for any changes to the system
LOGS:
id
acting_user
date
record_type (enum: "users", "photos", "...")
record_id
record_field
new_value
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table? And effectively, the record_type and record_id together are a foreign key to the record in the other table? Is this an anti-pattern? (Note: new_value, and all the thing we would be logging are the same data type, strings).

Is this an anti-pattern?
Yes. Any pattern that makes you enforce referential integrity manually1 is an anti-pattern.
Here is why using FOREIGN KEYs is so important and here is what to do in cases like yours.
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table?
There is no standard term that I know of, but I heard people calling it "generic" or "polymorphic" FKs.
1 As opposed to FOREIGN KEYs built-into the DBMS.

Actually, I think 'Anti-Pattern' is a pretty good name for this set up, but it can be a realistic way to go - especially in this example.
I'll add a similar example with a new table which records LIKES of users' photos, etc, and show why it's bad. Then I'll explain why it might not ne too bad for your LOGS example.
The LIKES table is:
Id
LikedByUserId
RecordType ("users", "photos", "...")
RecordId
This is pretty much the same as the LOGS table. The problem with this is that you cannot make RecordId a foreign key to the USERS table as well as to the PHOTOS table as well as any other tables. If User 1234 is being liked, you couldn't insert it unless there was a PHOTO with ID 1234 and so on. For this reason, all RDBMS's that I know of will not let a Foreign Key be defined with multiple Primary keys - after all, Primary means 'only one' amongst other things.
So you'ld have to create the LIKES table with no relational integrity. This may not be a bad thinbg sometimes, but in this case I'd think I'd want an important table such as LIKES to have valid entries.
To do LIKES properly, I would create the table as:
Id
LikedByUserId (allow null)
PhotoId (allow null)
OtherThingId (allow null)
...and create the appropriate foreign keys. This will actually make queries that read the data easier to read and maintain and probably more efficient too.
However, for a table like LOGS which probably isn't central to the functionality of my system and I'm only doing some ad-hoc querying from to check what's been happening, then I might not want to put in the extra effort and add the complexity that results in more efficient reading. I'm not sure I would actually skip it, though. It is an anti-pattern but depending on usage it might be OK.
To emphasise the point, I would only do this if the system never queried the table; if the only people who look at the data are admin's running ad-hoc queries against it then it might be OK.
Cheers -

Single column/primary key only table for referential integrity?

Maybe i'm going about this wrong but my working on a database design for one of my projects.
I have an entity with a classification column which groups up entities into convenient categories for the user. These classifications are predefined and unchangeable by the user (at least thats the current design).
I'm trying to decide if I should have a 'EntityClassification' table which contains simply an 'Id' column as the primary key with no other information in order to have an enforced relationship between the Entity:Classification -> EntityClassification:Id.
I don't plan to have a name/description column in EntityClassification since my current thought is that I'll need to support localization of these pre-defined names which will be done with static string table like resource files downloaded to the client based on their country/language. There really isn't any other data which is associated with this EntityClassfication that I would want and a table seems like it might be an overkill?
Is this common/recommend practice for this type of problem? We're using SQL Server 2008 and don't have an enum datatype for the database which would seem to be really what i'm trying to achieve.

You should have the table with name and description not only for end user display, but internal documentation so when the users say 'my query based on this classification doesn't work!' someone hired in the future will know which ID they're talking about.

Do you just want to ensure that the values in Entity:Classification are restricted to your pre-determined list? If so a check constraint might be what you need.
Such constraints aren't as flexible as foreign keys: to alter the checked values we have to drop and recreate the constraint, but then you say there are no plans to change the values so that shouldn't matter.

How to design DB table / schema with ease?

Is there a simple method to decide on what fields and indexes are needed for each table in an app you design?
For example, if it is a webapp that simply lets people create lists (any number of lists, and users can create "things to do" list or "shopping" list), and the user can assign other users to edit the list, and whether the list is viewable publicly or to only certain users, how can the tables be design so that it is very accurate and designed quickly? What about the indexes?
I did that in college and then revisited the question some time ago and have a method, but would like to find out if there are standard and good ways to do it out in the field.

Database design is hard ...
As with many things in life, it's a series of tradeoffs. The first thing you need to decide is what DBMS you will use, (MySQL, SQL Server, Oracle, PostgreSQL, one of the "Object-oriented" databases, etc.
Then you need to decide on normalization v. insane numbers of JOINs to get to your data. Questions like "how much logic will I implement in triggers, stored procedures, in app code, etc" need to be addressed.
There is no "Quick'n'Easy" way to design anything but the most trivial of databases.
'Course, that's just my experience. YMWV.

it is beyond the scope of this answer to fully explain database design
I generally break my design into three parts (part 1 and 2 happen up front, while 3 is usually near the project end)
1) create the tables based on relationships (parent/child/etc)
2) create fields based on content (parent has x atributes, etc)
3) create indexes last based on how you select data from your tables

Haven't heard of any formal approaches to this problem but there are rules of thumb. All nouns and business objects become tables, normalized of course. And I'd think the attributes sort of speak for themselves. I guess?
As for indexes, it just comes with working with the data. Any column that's joined off of deserves an index (maybe even clustered). It's very... depends. But there are patterns. But other than optimizing for joins, many indexes are directly related to how the data is used, and this isn't something that can be provided by rule of thumb. Like if you look up users by pk and elsewhere by last_name, last_name deserves an index.

I think the solution is a subjective one. When I have to design tables I look at the Java object that will represent that particular data model and go from there. You'll find a lot of frameworks (Django, CakePHP, RoR) have you develop the model and the frameworks will build the corresponding tables.
So I would suggest evaluating what functionality and data you need to store and develop your tables from that. Also look into whether the tool set you have at your disposal offers to generate the tables for you from the object structure.

I would go for the straightforward (almost) normalized design:
CREATE TABLE lists (
listid serial,
name varchar,
ownerid int references users(userid)
)
CREATE TABLE list_items (
listid int references lists(listid),
value varchar,
date datetime
)
CREATE TABLE permissions (
permissionid serial,
description varchar,
)
CREATE TABLE list_permissions (
listid int references lists(listid),
permissionid int references permissions(permissionid)
userid int references users(userid)
)
CREATE TABLE users (
userid serial,
name varchar
)
Which indexes to create would depend on what are the actual most used queries and how are they performing. For instance, if you query a lot on the lists and list_items (likely) you'd want an index on listid and on name, if you'll be searching by name.
Just some ideas. Hope they're helpful.

I'd try not to lock yourself in if you're still trying to see what works.
Just from your description, you'd want a table for your users' information, as well as:
tbl_lists:
ID_list (primary key)
UserID (foreign key to list owner)
ListName
tbl_listItems:
ID_listItem (primary key)
ListID (foreign key to list)
ItemDescription
tbl_permissions:
ID_permission (primary key)
ListID
UserID (foreign key to user you're granting permission to)
PermissionTypeID (what kind of permission)
tbl_permissionTypes:
ID_permissionType (primary key)
Description ("can view", "can edit", etc.)
The more flexible you can make things while you're designing, the better. You can optimize later.

If you want to keep things very simple and are not too concerned with normalizing. You could create one big table that stores the main object your webapp is based around, ex: lists, and have other smaller supporting tables link to the big table, ex: tbl_listType, tbl_permission, tbl_list_items).
Then when you write queries, you almost certainly include the main table and you can link in other supporting tables for more granular details.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight