How to design DB table / schema with ease?

How to design DB table / schema with ease? - database

Is there a simple method to decide on what fields and indexes are needed for each table in an app you design?
For example, if it is a webapp that simply lets people create lists (any number of lists, and users can create "things to do" list or "shopping" list), and the user can assign other users to edit the list, and whether the list is viewable publicly or to only certain users, how can the tables be design so that it is very accurate and designed quickly? What about the indexes?
I did that in college and then revisited the question some time ago and have a method, but would like to find out if there are standard and good ways to do it out in the field.

Database design is hard ...
As with many things in life, it's a series of tradeoffs. The first thing you need to decide is what DBMS you will use, (MySQL, SQL Server, Oracle, PostgreSQL, one of the "Object-oriented" databases, etc.
Then you need to decide on normalization v. insane numbers of JOINs to get to your data. Questions like "how much logic will I implement in triggers, stored procedures, in app code, etc" need to be addressed.
There is no "Quick'n'Easy" way to design anything but the most trivial of databases.
'Course, that's just my experience. YMWV.

it is beyond the scope of this answer to fully explain database design
I generally break my design into three parts (part 1 and 2 happen up front, while 3 is usually near the project end)
1) create the tables based on relationships (parent/child/etc)
2) create fields based on content (parent has x atributes, etc)
3) create indexes last based on how you select data from your tables

Haven't heard of any formal approaches to this problem but there are rules of thumb. All nouns and business objects become tables, normalized of course. And I'd think the attributes sort of speak for themselves. I guess?
As for indexes, it just comes with working with the data. Any column that's joined off of deserves an index (maybe even clustered). It's very... depends. But there are patterns. But other than optimizing for joins, many indexes are directly related to how the data is used, and this isn't something that can be provided by rule of thumb. Like if you look up users by pk and elsewhere by last_name, last_name deserves an index.

I think the solution is a subjective one. When I have to design tables I look at the Java object that will represent that particular data model and go from there. You'll find a lot of frameworks (Django, CakePHP, RoR) have you develop the model and the frameworks will build the corresponding tables.
So I would suggest evaluating what functionality and data you need to store and develop your tables from that. Also look into whether the tool set you have at your disposal offers to generate the tables for you from the object structure.

I would go for the straightforward (almost) normalized design:
CREATE TABLE lists (
listid serial,
name varchar,
ownerid int references users(userid)
)
CREATE TABLE list_items (
listid int references lists(listid),
value varchar,
date datetime
)
CREATE TABLE permissions (
permissionid serial,
description varchar,
)
CREATE TABLE list_permissions (
listid int references lists(listid),
permissionid int references permissions(permissionid)
userid int references users(userid)
)
CREATE TABLE users (
userid serial,
name varchar
)
Which indexes to create would depend on what are the actual most used queries and how are they performing. For instance, if you query a lot on the lists and list_items (likely) you'd want an index on listid and on name, if you'll be searching by name.
Just some ideas. Hope they're helpful.

I'd try not to lock yourself in if you're still trying to see what works.
Just from your description, you'd want a table for your users' information, as well as:
tbl_lists:
ID_list (primary key)
UserID (foreign key to list owner)
ListName
tbl_listItems:
ID_listItem (primary key)
ListID (foreign key to list)
ItemDescription
tbl_permissions:
ID_permission (primary key)
ListID
UserID (foreign key to user you're granting permission to)
PermissionTypeID (what kind of permission)
tbl_permissionTypes:
ID_permissionType (primary key)
Description ("can view", "can edit", etc.)
The more flexible you can make things while you're designing, the better. You can optimize later.

If you want to keep things very simple and are not too concerned with normalizing. You could create one big table that stores the main object your webapp is based around, ex: lists, and have other smaller supporting tables link to the big table, ex: tbl_listType, tbl_permission, tbl_list_items).
Then when you write queries, you almost certainly include the main table and you can link in other supporting tables for more granular details.

Related

How to model a 'history' table from a join table

I have a need to track some history for a table that contains ids from other tables:
I want to track the status of the company_device table such that I can make entries to know when the status of the relationship changed (when a device was assigned to a company, and when it was unassigned, etc). The company_device table would only contain current, existing relationships. So I'd like to do 'something' like this:
But this won't work, because it requires there to be a record in company_device for the FK to be satisfied in the company_device_history table. For example, if I
insert into company_device values (1,1);
insert into company_device_history values (1,1,'Assigned',now());
Then I can't ever remove the record from company_device because of the foreign key constraint. So I've currently settled on this:
so I'm not restricted by the foreign key.
My question is : is there a better way to model this? I could add the status and effective_date to the company_device table and query based on status or effective_date, but that doesn't seem to be a good solution to me. I'd like to know how others might approach this.

When looking exclusively at the problem (that is, when modeling the nature of the business problem at hand), the only thing you need is one single table COMPANY_DEVICE_ASSIGNMENT with four columns C_ID, D_ID, FROM and TO, telling you that "device D_ID was assigned to company C_ID from FROM to TO".
Systems do exist that allow you to work on this basis, however none of them speak SQL (an excellent book with an in-depth treatment of the subject matter of temporal data, I'd even call it the "canonical" work, is "Time and Relational Theory - Googling for it can't miss). If you do this in an SQL system, this "straightforward" approach is not going to get you very far. In that case, your practical options are limited by :
what temporal features are offered by the DBMS you want/can/must use
what features are supported by whatever modeling tool you want/can/must use to generate DDL.
As Neil's comment stated : the most recent version of the SQL standard had "temporal support" as its main novelty, and they are absolutely relevant to your problem, but the products actually offering these features are relatively few and far between.

Postgresql inheritance based database design

I'm developing a simple babysitter application that has 2 types of users: a 'Parent' and the 'Babysitter'. I'm using postgresql as my database but I'm having trouble working out my database design.
The 'Parent' and the 'Babysitter' entities have attributes that can be generalized, for example: username, password, email, ... Those attributes could be
placed into a parent entity called 'User'. They both also have their own attributes, for example: Babysitter -> age.
In terms of OOP things are very clear for me, just extend the user class and you are good to go but in DB design things are differently.
Before posting this question I roamed around the internet for a good week looking for insight into this 'issue'. I did find a lot of information but
it seemed to me that there was a lot a disagreement. Here are some of the posts I've read:
How do you effectively model inheritance in a database?: Table-Per-Type (TPT), Table-Per-Hierarchy (TPH) and Table-Per-Concrete (TPC) VS 'Forcing the RDb into a class-based requirements is simply incorrect.'
https://dba.stackexchange.com/questions/75792/multiple-user-types-db-design-advice:
Table: `users`; contains all similar fields as well as a `user_type_id` column (a foreign key on `id` in `user_types`
Table: `user_types`; contains an `id` and a `type` (Student, Instructor, etc.)
Table: `students`; contains fields only related to students as well as a `user_id` column (a foreign key of `id` on `users`)
Table: `instructors`; contains fields only related to instructors as well as a `user_id` column (a foreign key of `id` on `users`)
etc. for all `user_types`
https://dba.stackexchange.com/questions/36573/how-to-model-inheritance-of-two-tables-mysql/36577#36577
When to use inherited tables in PostgreSQL?: Inheritance in postgresql does not work as expected for me and a bunch of other users as the original poster points out.
I am really confused about which approach I should take. Class-table-inheritance (https://stackoverflow.com/tags/class-table-inheritance/info) seems like the most correct in
my OOP mindset but I would very much appreciate and updated DB minded opinion.

The way that I think of inheritance in the database world is "can only be one kind of." No other relational modeling technique works for that specific case; even with check constraints, with a strict relational model, you have the problem of putting the wrong "kind of" person into the wrong table. So, in your example, a user can be a parent or a babysitter, but not both. If a user can be more than one kind-of user, then inheritance is not the best tool to use.
The instructor/student relationship really only works well in the case where students cannot be instructors or vice-versa. If you have a TA, for example, it's better to model that using a strict relational design.
So, back to the parent-babysitter, your table design might look like this:
CREATE TABLE user (
id SERIAL,
full_name TEXT,
email TEXT,
phone_number TEXT
);
CREATE TABLE parent (
preferred_payment_method TEXT,
alternate_contact_info TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE babysitter (
age INT,
min_child_age INT,
preferred_payment_method TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE parent_babysitter (
parent_id INT REFERENCES parent(id),
babysitter_id INT REFERENCES babysitter(id),
PRIMARY KEY(parent_id, babysitter_id)
);
This model allows users to be "only one kind of" user - a parent or a babysitter. Notice how the primary key definitions are left to the child tables. In this model, you can have duplicated ID's between parent and babysitter, though this may not be a problem depending on how you write your code. (Note: Postgres is the only ORDBMS I know of with this restriction - Informix and Oracle, for example, have inherited keys on inherited tables)
Also see how we mixed the relational model in - we have a many-to-many relationship between parents and babysitters. That way we keep the entities separated, but we can still model a relationship without weird self-referencing keys.

All the options can be roughly represented by following cases:
base table + table for each class (class-table inheritance, Table-Per-Type, suggestions from the dba.stackexchange)
single table inheritance (Table-Per-Hierarchy) - just put everything into the single table
create independent tables for each class (Table-Per-Concrete)
I usually prefer option (1), because (2) and (3) are not completely correct in terms of DB design.
With (2) you will have unused columns for some rows (like "age" will be empty for Parent). And with (3) you may have duplicated data.
But you also need to think in terms of data access. With option (1) you will have the data spread over few tables, so to get Parent, you will need to use join operations to select data from both User and Parent tables.
I think that's the reason why options (2) and (3) exist - they are easier to use in terms of SQL queries (no joins are needed, you just select the data you need from one table).

Linking an address table to multiple other tables

I have been asked to add a new address book table to our database (SQL Server 2012).
To simplify the related part of the database, there are three tables each linked to each other in a one to many fashion: Company (has many) Products (has many) Projects and the idea is that one or many addresses will be able to exist at any one of these levels. The thinking is that in the front-end system, a user will be able to view and select specific addresses for the project they specify and more generic addresses relating to its parent product and company.
The issue now if how best to model this in the database.
I have thought of two possible ideas so far so wonder if anyone has had a similar type of relationship to model themselves and how they implemented it?
Idea one:
The new address table will additionally contain three fields: companyID, productID and projectID. These fields will be related to the relevant tables and be nullable to represent company and product level addresses. e.g. companyID 2, productID 1, projectID NULL is a product level address.
My issue with this is that I am storing the relationship information in the table so if a project is ever changed to be related to a different product, the data in this table will be incorrect. I could potentially NULL all but the level I am interested in but this will make getting parent addresses a little harder to get
Idea two:
On the address table have a typeID and a genericID. genericID could contain the IDs from the Company, Product and Project tables with the typeID determining which table it came from. I am a little stuck how to set up the necessary constraints to do this though and wonder if this is going to get tricky to deal with in the future
Many thanks,

I will suggest using Idea one and preventing Idea two.
Second Idea is called Polymorphic Association anti pattern
Objective: Reference Multiple Parents
Resulting side effect: Using dual-purpose foreign key will violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
The simplification of the relationship could be obtained in two ways:
Having multiple null-able forging keys (idea number 1): That will be
simple and applicable if the tables(product, project,...) that using
the relation are limited. (think about when they grow up to more)
Another more generic solution will be using inheritance. Defining a
new entity as the base table for (product, project,...) to satisfy
Addressable. May naming it organization-unit be more rational. Primary key of this organization_unit table will be the primary key of (product, project,...). Other collections like Address, Image, Contract ... tables will have a relation to this base table.

It sounds like you could use Junction tables http://en.wikipedia.org/wiki/Junction_table.
They will give you the flexibility you need to maintain your foreign key restraints, as well as share addresses between levels or entities if that is desired.
One for Company_Address, Product_Address, and Project_Address

Relational database: indirect reference to a "foreign key"

I have a data schema similar to the following:
USERS:
id
name
email
phone number
...
PHOTOS:
id
width
height
filepath
...
I have an auditing table for any changes to the system
LOGS:
id
acting_user
date
record_type (enum: "users", "photos", "...")
record_id
record_field
new_value
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table? And effectively, the record_type and record_id together are a foreign key to the record in the other table? Is this an anti-pattern? (Note: new_value, and all the thing we would be logging are the same data type, strings).

Is this an anti-pattern?
Yes. Any pattern that makes you enforce referential integrity manually1 is an anti-pattern.
Here is why using FOREIGN KEYs is so important and here is what to do in cases like yours.
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table?
There is no standard term that I know of, but I heard people calling it "generic" or "polymorphic" FKs.
1 As opposed to FOREIGN KEYs built-into the DBMS.

Actually, I think 'Anti-Pattern' is a pretty good name for this set up, but it can be a realistic way to go - especially in this example.
I'll add a similar example with a new table which records LIKES of users' photos, etc, and show why it's bad. Then I'll explain why it might not ne too bad for your LOGS example.
The LIKES table is:
Id
LikedByUserId
RecordType ("users", "photos", "...")
RecordId
This is pretty much the same as the LOGS table. The problem with this is that you cannot make RecordId a foreign key to the USERS table as well as to the PHOTOS table as well as any other tables. If User 1234 is being liked, you couldn't insert it unless there was a PHOTO with ID 1234 and so on. For this reason, all RDBMS's that I know of will not let a Foreign Key be defined with multiple Primary keys - after all, Primary means 'only one' amongst other things.
So you'ld have to create the LIKES table with no relational integrity. This may not be a bad thinbg sometimes, but in this case I'd think I'd want an important table such as LIKES to have valid entries.
To do LIKES properly, I would create the table as:
Id
LikedByUserId (allow null)
PhotoId (allow null)
OtherThingId (allow null)
...and create the appropriate foreign keys. This will actually make queries that read the data easier to read and maintain and probably more efficient too.
However, for a table like LOGS which probably isn't central to the functionality of my system and I'm only doing some ad-hoc querying from to check what's been happening, then I might not want to put in the extra effort and add the complexity that results in more efficient reading. I'm not sure I would actually skip it, though. It is an anti-pattern but depending on usage it might be OK.
To emphasise the point, I would only do this if the system never queried the table; if the only people who look at the data are admin's running ad-hoc queries against it then it might be OK.
Cheers -

database design: a 'code' table that get referenced by other entities

I am building a database as a simple exercise, it could be hosted on any database server, so I am trying to keep things as much standard as possible. Basically what I would like to do is a 'code' table that get referenced by other entities. I explain:
xcode
id code
r role
p property
code
r admin
r staff
p title
....
then I would like to have some view like:
role (select * from code where xcode='r')
r admin
r staff
property (select * from code where xcode='p')
p title
then, suppose we have an entity
myentity
id - 1
role - admin (foreign key to role)
title - title (foreign key to property)
Obviously I cannot create foreign key to a view, but this is to tell the idea I have in mind. How can I reflect such behaviour using whenever possible, standard sql syntax, then as a second option, database additional features like trigger ecc... ?
Because if I tell that role and title in myentity are foreign key to 'code', instead of the views, nothing would stop me to insert a role in title field.

I have worked on systems with a single table for all codes and others with one table per code. I definitely prefer the latter approach.
The advantages of a table per code are:
Foreign keys. As you have already spotted it is not possible to enforce compliance to permitted values through foreign keys with a single table. Using check constraints is an alternative approach but it has a higher maintenance cost.
Performance. Code lookups are not normally a performance bottle neck, but it undoubtedly helps the optimizer to make sensible decisions about execution paths if it knows it is retrieving records from a table with four rows rather than four hundred.
Code groups. Sometimes we want to organise a code into sub-divisions, usually to make it easier to render complex lists of values. If we have a table per code we have more flexibility when it comes to structure.
In addition I notice that you want to be able to deploy "on any database server". In that case avoid triggers. Triggers are usually bad news in most scenarios, but they have product-specific syntax.

What you are trying to do is in most cases an anti pattern and design mistake. Just create the different tables instead of views.
There are some rare cases where this kind of design makes sense. In this kind include the xcode field in the primary key/ foreign key. So your entity will look like this:
myentity
id - 1
role_xcode
role - admin (foreign key to role)
title_xcode
title - title (foreign key to property)
You then can create check constraints to enforce role_xcode='r' and title_xcode='p'
(sorry I don't know if they are standard, they do exist in oracle and are so simple that I'd expect them on other rdbms's as well)