Database best practices - database

I have a table which stores comments, the comment can either come from another user, or another profile which are separate entities in this app.
My original thinking was that the table would have both user_id and profile_id fields, so if a user submits a comment, it gives the user_id leaves the profile_id blank
is this right, wrong, is there a better way?

Whatever is the best solution depends IMHO on more than just the table, but also how this is used elsewhere in the application.
Assuming that the comments are all associated with some other object, lets say you extract all the comments from that object. In your proposed design, extracting all the comments require selecting from just one table, which is efficient. But that is extracting the comments without extracting the information about the poster of each comment. Maybe you don't want to show it, or maybe they are already cached in memory.
But what if you had to retrieve information about the poster while retrieving the comments? Then you have to join with two different tables, and now the resulting record set is getting polluted with a lot of NULL values (for a profile comment, all the user fields will be NULL). The code that has to parse this result set also could get more complex.
Personally, I would probably start with the fully normalized version, and then denormalize when I start seeing performance problems
There is also a completely different possible solution to the problem, but this depends on whether or not it makes sense in the domain. What if there are other places in the application where a user and a poster can be used interchangeably? What if a User is just a special kind of a Profile? Then I think that the solution should be solved generally in the user/profile tables. For example (some abbreviated pseudo-sql):
create table AbstractProfile (ID primary key, type ) -- type can be 'user' or 'profile'
create table User(ProfileID primary key references AbstractProfile , ...)
create table Profile(ProfileID primary key references AbstractProfile , ...)
Then any place in your application, where a user or a profile can be used interchangeably, you can reference the LoginID.

If the comments are general for several objects you could create a table for each object:
user_comments (user_id, comment_id)
profile_comments (profile_id, comment_id)
Then you do not have to have any empty columns in your comments table. It will also make it easy to add new comment-source-objects in the future without touching the comments table.

Another way to solve is to always denormalize (copy) the name of the commenter on the comment and also store a reference back to the commenter via a type and an id field. That way you have a unified comments table where on you can search, sort and trim quickly. The drawback is that there isn't any real FK relationship between a comment and it's owner.

In the past I have used a centralized comments table and had a field for the fk_table it is referencing.
eg:
comments(id,fk_id,fk_table,comment_text)
That way you can use UNION queries to concatenate the data from several sources.
SELECT c.comment_text FROM comment c JOIN user u ON u.id=c.fk_id WHERE c.fk_table="user"
UNION ALL
SELECT c.comment_text FROM comment c JOIN profile p ON p.id=c.fk_id WHERE c.fk_table="profile"
This ensures that you can expand the number of objects that have comments without creating redundant tables.

Here's another approach, which allows you to maintain referential integrity through foreign keys, manage centrally, and provide the highest performance using standard database tools such as indexes and if you really need, partitioning etc:
create table actor_master_table(
type char(1) not null, /* e.g. 'u' or 'p' for user / profile */
id varchar(20) not null, /* e.g. 'someuser' or 'someprofile' */
primary key(type, id)
);
create table user(
type char(1) not null,
id varchar(20) not null,
...
check (id = 'u'),
foreign key (type, id) references actor_master_table(type, id)
);
create table profile(
type char(1) not null,
id varchar(20) not null,
...
check (id = 'p'),
foreign key (type, id) references actor_master_table(type, id)
);
create table comment(
creator_type char(1) not null,
creator_id varchar(20) not null,
comment text not null,
foreign key(creator_type, creator_id) references actor_master_table(type, id)
);

Related

Friends within an application that can be both application users and non-users - how to design the data model?

I am writing an application in which application users can add friends with whom they can later share things inside the app. The thing is that they can add friends who are already using the application (easy case) but they can also add people who are not yet using the application but might start doing it one day (the tricky part). In the latter case they can simply add a record in the app and then share it via email but the record still needs to be stored in a database. An important thing to mention is that whenever a user adds a record for a friend a record is added for him/her as well (because they are sharing stuff I need to add information about user's part as well).
If friends could only be application users, then I would simply have a User table and a junction table defining the many-to-many relationship between users. Then, the record table could look somewhat like this:
CREATE TABLE IF NOT EXISTS Record (
record_id INT NOT NULL, -- primary key
record_col VARCHAR(45) NULL, -- some column describing the record
added_by_id VARCHAR(45) NOT NULL, -- references user_id from the User table
added_for_id VARCHAR(45) NOT NULL -- references user_id from the User table
);
However, since one can add friends who are not application users and can then add records for them I think I need an additional table to store information about friends:
CREATE TABLE IF NOT EXISTS Friend (
friend_id INT NOT NULL, -- primary key
friend_col VARCHAR(45) NULL, -- some column describing the friend
friend_of_user_id VARCHAR(45) NOT NULL, -- id of the user who added this friend
is_user BIT(1) NOT NULL, -- boolean indicating if this friend has already created an account
friend_user_id INT NULL -- if above is true, user_id of this friend would go here
);
But then how do I handle this in the Record table? I could have some records added for application users and some for non-users so for some of them the foreign key would reference user_id and for the rest friend_id. This doesn't seem right to me...
I could also try putting everyone (both application users and non-users) in a single table but then for the same physical person (let it be John Smith) I would have potentially multiple records added by different people who are friends with John and a record representing the actual John Smith being application user. This seems nasty.
What am I missing here? This doesn't seem like an unusual thing to do and yet I cannot figure out a proper solution.

Building comment system for different types of entities

I'm building a comment system in PostgreSQL where I can comment (as well as "liking" them) on different entities that I already have (such as products, articles, photos, and so on). For the moment, I came up with this:
(note: the foreign key between comment_board and product/article/photo is very loose here. ref_id is just storing the id, which is used in conjunction with the comment_board_type to determine which table it is)
Obviously, this doesn't seem like good data integrity. What can I do to give it better integrity? Also, I know every product/article/photo will need a comment_board. Could that mean I implement a comment_board_id to each product/article/photo entity such as this?:
I do recognize this SO solution, but it made me second-guess supertypes and the complexities of it: Database design - articles, blog posts, photos, stories
Any guidance is appreciated!
I ended up just pointing the comments directly to the product/photo/article fields. Here is what i came up with in total
CREATE TABLE comment (
id SERIAL PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT (now()),
updated_at TIMESTAMP WITH TIME ZONE,
account_id INT NOT NULL REFERENCES account(id),
text VARCHAR NOT NULL,
-- commentable sections
product_id INT REFERENCES product(id),
photo_id INT REFERENCES photo(id),
article_id INT REFERENCES article(id),
-- constraint to make sure this comment appears in only one place
CONSTRAINT comment_entity_check CHECK(
(product_id IS NOT NULL)::INT
+
(photo_id IS NOT NULL)::INT
+
(article_id IS NOT NULL)::INT
= 1
)
);
CREATE TABLE comment_likes (
id SERIAL PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT (now()),
updated_at TIMESTAMP WITH TIME ZONE,
account_id INT NOT NULL REFERENCES account(id),
comment_id INT NOT NULL REFERENCES comment(id),
-- comments can only be liked once by an account.
UNIQUE(account_id, comment_id)
);
Resulting in:
This makes it so that I have to do one less join to an intermediary table. Also, it lets me add a field and update the constraints easily.

Relating a child table to multiple parent tables

I am trying to figure out the best way to relate these tables together. Suppose I have the following tables:
tblPerson
tblGroup
tblResource
Each row in each of these tables can have multiple email addresses associated with them so I would want a separate table and relate it back.
Are there methods to have a single table (tblEmail) relate back to each of the tables. I thought of using a uniqueidentifier field in each of the parent tables and using that as a key in the email table. It would be guaranteed unique. I just wouldn't be able to create a FK in the email table to preserve integrity. That is manageable though.
Is there a fancy way to do this? I am creating these tables in SQL 2008 R2.
Thank you
Karl
While it may be tempting to try and use a single email table with a ParentType (Person/Group/Resource) and ParentID, this is dangerous and means you can't have the relationship defined in SQL (unless there's some feature I'm unaware of?).
If you want to have referential integrity in SQL you really need to create 3 tables, one for each parent table.
CREATE TABLE dbo.PersonEmail (
ID int IDENTITY PRIMARY KEY,
PersonID int,
EmailAddress varchar(500)
)
CREATE TABLE dbo.GroupEmail (
ID int IDENTITY PRIMARY KEY,
GroupID int,
EmailAddress varchar(500)
)
CREATE TABLE dbo.ResourceEmail (
ID int IDENTITY PRIMARY KEY,
ResourceID int,
EmailAddress varchar(500)
)
If you think you might extend your Email table to later include a DisplayName, and perhaps a BounceCount and others, create a table for Email and create many-to-many join tables to link them to Person/Group/Resource.
Be aware that edits might impact multiple links, you'll have to decide how you want to handle that.
This is a core part of SQL. In a proper relational design, you don't relate email addresess to perosns, groups, or resources -- you relate the persons, groups, and resources TO the email.
So, with an email table of:
CREATE TABLE dbo.tblEmail (
emailID int IDENTITY PRIMARY KEY,
email varchar(500)
)
If you only need one email per entity, you would just insert an emailID on each of the other fields that model something that may need an email.
ALTER TABLE dbo.tblPerson
ADD emailID int REFERENCES dbo.tblEmail(emailID);
ALTER TABLE dbo.tblGroup
ADD emailID int REFERENCES dbo.tblEmail(emailID);
ALTER TABLE dbo.tblResource
ADD emailID int REFERENCES dbo.tblEmail(emailID);
If you need multiple email addresses per entity, you need to insert an additional table, to interpolate the set of email addresses to a particular address. (I wouldn't do this unless you have a technical reason to handle the addresses individually, such as a bulk-email system where you want to avoid duplicates if someone uses the same email for their own use and their organization's use.)
CREATE TABLE dbo.tblEmail (
emailID int IDENTITY PRIMARY KEY
)
CREATE TABLE dbo.tblEmailAddress (
eAddrID IDENTITY PRIMARY KEY,
eAddr varchar(500)
)
CREATE TABLE dbo.tblEmailSet (
emailID int REFERENCES dbo.tblEmail(emailID),
eAddrID int REFERENCES dbo.tblEmailAddresses(eAddrID),
)
In order to, say, return a list of all emails to any Person, Group, or Resource named "Smith", you'd run the query below:
SELECT DISTINCT A.eAddr
FROM (
SELECT emailID FROM dbo.tblPerson WHERE Name = 'Smith'
UNION
SELECT emailID FROM dbo.tblGroup WHERE Name = 'Smith'
UNION
SELECT emailID FROM dbo.tblResource WHERE Name = 'Smith'
) AS PGR
INNER JOIN dbo.tblEmailSet AS S
ON PGR.emailID = S.emailID
INNER JOIN dbo.tblEmailAddress AS A
ON S.eAddrID = A.eAddrID
That ugly UNION, btw, is one of the reasons why you really don't want to do this unless you have a technical need to retrieve the data uniquely. While I've done this sort of many-to-many-to-many join on occasion, in this particular instance it's kind of a "code smell" and an indicator that instead of tracking "People", "Groups", and "Resources", you should be tracking "Contacts" with a "type" indicator to tell if a contact is a Person, a Group, or a Resource.
(Or maybe you never need to grab a bunch of email addresses, and just want a single table of emails you can check for whitelisting...)
So you want to have possibly multiple Emails per Person/Group/Resource, and you want all those emails in one table, am I correct ?
To do that, I would create a table dbo.EmailAddress such as this :
CREATE TABLE dbo.EmailAddress
(
EmailID BIGINT IDENTITY(1,1) NOT NULL PRIMARY KEY
,EmailAddress VARCHAR(250) NOT NULL
CONSTRAINT UK_EmailAddress UNIQUE(EmailAddress) --to ensure that you never insert twice the same email address
)
Then I would create the relation between your Person/Group/Resource and you emails using another table :
CREATE TABLE dbo.EmailAddressParentXRef
(
EmailID INT NOT NULL REFERENCES dbo.EmailAddress(EmailID)
,ParentTypeID INT NOT NULL
,PersonID INT NULL REFERENCES dbo.tblPerson(PersonID)
,GroupID INT NULL REFERENCES dbo.tblGroup(GroupID)
,ResourceID INT NULL REFERENCES dbo.tblResource(ResourceID)
CONSTRAINT UK_EmailID_ParentTypeID UNIQUE(EmailID,ParentTypeID) --to make sure you don't put the same EmailID for the same type of Parent (e.g. EmailID=12 twice for an Account)
)
There you would have referential integrity + some checks to avoid duplicates when you load data. Note that I didn't put a check to make sure you actually fill in either the PersonID, GroupID or ResourceID. This can be added in different ways, but if you understand the principle of this table, you shouldn't load any line without those references (or they will just be useless).
A lot more checks can be added based on this, to take care of every type of duplication/error you might create when loading the data, but you get the point.

How to solve the A/B key problem?

I have a table my_table with these fields: id_a, id_b.
So this table basically can reference either an row from table_a with id_a, or an row from table_b with id_b. If I reference a row from table_a, id_b is NULL. If I reference a row from table_b, id_a is NULL.
Currently I feel this is my only/best option I have, so in my table (which has a lot more other fields, btw) I will live with the fact that always one field is NULL.
If you care what this is for: If id_a is specified, I'm linking to a "data type" record set in my meta database, that specifies a particular data type. like varchar(40), for example. But if id_b is specified, I'm linking to a relationship definition recordset that specifies details about an relationship (wheather it's 1:1, 1:n, linking what, with which constraints, etc.). The fields are called a little bit better, of course ;) ...just try to simplify it to the problem.
Edit: If it matters: MySQL, latest version. But don't want to constrain my design to MySQL specific code, as much as possible.
Are there better solutions?
A and B are disjoint subtypes in your model.
This can be implemented like this:
refs (
type CHAR(1) NOT NULL, ref INT NOT NULL,
PRIMARY KEY (type, ref),
CHECK (type IN ('A', 'B'))
)
table_a (
type CHAR(1) NOT NULL, id INT NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (type, id) REFERENCES refs (type, id),
CHECK (type = 'A'),
…)
table_b (
type CHAR(1) NOT NULL, id INT NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (type, id) REFERENCES refs (type, id) ON DELETE CASCADE,
CHECK (type = 'B'),
…)
mytable (
type CHAR(1) NOT NULL, ref INT NOT NULL,
FOREIGN KEY (type, ref) REFERENCES refs (type, id) ON DELETE CASCADE,
CHECK (type IN ('A', 'B')),
…)
Table refs constains all instances of both A and B. It serves no other purpose except policing referential integrity, it won't even participate in the joins.
Note that MySQL accepts CHECK constraints but does not enforce them. You will need to watch your types.
You also should not delete the records from table_a and table_b directly: instead, delete the records from refs which will trigger ON DELETE CASCADE.
Create a parent "super-type" table for both A and B. Reference that table in my_table.
Yes, there are better solutions.
However, since you didn't describe what you're allowed to change, it's difficult to know which alternatives could be used.
Principally, this "exclusive-or" kind of key reference means that A and B are actually two subclasses of a common superclass. You have several ways to changing the A and B tables to unify them into a single table.
One of which is to simply merge the A and B table into a big table.
Another of which is to have a superclass table with the common features of A and B as well as a subtype flag that says which subtype it is. This still involves a join with the subclass table, but the join has an explicit discriminator, and can be done "lazily" by the application rather than in the SQL.
I see no problem with your solution. However, I think you should add CHECK constraints to make sure that exactly one of the fields is null.
you know, it's hard to tell if there are any better solutions since you've stripped the question of all vital information. with the tiny amount that's still there i'd say that most better solutions involve getting rid of my_table.

Bidirectional foreign key constraint

I'm thinking of designing a database schema similar to the following:
Person (
PersonID int primary key,
PrimaryAddressID int not null,
...
)
Address (
AddressID int primary key,
PersonID int not null,
...
)
Person.PrimaryAddressID and Address.PersonID would be foreign keys to the corresponding tables.
The obvious problem is that it's impossible to insert anything into either table. Is there any way to design a working schema that enforces every Person having a primary address?
"I believe this is impossible. You cannot create an Address Record until you know the ID of the person and you cannot insert the person record until you know an AddressId for the PrimaryAddressId field."
On the face of it, that claim seems SO appealing. However, it is quite propostrous.
This is a very common kind of problem that the SQL DBMS vendors have been trying to attack for perhaps decades already.
The key is that all constraint checking must be "deferred" until both inserts are done. That can be achieved under different forms. Database transactions may offer the possibility to do something like "SET deferred constraint checking ON", and you're done (were it not for the fact that in this particular example, you'd likely have to mess very hard with your design in order to be able to just DEFINE the two FK constraints, because one of them simply ISN'T a 'true' FK in the SQL sense !).
Trigger-based solutions as described here achieve essentially the same effect, but those are exposed to all the maintenance problems that exist with application-enforced integrity.
In their work, Chris Date & Hugh Darwen describe what is imo the true solution to the problem : multiple assignment. That is, essentially, the possibility to compose several distinct update statements and have the DBMS act upon it as if that were one single statement. Implementations of that concept do exist, but you won't find any that talks SQL.
This is a perfect example of many-to-many relationship. To resolve that you should have intermediate PERSON_ADDRESS table. In other words;
PERSON table
person_id (PK)
ADDRESS table
address_id (PK)
PERSON_ADDRESS
person_id (FK) <= PERSON
address_id (FK) <= ADDRESS
is_primary (BOOLEAN - Y/N)
This way you can assign multiple addresses to a PERSON and also reuse ADDRESS records in multiple PERSONs (for family members, employees of the same company etc.). Using is_primary field in PERSON_ADDRESS table, you can identify if that person_addrees combination is a primary address for a person.
We mark the primary address in our address table and then have triggers that enforces only record per person can have it (but one record must have it). If you change the primary address, it will update the old primary address as well as the new one. If you delete a primary address and other addresses exist, it will promote one of them (basesd ona series of rules) to the primary address. If the address is inserted and is the first address inserted, it will mark that one automatically as the primary address.
The second FK (PersonId from Address to Person) is too restrictive, IMHO. Are you suggesting that one address can only have a single person?
From your design, it seems that an address can apply to only one person, so just use the PersonID as the key to the address table, and drop the AddressID key field.
I know I'll probably be crucified or whatever, but here goes...
I've done it like this for my "particular very own unique and non-standard" business need ( =( God I'm starting to sound like SQL DDL even when I speak).
Here's an exaxmple:
CREATE TABLE IF NOT EXISTS PERSON(
ID INT,
CONSTRAINT PRIMARY KEY (ID),
ADDRESS_ID INT NOT NULL DEFAULT 1,
DESCRIPTION VARCHAR(255),
CONSTRAINT PERSON_UQ UNIQUE KEY (ADDRESS_ID, ...));
INSERT INTO PERSON(ID, DESCRIPTION)
VALUES (1, 'GOVERNMENT');
CREATE TABLE IF NOT EXISTS ADDRESS(
ID INT,
CONSTRAINT PRIMARY KEY (ID),
PERSON_ID INT NOT NULL DEFAULT 1,
DESCRIPTION VARCHAR(255),
CONSTRAINT ADDRESS_UQ UNIQUE KEY (PERSON_ID, ...),
CONSTRAINT ADDRESS_PERSON_FK FOREIGN KEY (PERSON_ID) REFERENCES PERSON(ID));
INSERT INTO ADDRESS(ID, DESCRIPTION)
VALUES (1, 'ABANDONED HOUSE AT THIS ADDRESS');
ALTER TABLE PERSON ADD CONSTRAINT PERSON_ADDRESS_FK FOREIGN KEY (ADDRESS_ID) REFERENCES ADDRESS(ID);
<...life goes on... whether you provide and address or not to the person and vice versa>
I defined one table, then the other table referencing the first and then altered the first to reflect the reference to the second (which didn't exist at the time of the first table's creation). It's not meant for a particular database; if I need it I just try it and if it works then I use it, if not then I try to avoid having that need in the design (I can't always control that, sometimes the design is handed to me as-is). if you have an address without a person then it belongs to the "government" person. If you have a "homeless person" then it gets the "abandoned house" address. I run a process to determine which houses have no users

Resources