Database design - should two projects share the same table? - database

Background:
Two projects (A & B) under design at the same time both needs a new table(called DocumentStore) to store document/file under postgres.
But business logic around the document storage are different between project A & B, this means relationship around DocumentStore are different between A & B.
Let's make this a bit more concrete, see example below:
The Document storage table structure looks the same without constraints/ foreign Keys:
Table DocumentStore
DocUUID //unique Id for this document, PK, FK to other table depends on project
fileName //file name
fileType //file type
FileContent //store file as blog
In project A, DocumentStore.DocUUID references Email.EmailUUID:
Note there is a one to many relationship between Email -> DocumentStore via the FK.
Table Email
EmailUUID //PK
subject
title
...
In project B, DocumentStore.DocUUID references Letter.LetterUUID:
Note there is a one to many relationship between Letter -> DocumentStore via the FK.
Table Letter
LetterUUID //PK
UserId
rightId
...
Email and Letter are completely different because of different of business logic.
My questions are:
Should I share this DocumentStore table between project A & B ?
If the answer to 1. is yes, then how? Through inheritance under postgres?
If the answer to 1. is no, should I create two table with the same structure but different table name and different foreign key ? One for project A and project B?

Only one of those fk constraints works per column of the same instance of the table. You would have to add one column for each fk. Or have two documentstore tables.
As you clarified, the same row in documentstore belongs to either a letter or an email, but only to a single one of those, while each letter / email can have multiple documents.
Hence my new advice: stick with table design you have now, but create two separate tables. There is no gain in having them in the same table. The fact that both tables share the same structure is no good reason to share the data.
You can have schema_a.documentstore and schema_b.documentstore inheriting from master.documentstore. That would mainly be useful if you have use cases dealing with all rows in both tables at once. Be sure to read the chapter about limitations of inheritance in Postgres. In particular, it won't allow you to define a single fk constraint:
A serious limitation of the inheritance feature is that indexes
(including unique constraints) and foreign key constraints only apply
to single tables, not to their inheritance children. This is true on
both the referencing and referenced sides of a foreign key constraint.
Related answers with code examples:
Find out which schema based on table values
Create a table of two types in PostgreSQL

same table is fine. then you have some choice - you may add a 'type' column if you need to differntiate inside the same table - which i think you don't really need or you build associative classes to the other things like this:
Doc_email
----------
DocUUID
EmailUUID
and
Doc_letter
-----------
DocUUID
LetterUUID

Related

Is it more common to use table_id or id in database design

I have a situation where I would like to know if it is more commonplace to use table_id or just id? (in my opinion, using table_ would cause slight confusion as to if it a foreign key). Which do people prefer, and is there really any difference between the two? Or should it just be left up to picking one and being consistent?
There are two main currents in terms of naming columns in tables:
Schema Namespace
This strategy is the traditional strategy that was conceived by teams documenting the "data dictionary" of a database in the 70s. The idea is that the name itself of the column tells you which table it belongs to across the whole schema or database. For example, CLIENT_NAME would represent the name of the client in the CLIENT table.
There are variations of this strategy where a limited number of letters are assigned as prefixes (specially for M:N relationship tables) because at the time column names were limited to 6 or 8 characters in many databases. For example, the date of purchase of a car by a client could take the form CLI_CAR_DATE, CLICAR_DATE, or even CLCADT.
Examples:
A primary key "id" column of the entity table "car" would be named CAR_ID.
A foreign key on a child table "document" that points to "car" would take the same form: CAR_ID. This allows the use of natural joins; however, it should be pointed out that there are compelling reasons to avoid natural joins at all cost, that are not discussed here.
Foreign keys on a table "transfer" that has multiple (two) relationships (seller and buyer) with "person" pollutes this strategy. They could be named: PERSON_BUYER_ID and PERSON_SELLER_ID because both cannot have the same name PERSON_ID; it doesn't allow natural joins anymore (good).
Table Namespace
In this strategy (that is newer) column names do not include the name of the entity they belong to, but only their property name. This strategy aligns more with object design, and produces shorter names (i.e. less typing). The name of the table must be indicated when mentioning a column. For example, you would need to say the column NAME on the table CLIENT.
Examples:
A primary key "id" column of the entity table "car" would be named ID.
A foreign key on a child table "document" that points to "car" would take the form: CAR_ID; this is the same solution as the previous strategy.
Foreign keys on a table "transfer" that has multiple (two) relationships (seller and buyer) with "person" could be named: BUYER_ID and SELLER_ID. They could follow the longer names as the previous strategy, but the goal here is typically to have shorter names so the app source code gets easier to write and to debug.
Summary
I personally like the second one, but there are teams who adhere to both strategies and there's no clear winner. My leaning towards the second one is [I think] the first one suffers from longer names (more typing), longer SQL (more errors), cryptic names (they don't play well with ORMs and app objects), and foreign keys that cannot follow the strategy well. In fact, virtually all the primary keys in my databases are named ID regardless of the specific entities.
But on the flip side, some teams value very highly the idea of knowing the table name of a column by just looking at it. And this is great for big databases (with 200-1000 relational fact tables) that can become quite complex, specially for new members of a team.
But above all, pick one and be consistent.

Cannot make relation between composite key of primary key and foreign key

I have two tables:
"Projects", that have three (3) field. One composite key of two (2) fields: Donor_Source & Project_Number and Project title
Please note that Donor_Source field is indexed as Yes(Duplicates OK) and Project_Number field is indexed as Yes(No Duplicates).
It has to be this way because a donor can support multiple projects.
Lastly there is also the PRF_Table, it has many fields but since I want to relate it to the Project table, I made two fields that are used as foreign keys of Projects table:
Please note that both fields of the foreign key are indexed as: NO.
As I was trying to relate the two tables, I managed to relate of project field from both tables but could not relate the donor source field of both tables:
As can be seen from the picture above, I managed to get many:1 relation between PRF_Table & Project, which is correct. PRF_Table can have many records on a specific project, but that project is listed only once in the Project table
The problem rises when trying to relate the Donor_Source field: I always get indeterminate relation (something that I want to avoid). I guess the problem might be because the Donor_Source field in the Project table, although indexed, it still can have duplicates and it of course has duplicates in the PRF_Table.
What should I do in order to get many:1 relation (PRF_Table:Projects)?
All fields in a compound key must be addressed to create referential integrity.
Thus, you must:
Create field Agrmnt_ID in PRF_Table and include that in the relation to the junction table.
Include field Donor_Source in PRF_Table in the relation to table Projects.
You are not required to create a field for Agrmnt_ID in your PRF_Table to have referential integrity. What you are doing so far between PRF_Table and PRF-PO_Junction_Table is fine.
Regarding the link between Projects and PRF_Table, it appears that your intentions are for each record in Projects to be able to relate to multiple records in PRF_Table. If so, then your solution is to change your Primary Key in Projects and consequently your relationship between the two tables.
In table Projects, remove your current composite Primary Key and create a single AutoNumber field (i.e. named ProjectID) as your Primary Key.
Now, in the Projects table, create your unique index on the Donor_Source and Project_Number fields (a composite, unique index), which will give you the same effect as your current composite Primary Key scenario, each Donor can be on multiple Projects, but the same Donor can't be on the same Project more than once.
Now, you will create the same field in PRF_Table that you created as your new Primary Key in Projects from step 1 (i.e. ProjectID)
Create your relationship between your new Primary Key in Projects and your new field in PRF_Table. This will allow every Project/Donor record in Projects to have multiple records in PRF_Table.
Composite Primary Keys are most useful in junction tables, like how you are using one with PRF-PO_Junction_Table. However, in any other link, you want try and have a single Primary Key field and use just a unique composite index to enforce uniqueness in two or more fields.

Data structure using master id

I have a database with tables A and B in a one-to-many relationship. So one entity in A can be assigned to multiple and differing entities in B. A and B each have their own specific fields, but there are also fields and workflows related to either A or B, which are basically the same data but related only to either A or B.
As an example, an entity in A can have multiple comments for differing reasons and so can entities in B. Since there can be multiple comments for a single record I have to have a related comment table outside of tables A and B. I didn't want to have two comment tables, one for A and a separate table for B, so I set up a MasterID table that is related to both A and B and has referential integrity enforced. This means that when I want to add a record in A or B, I have to make sure that a MasterID already exists in the MasterID table. There are other tables that have the same type of functionality, comments is just one example, but if I didn't use a MasterID I'd have to create multiple tables each for A and B.
So my question is, is this the correct way to do this? Is there another way? The front-end will be in Access so I'm running into a little bit of trouble making sure a MasterID is created right before creating a new record in A or B.
MasterID(MasterID)
TableA(TableAID, FK_MasterID)
TableB(TableBID, FK_MasterID, FK_TableAID)
Comments(CommentID, MasterID, Comment)
Thanks for any help.
From a pure data design standpoint, you are on the right track, but not quite. You can use an entity-subtyping approach in which A and B are subtypes of another entity (MasterID). It is this supertype entity which attracts comments. However, for this to be true subtyping, the PK of A and the PK of B would be the FK to MasterID.
The way you've designed your tables, they have two candidate keys. If you eliminate the redundant candidate keys, then you have a standard entity-subtyping pattern, which is a legitimate and commonly used design approach.
Based on my understanding for the problem, I think this is too complex for a little value. If I understand you correctly, you have a situation like the picture and you want to make the key for Comment unique.
Creating a fourth table could work but it adds unnecessary complexity.
What you could do instead is to make the key for the Comments table a compound key of the two columns one is a sequence number and the other is a character field indicating the parent table. So you get keys like (A,1), (A,2), (B,3), (A,4), (B,5) ...etc.
This way you don't need the master table, and you don't need FKs in Table A or B.

How to implement this data structure in SQL tables

I have a problem that can be summarized as follow:
Assume that I am implementing an employee database. For each person depends on his position, different fields should be filled. So for example if the employee is a software engineer, I have the following columns:
Name
Family
Language
Technology
CanDevelopWeb
And if the employee is a business manager I have the following columns:
Name
Family
FieldOfExpertise
MaximumContractValue
BonusRate
And if the employee is a salesperson then some other columns and so on.
How can I implement this in database schema?
One way that I thought is to have some related tables:
CoreTable:
Name
Family
Type
And if type is one then the employee is a software developer and hence the remaining information should be in table SoftwareDeveloper:
Language
Technology
CanDevelopWeb
For business Managers I have another table with columns:
FieldOfExpertise
MaximumContractValue
BonusRate
The problem with this structure is that I am not sure how to make relationship between tables, as one table has relationship with several tables on one column.
How to enforce relational integrity?
There are a few schools of thought here.
(1) store nullable columns in a single table and only populate the relevant ones (check constraints can enforce integrity here). Some people don't like this because they are afraid of NULLs.
(2) your multi-table design where each type gets its own table. Tougher to enforce with DRI but probably trivial with application or trigger logic.
The only problem with either of those, is as soon as you add a new property (like CanReadUpsideDown), you have to make schema changes to accommodate for that - in (1) you need to add a new column and a new constraint, in (2) you need to add a new table if that represents a new "type" of employee.
(3) EAV, where you have a single table that stores property name and value pairs. You have less control over data integrity here, but you can certainly constraint the property names to certain strings. I wrote about this here:
What is so bad about EAV, anyway?
You are describing one ("class per table") of the 3 possible strategies for implementing the category (aka. inheritance, generalization, subclass) hierarchy.
The correct "propagation" of PK from the parent to child tables is naturally enforced by straightforward foreign keys between them, but ensuring both presence and the exclusivity of the child rows is another matter. It can be done (as noted in the link above), but the added complexity is probably not worth it and I'd generally recommend handling it at the application level.
I would add a field called EmployeeId in the EmployeeTable
I'd get rid of Type
For BusinessManager table and SoftwareDeveloper for example, I'll add EmployeeId
From here, you can then proceed to create Foreign Keys from BusinessManager, SoftwareDeveloper table to Employee
To further expand on your one way with the core table is to create a surrogate key based off an identity column. This will create a unique employee id for each employee (this will help you distinguish between employees with the same name as well).
The foreign keys preserve your referential integrity. You wouldn't necessarily need EmployeeTypeId as someone else mentioned as you could filter on existence in the SoftwareDeveloper or BusinessManagers tables. The column would instead act as a cached data point for easier querying.
You have to fill in the types in the below sample code and rename the foreign keys.
create table EmployeeType(
EmployeeTypeId
, EmployeeTypeName
, constraint PK_EmployeeType primary key (EmployeeTypeId)
)
create table Employees(
EmployeeId int identity(1,1)
, Name
, Family
, EmployeeTypeId
, constraint PK_Employees primary key (EmployeeId)
, constraint FK_blahblah foreign key (EmployeeTypeId) references EmployeeType(EmployeeTypeId)
)
create table SoftwareDeveloper(
EmployeeId
, Language
, Technology
, CanDevelopWeb
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)
create table BusinessManagers(
EmployeeId
, FieldOfExpertise
, MaximumContractValue
, BonusRate
, constraint FK_blahblah foreign key (EmployeeId) references Employees(EmployeeId)
)
No existing SQL engine has solutions that make life easy on you in this situation.
Your problem is discussed at fairly large in "Practical Issues in Database Management", in the chapter on "entity subtyping". Commendable reading, not only for this particular chapter.
The proper solution, from a logical design perspective, would be similar to yours, but for the "type" column in the core table. You don't need that, since you can derive the 'type' from which non-core table the employee appears in.
What you need to look at is the business rules, aka data constraints, that will ensure the overall integrity (aka consistency) of the data (of course whether any of these actually apply is something your business users, not me, should tell you) :
Each named employee must have exactly one job, and thus some job detail somewhere. iow : (1) no named employees without any job detail whatsoever and (2) no named employees with >1 job detail.
(3) All job details must be for a named employee.
Of these, (3) is the only one you can implement declaratively if you are using an SQL engine. It's just a regular FK from the non-core tables to the core table.
(1) and (2) could be defined declaratively in standard SQL, using either CREATE ASSERTION or a CHECK CONSTRAINT involving references to other tables than the one the CHECK CONSTRAINT is defined on, but neither of those constructs are supported by any SQL engine I know.
One more thing about why [including] the 'type' column is a rather poor choice to make : it changes how constraint (3) must be formulated. For example, you can no longer say "all business managers must be named employees", but instead you'd have to say "all business managers are named employees whose type is <type here>". Iow, the "regular FK" to your core table has now become a reference to a VIEW on your core table, something you might want to declare as, say,
CREATE TABLE BUSMANS ... REFERENCES (SELECT ... FROM CORE WHERE TYPE='BM');
or
CREATE VIEW BM AS (SELECT ... FROM CORE WHERE TYPE='BM');
CREATE TABLE BUSMANS ... REFERENCES BM;
Once again something SQL doesn't allow you to do.
You can use all fields in the same table, but you'll need an extra table named Employee_Type (for example) and here you have to put Developer, Business Manager, ... of course with an unique ID. So your relation will be employee_type_id in Employee table.
Using PHP or ASP you can control what field you want to show depending the employee_type_id (or text) in a drop-down menu.
You are on the right track. You can set up PK/FK relationships from the general person table to each of the specialized tables. You should add a personID to all the tables to use for the relationship as you do not want to set up a relationship on name because it cannot be a PK as it is not unique. Also names change, they are a very poor choice for an FK relationship as a name change could cause many records to need to change. It is important to use separate tables rather than one because some of those things are in a one to many relationship. A Developer for instnce may have many differnt technologies and that sort of thing should NEVER be stored in a comma delimted list.
You could also set up trigger to enforce that records can only be added to a specialty table if the main record has a particular personType. However, be wary of doing this as you wil have peopl who change roles over time. Do you want to lose the history of wha the person knew when he was a developer when he gets promoted to a manager. Then if he decides to step back down to development (A frequent occurance) you would have to recreate his old record.

Database design - defining a basic many-to-one relationship

This is a basic database design question. I want a table (or multiple tables) defining relationships between customers. I want it so PrimaryCustomer can be linked to multiple SecondaryCustomers, and can have many SecondaryCustomers with the same relationship.
PrimaryCustomerID RelationshipID SecondaryCustomerID
1) If the primary key is {PrimaryCustomerID} then I can only have one linked customer of any kind.
2) If the primary key is {PrimaryCustomerID, RelationshipID}, then I can only have one linked customer for each relationship type.
3) If the primary key is {PrimaryCustomerID, RelationshipID, SecondaryCustomerID}, then I can have whatever I like, but having all columns as the primary key seems completely wrong.
What's the right way to set things up?
A third alternative might be for the key to be (PrimaryCustomerId, SecondaryCustomerId), which would make sense if only one type of relationship is permitted per pair of customers. What keys to implement should be defined by what dependencies you need to represent in the table so that the table accurately represents the reality you are modelling. There's nothing wrong in principle with compound keys or all-key tables.
Number 3 is the right way to go for this data model. Linking tables often have all the columns in a join as all they do is link to other tables.
If a customer can only be linked to one primary customer then you can use a simple recursive relationship in the customer table itself.
CustomerID as PK
PrimaryCustomerID as FK to CustomerID
Nothing wrong with No 3.
If you need to prevent reverse-relationship duplicates, you can use
ALTER TABLE CustomerRelationship
ADD CONSTRAINT chk_id CHECK (PrimaryCustomerId < SecondaryCustomerId);

Resources