Modeling a hierarchical data structure - database

20120304 - Streamlined question
Suppose we have entities R, D and E and this relational cardinalities
R < n:m > D
D < 1:n > E
The mapping of this specification is straight forward, but we have another requirement:
R < n:m > E
Side condition: E1 might only get 'assigned' to an R1, if E1 is related to some D1 and this D1 is related to the R1.
Unfortunately, even if E2 is related to a D2, which is related to an R2 - E2 might not be related to R2.
I'm in search of a relational DB model.
A model, which doesn't require multiple updates if a D gets detached from an Ra and reattached to another Rb. In this case, all Es of D need to get detached from Ra and attached to Rb too.
20120305 - Workaround?
A friend propose to create an entity DxR which links its D and its R by means of a tuple (D,R). Then create a relation
DxR < n:m > E
Hm...
20120302 - Original question
System is composed of top level zones (Z). A zone may have several regions (R).
So called departments (D) may be assigned to regions. One department may get assigned to more than one region, only if each region belongs to a different zone.
Finally, employees (E) belong to one and only one department.
Employees may get assigned to a region only if, employee's department belongs to the region.
Important: An employee need not belong to all regions its department belongs to.
Assume, that in the following graphics E1 belongs to D1. E1 should also belong to R1, but not belong to R2 - although D1 belongs to R1 and R2:
- Z Z
- __|___ ___|___
- R1 R R2 R
- \_________/
- D1
Q: Please propose the relations DB's table structure which models the above specification?

This question is very specific in one sense and some people might argue that it is too localized. There is, however, one more generally applicable idea that might be useful to other people in the future, so it isn't necessarily true that the question is too specific.
The really interesting part of these business rules is this one: (my emphasis added)
One department may get assigned to more than one region, only if each
region belongs to a different zone.
Here is a schema that captures almost all of the stated business rules declaratively without having to resort to any triggers.
create table ZONE
( ID int not null
, NAME varchar(50) not null
, constraint PK_ZONE primary key clustered (ID)
)
create table REGION
( ZONE_ID int not null
, REGION_ID int not null
, NAME varchar(50) not null
, constraint PK_REGION primary key clustered (ZONE_ID, REGION_ID)
, conttraint FK_REGION__ZONE foreign key (ZONE_ID)
references ZONE (ID)
)
create table DEPARTMENT
( ID int not null
, NAME varchar(50) not null
, constraint PK_DEPARTMENT primary key clustered (ID)
)
create table EMPLOYEE
( ID int not null
, NAME varchar(50) not null
, DEPT_ID int not null
, constraint PK_EMPLOYEE primary key clustered (ID)
, constraint FK_EMPLOYEE__DEPARTMENT foreign key (DEPT_ID)
references DEPARTMENT (ID)
)
The above tables are pretty obvious. However, there is one particular quirk: The REGION table has a compound primary key that includes the FK to ZONE. This is useful for propagating the constraint about departments having to be distinct within a zone.
Assigninging departments to regions requires an intersection table:
create table DEPT_ASGT -- Department Assignment
( REGION_ID int not null
, DEPT_ID int not null
, ZONE_ID int not null
, constraint PK_DEPT_ASGT (REGION_ID, DEPT_ID)
, constraint FK_DEPT_ASGT__REGION foreign key (ZONE_ID, REGION_ID)
references REGION (ZONE_ID, ID)
, constraint FK_DEPT_ASGT__DEPARTMENT foreign key (DEPT_ID)
references DEPARTMENT (ID)
, constraint UN_DEPT_ASGT__ZONES unique nonclustered (ZONE_ID, DEPT_ID)
)
This intersection table is pretty normal insofar as it has a foreign key to each of the tables that it links. What is special about this intersection table is the unique constraint. This is what enforces the rule that a department can't be in two different regions within the same zone.
Lastly, we need to map employees into departments and into regions. This requires another intersection table:
create table EMP_ASGT -- Employee Assignment
( REGION_ID int not null
, DEPT_ID int not null
, EMPLOYEE_ID int not null
, constraint PK_EMP_ASGT (REGION_ID, DEPT_ID, EMPLOYEE_ID)
, constraint FK_EMP_ASGT__DEPT_ASGT (REGION_ID, DEPT_ID)
references DEPT_ASGT (REGION_ID, DEPT_ID)
, constraint FK_EMP_ASGT__EMPLOYEE (EMPLOYEE_ID) refernces EMPLOYEE (ID)
)
You will note that the EMPLOYEE table has a foreign key to DEPARTMENT - That enforces the rule that each employee can belong to only one department. The EMP_ASGT table adds the details about which regions the employee participates in. Since an employee may not be involved in every region that his or her department is assigned to, the EMP_ASGT table connects employees to just those regions where they have some involvement.
Here is the one place where a trigger or some other procedural logic is needed. You need to make sure that EMPLOYEE.department_id stays consistent with the records in EMP_ASGT. You could try to push this into the declarative referential integrity by making the PK of EMPLOYEE a compound of ID and DEPT_ID, but that would force you to decide whether you want to violate 3NF or make your employee department changes a procedurally ugly mess. At the end of the day, a little trigger to make sure that EMP_ASGT doesn't disagree with EMPLOYEE.DEPT_ID would be much less trouble.

Related

SQL Server choosing foreign key from another foreign key or unique key

In my scenario I have a table tblCity that has two columns as foreign keys CompanyRef and BranchRef, also they together are unique.
And I'd add one unique key to use as primary key [ID]
And in other table called tblCustomer I need to use tblCity as foreign key.
My problem is that I really need that ID column or I should use two foreign keys as primary key? In second case I must use three column as foreign key (CompanyRef, BranchRef, CityRef) in tblCustomer or what?
Which one of these methods is right for my problem?
So, just to make things clear a little bit in your question (I hope I got it right):
tblCity
CityId INT -- is part of the separate PK option
CompanyRef INT, FK -> tblCompany
BranchRef INT, FK -> tblBranch
tblCustomer
CustomerId INT -- not interesting here
CityRef INT FK -> tblCity -- is part of the separate PK option
CompanyRef INT -- part of the alternative
BranchRef INT -- part of the alternative
I can't tell which one is best performance-wise (that's more a DBA question), but from a developer perspective, I would advice for having a single column PK for City:
City sounds like a quite generic concept. It might be needed in the future, so dragging two columns in each other table referencing it, means that each JOIN will be on those two columns.
The final solution could look like this:
tblCity
CityId INT PRIMARY KEY IDENTITY(1, 1),
CompanyRef INT, FK -> tblCompany
BranchRef INT, FK -> tblBranch
UNIQUE (CompanyRef, BranchRef) -- acts as a constraint, but also an index
tblCustomer
CustomerId INT
CityRef INT FK -> tblCity
Side note: Hungarian notation seems quite discouraged these days - see this very popular question and its answers.
Also, I would advice to keep the same column name for the same thing. E.g.
CompanyRef -> CompanyId (or whatever the PK is named)
BranchRef -> BranchId
you need to creat relationship
base of what type of relationship you need to use
primary key and foreign key = one to money
primary key and primary key = one to one
foreign key and foreign key = many to many

about indexes created on primary keys referenced as foreign keys in a different table (SQLite3)

Scenario for my database design is as follows: people visit matchmakers who network with each other and propose matches. For example, person A visits matchmaker X, and person B visits matchmaker Y, where A not equals B and no constraint on X, Y i.e. they can be the same or different.
create table matchmaker ( id TEXT primary key, address TEXT );
create table people ( id TEXT primary key, name TEXT, gender TEXT, matchmaker_id TEXT,
foreign key(matchmaker_id) references matchmaker(id));
create table married_couples ( id1 TEXT, id2 TEXT,
foreign key (id1) references people(id),
foreign key (id2) reference people(id));
Then, for faster database access:
create index matchmaker_index on matchmaker(id);
create index people_index on people(id);
My question is based on the following query to generate tuples of matchmaker pairs with people they've paired.
select a.id, b.id, e.id1, e.id2
from matchmaker as a, matchmaker as b,
people as c, people as d,
married_couples as e
where e.id1 = c.id and c.id = a.id and
e.id2 = d.id and d.id = b.id;
For the query above, will the two matchmaker_index and people_index suffice or,
is there a need for two more (other) indexes as below?
create index matchmaker_people_index on people(id, matchmaker_id);
create index married_couples_index on married_couples(id1, id2);
additional info:
1) matchmaker has 20074 unique entries;
2) people has 20494819 unique entries;
3) married_couples ?? (i don't have this information yet, but it's going to be big)
Also, it's possible that married_couples will have duplicate entries. So,
after creating the relevant indexes, will run query to delete duplicates as below:
delete from married_couples
where rowid not in ( select min(rowid)
from married_couples
group by id1, id2);
SQLite generates indexes automatically for columns declared primary key or unique. FAQ, see question 7. So these two indexes
create index matchmaker_index on matchmaker(id);
create index people_index on people(id);
are duplicates. Drop them.
This index
create index matchmaker_people_index on people(id, matchmaker_id);
might help.
Also, it's possible that married_couples will have duplicate entries.
Remove that possibility. Add a primary key and a CHECK constraint.
create table married_couples (
id1 TEXT, id2 TEXT,
foreign key (id1) references people(id),
foreign key (id2) references people(id),
primary key (id1, id2),
check (id1 < id2)
);
The primary key provides an index, too.
When you're talking about marriages, a marriage between Mike and Mindy is a duplicate of a marriage between Mindy and Mike. The CHECK constraint prevents that kind of subtle duplication. It also prevents people from marrying themselves.
I'm not sure what you might want to do about the woman who married the Eiffel Tower.
You can simplify the original query quite a lot. Learning ANSI joins is a good idea.
select id1, id2, c.matchmaker_id m_id1, d.matchmaker_id m_id2
from married_couples a
inner join people c
on c.id = a.id1
inner join people d
on d.id = a.id1;

Can a foreign key of a table be a part of the composite primary key of the same table?

Suppose I have two tables: A (with columns: a,b,c,d) and B (with columns: x,y,z). Now, (a,b) together make the primary key for table A and x is the primary key of table B. Is it possible to make b a foreign key of table A that refers x from table B?
Please reply ASAP!
Thanks in advance! :-)
Yes, there is no issue with that. A classic example (using MySQL for demonstration purposes) is a database table holding a number of companies and another holding employees which can work for any one of those companies:
create table companies (
id int primary key,
name varchar(20));
create table employees (
id int,
c_id varchar(20) references companies(id),
name varchar(20),
primary key (id, c_id));
insert into companies (id, name) values (1, 'ABC');
insert into companies (id, name) values (2, 'DEF');
insert into companies (id, name) values (3, 'HIJ');
insert into employees (id, c_id, name) values (101, 1, "Allan");
insert into employees (id, c_id, name) values (102, 1, "Bobby");
insert into employees (id, c_id, name) values (101, 2, "Carol");
insert into employees (id, c_id, name) values (101, 3, "David");
Note that the primary key for employees is a composite key made up of the employee ID and company ID. Note also that the company ID is a foreign key constraint on the primary key of companies, the exact situation (functionally) that you asked about.
The query showing who works for what company shows this in action:
select c.id, c.name, e.id, e.name
from companies c, employees e
where c.id = e.c_id
order by c.id, e.id
c.id c.name e.id e.name
---- ------ ---- ------
1 ABC 101 Allan
1 ABC 102 Bobby
2 DEF 101 Carol
3 HIJ 101 David
Can a column in a composite primary key also be a foreign key referencing a primary key of another table? Of course it can. The important question is, when is this a good idea?
The most common scenario is probably the intersection or junction table. Customers can have more than one Address (Shipping, Billing, etc) and Addresses can have more than one Customer using them. So the table CUSTOMER_ADDRESSES has a primary key which references both CUSTOMER and ADDRESS primary key (and for bonus points the ADDRESS_TYPE reference data table too).
My examples use Oracle 12c syntax:
create table customer_address
( customer_id number(38,0) not null
, address_id number(38,0) not null
, address_type_code varchar2(3) not null
, constraint customer_address_pk primary key
(customer_id, address_id, address_type_code)
, constraint customer_address_customer_fk foreign key
(customer_id) references customer(customer_id)
, constraint customer_address_address_fk foreign key
(address_id) references address(address_id)
, constraint customer_address_type_fk foreign key
(address_type_code) references address_type(address_type_code)
);
The second scenario occurs when the primary key of the child table is comprises the parent key and an identifier (usually a number) which is only unique within the parent key. For instance, an Order has an Order Header and some Order Lines. The Order is identified by the Order Header ID and its lines are identified by a monotonically incrementing number. The ORDER_LINE table may look like this:
create table order_line
( order_header_id number(38,0) not null
, order_line_no number(38,0) not null
, product_id number(38,0) not null
, qty number(38,0) not null
, constraint order_line_pk primary key
(order_header_id, order_line_no)
, constraint order_line_header_fk foreign key
(order_header_id) references order_header(order_header_id)
, constraint order_line_product_fk foreign key
(product_id) references product(product_id)
);
Note that we could model ORDER_LINE as another intersection table, with a primary key of (order_header_id, product_id) and relegate order_line_no to the status of ordinary attribute: it depends on the business rules we must represent.
This second scenario is rarer than you might think: composite primary keys are pretty rare in real life. For instance, I think the model presented in that other answer is weak. The chances are we will need to use Employee as a foreign key for a number of relationships (e.g. Manager, Assignment, Sales). Using a composite key for foreign keys is clumsy (more typing!). Furthermore, when we drill into these models we often find that one of the key columns is a natural key rather than a primary key, and so might be subject to change. Cascading changes to natural key columns in composite foreign keys is a PITN.
Hence it is common practice to use a surrogate (or synthetic) primary key, say using a sequence or identify column, and enforce the natural key with a unique constraint. The latter step is often forgotten but it is crucial to maintaining referential integrity. Given a situation in which we need to store details of Employees from several Companies, including the Companies' Employee Identifier we might have an EMPLOYEE table like this:
create table employee
( employee_id number(38,0) generated always as number
, company_id number(38,0) not null
, company_employee_id varchar2(128) not null
, name varchar2(128) not null
, constraint employee_pk primary key
(employee_id)
, constraint employee_uk unique
(company_id, company_employee_id)
, constraint employee_company_fk foreign key
(company_id) references company(company_id)
);
One situation where it is common to find composite primary keys cascaded to dependent tables is in data warehouses and other VLDBs. Here the composite key columns form part of a denormalization strategy to support Partitioning schemes and/or efficient access paths.

Database design - composite key relationship issue

I had posted a similar question before, but this is more specific. Please have a look at the following diagram:
The explanation of this design is as follows:
Bakers produce many Products
The same Product can be produced by more than one Baker
Bakers change their pricing from time-to-time for certain (of their) Products
Orders can be created, but not necessarily finalised
The aim here is to allow the store manager to create an Order "Basket" based on whatever goods are required, and also allow the system being created to determine the best price at that time based on what Products are contained within the Order.
I therefore envisaged the ProductOrders table to initially hold the productID and associated orderID, whilst maintaining a null (undetermined) value for bakerID and pricingDate, as that would be determined and updated by the system, which would then constitute a finalised order.
Now that you have an idea of what I am trying to do, please advise me on how to to best set these relationships up.
Thank you!
If I understand correctly, an unfinalised order is not yet assigned a baker / pricing (meaning when an order is placed, no baker has yet been selected to bake the product).
In which case, the order is probably placed against the Products Table and then "Finalized" against the BakersProducts table.
A solution could be to give ProductsOrders 2 separate "ProductID's", one being for the original ordered ProductId (i.e. Non Nullable) - say ProductId, and the second being part of the Foreign key to the assigned BakersProducts (say ProductId2). Meaning that in ProductsOrders, the composite foreign keys BakerId, ProductId2 and PricingDate are all nullable, as they will only be set once the order is Finalized.
In order to remove this redundancy, what you might also consider is using surrogate keys instead of the composite keys. This way BakersProducts would have a surrogate PK (e.g. BakersProductId) which would then be referenced as a nullable FK in ProductsOrders. This would also avoid the confusion with the Direct FK in ProductsOrders to Product.ProductId (which from above, was the original Product line as part of the Order).
HTH?
Edit:
CREATE TABLE dbo.BakersProducts
(
BakerProductId int identity(1,1) not null, -- New Surrogate PK here
BakerId int not null,
ProductId int not null,
PricingDate datetime not null,
Price money not null,
StockLevel bigint not null,
CONSTRAINT PK_BakerProducts PRIMARY KEY(BakerProductId),
CONSTRAINT FK_BakerProductsProducts FOREIGN KEY(ProductId) REFERENCES dbo.Products(ProductId),
CONSTRAINT FK_BakerProductsBaker FOREIGN KEY(BakerId) REFERENCES dbo.Bakers(BakerId),
CONSTRAINT U_BakerProductsPrice UNIQUE(BakerId, ProductId, PricingDate) -- Unique Constraint mimicks the original PK for uniqueness ... could also use a unique index
)
CREATE TABLE dbo.ProductOrders
(
OrderId INT NOT NULL,
ProductId INT NOT NULL, -- This is the original Ordered Product set when order is created
BakerProductId INT NULL, -- This is nullable and gets set when Order is finalised with a baker
OrderQuantity BIGINT NOT NULL,
CONSTRAINT FK_ProductsOrdersBakersProducts FOREIGN KEY(BakersProductId) REFERENCES dbo.BakersProducts(BakerProductId)
.. Other Keys here
)

How to store the following SQL data optimally in SQL Server 2008

I am creating a page where people can post articles. When the user posts an article, it shows up on a list, like the related questions on Stack Overflow (when you add a new question). It's fairly simple.
My problem is that I have 2 types of users. 1) Unregistered private users. 2) A company.
The unregistered users needs to type in their name, email and phone. Whereas the company users just needs to type in their company name/password. Fairly simple.
I need to reduce the excess database usage and try to optimize the database and build the tables effectively.
Now to my problem in hand:
So I have one table with the information about the companies, ID (guid), Name, email, phone etc.
I was thinking about making one table called articles that contained ArticleID, Headline, Content and Publishing date.
One table with the information about the unregistered users, ID, their name, email and phone.
How do i tie the articles table to the company/unregistered users table. Is it good to make an integer that contains 2 values, 1=Unregistered user and 2=Company and then one field with an ID-number to the specified user/company. It looks like you need a lot of extra code to query the database. Performance? How could i then return the article along with the contact information? You should also be able to return all the articles from a specific company.
So Table company would be:
ID (guid), company name, phone, email, password, street, zip, country, state, www, description, contact person and a few more that i don't have here right now.
Table Unregistered user:
ID (guid), name, phone, email
Table article:
ID (int/guid/short guid), headline, content, published date, is_company, id_to_user
Is there a better approach?
Qualities that I am looking for is: Performance, Easy to query and Easy to maintain (adding new fields, indexes etc)
Theory
The problem you described is called Table Inheritance in data modeling theory. In Martin Fowler's book the solutions are:
single table inheritance: a single table that contains all fields.
class table inheritance: one table per class, with table for abstract classes.
concrete table inheritance: one table per non-abstract class, abstract members are repeated in each concrete table
So from a theory and industry practice point of view all three solutions are acceptable: one table Posters with columns NULLable columns (ie. single table), three tables Posters, Companies and Persons (ie. class inheritance) and two tables Companies and Persons (ie. concrete inheritance).
Now, to pros and cons.
Cost of NULL columns
The record structure is discussed in Inside the Storage Engine: Anatomy of a record:
NULL bitmap
two bytes for count of columns in the record
variable number of bytes to store one bit per column in the
record, regardless of whether the
column is nullable or not (this is
different and simpler than SQL Server
2000 which had one bit per nullable
column only)
So if you have at least one NULLable column, you pay the cost of the NULL bitmap in each record, at least 3 bytes. But the cost is identical if you have 1 or 8 columns! The 9th NULLable column will add a byte to the NULL bitmap in each record. the formula is described in Estimating the Size of a Clustered Index: 2 + ((Num_Cols + 7) / 8)
Peformance Driving Factor
In database system there is really only one factor that drives performance: amount of data scanned. How large are the record scanned by a query plan, and how many records does it have to scan. So to improve the performance you need to:
narrow the records: reduce the data size, covering include indexes, vertical partitioning
reduce the number of records scanned: indexes
reduce the number of scans: eliminate joins
Now in order to analyze these criteria, there is something missing in your post: the prevalent data access pattern, ie. the most common query that the database will be hit with. This is driven by how you display your posts on the site. Consider these possible approaches:
posts front page: like SO, a page of recent posts with header, excerpt, time posted and author basic information (name, gravatar). To get this page displayed you need to join Posts with authors, but you only need the author name and gravatar. Both single table inheritance and class table inheritance would work, but concrete table inheritance would fail. This is because you cannot afford for such a query to do conditional joins (ie. join the articles posted to either Companies or Persons), such a query will be less than optimal.
posts per author: users have to login first and then they'll see their own posts (this is common for non-public post oriented sites, think incident tracking for instance). For such a design, all three table inheritance schemes would work.
Conclusion
There are some general performance considerations (ie. narrow the data) to consider, but the critical information is missing: how are you going to query the data, your access pattern. The data model has to be optimized for that access pattern:
Which fields from Companies and Persons will be displayed on the landing page of the site (ie. the most often and performance critical query) ? You don't want to join 5 tables to show those fields.
Are some Company/Person information fields only needed on the user information page? Perhaps partition the table vertically into CompaniesExtra and PersonsExtra tables. Or use a index that will cover the frequently used fields (this approach simplifies code and is easier to keep consistent, at the cost of data duplication)
PS
Needless to say, don't use guids for ids. Unless you're building a distributed system, they are a horrible choice for reasons of excessive width. Fragmentation is also a potential problem, but that can be alleviated by use of sequential guids.
Ideally if you could use ORM (as mentioned by TFD), I would do so. Since you have not commented on that as well as you always come back with the "performance" question, I assume you would not like to use one.
Using pure SQL, the approach I would suggest would be to have table structure as below:
ActicleOwner [ID (guid)]
Company [ID (guid) - PK as well as FK to ActicleOwner.ID,
company name, phone, email, password, street, zip, ...]
UnregisteredUser [ID (guid) - PK as well as FK to ActicleOwner.ID,
name, phone, email]
Article = [ID (int/guid/short guid), headline, content, published date,
ArticleOwnerID - FK to ActicleOwner.ID]
Lets see usages:
INSERT: overhead is the need to add a row to ActicleOwner table for each Company/UU. This is not the operation that happens so often, there is no need to optimize performance
SELECT:
Company/UU: well, it is easy to search for both UU and Company, since you do not need to JOIN to any other table, as all the info about the required object is in one table
Acticles of one Company/UU: again, you just need to filter on the GUID of the Company/UU, and there you go: SELECT (list fields) FROM Acticle WHERE ArticleOwnerID = #AOID
Also think that one day you might need to support multiple Owners in the Article. With the parent table approach above (or mentioned by Vincent) you will just need to introduce relation table, whereas with 2 NULL-able FK constraints to each Owner table is solution you are kind-of stuck.
Performance:
Are you sure you have performance problem? What is your target?
One thing I can recommend looking at you model regarding performance is not to use GUIDs as clustered index (which is the default for a PK). Because basically your INSERT statements will be inserting data randomly into the table.
Alternatives are:
use Sequential GUID instead (see: What are the performance improvement of Sequential Guid over standard Guid?)
use both INTEGER and GUID. This is someone complicated approach and might be an overkill for a simple model you have, but the result is that you always JOIN tables in SELECTs on INTEGER instead of GUID, which is much faster.
So if you are so hot on performance, you might try to do the following:
ActicleOwner (ID (int identity) - PK, UID (guid) - UC)
Company [ID (int) - PK as well as FK to ActicleOwner.ID,
UID (guid) - UC as well as FK to ActicleOwner.UID, company name, ...]
...
Article = [ID (int/guid/short guid), headline, content, published date,
ArticleOwnerID - FK to ActicleOwner.ID (int)]
To INSERT a user (Company/UU) you do the following:
Having a UID (maybe sequential one) from the code, you do INSERT into ActicleOwner table. You get back the autogenerated integer ID.
you insert all the data into Company/UU, including the integer ID that you have just received.
ActicleOwner.ID will be integer, so searching on it will be faster then on UID, especially when you have an index on it.
This is a common OO programming problem that should not be solved in the SQL domain. It should be handled by your ORM
Make two classes in your program code as required and let you ORM map them to a suitable SQL representation. For performance a single table with nulls will do, the only overhead is the discriminator column
Some examples hibernate inheritance
I would suggest the super-type Author for Person and Organization sub-types.
Note that AuthorID serves as the primary and the foreign key at the same time for Person and Organization tables.
So first let's create tables:
CREATE TABLE Author(
AuthorID integer IDENTITY NOT NULL
,AuthorType char(1)
,Phone varchar(20)
,Email varchar(128) NOT NULL
);
ALTER TABLE Author ADD CONSTRAINT pk_Author PRIMARY KEY (AuthorID);
CREATE TABLE Article (
ArticleID integer IDENTITY NOT NULL
,AuthorID integer NOT NULL
,DatePublished date
,Headline varchar(100)
,Content varchar(max)
);
ALTER TABLE Article ADD
CONSTRAINT pk_Article PRIMARY KEY (ArticleID)
,CONSTRAINT fk1_Article FOREIGN KEY (AuthorID) REFERENCES Author(AuthorID) ;
CREATE TABLE Person (
AuthorID integer NOT NULL
,FirstName varchar(50)
,LastName varchar(50)
);
ALTER TABLE Person ADD
CONSTRAINT pk_Person PRIMARY KEY (AuthorID)
,CONSTRAINT fk1_Person FOREIGN KEY (AuthorID) REFERENCES Author(AuthorID);
CREATE TABLE Organization (
AuthorID integer NOT NULL
,OrgName varchar(40)
,OrgPassword varchar(128)
,OrgCountry varchar(40)
,OrgState varchar(40)
,OrgZIP varchar(16)
,OrgContactName varchar(100)
);
ALTER TABLE Organization ADD
CONSTRAINT pk_Organization PRIMARY KEY (AuthorID)
,CONSTRAINT fk1_Organization FOREIGN KEY (AuthorID) REFERENCES Author(AuthorID);
When inserting into Author you have to capture the auto-incremented id and then use it to insert the rest of data into person or organization, depending on AuthorType. Each row in Author has only one matching row in Person or Organization, not in both. Here is an example of how to capture the AuthorID.
-- Insert into table and return the auto-incremented AuthorID
INSERT INTO Author ( AuthorType, Phone, Email )
OUTPUT INSERTED.AuthorID
VALUES ( 'P', '789-789-7899', 'dudete#mmahoo.com' );
Here are a few examples of how to query authors:
-- Return all authors (org and person)
SELECT *
FROM dbo.Author AS a
LEFT JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
LEFT JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID ;
-- Return all-organization authors
SELECT *
FROM dbo.Author AS a
JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID ;
-- Return all person-authors
SELECT *
FROM dbo.Author AS a
JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
And now all articles with authors.
-- Return all articles with author information
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
LEFT JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
LEFT JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID ;
There are two ways to return all articles belonging to organizations. The first example returns only columns from the Organization table, while the second one has columns from the Person table too, with NULL values.
-- (1) Return all articles belonging to organizations
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID;
-- (2) Return all articles belonging to organizations
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
LEFT JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
LEFT JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID
WHERE AuthorType = 'O';
And to return all articles belonging to a specific organization, again two methods.
-- (1) Return all articles belonging to a specific organization
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID
WHERE c.OrgName = 'somecorp';
-- (2) Return all articles belonging to a specific organization
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
LEFT JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
LEFT JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID
WHERE c.OrgName = 'somecorp';
To make queries simpler, you could package some of this into a view or two.
Just as a reminder, it is common for an article to have several authors, so a many-to-many table Article_Author would be in order.
My preference is to use a table that acts like a super table to both.
ArticleOwner = (ID (guid), company name, phone, email)
company = (ID, password)
unregistereduser = (ID)
article = (ID (int/guid/short guid), headline, content, published date, owner)
Then querying the database will require a JOIN on the 3 tables but this way you do not have the null fields.
I'd suggest instead of two tables create one table Poster.
It's ok to have some fields empty if they are not applicable to one kind of poster.
Poster:
ID (guid), type, name, phone, email, password
where type is 1 for company, 2 - for unregistered user.
OR
Keep your users and companies separate, but require each company to have a user in users table. That table should have a CompanyID field. I think it would be more logical and elegant.
An interesting approach would be to use the Node model followed by Drupal, where everything is effectively a Node and all other data is stored in a secondary table. It's highly flexible and as is evidenced by the widespread use of Drupal in large publishing and discussion sites.
The layout would be something like this:
Node
ID
Type (User, Guest, Article)
TypeID (PKey of related data)
Created
Modified
Article
ID
Field1
Field2
Etc.
User
ID
Field1
Field2
Etc.
Guest
ID
Field1
Field2
Etc.
It's an alternative option with some good benefits. The greatest being flexibility.
I'm not convinced you need to distinguish between companies and persons; only registered and unregistered authors.
I added this for clarity. You could simply use a check constraint on the Authors table to limit the values to U and R.
Create Table dbo.AuthorRegisteredStates
(
Code char(1) not null Primary Key Clustered
, Name nvarchar(15) not null
, Constraint UK_AuthorRegisteredState Unique ( [Name])
)
Insert dbo.AuthorRegisteredState(Code, Name) Values('U', 'Unregistered')
Insert dbo.AuthorRegisteredState(Code, Name) Values('R', 'Registered')
GO
The key in any database system is data integrity. So, we want to ensure that usernames are unique and, perhaps, that Names are unique. Do you want to allow two people with the same name to publish an article? How would the reader differentiate them? Notice that I don't care whether the Author represents a company or person. If someone is registering a company or a person, they can put in a first name and last name if they want. However, what is required is that everyone enter a name (think of it as a display name). We would never search for authors based on anything other than name.
Create Table dbo.Authors
(
Id int not null identity(1,1) Primary Key Clustered
, AuthorStateCode char(1) not null
, Name nvarchar(100) not null
, Email nvarchar(300) null
, Username nvarchar(20) not null
, PasswordHash nvarchar(50) not null
, FirstName nvarchar(25) null
, LastName nvarchar(25) null
...
, Address nvarchar(max) null
, City nvarchar(40) null
...
, Website nvarchar(max) null
, Constraint UK_Authors_Name Unique ( [Name] )
, Constraint UK_Authors_Username Unique ( [Username] )
, Constraint FK_Authors_AuthorRegisteredStates
Foreign Key ( AuthorStateCode )
References dbo.AuthorRegisteredStates ( Code )
-- optional. if you really wanted to ensure that an author that was unregistered
-- had a firstname and lastname. However, I'd recommend enforcing this in the GUI
-- if anywhere as it really does not matter if they
-- enter a first name and last name.
-- All that matters is whether they are registered and entered a name.
, Constraint CK_Authors_RegisteredWithFirstNameLastName
Check ( State = 'R' Or ( State = 'U' And FirstName Is Not Null And LastName Is Not Null ) )
)
Can a single author publish two articles on the same date and time? If not (as I've guessed here), then we add a unique constraint. The question is whether you might need to identify an article. What information might you be given to locate an article besides the general date it was published?
Create Table dbo.Articles
(
Id int not null identity(1,1) Primary Key Clustered
, AuthorId int not null
, PublishedDate datetime not null
, Headline nvarchar(200) not null
, Content nvarchar(max) null
...
, Constraint UK_Articles_PublishedDate Unique ( AuthorId, PublishedDate )
, Constraint FK_Articles_Authors
Foreign Key ( AuthorId )
References dbo.Authors ( Id )
)
In addition, I would add an index on PublishedDate to improve searches by date.
Create Index IX_Articles_PublishedDate dbo.Articles On ( PublishedDate )
I would also enable free text search to search on the contents of articles.
I think concerns about "empty space" are probably premature optimization. The effect on performance will be nil. This is a case where a small amount of denormalizing costs you nothing in terms of performance and gains you in terms of development. However, if it really concerned you, you could move the address information into 1:1 table like so:
Create Table dbo.AuthorAddresses
(
AuthorId int not null Primary Key Clustered
, Street nvarchar(max) not null
, City nvarchar(40) not null
...
, Constraint FK_AuthorAddresses_Authors
Foreign Key ( AuthorId )
References dbo.Authors( Id )
)
This will add a small amount of complexity to your middle-tier. As always, the question is whether the elimination of some empty space exceeds the cost in terms of coding and testing. Whether you store this information as columns in your Authors table or in a separate table, the effect on performance will be nil.
I have solved similar problems by an approach similar to this:
Company -> Company
Articles User -> UserArticles
Articles
CompanyArticles contains a mapping from Company to an Article
UserArticles contains a mapping from User to Article
Article doesn't know anything about who created it.
By inverting the dependencies here you end up not overloading the meaning of foreign keys, having unused foreign keys, or creating a super table.
Getting all articles and contact information would look like:
SELECT name, phone, email FROM
user
JOIN userarticles on user.user_id = userarticles.user_id
JOIN articles on userarticles.article_id = article.article_id
UNION
SELECT name, phone, email FROM
company
JOIN companyarticles on company.company_id = companyarticles.company_id
JOIN articles on companyarticles.article_id = article.article_id

Resources