Normalisation issues regarding a relationship - database

I am looking for some guidance regarding a library database I am currently creating.
The Situation
I have an Accounts Table that stores all of the user's information.
I have a Books Table that stores all of the book's information.
I have a MyBooks Table that attempts to store the books that a user has taken out from the library.
The user can borrow many books and a book can be borrowed by many users (given that the book is free to be taken).
My Approach
Now, the issue is that I believe that my approach to solving this issue is not normalised.\
ERD I designed prior to implementing the database.
It leads to this within the table, which allows the table to store multiple books that belong to one user.
However, I have been told there should be a linking table somewhere because the BookID can just keep growing and seems that it could populate the table very fast and make it slower, though I'm not sure if that is true.
Here is my approach when I created the database using sqlite3 in Python 3 and it achieved the results I wanted, but at a possible cost of normalisation?
from sqlite3 import connect
conn = connect('LibrarySystem.db')
c = conn.cursor()
c.execute("""CREATE TABLE Accounts (
user_id INTEGER PRIMARY KEY,
email_address NVARCHAR(320) NOT NULL DEFAULT '',
password VARCHAR(60) NOT NULL DEFAULT '',
staff_mode INTEGER NOT NULL DEFAULT 0,
my_booksID INTEGER NOT NULL DEFAULT 0,
FOREIGN KEY(my_booksID) REFERENCES MyBooks(my_bookID)
)""")
conn.commit()
c.execute("""CREATE TABLE Books (
bookID INTEGER PRIMARY KEY,
title VARCHAR(100) NOT NULL DEFAULT '',
author VARCHAR(100) NOT NULL DEFAULT '',
genre VARCHAR(100) NOT NULL DEFAULT '',
issued INTEGER NOT NULL DEFAULT 0,
FOREIGN KEY(genre) REFERENCES Genres(genre)
)""")
conn.commit()
c.execute("""CREATE TABLE MyBooks (
my_booksID INTEGER NOT NULL DEFAULT '',
bookID INTEGER NOT NULL DEFAULT '',
date_issued TIMESTAMP NOT NULL DEFAULT '',
return_date TIMESTAMP NOT NULL DEFAULT '',
FOREIGN KEY(bookID) REFERENCES Books(bookID)
)""")
conn.commit()
Is there any normalisation issues regarding my approach?\

In the table Accounts there is a column my_booksID referencing my_booksID of MyBooks. Why?
Do you plan to have a new row in Accounts for the same user every time they take a book?
Instead, you should have a column user_id in MyBooks referencing user_id in Accounts.
This way you make MyBooks the linking table between Accounts and MyBooks.
When a user takes a book out from the library, you will add a new row in MyBooks with the user_id of the user and the bookID of the book.
Also, in SQLite there are no VARCHAR and TIMESTAMP data types (check Datatypes In SQLite Version 3).
In the case of VARCHAR use TEXT and for TIMESTAMP, if you want to store dates in the format YYYY-MM-DD (which is the only valid date format for SQLite) use TEXT, or if you want to store dates as unix timestamps use INTEGER.

Related

Building comment system for different types of entities

I'm building a comment system in PostgreSQL where I can comment (as well as "liking" them) on different entities that I already have (such as products, articles, photos, and so on). For the moment, I came up with this:
(note: the foreign key between comment_board and product/article/photo is very loose here. ref_id is just storing the id, which is used in conjunction with the comment_board_type to determine which table it is)
Obviously, this doesn't seem like good data integrity. What can I do to give it better integrity? Also, I know every product/article/photo will need a comment_board. Could that mean I implement a comment_board_id to each product/article/photo entity such as this?:
I do recognize this SO solution, but it made me second-guess supertypes and the complexities of it: Database design - articles, blog posts, photos, stories
Any guidance is appreciated!
I ended up just pointing the comments directly to the product/photo/article fields. Here is what i came up with in total
CREATE TABLE comment (
id SERIAL PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT (now()),
updated_at TIMESTAMP WITH TIME ZONE,
account_id INT NOT NULL REFERENCES account(id),
text VARCHAR NOT NULL,
-- commentable sections
product_id INT REFERENCES product(id),
photo_id INT REFERENCES photo(id),
article_id INT REFERENCES article(id),
-- constraint to make sure this comment appears in only one place
CONSTRAINT comment_entity_check CHECK(
(product_id IS NOT NULL)::INT
+
(photo_id IS NOT NULL)::INT
+
(article_id IS NOT NULL)::INT
= 1
)
);
CREATE TABLE comment_likes (
id SERIAL PRIMARY KEY,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT (now()),
updated_at TIMESTAMP WITH TIME ZONE,
account_id INT NOT NULL REFERENCES account(id),
comment_id INT NOT NULL REFERENCES comment(id),
-- comments can only be liked once by an account.
UNIQUE(account_id, comment_id)
);
Resulting in:
This makes it so that I have to do one less join to an intermediary table. Also, it lets me add a field and update the constraints easily.

Normalize 3 database tables

Hello I have a problem seperating my 3 payment types: CASH, CREDIT, BANK
Each of them has different details.
The details are user defined which means that in a credit card payment (for ex: you should input your credit card details, bank details, cash details (currency and etc))
Business Process: The user will choose his payment type in a
combobox:
Then the user will input the details of that payment type.
This is what I've tried:
PaymentType(PaymentType_ID(PK), PaymentTypes)
...
.....
......
.........
then I'm stuck. I don't know how. Please help me. If you will answer explain to me please. I don't want to ask the same question here again. If I'm faced with a similar situation.
***I can't merge all of them into 1 table because they different columns. They have different specific details...
All three payment types have a few things in common. They all have an account number, an amount, a timestamp, a payment type, and some kind of transaction identifier. All the common attributes go in one table. (Some of the data types are deliberately naive, because they're application-dependent, and I don't know your application.)
create table payment_types (
payment_type_code char(2) primary key,
payment_type varchar(8) not null unique
);
insert into payment_types values
('Ca', 'Cash'),('Cr', 'Credit'),('Ba', 'Bank');
create table payments (
transaction_id integer primary key,
account_code varchar(5) not null, -- references accounts, not shown
amount_usd numeric(5,2) not null,
payment_type_code char(2) not null references payment_types (payment_type_code),
transaction_timestamp timestamp not null default current_timestamp,
unique (transaction_id, payment_type_code)
);
The unique constraint on {transaction_id, payment_type_code} lets SQL use that pair of columns as the target for a foreign key constraint. That's crucial to keeping the rows from the several tables from getting mixed up.
Each payment has different attributes, depending on the payment type. And each payment can be of only one type.
create table payment_cash (
transaction_id integer primary key,
payment_type_code char(2) not null default 'Ca' check (payment_type_code = 'Ca'),
foreign key (transaction_id, payment_type_code)
references payments (transaction_id, payment_type_code),
other_cash_columns char(1) not null
);
create table payment_credit (
transaction_id integer primary key,
payment_type_code char(2) not null default 'Cr' check (payment_type_code = 'Cr'),
foreign key (transaction_id, payment_type_code)
references payments (transaction_id, payment_type_code),
other_credit_columns char(1) not null
);
create table payment_bank (
transaction_id integer primary key,
payment_type_code char(2) not null default 'Ba' check (payment_type_code = 'Ba'),
foreign key (transaction_id, payment_type_code)
references payments (transaction_id, payment_type_code),
other_bank_columns char(1) not null
);
The default value and check constraint for payment_type_code makes it impossible, for example, to insert credit details for a cash payment. That would be possible--and it would be a Bad Thing--if the foreign key constraint used only the transaction id.
As a general rule, you don't cascade updates or deletes for financial transactions. Instead, correct errors by inserting a compensating transaction.
To make this more friendly to users and application code, create three updatable views that join the payments table to the detail. How to make them updatable depends on your dbms.
create view credit_payments_all as
select p.transaction_id, p.account_code, p.amount_usd,
p.payment_type_code, p.transaction_timestamp,
c.other_credit_columns
from payments p
inner join payment_credit c on c.transaction_id = p.transaction_id
-- Rules, triggers, stored procedures, functions, or whatever you need
-- to make this view updatable.
Then any code that needs to insert a credit transaction can just insert into the view credit_payments_all.

Large sample database for HSQLDB?

I'm taking a database class and I'd like to have a large sample database to experiment with. My definition of large here is that there's enough data in the database so that if I try a query that's very inefficient, I'll be able to tell by the amount of time it takes to execute. I've googled for this and not found anything that's HSQLDB specific, but maybe I'm using the wrong keywords. Basically I'm hoping to find something that's already set up, with the tables, primary keys, etc. and normalized and all that, so I can try things out on a somewhat realistic database. For HSQLDB I guess that would just be the .script file. Anyway if anybody knows of any resources for this I'd really appreciate it.
You can use the MySQL Sakila database schema and data (open source, on MySQL web site), but you need to modify the schema definition. You can delete the view and trigger definitions, which are not necessary for your experiment. For example:
CREATE TABLE country (
country_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
country VARCHAR(50) NOT NULL,
last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (country_id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
modified:
CREATE TABLE country (
country_id SMALLINT GENERATED BY DEFAULT AS IDENTITY,
country VARCHAR(50) NOT NULL,
last_update TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (country_id)
)
Some MySQL DDL syntax is supported in the MYS syntax mode of HSQLDB, for example AUTO_INCREMENT is translated to IDENTITY, but others need manual editing. The data is mostly compatible, apart from some binary strings.
You need to access the database with a tool that reports the query time. The HSQLDB DatabaseManager does this when the query output is in Text mode.

Database best practices

I have a table which stores comments, the comment can either come from another user, or another profile which are separate entities in this app.
My original thinking was that the table would have both user_id and profile_id fields, so if a user submits a comment, it gives the user_id leaves the profile_id blank
is this right, wrong, is there a better way?
Whatever is the best solution depends IMHO on more than just the table, but also how this is used elsewhere in the application.
Assuming that the comments are all associated with some other object, lets say you extract all the comments from that object. In your proposed design, extracting all the comments require selecting from just one table, which is efficient. But that is extracting the comments without extracting the information about the poster of each comment. Maybe you don't want to show it, or maybe they are already cached in memory.
But what if you had to retrieve information about the poster while retrieving the comments? Then you have to join with two different tables, and now the resulting record set is getting polluted with a lot of NULL values (for a profile comment, all the user fields will be NULL). The code that has to parse this result set also could get more complex.
Personally, I would probably start with the fully normalized version, and then denormalize when I start seeing performance problems
There is also a completely different possible solution to the problem, but this depends on whether or not it makes sense in the domain. What if there are other places in the application where a user and a poster can be used interchangeably? What if a User is just a special kind of a Profile? Then I think that the solution should be solved generally in the user/profile tables. For example (some abbreviated pseudo-sql):
create table AbstractProfile (ID primary key, type ) -- type can be 'user' or 'profile'
create table User(ProfileID primary key references AbstractProfile , ...)
create table Profile(ProfileID primary key references AbstractProfile , ...)
Then any place in your application, where a user or a profile can be used interchangeably, you can reference the LoginID.
If the comments are general for several objects you could create a table for each object:
user_comments (user_id, comment_id)
profile_comments (profile_id, comment_id)
Then you do not have to have any empty columns in your comments table. It will also make it easy to add new comment-source-objects in the future without touching the comments table.
Another way to solve is to always denormalize (copy) the name of the commenter on the comment and also store a reference back to the commenter via a type and an id field. That way you have a unified comments table where on you can search, sort and trim quickly. The drawback is that there isn't any real FK relationship between a comment and it's owner.
In the past I have used a centralized comments table and had a field for the fk_table it is referencing.
eg:
comments(id,fk_id,fk_table,comment_text)
That way you can use UNION queries to concatenate the data from several sources.
SELECT c.comment_text FROM comment c JOIN user u ON u.id=c.fk_id WHERE c.fk_table="user"
UNION ALL
SELECT c.comment_text FROM comment c JOIN profile p ON p.id=c.fk_id WHERE c.fk_table="profile"
This ensures that you can expand the number of objects that have comments without creating redundant tables.
Here's another approach, which allows you to maintain referential integrity through foreign keys, manage centrally, and provide the highest performance using standard database tools such as indexes and if you really need, partitioning etc:
create table actor_master_table(
type char(1) not null, /* e.g. 'u' or 'p' for user / profile */
id varchar(20) not null, /* e.g. 'someuser' or 'someprofile' */
primary key(type, id)
);
create table user(
type char(1) not null,
id varchar(20) not null,
...
check (id = 'u'),
foreign key (type, id) references actor_master_table(type, id)
);
create table profile(
type char(1) not null,
id varchar(20) not null,
...
check (id = 'p'),
foreign key (type, id) references actor_master_table(type, id)
);
create table comment(
creator_type char(1) not null,
creator_id varchar(20) not null,
comment text not null,
foreign key(creator_type, creator_id) references actor_master_table(type, id)
);

How to store the following SQL data optimally in SQL Server 2008

I am creating a page where people can post articles. When the user posts an article, it shows up on a list, like the related questions on Stack Overflow (when you add a new question). It's fairly simple.
My problem is that I have 2 types of users. 1) Unregistered private users. 2) A company.
The unregistered users needs to type in their name, email and phone. Whereas the company users just needs to type in their company name/password. Fairly simple.
I need to reduce the excess database usage and try to optimize the database and build the tables effectively.
Now to my problem in hand:
So I have one table with the information about the companies, ID (guid), Name, email, phone etc.
I was thinking about making one table called articles that contained ArticleID, Headline, Content and Publishing date.
One table with the information about the unregistered users, ID, their name, email and phone.
How do i tie the articles table to the company/unregistered users table. Is it good to make an integer that contains 2 values, 1=Unregistered user and 2=Company and then one field with an ID-number to the specified user/company. It looks like you need a lot of extra code to query the database. Performance? How could i then return the article along with the contact information? You should also be able to return all the articles from a specific company.
So Table company would be:
ID (guid), company name, phone, email, password, street, zip, country, state, www, description, contact person and a few more that i don't have here right now.
Table Unregistered user:
ID (guid), name, phone, email
Table article:
ID (int/guid/short guid), headline, content, published date, is_company, id_to_user
Is there a better approach?
Qualities that I am looking for is: Performance, Easy to query and Easy to maintain (adding new fields, indexes etc)
Theory
The problem you described is called Table Inheritance in data modeling theory. In Martin Fowler's book the solutions are:
single table inheritance: a single table that contains all fields.
class table inheritance: one table per class, with table for abstract classes.
concrete table inheritance: one table per non-abstract class, abstract members are repeated in each concrete table
So from a theory and industry practice point of view all three solutions are acceptable: one table Posters with columns NULLable columns (ie. single table), three tables Posters, Companies and Persons (ie. class inheritance) and two tables Companies and Persons (ie. concrete inheritance).
Now, to pros and cons.
Cost of NULL columns
The record structure is discussed in Inside the Storage Engine: Anatomy of a record:
NULL bitmap
two bytes for count of columns in the record
variable number of bytes to store one bit per column in the
record, regardless of whether the
column is nullable or not (this is
different and simpler than SQL Server
2000 which had one bit per nullable
column only)
So if you have at least one NULLable column, you pay the cost of the NULL bitmap in each record, at least 3 bytes. But the cost is identical if you have 1 or 8 columns! The 9th NULLable column will add a byte to the NULL bitmap in each record. the formula is described in Estimating the Size of a Clustered Index: 2 + ((Num_Cols + 7) / 8)
Peformance Driving Factor
In database system there is really only one factor that drives performance: amount of data scanned. How large are the record scanned by a query plan, and how many records does it have to scan. So to improve the performance you need to:
narrow the records: reduce the data size, covering include indexes, vertical partitioning
reduce the number of records scanned: indexes
reduce the number of scans: eliminate joins
Now in order to analyze these criteria, there is something missing in your post: the prevalent data access pattern, ie. the most common query that the database will be hit with. This is driven by how you display your posts on the site. Consider these possible approaches:
posts front page: like SO, a page of recent posts with header, excerpt, time posted and author basic information (name, gravatar). To get this page displayed you need to join Posts with authors, but you only need the author name and gravatar. Both single table inheritance and class table inheritance would work, but concrete table inheritance would fail. This is because you cannot afford for such a query to do conditional joins (ie. join the articles posted to either Companies or Persons), such a query will be less than optimal.
posts per author: users have to login first and then they'll see their own posts (this is common for non-public post oriented sites, think incident tracking for instance). For such a design, all three table inheritance schemes would work.
Conclusion
There are some general performance considerations (ie. narrow the data) to consider, but the critical information is missing: how are you going to query the data, your access pattern. The data model has to be optimized for that access pattern:
Which fields from Companies and Persons will be displayed on the landing page of the site (ie. the most often and performance critical query) ? You don't want to join 5 tables to show those fields.
Are some Company/Person information fields only needed on the user information page? Perhaps partition the table vertically into CompaniesExtra and PersonsExtra tables. Or use a index that will cover the frequently used fields (this approach simplifies code and is easier to keep consistent, at the cost of data duplication)
PS
Needless to say, don't use guids for ids. Unless you're building a distributed system, they are a horrible choice for reasons of excessive width. Fragmentation is also a potential problem, but that can be alleviated by use of sequential guids.
Ideally if you could use ORM (as mentioned by TFD), I would do so. Since you have not commented on that as well as you always come back with the "performance" question, I assume you would not like to use one.
Using pure SQL, the approach I would suggest would be to have table structure as below:
ActicleOwner [ID (guid)]
Company [ID (guid) - PK as well as FK to ActicleOwner.ID,
company name, phone, email, password, street, zip, ...]
UnregisteredUser [ID (guid) - PK as well as FK to ActicleOwner.ID,
name, phone, email]
Article = [ID (int/guid/short guid), headline, content, published date,
ArticleOwnerID - FK to ActicleOwner.ID]
Lets see usages:
INSERT: overhead is the need to add a row to ActicleOwner table for each Company/UU. This is not the operation that happens so often, there is no need to optimize performance
SELECT:
Company/UU: well, it is easy to search for both UU and Company, since you do not need to JOIN to any other table, as all the info about the required object is in one table
Acticles of one Company/UU: again, you just need to filter on the GUID of the Company/UU, and there you go: SELECT (list fields) FROM Acticle WHERE ArticleOwnerID = #AOID
Also think that one day you might need to support multiple Owners in the Article. With the parent table approach above (or mentioned by Vincent) you will just need to introduce relation table, whereas with 2 NULL-able FK constraints to each Owner table is solution you are kind-of stuck.
Performance:
Are you sure you have performance problem? What is your target?
One thing I can recommend looking at you model regarding performance is not to use GUIDs as clustered index (which is the default for a PK). Because basically your INSERT statements will be inserting data randomly into the table.
Alternatives are:
use Sequential GUID instead (see: What are the performance improvement of Sequential Guid over standard Guid?)
use both INTEGER and GUID. This is someone complicated approach and might be an overkill for a simple model you have, but the result is that you always JOIN tables in SELECTs on INTEGER instead of GUID, which is much faster.
So if you are so hot on performance, you might try to do the following:
ActicleOwner (ID (int identity) - PK, UID (guid) - UC)
Company [ID (int) - PK as well as FK to ActicleOwner.ID,
UID (guid) - UC as well as FK to ActicleOwner.UID, company name, ...]
...
Article = [ID (int/guid/short guid), headline, content, published date,
ArticleOwnerID - FK to ActicleOwner.ID (int)]
To INSERT a user (Company/UU) you do the following:
Having a UID (maybe sequential one) from the code, you do INSERT into ActicleOwner table. You get back the autogenerated integer ID.
you insert all the data into Company/UU, including the integer ID that you have just received.
ActicleOwner.ID will be integer, so searching on it will be faster then on UID, especially when you have an index on it.
This is a common OO programming problem that should not be solved in the SQL domain. It should be handled by your ORM
Make two classes in your program code as required and let you ORM map them to a suitable SQL representation. For performance a single table with nulls will do, the only overhead is the discriminator column
Some examples hibernate inheritance
I would suggest the super-type Author for Person and Organization sub-types.
Note that AuthorID serves as the primary and the foreign key at the same time for Person and Organization tables.
So first let's create tables:
CREATE TABLE Author(
AuthorID integer IDENTITY NOT NULL
,AuthorType char(1)
,Phone varchar(20)
,Email varchar(128) NOT NULL
);
ALTER TABLE Author ADD CONSTRAINT pk_Author PRIMARY KEY (AuthorID);
CREATE TABLE Article (
ArticleID integer IDENTITY NOT NULL
,AuthorID integer NOT NULL
,DatePublished date
,Headline varchar(100)
,Content varchar(max)
);
ALTER TABLE Article ADD
CONSTRAINT pk_Article PRIMARY KEY (ArticleID)
,CONSTRAINT fk1_Article FOREIGN KEY (AuthorID) REFERENCES Author(AuthorID) ;
CREATE TABLE Person (
AuthorID integer NOT NULL
,FirstName varchar(50)
,LastName varchar(50)
);
ALTER TABLE Person ADD
CONSTRAINT pk_Person PRIMARY KEY (AuthorID)
,CONSTRAINT fk1_Person FOREIGN KEY (AuthorID) REFERENCES Author(AuthorID);
CREATE TABLE Organization (
AuthorID integer NOT NULL
,OrgName varchar(40)
,OrgPassword varchar(128)
,OrgCountry varchar(40)
,OrgState varchar(40)
,OrgZIP varchar(16)
,OrgContactName varchar(100)
);
ALTER TABLE Organization ADD
CONSTRAINT pk_Organization PRIMARY KEY (AuthorID)
,CONSTRAINT fk1_Organization FOREIGN KEY (AuthorID) REFERENCES Author(AuthorID);
When inserting into Author you have to capture the auto-incremented id and then use it to insert the rest of data into person or organization, depending on AuthorType. Each row in Author has only one matching row in Person or Organization, not in both. Here is an example of how to capture the AuthorID.
-- Insert into table and return the auto-incremented AuthorID
INSERT INTO Author ( AuthorType, Phone, Email )
OUTPUT INSERTED.AuthorID
VALUES ( 'P', '789-789-7899', 'dudete#mmahoo.com' );
Here are a few examples of how to query authors:
-- Return all authors (org and person)
SELECT *
FROM dbo.Author AS a
LEFT JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
LEFT JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID ;
-- Return all-organization authors
SELECT *
FROM dbo.Author AS a
JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID ;
-- Return all person-authors
SELECT *
FROM dbo.Author AS a
JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
And now all articles with authors.
-- Return all articles with author information
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
LEFT JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
LEFT JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID ;
There are two ways to return all articles belonging to organizations. The first example returns only columns from the Organization table, while the second one has columns from the Person table too, with NULL values.
-- (1) Return all articles belonging to organizations
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID;
-- (2) Return all articles belonging to organizations
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
LEFT JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
LEFT JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID
WHERE AuthorType = 'O';
And to return all articles belonging to a specific organization, again two methods.
-- (1) Return all articles belonging to a specific organization
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID
WHERE c.OrgName = 'somecorp';
-- (2) Return all articles belonging to a specific organization
SELECT *
FROM dbo.Article AS x
JOIN dbo.Author AS a ON a.AuthorID = x.AuthorID
LEFT JOIN dbo.Person AS p ON a.AuthorID = p.AuthorID
LEFT JOIN dbo.Organization AS c ON c.AuthorID = a.AuthorID
WHERE c.OrgName = 'somecorp';
To make queries simpler, you could package some of this into a view or two.
Just as a reminder, it is common for an article to have several authors, so a many-to-many table Article_Author would be in order.
My preference is to use a table that acts like a super table to both.
ArticleOwner = (ID (guid), company name, phone, email)
company = (ID, password)
unregistereduser = (ID)
article = (ID (int/guid/short guid), headline, content, published date, owner)
Then querying the database will require a JOIN on the 3 tables but this way you do not have the null fields.
I'd suggest instead of two tables create one table Poster.
It's ok to have some fields empty if they are not applicable to one kind of poster.
Poster:
ID (guid), type, name, phone, email, password
where type is 1 for company, 2 - for unregistered user.
OR
Keep your users and companies separate, but require each company to have a user in users table. That table should have a CompanyID field. I think it would be more logical and elegant.
An interesting approach would be to use the Node model followed by Drupal, where everything is effectively a Node and all other data is stored in a secondary table. It's highly flexible and as is evidenced by the widespread use of Drupal in large publishing and discussion sites.
The layout would be something like this:
Node
ID
Type (User, Guest, Article)
TypeID (PKey of related data)
Created
Modified
Article
ID
Field1
Field2
Etc.
User
ID
Field1
Field2
Etc.
Guest
ID
Field1
Field2
Etc.
It's an alternative option with some good benefits. The greatest being flexibility.
I'm not convinced you need to distinguish between companies and persons; only registered and unregistered authors.
I added this for clarity. You could simply use a check constraint on the Authors table to limit the values to U and R.
Create Table dbo.AuthorRegisteredStates
(
Code char(1) not null Primary Key Clustered
, Name nvarchar(15) not null
, Constraint UK_AuthorRegisteredState Unique ( [Name])
)
Insert dbo.AuthorRegisteredState(Code, Name) Values('U', 'Unregistered')
Insert dbo.AuthorRegisteredState(Code, Name) Values('R', 'Registered')
GO
The key in any database system is data integrity. So, we want to ensure that usernames are unique and, perhaps, that Names are unique. Do you want to allow two people with the same name to publish an article? How would the reader differentiate them? Notice that I don't care whether the Author represents a company or person. If someone is registering a company or a person, they can put in a first name and last name if they want. However, what is required is that everyone enter a name (think of it as a display name). We would never search for authors based on anything other than name.
Create Table dbo.Authors
(
Id int not null identity(1,1) Primary Key Clustered
, AuthorStateCode char(1) not null
, Name nvarchar(100) not null
, Email nvarchar(300) null
, Username nvarchar(20) not null
, PasswordHash nvarchar(50) not null
, FirstName nvarchar(25) null
, LastName nvarchar(25) null
...
, Address nvarchar(max) null
, City nvarchar(40) null
...
, Website nvarchar(max) null
, Constraint UK_Authors_Name Unique ( [Name] )
, Constraint UK_Authors_Username Unique ( [Username] )
, Constraint FK_Authors_AuthorRegisteredStates
Foreign Key ( AuthorStateCode )
References dbo.AuthorRegisteredStates ( Code )
-- optional. if you really wanted to ensure that an author that was unregistered
-- had a firstname and lastname. However, I'd recommend enforcing this in the GUI
-- if anywhere as it really does not matter if they
-- enter a first name and last name.
-- All that matters is whether they are registered and entered a name.
, Constraint CK_Authors_RegisteredWithFirstNameLastName
Check ( State = 'R' Or ( State = 'U' And FirstName Is Not Null And LastName Is Not Null ) )
)
Can a single author publish two articles on the same date and time? If not (as I've guessed here), then we add a unique constraint. The question is whether you might need to identify an article. What information might you be given to locate an article besides the general date it was published?
Create Table dbo.Articles
(
Id int not null identity(1,1) Primary Key Clustered
, AuthorId int not null
, PublishedDate datetime not null
, Headline nvarchar(200) not null
, Content nvarchar(max) null
...
, Constraint UK_Articles_PublishedDate Unique ( AuthorId, PublishedDate )
, Constraint FK_Articles_Authors
Foreign Key ( AuthorId )
References dbo.Authors ( Id )
)
In addition, I would add an index on PublishedDate to improve searches by date.
Create Index IX_Articles_PublishedDate dbo.Articles On ( PublishedDate )
I would also enable free text search to search on the contents of articles.
I think concerns about "empty space" are probably premature optimization. The effect on performance will be nil. This is a case where a small amount of denormalizing costs you nothing in terms of performance and gains you in terms of development. However, if it really concerned you, you could move the address information into 1:1 table like so:
Create Table dbo.AuthorAddresses
(
AuthorId int not null Primary Key Clustered
, Street nvarchar(max) not null
, City nvarchar(40) not null
...
, Constraint FK_AuthorAddresses_Authors
Foreign Key ( AuthorId )
References dbo.Authors( Id )
)
This will add a small amount of complexity to your middle-tier. As always, the question is whether the elimination of some empty space exceeds the cost in terms of coding and testing. Whether you store this information as columns in your Authors table or in a separate table, the effect on performance will be nil.
I have solved similar problems by an approach similar to this:
Company -> Company
Articles User -> UserArticles
Articles
CompanyArticles contains a mapping from Company to an Article
UserArticles contains a mapping from User to Article
Article doesn't know anything about who created it.
By inverting the dependencies here you end up not overloading the meaning of foreign keys, having unused foreign keys, or creating a super table.
Getting all articles and contact information would look like:
SELECT name, phone, email FROM
user
JOIN userarticles on user.user_id = userarticles.user_id
JOIN articles on userarticles.article_id = article.article_id
UNION
SELECT name, phone, email FROM
company
JOIN companyarticles on company.company_id = companyarticles.company_id
JOIN articles on companyarticles.article_id = article.article_id

Resources