Database - table contain only primary and foreign keys - database

Can a table contain only a primary key and 1 or more foreign keys? Or it will violate the normalization design?
For example:
PK: SKILL_NAME
FK: SKILL_ID
FK: EMPL_ID

Yes. This would be the typical structure, for instance, for an association/junction table that implements a n-m relationship between two entities.
That said, almost all tables I create also have:
createdAt -- insertion time for the record
createdBy -- who inserted the record
And sometimes:
createdOn -- the system/database where the record was created

Your example may not be a good one because i don't understand the scenario that skill_name is the PK, instead of skill_id.
What you mentioned is actually similar to EAV data model, i see some open source produce (magento) using it. But it's too normalized and need denormalized caching table for the performance.

Related

Mapping database structure from SQL Server to DynamoDB

I am thinking about using a NoSQL database to scale database reads. Please see the relational database structure below:
CREATE TABLE Person(
ID uniqueidentifier not null,
Name varchar(100),
DateOfBirth datetime)
CREATE TABLE Sport (
ID uniqueidentifier not null,
Description varchar(50)) -- e.g. Football; Tennis; Badminton etc
CREATE TABLE PersonPlaysSport (
PersonID uniqueidentifier FOREIGN KEY REFERENCES Person(ID),
SportID uniqueidentifier FOREIGN KEY REFERENCE Sport (ID),
primary key (PersonID, SportID)
In the example above a Person Plays many Sports. In my real application; I have many-to-many relationships like this that do not perform well.
How would these be stored in a NoSQL document database (DynamoDB)?
Disclaimer - I'm not familiar with DynamoDb, but have used several other NoSql databases
The common approach is to choose the most important subject entity as the root of the document (in your case, I would say this is Person)
A document is then created for each person, and will include the "person centric" view of all associated entities (i.e. linked sports):
Joe (Person, Keyed on a natural, or surrogate id).
+ Fields of Joe (Date of Birth, etc)
+ SportsPlayed: (Collection)
--> Golf (Sport)
--> Tennis (Sport)
If it becomes important to view the relationship from a Sport centric approach (e.g. you need to know which persons are 'subscribed' to which Sport):
You could attempt a secondary index on Person.Sport, if the NoSql database allows this. This would allow for queries like "Who plays Golf?", although this approach is often frowned upon in NoSql terms.
Alternatively, and preferably, create a second collection of documents, this time keyed by Sport:
Golf (Sport)
- Joe
- Jim
...
etc. Obviously there's extra work to be done in keeping both sets of documents up to date when a change is made to a Person, a Sport, or the relationship between them, however the benefit is high performance on the read side - only a single document needs to be retrieved to pull the entire entity graph - In SQL terms, this would have required a Query joining 3 distinct tables.

In a junction table, should I use a Primary key and a unique constraint, or a compound/composite key?

I have read through handfuls of what would seem to make this a duplicate question. But reading through all of these has left me uncertain. I'm hoping to get an answer based on the absolute example below, as many questions/answers trail off into debates back and forth.
If I have:
dbo.Book
--------
BookID PK int identity(1,1)
dbo.Author
----------
AuthorID PK int identity(1,1)
Now I have two choices for a simple junction table:
dbo.BookAuthor
--------------
BookID CPK and FK
AuthorID CPK and FK
The above would be a compound/composite key on both FKs, as well as set up the FK relationships for both columns - also using Cascade on delete.
OR
dbo.BookAuthor
--------------
RecordID PK int identity(1,1)
BookID FK
AuthorID FK
Foreign key relationships on BookID and AuthorID, along with Cascade on delete. Also set up a unique constraint on BookID and AuthorID.
I'm looking for a simple answer as to why one method is better than another in the ABOVE particular example. The answers that I'm reading are very detailed, and I was just about to settle on a compound key, but then watched a video where the example used an Identity column like my first example.
It seems this topic is slightly torn in half, but my gut is telling me that I should just use a composite key.
What's more efficient for querying? It seems having a PK identity column along with setting up a unique constraint on the two columns, AND the FK relationships would be more costly, even if a little.
This is something I've always remembered from my database course way back in college. We were covering the section from the textbook on "Entity Design" and it was talking about junction tables... we called them intersect tables or intersection relations. I was actually paying attention in class that day. The professor said, in his experience, a many-to-many junction table almost always indicates an unidentified missing entity. These entities almost always end up with data of their own.
We were given an example of Student and Course entities. For a student to take a course, you need to junction between those two. What you actually have as a result is a new entity: an Enrollment. The additional data in this case would be things like Credit Type (audit vs regular) or Final Grade.
I remember that advice to this day... but I don't always follow it. What I will do in this situation is stop, and make sure to go back to the stakeholders on the issue and work with them on what data points we might still be missing in this junction. If we really can't find anything, then I'll use the compound key. When we do find data, we think of a better name and it gets a surrogate key.
Update in 2020
I still have the textbook, and by amazing coincidence both it and this question were brought to my attention within a few hours of each other. So for the curious, it was Chapter 5, section 6, of the 7th edition of this book:
https://www.amazon.com/Database-Processing-Fundamentals-Design-Implementation-dp-9332549958/dp/9332549958/
As a staunch proponent of, and proselytizer for, the benefits of surrogate keys, I none-the-less make an exception for all-key join tables such as your first example. One of the benefits of surrogate keys is that engines are generally optimized for joining on single integer fields, as the default and most common circumstance.
Your first proposal still obtains this benefit, but also has a 50% greater fan-put on each index level, reducing both the overall size and height of the indices on the join table. Although the performance benefits of this are likely negligible for anything smaller than a massive table it is best practice and comes at no cost.
When I might opt for the other design is if the relation were to accrue additional columns. At that point it is no longer strictly a join table.
I prefer the first design, using Composite Keys. Having an identity column on the junction table does not give you an advantage even if the parent tables have them. You won't be querying the BookAuthor using the identity column, instead you would query it using the BookID and AuthorID.
Also, adding an identity would allow for duplicate BookID-AuthorID combination, unless you put a constraint.
Additionally, if your primary key is (BookID, AuthorID), you need to an index on AuthorID, BookID). This will help if you want to query the the books written by an author.
Using composite key would be my choice too. Here's why:
Less storage overhead
Let's say you would use a surrogate key. Since you'd probably gonna want to query all authors for a specific book and vica versa you'd need indexes starting with both BookId and AuthorId. For performance reasons you should include the other column in both indexes to prevent a clustered key lookup. You'd probably would want to make one of them a unique to make sure no duplicate BookId/AuthorId combinations are added to the table.
So as a net result:
The data is stored 3 times instead of 2 times
2 unique constraints are to be validated instead of 1
Querying a junction table referencing table
Even if you'd add a table like Contributions (AuthorId, BookId, ...) referencing the junction table. Most queries won't require the junction table to be touched at all. E.g.: to find all contribution of a specific author would only involve the author and contributions tables.
Depending on the amount of data in the junction table, a compound key might end up causing poor performance over an auto generated sequential primary key.
The primary key is the clustered index for the table, which means that it determines the order in which rows are stored on disc. If the primary key's values are not generated sequentially (e.g. it is a composite key comprised of foreign keys from tables where rows do not fall in the same order as the junction table's rows, or it is a GUID or other random key) then each time a row is added to the junction table a reshuffle of the junction table's rows will be necessary.
You probably should use the compound/composite key. This way you are fully relational - one author can write many books and one book can have multiple authors.

Separate tables for online and in-store purchases?

I'm developing a system for a retailer and I've hit a bit of a conundrum when it comes to deciding how to represent the orders in the database. The schema for my Order table so far is as follows:
Id - PK
AccountId - FK (Nullable)
ShippingAddressId - FK (Nullable)
BillingAddressId - FK (Nullable)
ShippingMethod - (Nullable)
Type - (Nullable)
Status
Date
SubTotal
Tax
Total
My problem is I'm not sure whether I should represent online purchases and in-store purcahses in separate tables or not. If I were to store them in the same table, all non-nullable fields would be the only ones applicable for in-store purchases.
Another design pattern that crossed my mind is something like this:
Online order table:
PurchaseId - PK, FK
AccountId - FK
ShippingAddressId - FK
BillingAddressId - FK
ShippingMethod
Type
Purchase table:
Id - PK
Status
Date
SubTotal
Tax
Total
And for in-store purchases, there would simply be no reference from the online orders table.
Thoughts?
I would make a second table for location, with a primary key and location information. That could be online as well. Then use a foriegn key in your main table. You would then just fill the fields require for the application you are doing(in store, or online). This would also allow For the business to grow to more locations just by simply adding it into the location table.
I'm going with the original design. Likely more maintainable and efficient as well.
Your second design is very close to an Entity Sub-typing pattern. If the primary key of your online order table was the foreign key to your purchase table then you would have entity sub-typing.
Your original design is a practical design for the physical implementation of your database because it is simple to use. Entity sub-typing would be the preferred design at the logical level because it clearly represents your rules about which predicates (columns) belong to which logical tables.
Some people would also use the entity sub-typing pattern for their physical model too because they have an aversion to nulls.

How to add column in table 1 to be type of table 2?

I define table 1.
Now i want to add new table - and one of the column of the second table need to be the first table -
How can i do it ?
If you want one column of a table to reference another table, then your best bet is probably to go read up on the concept of keys, primary keys, and foreign keys in database design.
For example, in a database of companies and employees, you might have 2 tables like this:
Company (c_id, name, city)
Employee (e_id, c_id, name)
In the Company table, c_id would be a primary key. In the Employee table, c_id would be a foreign key referencing Company. This would allow you to do queries like
SELECT E.name
FROM Employee as E, Company as C
WHERE E.c_id = C.c_id AND C.name = "IBM"
which would return the names of employees who work at IBM.
Links:
http://en.wikipedia.org/wiki/Primary_key
http://en.wikipedia.org/wiki/Foreign_key
Why cant you go for a foreign relationship.
for eg : Table1 (ID,ForeignKeyId, other columns)
Table2 (ID,other columns)
ForigenKeyId will be the primary key of Table2
If you really need table as a column, you should read http://msdn.microsoft.com/en-us/library/ms175010.aspx for a solution. However, this is highly unlikely that you really need table column datatype, as it is primarily used for temporary storage.
If you don't know primary-foreign keys relationships stuff, you should take some time learning relational databases or have someone design a database schema for you based on business entities and your application needs. Otherwise you will end up with a design which is completely unmaintainable and mid term it will backfire on you.
If you need a quick reading on PK/FK topic, please read http://www.functionx.com/sqlserver2005/Lesson13.htm. It should give you a knowledge required to tackle with this particular issue.

Designing a conditional database relationship in SQL Server

I have three basic types of entities: People, Businesses, and Assets. Each Asset can be owned by one and only one Person or Business. Each Person and Business can own from 0 to many Assets. What would be the best practice for storing this type of conditional relationship in Microsoft SQL Server?
My initial plan is to have two nullable foreign keys in the Assets table, one for People and one for Businesses. One of these values will be null, while the other will point to the owner. The problem I see with this setup is that it requires application logic in order to be interpreted and enforced. Is this really the best possible solution or are there other options?
Introducing SuperTypes and SubTypes
I suggest that you use supertypes and subtypes. First, create PartyType and Party tables:
CREATE TABLE dbo.PartyType (
PartyTypeID int NOT NULL identity(1,1) CONSTRAINT PK_PartyType PRIMARY KEY CLUSTERED
Name varchar(32) CONSTRAINT UQ_PartyType_Name UNIQUE
);
INSERT dbo.PartyType VALUES ('Person'), ('Business');
SuperType
CREATE TABLE dbo.Party (
PartyID int identity(1,1) NOT NULL CONSTRAINT PK_Party PRIMARY KEY CLUSTERED,
FullName varchar(64) NOT NULL,
BeginDate smalldatetime, -- DOB for people or creation date for business
PartyTypeID int NOT NULL
CONSTRAINT FK_Party_PartyTypeID FOREIGN KEY REFERENCES dbo.PartyType (PartyTypeID)
);
SubTypes
Then, if there are columns that are unique to a Person, create a Person table with just those:
CREATE TABLE dbo.Person (
PersonPartyID int NOT NULL
CONSTRAINT PK_Person PRIMARY KEY CLUSTERED
CONSTRAINT FK_Person_PersonPartyID FOREIGN KEY REFERENCES dbo.Party (PartyID)
ON DELETE CASCADE,
-- add columns unique to people
);
And if there are columns that are unique to Businesses, create a Business table with just those:
CREATE TABLE dbo.Business (
BusinessPartyID int NOT NULL
CONSTRAINT PK_Business PRIMARY KEY CLUSTERED
CONSTRAINT FK_Business_BusinessPartyID FOREIGN KEY REFERENCES dbo.Party (PartyID)
ON DELETE CASCADE,
-- add columns unique to businesses
);
Usage and Notes
Finally, your Asset table will look something like this:
CREATE TABLE dbo.Asset (
AssetID int NOT NULL identity(1,1) CONSTRAINT PK_Asset PRIMARY KEY CLUSTERED,
PartyID int NOT NULL
CONSTRAINT FK_Asset_PartyID FOREIGN KEY REFERENCES dbo.Party (PartyID),
AssetTag varchar(64) CONSTRAINT UQ_Asset_AssetTag UNIQUE
);
The relationship the supertype Party table shares with the subtype tables Business and Person is "one to zero-or-one". Now, while the subtypes generally have no corresponding row in the other table, there is the possibility in this design of having a Party that ends up in both tables. However, you may actually like this: sometimes a person and a business are nearly interchangeable. If not useful, while a trigger to enforce this will be fairly easily done, the best solution is probably to add the PartyTypeID column to the subtype tables, making it part of the PK & FK, and put a CHECK constraint on the PartyTypeID.
The beauty of this model is that when you want to create a column that has a constraint to a business or a person, then you make the constraint to the appropriate table instead of the party table.
Also, if desired, turning on cascade delete on the constraints can be useful, as well as an INSTEAD OF DELETE trigger on the subtype tables that instead delete the corresponding IDs from the supertype table (this guarantees no supertype rows that have no subtype rows present). These queries are very simple and work at the entire-row-exists-or-doesn't-exist level, which in my opinion is a gigantic improvement over any design that requires checking column value consistency.
Also, please notice that in many cases columns that you would think should go in one of the subtype tables really can be combined in the supertype table, such as social security number. Call it TIN (taxpayer identification number) and it works for both businesses and people.
ID Column Naming
The question of whether or not to call the column in the Person table PartyID, PersonID, or PersonPartyID is your own preference, but I think it's best to call these PersonPartyID or BusinessPartyID—tolerating the cost of the longer name, this avoids two types of confusion. E.g., someone unfamiliar with the database sees BusinessID and doesn't know this is a PartyID, or sees PartyID and doesn't know it is restricted by foreign key to just those in the Business table.
If you want to create views for the Party and Business tables, they can even be materialized views since it's a simple inner join, and there you could rename the PersonPartyID column to PersonID if you were truly so inclined (though I wouldn't). If it's of great value to you, you can even make INSTEAD OF INSERT and INSTEAD OF UPDATE triggers on these views to handle the inserts to the two tables for you, making the views appear completely like their own tables to many application programs.
Making Your Proposed Design Work As-Is
Also, I hate to mention it, but if you want to have a constraint in your proposed design that enforces only one column being filled in, here is code for that:
ALTER TABLE dbo.Assets
ADD CONSTRAINT CK_Asset_PersonOrBusiness CHECK (
CASE WHEN PersonID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN BusinessID IS NULL THEN 0 ELSE 1 END = 1
);
However, I don't recommend this solution.
Final Thoughts
A natural third subtype to add is organization, in the sense of something that people and businesses can have membership in. Supertype and subtype also elegantly solve customer/employee, customer/vendor, and other problems similar to the one you presented.
Be careful not to confuse "Is-A" with "Acts-As-A". You can tell a party is a customer by looking in your order table or viewing the order count, and may not need a Customer table at all. Also don't confuse identity with life cycle: a rental car may eventually be sold, but this is a progression in life cycle and should be handled with column data, not table presence--the car doesn't start out as a RentalCar and get turned into a ForSaleCar later, it's a Car the whole time. Or perhaps a RentalItem, maybe the business will rent other things too. You get the idea.
It may not even be necessary to have a PartyType table. The party type can be determined by the presence of a row in the corresponding subtype table. This would also avoid the potential problem of the PartyTypeID not matching the subtype table presence. One possible implementation is to keep the PartyType table, but remove PartyTypeID from the Party table, then in a view on the Party table return the correct PartyTypeID based on which subtype table has the corresponding row. This won't work if you choose to allow parties to be both subtypes. Then you would just stick with the subtype views and know that the same value of BusinessID and PersonID refer to the same party.
Further Reading On This Pattern
Please see A Universal Person and Organization Data Model for a more complete and theoretical treatment.
I recently found the following articles to be useful for describing some alternate approaches for modeling inheritance in a database. Though specific to Microsoft's Entity Framework ORM tool, there's no reason you couldn't implement these yourself in any DB development:
Table Per Hierarchy
Table Per Type (this is what I advocate above as the only fully normalized method of implementing inheritance in a database)
Table Per Concrete Class
Or a more brief overview of these three ways: How to choose an inheritance strategy
P.S. I have switched, more than once, my opinion on column naming of IDs in subtype tables, due to having more experience under my belt.
You don't need application logic to enforce this. The easiest way is with a check constraint:
(PeopleID is null and BusinessID is not null) or (PeopleID is not null and BusinessID is null)
You can have another entity from which Person and Business "extend". We call this entity Party in our current project. Both Person and Business have a FK to Party (is-a relationship). And Asset may have also a FK to Party (belongs to relationship).
With that said, if in the future an Asset can be shared by multiple instances, is better to create m:n relationships, it gives flexibility but complicates the application logic and the queries a bit more.
ErikE's answer gives a good explanation on how to go about the supertype / subtype relationship in tables and is likely what I'd go for in your situation, however, it doesn't really address the question(s) you've posed which are also interesting, namely:
What would be the best practice for storing this type of conditional relationship in Microsoft SQL Server?
...are there other options?
For those I recommend this blog entry on TechTarget which has an excerpt from excerpt from "A Developer's Guide to Data Modeling for SQL Server, Covering SQL Server 2005 and 2008" by Eric Johnson and Joshua Jones which addresses 3 possible options.
In summary they are:
Supertype Table - Almost matches what you've proposed, have a table with some fields that will always be null when others are filled. Good when only a couple of fields aren't shared. So depending on how different Business and People are you could possibly combine them into one table, Owners perhaps, and then just have OwnerID in your Asset table.
Subtype Tables - Basically the opposite of what Supertype tables are and is what you have just now. Here we have lots of unique fields and maybe one or two the same so we just have the repeated fields appear in each table. As you are finding this isn't really suitable for your situation.
Supertype and Subtype Tables - A combination of both of the above where the matching fields are placed in a single table and the unique ones in separate tables and matching IDs are used to join the record from one table to the other. This matches ErikE's proposed solution and, as mentioned, is the one I would favour as well.
Sadly it doesn't go on to explain which, if any, are best practice but it is certainly a good read to get an idea of the options that are out there.
YOu can enforce the logic with a trigger instead. Then no matter how the record is changed, only one of the fileds will be filled in.
You could also have a PeopleAsset table and a BusinessAsset table, but stillwould have the problem of enforcing that only one of them has a record.
An asset would have a foreign key to the owning person, and you should setup an association table to link assets and businesses. As said in other comments, you can use triggers and/or constraints to ensure that the data stays in a consistent state. ie. when you delete a business, delete the lines in your association table.
Table People, Businesses both can use UUID as primary key, and union both to a view for sql join purpose.
so you can simply use one foreign key column in Assets relation to both People and Businesses, because UUID is nearly unique. And you can simply query like:
select * from Assets
join view_People_Businesses as v on v.id = Assets.fk

Resources