Let's consider the tables:
CREATE TABLE [dbo].[User] (
[Id] INT PRIMARY KEY,
...
);
CREATE TABLE [dbo].[UserInfo_1] (
[Id] INT PRIMARY KEY,
[UserId] INT,
...,
CONSTRAINT FK_UserId FOREIGN KEY ([UserId]) REFERENCES [dbo].[User] (Id)
);
CREATE TABLE [dbo].[UserInfo_2] (
[UserId] INT PRIMARY KEY,
...,
CONSTRAINT FK_UserId FOREIGN KEY ([UserId]) REFERENCES [dbo].[User] (Id)
);
What are the procs and cons of using FOREIGN KEY for UserInfo_1 and UserInfo_2 tables? Also in terms of ORM.
I don't think there would be a con for using Foreign Keys on any form of tables. In fact, I make sure to have a primary key on all tables I use, especially on temps and variable tables since I know I will be joining and filtering with them.
Now your first table User and UserInfo_1 is a one to many relationship. Meaning a single User can have many different UserInfo_1 associated with it.
the second one of User and UserInfo_2 is a one to one relationship. In which a single User can only ever have one UserInfo_2 associated with it.
In terms of performance, since they are Indexed they would perform relatively the same, depending upon your filtering and what plan was cached in the query plan. though you may not entirely run into issues with cached query plans as EF utilizes ad-hock statements, though EF does run up the cached plan memory and that is typically recommended to be disabled when using EF.
One-to-One
I am a fan of one to one relationship, especially in a Domain Driven Design aspect, and when implemented correctly. If each of your rows for User is going to require information from UserInfo_2, then I would theoretically keep them on the User table. Now if you know you will not be querying that information much or not all Users will require the columns on that table, or if your main table is fairly large I would keep it as a one to one relationship.
I personally like to use System Versioning. I have tables which contain certain columns which typically update and columns which almost never update. Those that I know update on a daily/weekly bases I have them congregated on a one to one relationship to the main table that should almost never update. But each business needs and scenario is different. Not any one design fits all situations.
Benefits of Indexing
When you create a Foreign Key, you are creating an Index in the database. This will allow for you to perform faster queries. The SQL optimizer will utilize the Index to better find what you are looking for. Without the index, your query plan will turn into a table scan, which is a row by row search. When doing a row by row search, you can seriously slow your system down as your table grows.
Should you choose to create your tables without an Index, or in this case a Foreign Key index, you might find issues with the ORM aspect. When you query your database from EF, you will call your DbSet. If you had proper Foreign Key connections with your two tables, EF can utilize the .Include to join the two tables searching for what you need. Otherwise you would be forced to utilize two queries into the database for both tables.
In a project I worked on one time, a developer did that. He did not properly attach a Foreign Key connection between two objects and then didn't understand why EF would not properly return his values when he used the .Include and wasn't very fast. He thought it was EF's fault and had to do two queries to obtain the information he needed.
Well User > UserInfo_1 is a one-to-many relationship, as UserInfo_1.UserID is not a key. And EF 6 doesn't support alternate keys. EF Core does, so you could make it a key.
But the simplest design is always to collapse 1-1 relationships into a single table. In EF Core you can still have a main Entity Type and one or more separate Owned Entity Types. But on the database it's typically better to have them in a single table.
The second-simplest is to have have both tables have the same key columns.
Related
I have been tasked at looking at an MSSQL db to see if I can enhance performance anywhere. Most of the DB has clustered-indexes using the standard ID Identity primary key field.
In the area I am concerned about I have found where two tables are not linked via a standard primary key -> foreign key relationship. Instead there is an nvarchar(50) field that corresponds in both tables as the contents are the same. So an inner join can be done without a concrete relationship. Here a non-clustered index has been created.
I have queried why we are not doing it the same way as everywhere else in the DB, and explained that I thought this could be where things are being significantly slowed down. I have been told that it won't be an issue because there is a non-clustered index on the field in question. So I should look elsewhere.
So I thought I would ask what people think. Will a non-clustered index, setup on an nvarchar(50) column that is used to join two tables (the two tables in question have non concrete relationship between each other) cause any performance issues, especially when comparing to a primary key, system generated clustered-index.
Would be grateful for any advise here.
I've read a lot of tips and tutorials about normalization but I still find it hard to understand how and when we need normalization. So right now I need to know if this database design for an electricity monitoring system needs to be normalized or not.
So far I have one table with fields:
monitor_id
appliance_name
brand
ampere
uptime
power_kWh
price_kWh
status (ON/OFF)
This monitoring system monitors multiple appliances (TV, Fridge, washing machine) separately.
So does it need to be normalized further? If so, how?
Honestly, you can get away without normalizing every database. Normalization is good if the database is going to be a project that affects many people or if there are performance issues and the database does OLTP. Database normalization in many ways boils down to having larger numbers of tables themselves with fewer columns. Denormalization involves having fewer tables with larger numbers of columns.
I've never seen a real database with only one table, but that's ok. Some people denormalize their database for reporting purposes. So it isn't always necessary to normalize a database.
How do you normalize it? You need to have a primary key (on a column that is unique or a combination of two or more columns that are unique in their combined form). You would need to create another table and have a foreign key relationship. A foreign key relationship is a pair of columns that exist in two or more tables. These columns need to share the same data type. These act as a map from one table to another. The tables are usually separated by real-world purpose.
For example, you could have a table with status, uptime and monitor_id. This would have a foreign key relationship to the monitor_id between the two tables. Your original table could then drop the uptime and status columns. You could have a third table with Brands, Models and the things that all models have in common (e.g., power_kWh, ampere, etc.). There could be a foreign key relationship to the first table based on model. Then the brand column could be eliminated (via the DDL command DROP) from the first table as this third table will have it relating from the model name.
To create new tables, you'll need to invoke a DDL command CREATE TABLE newTable with a foreign key on the column that will in effect be shared by the new table and the original table. With foreign key constraints, the new tables will share a column. The tables will have less information in them (fewer columns) when they are highly normalized. But there will be more tables to accommodate and store all the data. This way you can update one table and not put a lock on all the other columns in a denormalized database with one big table.
Once new tables have the data in the column or columns from the original table, you can drop those columns from the original table (except for the foreign key column). To drop columns, you need to invoke DDL commands (ALTER TABLE originalTable, drop brand).
In many ways, performance will be improved if you try to do many reads and writes (commit many transactions) on a database table in a normalized database. If you use the table as a report, and want to present all the data as it is in the table normally, normalized the database will hurt the peformance.
By the way, normalizing the database can prevent redundant data. This can make the database consume less storage space and use less memory.
It is nice to have our database normalize.It helps us to have a efficient data because we can prevent redundancy here and also saves memory usages. On normalizing tables we need to have a primary key in each table and use this to connect to another table and when the primary key (unique in each table) is on another table it is called the foreign key (use to connect to another table).
Sample you already have this table :
Table name : appliances_tbl
-inside here you have
-appliance_id : as the primary key
-appliance_name
-brand
-model
and so on about this appliances...
Next you have another table :
Table name : appliance_info_tbl (anything for a table name and must be related to its fields)
-appliance_info_id : primary key
-appliance_price
-appliance_uptime
-appliance_description
-appliance_id : foreign key (so you can get the name of the appliance by using only its id)
and so on....
You can add more table like that but just make sure that you have a primary key in each table. You can also put the cardinality to make your normalizing more understandable.
For example, we have table A, and table B which have a many-to-many relationship. An intersection table, Table C stores A.id and B.id along with a value that represents a relationship between the two. Or as a concrete example, imagine stackexchange which has a user account, a forum, and a karma score. Or, a student, a course, and a grade. If table A and B are very large, table C can and probably will grow monstrously large very quickly(in fact lets just assume it does). How do we go about dealing with such an issue? Is there a better way to design the tables to avoid this?
There is no magic. If some rows are connected and some aren't, this information has to be represented somehow, and the "relational" way of doing it is a "junction" (aka "link") table. Yes, a junction table can grow large, but fortunately databases are very capable of handling huge amounts of data.
There are good reasons for using junction table versus comma-separated list (or similar), including:
Efficient querying (through indexing and clustering).
Enforcement of referential integrity.
When designing a junction table, ask the following questions:
Do I need to query in only one direction or both?1
If one direction, just create a composite PRIMARY KEY on both foreign keys (let's call them PARENT_ID and CHILD_ID). Order matters: if you query from parent to children, PK should be: {PARENT_ID, CHILD_ID}.
If both directions, also create a composite index in the opposite order, which is {CHILD_ID, PARENT_ID} in this case.
Is the "extra" data small?
If yes, cluster the table and cover the extra data in the secondary index as necessary.2
I no, don't cluster the table and don't cover the extra data in the secondary index.3
Are there any additional tables for which the junction table acts as a parent?
If yes, consider whether adding a surrogate key might be worthwhile to keep child FKs slim. But beware that if you add a surrogate key, this will probably eliminate the opportunity for clustering.
In many cases, answers to these questions will be: both, yes and no, in which case your table will look similar to this (Oracle syntax below):
CREATE TABLE JUNCTION_TABLE (
PARENT_ID INT,
CHILD_ID INT,
EXTRA_DATA VARCHAR2(50),
PRIMARY KEY (PARENT_ID, CHILD_ID),
FOREIGN KEY (PARENT_ID) REFERENCES PARENT_TABLE (PARENT_ID),
FOREIGN KEY (CHILD_ID) REFERENCES CHILD_TABLE (CHILD_ID)
) ORGANIZATION INDEX COMPRESS;
CREATE UNIQUE INDEX JUNCTION_TABLE_IE1 ON
JUNCTION_TABLE (CHILD_ID, PARENT_ID, EXTRA_DATA) COMPRESS;
Considerations:
ORGANIZATION INDEX: Oracle-specific syntax for what most DBMSes call clustering. Other DBMSes have their own syntax and some (MySQL/InnoDB) imply clustering and user cannot turn it off.
COMPRESS: Some DBMSes support leading-edge index compression. Since clustered table is essentially an index, compression can be applied to it as well.
JUNCTION_TABLE_IE1, EXTRA_DATA: Since extra data is covered by the secondary index, DBMS can get it without touching the table when querying in the direction from child to parents. Primary key acts as a clustering key so the extra data is naturally covered when querying from a parent to the children.
Physically, you have just two B-Trees (one is the clustered table and the other is the secondary index) and no table heap at all. This translates to good querying performance (both parent-to-child and child-to-parent directions can be satisfied by a simple index range scan) and fairly small overhead when inserting/deleting rows.
Here is the equivalent MS SQL Server syntax (sans index compression):
CREATE TABLE JUNCTION_TABLE (
PARENT_ID INT,
CHILD_ID INT,
EXTRA_DATA VARCHAR(50),
PRIMARY KEY (PARENT_ID, CHILD_ID),
FOREIGN KEY (PARENT_ID) REFERENCES PARENT_TABLE (PARENT_ID),
FOREIGN KEY (CHILD_ID) REFERENCES CHILD_TABLE (CHILD_ID)
);
CREATE UNIQUE INDEX JUNCTION_TABLE_IE1 ON
JUNCTION_TABLE (CHILD_ID, PARENT_ID) INCLUDE (EXTRA_DATA);
Note that MS SQL Server automatically clusters tables, unless PRIMARY KEY NONCLUSTERED is specified.
1 In other words, do you only need to get "children" of given "parent", or you might also need to get parents of given child.
2 Covering allows the query to be satisfied from the index alone, and avoids expensive double-lookup that would otherwise be necessary when accessing data through a secondary index in the clustered table.
3 This way, the extra data is not repeated (which would be expensive, since it's big), yet you avoid the double-lookup and replace it with (cheaper) table heap access. But, beware of clustering factor that can destroy the performance of range scans in heap-based tables!
I designed a database to use GUIDs for the UserID but I added UserId as a foreign key to many other tables, and as I plan to have a very large number of database inputs I must design this well.
I also use ASP Membership tables (not profiles only membership, users and roles).
So currently I use a GUID as PK and as FK in every other table, this is maybe bad design?
What I think is maybe better and is the source of my question, should I add in Users table UserId(int) as a primary key and use this field as a foreign key to other tables and user GUID UserId only to reference aspnet_membership?
aspnet_membership{UserID(uniqueIdentifier)}
Users{
UserID_FK(uniqueIdentifier) // FK to aspnet_membership table
UserID(int) // primary key in this table --> Should I add this
...
}
In other tables user can create items in tables and I always must add UserId for example:
TableA{TableA_PK(int)..., CreatedBy(uniqueIdentifier)}
TableB{TableB_PK(int)..., CreatedBy(uniqueIdentifier)}
TableC{TableC_PK(int)..., CreatedBy(uniqueIdentifier)}
...
Ultimately the answer is that it really depends.
Microsoft have documented the performance differences of each here. While the article differs slightly to your situation, as you have to use a UNIQUEIDENTIFIER to link back to asp membership, many of the discussion points still apply.
If you have to create your own users table anyway it would make more sense to have your own int primary key, and use the GUID as a foreign key. It keeps separate entities separate. What if at some point in the future you wanted to add a different membership to a user? You would then need to update a primary key, which would have to cascade to any tables referencing this and could be quite a performance hit. If it is just a unique column in your users table it is a simple update.
All binary datatypes (uniqueidetifier is binary(16)) are fine as foreign keys. But GUID might cause another small problem as primary key. By default primary key is clustered, but GUID is generated randomly not sequentially as IDENTITY, so it will frequently split IO pages, that cause performance degradation a little.
I'd go ahead with your current design.
If you're worried because of inner joins performance in your queries because of comparing strings instead of ints, nowadays there is no difference at all.
I have three basic types of entities: People, Businesses, and Assets. Each Asset can be owned by one and only one Person or Business. Each Person and Business can own from 0 to many Assets. What would be the best practice for storing this type of conditional relationship in Microsoft SQL Server?
My initial plan is to have two nullable foreign keys in the Assets table, one for People and one for Businesses. One of these values will be null, while the other will point to the owner. The problem I see with this setup is that it requires application logic in order to be interpreted and enforced. Is this really the best possible solution or are there other options?
Introducing SuperTypes and SubTypes
I suggest that you use supertypes and subtypes. First, create PartyType and Party tables:
CREATE TABLE dbo.PartyType (
PartyTypeID int NOT NULL identity(1,1) CONSTRAINT PK_PartyType PRIMARY KEY CLUSTERED
Name varchar(32) CONSTRAINT UQ_PartyType_Name UNIQUE
);
INSERT dbo.PartyType VALUES ('Person'), ('Business');
SuperType
CREATE TABLE dbo.Party (
PartyID int identity(1,1) NOT NULL CONSTRAINT PK_Party PRIMARY KEY CLUSTERED,
FullName varchar(64) NOT NULL,
BeginDate smalldatetime, -- DOB for people or creation date for business
PartyTypeID int NOT NULL
CONSTRAINT FK_Party_PartyTypeID FOREIGN KEY REFERENCES dbo.PartyType (PartyTypeID)
);
SubTypes
Then, if there are columns that are unique to a Person, create a Person table with just those:
CREATE TABLE dbo.Person (
PersonPartyID int NOT NULL
CONSTRAINT PK_Person PRIMARY KEY CLUSTERED
CONSTRAINT FK_Person_PersonPartyID FOREIGN KEY REFERENCES dbo.Party (PartyID)
ON DELETE CASCADE,
-- add columns unique to people
);
And if there are columns that are unique to Businesses, create a Business table with just those:
CREATE TABLE dbo.Business (
BusinessPartyID int NOT NULL
CONSTRAINT PK_Business PRIMARY KEY CLUSTERED
CONSTRAINT FK_Business_BusinessPartyID FOREIGN KEY REFERENCES dbo.Party (PartyID)
ON DELETE CASCADE,
-- add columns unique to businesses
);
Usage and Notes
Finally, your Asset table will look something like this:
CREATE TABLE dbo.Asset (
AssetID int NOT NULL identity(1,1) CONSTRAINT PK_Asset PRIMARY KEY CLUSTERED,
PartyID int NOT NULL
CONSTRAINT FK_Asset_PartyID FOREIGN KEY REFERENCES dbo.Party (PartyID),
AssetTag varchar(64) CONSTRAINT UQ_Asset_AssetTag UNIQUE
);
The relationship the supertype Party table shares with the subtype tables Business and Person is "one to zero-or-one". Now, while the subtypes generally have no corresponding row in the other table, there is the possibility in this design of having a Party that ends up in both tables. However, you may actually like this: sometimes a person and a business are nearly interchangeable. If not useful, while a trigger to enforce this will be fairly easily done, the best solution is probably to add the PartyTypeID column to the subtype tables, making it part of the PK & FK, and put a CHECK constraint on the PartyTypeID.
The beauty of this model is that when you want to create a column that has a constraint to a business or a person, then you make the constraint to the appropriate table instead of the party table.
Also, if desired, turning on cascade delete on the constraints can be useful, as well as an INSTEAD OF DELETE trigger on the subtype tables that instead delete the corresponding IDs from the supertype table (this guarantees no supertype rows that have no subtype rows present). These queries are very simple and work at the entire-row-exists-or-doesn't-exist level, which in my opinion is a gigantic improvement over any design that requires checking column value consistency.
Also, please notice that in many cases columns that you would think should go in one of the subtype tables really can be combined in the supertype table, such as social security number. Call it TIN (taxpayer identification number) and it works for both businesses and people.
ID Column Naming
The question of whether or not to call the column in the Person table PartyID, PersonID, or PersonPartyID is your own preference, but I think it's best to call these PersonPartyID or BusinessPartyID—tolerating the cost of the longer name, this avoids two types of confusion. E.g., someone unfamiliar with the database sees BusinessID and doesn't know this is a PartyID, or sees PartyID and doesn't know it is restricted by foreign key to just those in the Business table.
If you want to create views for the Party and Business tables, they can even be materialized views since it's a simple inner join, and there you could rename the PersonPartyID column to PersonID if you were truly so inclined (though I wouldn't). If it's of great value to you, you can even make INSTEAD OF INSERT and INSTEAD OF UPDATE triggers on these views to handle the inserts to the two tables for you, making the views appear completely like their own tables to many application programs.
Making Your Proposed Design Work As-Is
Also, I hate to mention it, but if you want to have a constraint in your proposed design that enforces only one column being filled in, here is code for that:
ALTER TABLE dbo.Assets
ADD CONSTRAINT CK_Asset_PersonOrBusiness CHECK (
CASE WHEN PersonID IS NULL THEN 0 ELSE 1 END
+ CASE WHEN BusinessID IS NULL THEN 0 ELSE 1 END = 1
);
However, I don't recommend this solution.
Final Thoughts
A natural third subtype to add is organization, in the sense of something that people and businesses can have membership in. Supertype and subtype also elegantly solve customer/employee, customer/vendor, and other problems similar to the one you presented.
Be careful not to confuse "Is-A" with "Acts-As-A". You can tell a party is a customer by looking in your order table or viewing the order count, and may not need a Customer table at all. Also don't confuse identity with life cycle: a rental car may eventually be sold, but this is a progression in life cycle and should be handled with column data, not table presence--the car doesn't start out as a RentalCar and get turned into a ForSaleCar later, it's a Car the whole time. Or perhaps a RentalItem, maybe the business will rent other things too. You get the idea.
It may not even be necessary to have a PartyType table. The party type can be determined by the presence of a row in the corresponding subtype table. This would also avoid the potential problem of the PartyTypeID not matching the subtype table presence. One possible implementation is to keep the PartyType table, but remove PartyTypeID from the Party table, then in a view on the Party table return the correct PartyTypeID based on which subtype table has the corresponding row. This won't work if you choose to allow parties to be both subtypes. Then you would just stick with the subtype views and know that the same value of BusinessID and PersonID refer to the same party.
Further Reading On This Pattern
Please see A Universal Person and Organization Data Model for a more complete and theoretical treatment.
I recently found the following articles to be useful for describing some alternate approaches for modeling inheritance in a database. Though specific to Microsoft's Entity Framework ORM tool, there's no reason you couldn't implement these yourself in any DB development:
Table Per Hierarchy
Table Per Type (this is what I advocate above as the only fully normalized method of implementing inheritance in a database)
Table Per Concrete Class
Or a more brief overview of these three ways: How to choose an inheritance strategy
P.S. I have switched, more than once, my opinion on column naming of IDs in subtype tables, due to having more experience under my belt.
You don't need application logic to enforce this. The easiest way is with a check constraint:
(PeopleID is null and BusinessID is not null) or (PeopleID is not null and BusinessID is null)
You can have another entity from which Person and Business "extend". We call this entity Party in our current project. Both Person and Business have a FK to Party (is-a relationship). And Asset may have also a FK to Party (belongs to relationship).
With that said, if in the future an Asset can be shared by multiple instances, is better to create m:n relationships, it gives flexibility but complicates the application logic and the queries a bit more.
ErikE's answer gives a good explanation on how to go about the supertype / subtype relationship in tables and is likely what I'd go for in your situation, however, it doesn't really address the question(s) you've posed which are also interesting, namely:
What would be the best practice for storing this type of conditional relationship in Microsoft SQL Server?
...are there other options?
For those I recommend this blog entry on TechTarget which has an excerpt from excerpt from "A Developer's Guide to Data Modeling for SQL Server, Covering SQL Server 2005 and 2008" by Eric Johnson and Joshua Jones which addresses 3 possible options.
In summary they are:
Supertype Table - Almost matches what you've proposed, have a table with some fields that will always be null when others are filled. Good when only a couple of fields aren't shared. So depending on how different Business and People are you could possibly combine them into one table, Owners perhaps, and then just have OwnerID in your Asset table.
Subtype Tables - Basically the opposite of what Supertype tables are and is what you have just now. Here we have lots of unique fields and maybe one or two the same so we just have the repeated fields appear in each table. As you are finding this isn't really suitable for your situation.
Supertype and Subtype Tables - A combination of both of the above where the matching fields are placed in a single table and the unique ones in separate tables and matching IDs are used to join the record from one table to the other. This matches ErikE's proposed solution and, as mentioned, is the one I would favour as well.
Sadly it doesn't go on to explain which, if any, are best practice but it is certainly a good read to get an idea of the options that are out there.
YOu can enforce the logic with a trigger instead. Then no matter how the record is changed, only one of the fileds will be filled in.
You could also have a PeopleAsset table and a BusinessAsset table, but stillwould have the problem of enforcing that only one of them has a record.
An asset would have a foreign key to the owning person, and you should setup an association table to link assets and businesses. As said in other comments, you can use triggers and/or constraints to ensure that the data stays in a consistent state. ie. when you delete a business, delete the lines in your association table.
Table People, Businesses both can use UUID as primary key, and union both to a view for sql join purpose.
so you can simply use one foreign key column in Assets relation to both People and Businesses, because UUID is nearly unique. And you can simply query like:
select * from Assets
join view_People_Businesses as v on v.id = Assets.fk