database table design thoughts . . - database

I have a database structure issue I am looking for some opinions on.
Let's say there is a scenario where users will use an application to request materials.
There is the need to track who the requester is.
There are three possible "types" of requesters. An individual (Person), a Department, and the Supplier supplying the materials themselves.
In addition the Supplier object needs to be related as the Supplier as well.
So the idea is in the Request table there is a RequestedByID FK. But the related requester has such a different structure for the data for each that it would require a completely denormalized table to related back to if it were made just a single table (people have different properties than departments, and suppliers).
I have some ideas on how I might handle this but thought the SO community would have some great insight.
Thanks for any and all help.
EDIT:
pseudo structure:
Request
RequestID
RequesterID
Department
DepartmentID
DepField1
DepField2
Person
PersonID
PersonField1
PersonField2
Supplier
SupplierID
SuppFiel1
SuppField2
Department, Person, and Supplier all have separate tables because they differ in their properties quite a bit. But each of them can serve as the Requester of a Request (RequesterID). What is the best way to accomplish this without one (denormalized table) full of the different possible requesters?
Hope this helps. . .

You need what is in ER modeling know as inheritance (aka. category, subtype, generalization hierarchy etc.), something like this:
This way, it's easy to have different fields and FKs per requester kind, while still having only one REQUEST table. Essentially, you can varry the requester without being forced to also vary the request.
There are generally 3 ways to represent inheritance in the physical database. What you have tried is essentially the strategy #1 (merging all classes in single table), but I'd recommend strategy #3 (every class in separate table).

You could have two different IDs: RequesterID and RequesterTypeID. RequesterTypeID would just be 1, 2, or 3 for Person, Department, and Supplier, respectively, and RequesterTypeID paired with RequesterID would together make a multi-attribute primary key.

What Jack Radcliffe suggested is probably the best option. So I'd just add an alternative option:
You might also consider having 3 requests tables... One for ppl requests, one for suppliers requests, and one for departments requests... So you don't need to explicitly store the RequesterTypeID, since you can deduce it from the name of the table... You can then create the table Jack Radcliffe as a view, by "uniting" all the 3 individual tables...
Also, if you implement Jack Radcliffe approach, you can create 3 views to simulate the 3 tables I've mention... So then you can use whichever table/view is best for each situation, and if you want to change from approach A to B it's really easy too...

What I like about Jack Radcliffe's thought is if you store them in a separate table or make the sql statement generic to handle any number passed in by the application, they can be expanded e.g. manufacture, entity, subsidiary, etc
However, you choose the expansion will entail overhead.

Related

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.
As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.
If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.
If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.
Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

When I should use one to one relationship?

Sorry for that noob question but is there any real needs to use one-to-one relationship with tables in your database? You can implement all necessary fields inside one table. Even if data becomes very large you can enumerate column names that you need in SELECT statement instead of using SELECT *. When do you really need this separation?
1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?
My 2 cents.
I work in a place where we all develop in a large application, and everything is a module. For example, we have a users table, and we have a module that adds facebook details for a user, another module that adds twitter details to a user. We could decide to unplug one of those modules and remove all its functionality from our application. In this case, every module adds their own table with 1:1 relationships to the global users table, like this:
create table users ( id int primary key, ...);
create table users_fbdata ( id int primary key, ..., constraint users foreign key ...)
create table users_twdata ( id int primary key, ..., constraint users foreign key ...)
If you place two one-to-one tables in one, its likely you'll have semantics issue. For example, if every device has one remote controller, it doesn't sound quite good to place the device and the remote controller with their bunch of characteristics in one table. You might even have to spend time figuring out if a certain attribute belongs to the device or the remote controller.
There might be cases, when half of your columns will stay empty for a long while, or will not ever be filled in. For example, a car could have one trailer with a bunch of characteristics, or might have none. So you'll have lots of unused attributes.
If your table has 20 attributes, and only 4 of them are used occasionally, it makes sense to break the table into 2 tables for performance issues.
In such cases it isn't good to have everything in one table. Besides, it isn't easy to deal with a table that has 45 columns!
If data in one table is related to, but does not 'belong' to the entity described by the other, then that's a candidate to keep it separate.
This could provide advantages in future, if the separate data needs to be related to some other entity, also.
The most sensible time to use this would be if there were two separate concepts that would only ever relate in this way. For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. I accept that this is contrived example to demonstrate the point.
Another reason is that you want to specialize a concept in different ways. If you have a Person table and want to add the concept of different types of Person, such as Employee, Customer, Shareholder - each one of these would need different sets of data. The data that is similar between them would be on the Person table, the specialist information would be on the specific tables for Customer, Shareholder, Employee.
Some database engines struggle to efficiently add a new column to a very large table (many rows) and I have seen extension-tables used to contain the new column, rather than the new column being added to the original table. This is one of the more suspect uses of additional tables.
You may also decide to divide the data for a single concept between two different tables for performance or readability issues, but this is a reasonably special case if you are starting from scratch - these issues will show themselves later.
First, I think it is a question of modelling and defining what consist a separate entity. Suppose you have customers with one and only one single address. Of course you could implement everything in a single table customer, but if, in the future you allow him to have 2 or more addresses, then you will need to refactor that (not a problem, but take a conscious decision).
I can also think of an interesting case not mentioned in other answers where splitting the table could be useful:
Imagine, again, you have customers with a single address each, but this time it is optional to have an address. Of course you could implement that as a bunch of NULL-able columns such as ZIP,state,street. But suppose that given that you do have an address the state is not optional, but the ZIP is. How to model that in a single table? You could use a constraint on the customer table, but it is much easier to divide in another table and make the foreign_key NULLable. That way your model is much more explicit in saying that the entity address is optional, and that ZIP is an optional attribute of that entity.
not very often.
you may find some benefit if you need to implement some security - so some users can see some of the columns (table1) but not others (table2)..
of course some databases (Oracle) allow you to do this kind of security in the same table, but some others may not.
You are referring to database normalization. One example that I can think of in an application that I maintain is Items. The application allows the user to sell many different types of items (i.e. InventoryItems, NonInventoryItems, ServiceItems, etc...). While I could store all of the fields required by every item in one Items table, it is much easier to maintain to have a base Item table that contains fields common to all items and then separate tables for each item type (i.e. Inventory, NonInventory, etc..) which contain fields specific to only that item type. Then, the item table would have a foreign key to the specific item type that it represents. The relationship between the specific item tables and the base item table would be one-to-one.
Below, is an article on normalization.
http://support.microsoft.com/kb/283878
As with all design questions the answer is "it depends."
There are few considerations:
how large will the table get (both in terms of fields and rows)? It can be inconvenient to house your users' name, password with other less commonly used data both from a maintenance and programming perspective
fields in the combined table which have constraints could become cumbersome to manage over time. for example, if a trigger needs to fire for a specific field, that's going to happen for every update to the table regardless of whether that field was affected.
how certain are you that the relationship will be 1:1? As This question points out, things get can complicated quickly.
Another use case can be the following: you might import data from some source and update it daily, e.g. information about books. Then, you add data yourself about some books. Then it makes sense to put the imported data in another table than your own data.
I normally encounter two general kinds of 1:1 relationship in practice:
IS-A relationships, also known as supertype/subtype relationships. This is when one kind of entity is actually a type of another entity (EntityA IS A EntityB). Examples:
Person entity, with separate entities for Accountant, Engineer, Salesperson, within the same company.
Item entity, with separate entities for Widget, RawMaterial, FinishedGood, etc.
Car entity, with separate entities for Truck, Sedan, etc.
In all these situations, the supertype entity (e.g. Person, Item or Car) would have the attributes common to all subtypes, and the subtype entities would have attributes unique to each subtype. The primary key of the subtype would be the same as that of the supertype.
"Boss" relationships. This is when a person is the unique boss or manager or supervisor of an organizational unit (department, company, etc.). When there is only one boss allowed for an organizational unit, then there is a 1:1 relationship between the person entity that represents the boss and the organizational unit entity.
The main time to use a one-to-one relationship is when inheritance is involved.
Below, a person can be a staff and/or a customer. The staff and customer inherit the person attributes. The advantage being if a person is a staff AND a customer their details are stored only once, in the generic person table. The child tables have details specific to staff and customers.
In my time of programming i encountered this only in one situation. Which is when there is a 1-to-many and an 1-to-1 relationship between the same 2 entities ("Entity A" and "Entity B").
When "Entity A" has multiple "Entity B" and "Entity B" has only 1 "Entity A"
and
"Entity A" has only 1 current "Entity B" and "Entity B" has only 1 "Entity A".
For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. - I borrowed this example from #Steve Fenton's answer
Where a Driver can drive multiple Cars, just not at the same time. So the Car and Driver entities are 1-to-many or many-to-many. But if we need to know who the current driver is, then we also need the 1-to-1 relation.
Another use case might be if the maximum number of columns in the database table is exceeded. Then you could join another table using OneToOne

Use one table or multiple tables for multiple client software system?

This question may answer itself, but it is also a question of best practices.
I am designing an application that allows users (comapnies) to create an account. Those users are placed in a table "Shop_table". Now each shop has dynamic data, however the tables would be the same for each shop, like shop_employees, shop_info, shop_data.
Would it be more effective to have a specific table for each shop or would I just link their data by the shop id.
For example:
shop: Dunkins with id:1
shop: Starbucks with id:2
would dunkins have its own dunkins_shop_employees, dunkins_shop_info, dunkins_shop_data tables
and Starbucks have its own starbucks_shop_employees , starbucks_shop_info , starbucks_shop_data
or would i have one table shope_employees, shop_info, shop_data and link by id 1 or 2, etc..
Definitely one table for each entity with a field to identify the company.
If all the companies have the same information there is no need to create tables for each, and if you did your queries will become a nightmare.
Do you really want a load of UNION queries in order to get any aggregate data across companies? You will also have to modify all queries in your DB as soon as another company (and therefore multiple tables) are added.
Define your tables independently, model the entities you want to store and dont think about who they belong to.
You should have only one table ( for each shop_info etc.. )
Creating similar tables is a maintenance nightmare. You will need to create similar foreign keys, similar constraints, similar indexes, etc.
If your concern is privacy, this should be controlled in your application. You application should always add a "WHERE" clause based on who is logged in/ querying.
If you absolutely need to - you can create views which where clause as shop_id. You can give rights to various people on the view only. This would only make sense if you had a big customer who wanted some SQL level query ability.

Many tables to a single row in relational database

Consider we have a database that has a table, which is a record of a sale. You sell both products and services, so you also have a product and service table.
Each sale can either be a product or a service, which leaves the options for designing the database to be something like the following:
Add columns for each type, ie. add Service_id and Product_id to Invoice_Row, both columns of which are nullable. If they're both null, it's an ad-hoc charge not relating to anything, but if one of them is satisfied then it is a row relating to that type.
Add a weird string/id based system, for instance: Type_table, Type_id. This would be a string/varchar and integer respectively, the former would contain for example 'Service', and the latter the id within the Service table. This is obviously loose coupling and horrible, but is a way of solving it so long as you're only accessing the DB from code, as such.
Abstract out the concept of "something that is chargeable" for with new tables, of which Product and Service now are an abstraction of, and on the Invoice_Row table you would link to something like ChargeableEntity_id. However, the ChargeableEntity table here would essentially be redundant as it too would need some way to link to an abstract "backend" table, which brings us all the way back around to the same problem.
Which way would you choose, or what are the other alternatives to solving this problem?
What you are essentially asking is how to achieve polymorphism in a relational database. There are many approaches (as you yourself demonstrate) to this problem. One solution is to use "table per class" inheritance. In this setup, there will be a parent table (akin to your "chargeable item") that contains a unique identifier and the fields that are common to both products and services. There will be two child tables, products and goods: Each will contain the unique identifier for that entity and the fields specific to it.
One benefit to this approach over others is you don't end up with one table with many nullable columns that essentially becomes a dumping ground to describe anything ("schema-less").
One downside is as your inheritance hierarchy grows, the number of joins needed to grab all the data for an entity also grows.
I believe it depends on use case(s).
You could put the common columns in one table and put product and service specific columns in its own tables.Here the deal is that you need to join stuff.
Else if you maintain two separate tables, one for Product and another for Sale. You use application logic to determine which table to insert into. And getting all sales will essentially mean , union of getting all products and getting all sale.
I would go for approach 2 personally to avoid joins and inserting into two tables whenever a sale is made.

Table "Inheritance" in SQL Server

I am currently in the process of looking at a restructure our contact management database and I wanted to hear peoples opinions on solving the problem of a number of contact types having shared attributes.
Basically we have 6 contact types which include Person, Company and Position # Company.
In the current structure all of these have an address however in the address table you must store their type in order to join to the contact.
This consistent requirement to join on contact type gets frustrating after a while.
Today I stumbled across a post discussing "Table Inheritance" (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server).
Basically you have a parent table and a number of sub tables (in this case each contact type). From there you enforce integrity so that a sub table must have a master equivalent where it's type is defined.
The way I see it, by this method I would no longer need to store the type in tables like address, as the id is unique across all types.
I just wanted to know if anybody had any feelings on this method, whether it is a good way to go, or perhaps alternatives?
I'm using SQL Server 05 & 08 should that make any difference.
Thanks
Ed
I designed a database just like the link you provided suggests. The case was to store the data for many different technical reports. The number of report types is undefined and will probably grow to about 40 different types.
I created one master report table, that has an autoincrement primary key. That table contains all common information like customer, testsite, equipmentid, date etc.
Then I have one table for each report type that contains the spesific information relating to that report type. That table have the same primary key as the master and references the master as well.
My idea for splitting this into different tables with a 1:1 relation (which normally would be a no-no) was to avoid getting one single table with a huge number of columns, that gets very difficult to maintain as your constantly adding columns.
My design with table inheritance gave me segmented data and expandability without beeing difficult to maintain. The only thing I had to do was to write special a special save method to handle writing to two tables automatically. So far I'm very happy with the design and haven't really found any drawbacks, except for a little more complicated save method.
Google on "gen-spec relational modeling". You'll find a lot of articles discussing exactly this pattern. Some of them focus on table design, while others focus on an object oriented approach.
Table inheritance pops up in a few of them.
I know this won't help much now, but initially it may have been better to have an Entity table rather than 6 different contact types. Then each Entity could have as many addresses as necessary and there would be no need for type in the join.
You'll still have the problem that if you want the sub-type fields and you have only the master contact, you'll have to know what table to go looking at - or else join to all of them. But otherwise this is a workable solution to a common problem.
Another possibility (fairly similar in structure, but different in how you think of it) is to simply put all your contacts into one table. Then for the more specific fields (birthday say for people and department for position#company) create separate tables that are associated with that contact.
Contact Table
--------------
Name
Phone Number
Address Table
-------------
Street / state, etc
ContactId
ContactBirthday Table
--------------
Birthday
ContactId
Departments Table
-----------------
Department
ContactId
It requires a different way of thinking of things though - instead of thinking of people vs. companies, you think of the various functional requirements for the task at hand - if you want to send out birthday cards, get all the contacts that have birthdays associated with them, etc..
I'm going to go out on a limb here and suggest you should rethink your normalization strategy (as you seem to be lucky enough to be able to rethink your schema quite fundamentally). If you typically store an address for each contact, then your contact table should have the address fields in it. Alternatively if the address is stored per company then the address should be stored in the company table and your contacts linked to that company.
If your contacts only have one address, or one (or even 3, just not 'many') instance of the other fields, think about rationalizing them into a single table. In my experience having a few null fields is a far better alternative than needing left joins to data you aren't sure exists.
Fortunately for anyone who vehemently disagrees with me you did ask for opinions! :) IMHO you should only normalize when you really need to. Where you are rethinking schemas, denormalization should be considered at every opportunity.
When you have a 7th type, you'll have to create another table.
I'm going to try this approach. Yes, you have to create new tables when you have a new type, but since this table will probably have different columns, you'll end up doing this anyway if you don't use this scheme.
If the tables that inherit the master don't differentiate much from one another, I'd recommend you try another approach.
May I suggest that we just add a Type table. Ie a person has an address, name etc then the student, teacher as each use case presents its self we have a PersonType table that has an entry from the person table to n types and the subsequent new tables teacher, alien, singer as the system eveolves...

Resources