DB design for master file in enterprise software

DB design for master file in enterprise software - database

I want to write an enterprise software and now I'm in the DB design phase. The software will have some master data such as Suppliers, Customers, Inventories, Bankers...
I considering 2 options:
Put each of these on one separate table. The advantage: the table will have all necessary information for that kind of master file (Customer: name, address,.../Inventory: Type, Manufacturer, Condition...). Disadvantage: Not flexible. When I want to have a new type of master data, such as Insurer, I have to design another table.
Put all in one table and this table have foreign key to another table which have type of each kind of master data (table 1: id, data_type, code, name, address....; table 2: data_type, data_type_name). Advantage: flexible - if I want more master data such as Insurer, I just put in table 2: code: 002, name: Insurer, and then put detail each insurer into table 1). Disadvantage: table 1 must have sufficient field to store all kind of information including: customer name, address, account, inventory's manufacturer, inventory's quality...).
So which method do you usually do (or you think work better).
Thank you very much

I would advise creating separate tables for each entity type - it will be a lot easier to maintain in the future when you discover things you want to add for one entity type that don't apply to the others. If all of the entities (Suppliers, Customers, etc) are going to have the same fields and the only difference is their type then you could theoretically use one table. However, I would expect that there would be enough differences between the entity types that it would be worth creating separate tables for each. If there are several fields in common (e.g. address information) you could create a table for the common elements and have a foreign key in the individual tables to the table with the common data (e.g. AddressID).

logically, each "master" entity should be in its own table
if you don't, you'll find joins will become very painful, and your generic lookup table will accumulate all kinds of useless fields

Related

A Master Category Table Where Records Have Various Categories OR There Should Be A Table For Each Category Type

Recently I encountered an application, Where a Master Table is maintained which contain the data of more than 20 categories. For e.g. it has some categories named as Country,State and City.
So my question is, it is better to move out this category as a separate table and fetching out the data through joins or Everything should be inside a single table.
P.S. In future categories count might increase to 50+ or more than it.
P.S. application based on EF6 + Sql Server.
Edited Version
I just want to know that in above scenario what should be the best approach, one should go with single table with proper indexing or go by the DB normalization approach, putting each category into a separate Table and maintaning relationship through fk's.

Normally, categories are put into separate tables. This conforms more closely with normalized database structures and the definition of entities. In particular, it allows for proper foreign key relationships to be defined. That is a big win for data integrity.
Sometimes categories are put into a single table. This can, of course, be confusing; consider, for instance, "Florida, Massachusetts" or "Washington, Iowa" (these are real places).
Putting categories in one table has one major advantage: all the text is in a single location. That can be very handy for internationalization efforts. To be honest, that is the situation where I have seen this used.

When I should use one to one relationship?

Sorry for that noob question but is there any real needs to use one-to-one relationship with tables in your database? You can implement all necessary fields inside one table. Even if data becomes very large you can enumerate column names that you need in SELECT statement instead of using SELECT *. When do you really need this separation?

1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?

My 2 cents.
I work in a place where we all develop in a large application, and everything is a module. For example, we have a users table, and we have a module that adds facebook details for a user, another module that adds twitter details to a user. We could decide to unplug one of those modules and remove all its functionality from our application. In this case, every module adds their own table with 1:1 relationships to the global users table, like this:
create table users ( id int primary key, ...);
create table users_fbdata ( id int primary key, ..., constraint users foreign key ...)
create table users_twdata ( id int primary key, ..., constraint users foreign key ...)

If you place two one-to-one tables in one, its likely you'll have semantics issue. For example, if every device has one remote controller, it doesn't sound quite good to place the device and the remote controller with their bunch of characteristics in one table. You might even have to spend time figuring out if a certain attribute belongs to the device or the remote controller.
There might be cases, when half of your columns will stay empty for a long while, or will not ever be filled in. For example, a car could have one trailer with a bunch of characteristics, or might have none. So you'll have lots of unused attributes.
If your table has 20 attributes, and only 4 of them are used occasionally, it makes sense to break the table into 2 tables for performance issues.
In such cases it isn't good to have everything in one table. Besides, it isn't easy to deal with a table that has 45 columns!

If data in one table is related to, but does not 'belong' to the entity described by the other, then that's a candidate to keep it separate.
This could provide advantages in future, if the separate data needs to be related to some other entity, also.

The most sensible time to use this would be if there were two separate concepts that would only ever relate in this way. For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. I accept that this is contrived example to demonstrate the point.
Another reason is that you want to specialize a concept in different ways. If you have a Person table and want to add the concept of different types of Person, such as Employee, Customer, Shareholder - each one of these would need different sets of data. The data that is similar between them would be on the Person table, the specialist information would be on the specific tables for Customer, Shareholder, Employee.
Some database engines struggle to efficiently add a new column to a very large table (many rows) and I have seen extension-tables used to contain the new column, rather than the new column being added to the original table. This is one of the more suspect uses of additional tables.
You may also decide to divide the data for a single concept between two different tables for performance or readability issues, but this is a reasonably special case if you are starting from scratch - these issues will show themselves later.

First, I think it is a question of modelling and defining what consist a separate entity. Suppose you have customers with one and only one single address. Of course you could implement everything in a single table customer, but if, in the future you allow him to have 2 or more addresses, then you will need to refactor that (not a problem, but take a conscious decision).
I can also think of an interesting case not mentioned in other answers where splitting the table could be useful:
Imagine, again, you have customers with a single address each, but this time it is optional to have an address. Of course you could implement that as a bunch of NULL-able columns such as ZIP,state,street. But suppose that given that you do have an address the state is not optional, but the ZIP is. How to model that in a single table? You could use a constraint on the customer table, but it is much easier to divide in another table and make the foreign_key NULLable. That way your model is much more explicit in saying that the entity address is optional, and that ZIP is an optional attribute of that entity.

not very often.
you may find some benefit if you need to implement some security - so some users can see some of the columns (table1) but not others (table2)..
of course some databases (Oracle) allow you to do this kind of security in the same table, but some others may not.

You are referring to database normalization. One example that I can think of in an application that I maintain is Items. The application allows the user to sell many different types of items (i.e. InventoryItems, NonInventoryItems, ServiceItems, etc...). While I could store all of the fields required by every item in one Items table, it is much easier to maintain to have a base Item table that contains fields common to all items and then separate tables for each item type (i.e. Inventory, NonInventory, etc..) which contain fields specific to only that item type. Then, the item table would have a foreign key to the specific item type that it represents. The relationship between the specific item tables and the base item table would be one-to-one.
Below, is an article on normalization.
http://support.microsoft.com/kb/283878

As with all design questions the answer is "it depends."
There are few considerations:
how large will the table get (both in terms of fields and rows)? It can be inconvenient to house your users' name, password with other less commonly used data both from a maintenance and programming perspective
fields in the combined table which have constraints could become cumbersome to manage over time. for example, if a trigger needs to fire for a specific field, that's going to happen for every update to the table regardless of whether that field was affected.
how certain are you that the relationship will be 1:1? As This question points out, things get can complicated quickly.

Another use case can be the following: you might import data from some source and update it daily, e.g. information about books. Then, you add data yourself about some books. Then it makes sense to put the imported data in another table than your own data.

I normally encounter two general kinds of 1:1 relationship in practice:
IS-A relationships, also known as supertype/subtype relationships. This is when one kind of entity is actually a type of another entity (EntityA IS A EntityB). Examples:
Person entity, with separate entities for Accountant, Engineer, Salesperson, within the same company.
Item entity, with separate entities for Widget, RawMaterial, FinishedGood, etc.
Car entity, with separate entities for Truck, Sedan, etc.
In all these situations, the supertype entity (e.g. Person, Item or Car) would have the attributes common to all subtypes, and the subtype entities would have attributes unique to each subtype. The primary key of the subtype would be the same as that of the supertype.
"Boss" relationships. This is when a person is the unique boss or manager or supervisor of an organizational unit (department, company, etc.). When there is only one boss allowed for an organizational unit, then there is a 1:1 relationship between the person entity that represents the boss and the organizational unit entity.

The main time to use a one-to-one relationship is when inheritance is involved.
Below, a person can be a staff and/or a customer. The staff and customer inherit the person attributes. The advantage being if a person is a staff AND a customer their details are stored only once, in the generic person table. The child tables have details specific to staff and customers.

In my time of programming i encountered this only in one situation. Which is when there is a 1-to-many and an 1-to-1 relationship between the same 2 entities ("Entity A" and "Entity B").
When "Entity A" has multiple "Entity B" and "Entity B" has only 1 "Entity A"
and
"Entity A" has only 1 current "Entity B" and "Entity B" has only 1 "Entity A".
For example, a Car can only have one current Driver, and the Driver can only drive one car at a time - so the relationship between the concepts of Car and Driver would be 1 to 1. - I borrowed this example from #Steve Fenton's answer
Where a Driver can drive multiple Cars, just not at the same time. So the Car and Driver entities are 1-to-many or many-to-many. But if we need to know who the current driver is, then we also need the 1-to-1 relation.

Another use case might be if the maximum number of columns in the database table is exceeded. Then you could join another table using OneToOne

Many tables to a single row in relational database

Consider we have a database that has a table, which is a record of a sale. You sell both products and services, so you also have a product and service table.
Each sale can either be a product or a service, which leaves the options for designing the database to be something like the following:
Add columns for each type, ie. add Service_id and Product_id to Invoice_Row, both columns of which are nullable. If they're both null, it's an ad-hoc charge not relating to anything, but if one of them is satisfied then it is a row relating to that type.
Add a weird string/id based system, for instance: Type_table, Type_id. This would be a string/varchar and integer respectively, the former would contain for example 'Service', and the latter the id within the Service table. This is obviously loose coupling and horrible, but is a way of solving it so long as you're only accessing the DB from code, as such.
Abstract out the concept of "something that is chargeable" for with new tables, of which Product and Service now are an abstraction of, and on the Invoice_Row table you would link to something like ChargeableEntity_id. However, the ChargeableEntity table here would essentially be redundant as it too would need some way to link to an abstract "backend" table, which brings us all the way back around to the same problem.
Which way would you choose, or what are the other alternatives to solving this problem?

What you are essentially asking is how to achieve polymorphism in a relational database. There are many approaches (as you yourself demonstrate) to this problem. One solution is to use "table per class" inheritance. In this setup, there will be a parent table (akin to your "chargeable item") that contains a unique identifier and the fields that are common to both products and services. There will be two child tables, products and goods: Each will contain the unique identifier for that entity and the fields specific to it.
One benefit to this approach over others is you don't end up with one table with many nullable columns that essentially becomes a dumping ground to describe anything ("schema-less").
One downside is as your inheritance hierarchy grows, the number of joins needed to grab all the data for an entity also grows.

I believe it depends on use case(s).
You could put the common columns in one table and put product and service specific columns in its own tables.Here the deal is that you need to join stuff.
Else if you maintain two separate tables, one for Product and another for Sale. You use application logic to determine which table to insert into. And getting all sales will essentially mean , union of getting all products and getting all sale.
I would go for approach 2 personally to avoid joins and inserting into two tables whenever a sale is made.

Database Design for Asset Management

I'm developing an Asset management application.
Looking through the excel tracker that was being used previously, I was able to identify some attributes that were common to all categories of assets (basically non-technical attributes such as Purchase Order No. , Warranty Info etc.) for which I think I will make a separate table.
But when storing technical-attributes, there are many categories of assets for which I need only one or two additional attributes to be stored.
Should a make a single table for all these attributes and store NULLs wherever applicable or should I make a separate table each category containing just the asset ID and the addition columns? Which approach is better/more pragmatic?
Is cluttering the database with too many tables ok? I have around 10 such categories.

There are 3 known approaches to this:
Single table
In this model, you have a single table with all known columns, and allow them to be null for types that don't have that attribute. This gives you a simple database, and fairly simple SQL, but doesn't allow support for common features that relational databases give you, like insisting on non-null columns for a data type, or creating unique indices where that makes sense.
It also tends to lead to messy SQL, with developers forgetting over time what columns mean, so you could get a column being used for multiple purposes.
It does make it easy to join to other tables - so if you have an asset and a purchase related to that asset, the "purchase" table joins to the "asset" table on "assetID".
Table per subtype
In this case, you build a table for each subtype, and enforce the data characteristics of that subtype with not null, unique etc.
This creates a clearer separation of subtypes, and is less likely to degrade into big ball of mud, but makes joins very hard - to join from "purchase" to "asset", you have to know which table holds that particular asset.
Common table for common fields, table per subtype
In this model, you have a single table for the fields that are common between subtypes - you say you've identified this already - and have further tables for each subtype to store the unique attributes.
This solves the joining problem between "asset" and "purchase", keeps the data pretty self-describing.
It does mean client logic needs to implement the "join asset_master to asset_subtype" issue.
I prefer option 3 - it's the best trade-off between maintainability and managability.

Databases should be able to handle lots of columns and lots of tables, so both approaches should work from that perspective.
If you don't have any additional requirements, I'd use the single table approach. It is the easiest, and the only thing you are loosing is the ability to put not null constraints on the fields that exist only form some categories

What's more readable naming conventions for lookup tables?

We always name lookup tables - such as Countries,Cities,Regions ... etc - as below :
EntityName_LK OR LK_EntityName ( Countries_LK OR LK_Countries )
But I ask if any one have more better naming conversions for lookup tables ?
Edit:
We think to make postfix or prefix to solve like a conflict :
if we have User tables and lookup table for UserTypes (ID-Name) and we have a relation many to many between User & UserTypes that make us a table which we can name it like Types_For_User that may make confusion between UserTypes & Types_For_User So we like to make lookup table UserTypes to be like UserTypesLK to be obvious to all

Before you decide you need the "lookup" moniker, you should try to understand why you are designating some tables as "lookups" and not others. Each table should represent an entity unto itself.
What happens when a table that was designated as a "lookup" grows in scope and is no longer considered a "lookup"? You are either left with changing the table name which can be onerous or leaving it as is and having to explain to everyone that a given table isn't really a "lookup".
A common scenario mentioned in the comments related to a junction table. For example, suppose a User can have multiple "Types" which are expressed in a junction table with two foreign keys. Should that table be called User_UserTypes? To this scenario, I would first say that I prefer to use the suffix Member on the junction table. So we would have Users, UserTypes, UserTypeMembers. Secondly, the word "type" in this context is quite generic. Does a UserType really mean a Role? The term you use can make all the difference. If UserTypes are really Roles, then our table names become Users, Roles, RoleMembers which seems quite clear.

Here are two concerns for whether to use a prefix or suffix.
In a sorted list of tables, do you want the LK tables to be together or do you want all tables pertaining to EntityName to appear together
When programming in environments with auto-complete, are you likely to want to type "LK" to get the list of tables or the beginning of EntityName?
I think there are arguments for either, but I would choose to start with EntityName.

Every table can become a lookup table.
Consider that a person is a lookup in an Invoice table.
So in my opinion, tables should just be named the (singular) entity name, e.g. Person, Invoice.
What you do want is a standard for the column names and constraints, such as
FK_Invoice_Person (in table invoice, link to person)
PersonID or Person_ID (column in table invoice, linking to entity Person)
At the end of the day, it is all up to personal preference (if you can get away with dictating it) or team standards.
updated
If you have lookups that pertain only to entities, like Invoice_Terms which is a lookup from a list of 4 scenarios, then you could name it as Invoice_LK_Terms which would make it appear by name grouped under Invoice. Another way is to have a single lookup table for simple single-value lookups, separated by the function (table+column) it is for, e.g.
Lookups
Table | Column | Value

There is only one type of table and I don't believe there is any good reason for calling some tables "lookup" tables. Use a naming convention that works equally for every table.

One area where table naming conventions can help is data migration between environments. We often have to move data in lookup tables (which constrain values which may appear in other tables) along with schema changes, as these allowed value lists change. Currently we don't name lookup tables differently, but we are considering it to prevent the migration guy asking "which tables are lookup tables again?" every time.