I have a table called Documents which stores filenames, notes etc. about documents. The relevant fields are:
Document_ID - Autonumber
Document_Type_ID - FK to a lookup table Document_Types
Table_Unique_ID
The Table_Unique_ID relates to any of the other IDs used in other tables and we know which table by the relevant Document_Type_ID.
E.g.
Document_Type_ID = 1 relates to the Projects table, so a document record with a Table_Unique_ID of 1357 and Document_Type_ID of 1 means it relates to Project_ID = 1357.
Document_Type_ID = 2 relates to the Sites table, so a document record with Table_Unique_ID of 1357 and Document_Type_ID of 2 means a Site_ID of 1357
and so on.
This allows for great flexibility for what types of documents we hold for various record in any table, Projects, Sites, Contacts etc. rather than creating separate tables (Project_Documents, Site_Documents etc.).
BUT it's been pointed out that data integrity is harder (or impossible) to impose using traditional simple PK/FK relationships, since that 1357 could relate to either Projects or Sites.
Currently data integrity is handled by user interface checks.
The question is, can triggers or stored procedures help when inserting Document records or when deleting the 'other' records (Projects, Contacts etc.)?
If so, I would really appreciate being pointed in the right direction.
What is your primary objective? Data integrity, or flexibility and a clean design? These are conflicting interests. If you absolutely have to enforce integrity without triggers, you'll have to have an uglier design. Compromises always have to be made in database design. You'll get the data-integrity elitists harping about how bad this design is, but at the end of the day is more normalization worth it?
Related
Recently I encountered an application, Where a Master Table is maintained which contain the data of more than 20 categories. For e.g. it has some categories named as Country,State and City.
So my question is, it is better to move out this category as a separate table and fetching out the data through joins or Everything should be inside a single table.
P.S. In future categories count might increase to 50+ or more than it.
P.S. application based on EF6 + Sql Server.
Edited Version
I just want to know that in above scenario what should be the best approach, one should go with single table with proper indexing or go by the DB normalization approach, putting each category into a separate Table and maintaning relationship through fk's.
Normally, categories are put into separate tables. This conforms more closely with normalized database structures and the definition of entities. In particular, it allows for proper foreign key relationships to be defined. That is a big win for data integrity.
Sometimes categories are put into a single table. This can, of course, be confusing; consider, for instance, "Florida, Massachusetts" or "Washington, Iowa" (these are real places).
Putting categories in one table has one major advantage: all the text is in a single location. That can be very handy for internationalization efforts. To be honest, that is the situation where I have seen this used.
Inventory Items :
Paper Size
-----
A0
A1
A2
etc
Paper Weight
------------
80gsm
150gsm etc
Paper mode
----------
Colour
Bw
Paper type
-----------
glass
silk
normal
Tabdividers and tabdivider Type
--------
Binding and Binding Types
--
Laminate and laminate Types
--
Such Inventory items and these all needs to be stored in invoice table
How do you store them in Database using proper RDBMS.
As per my opinion for each list a master table and retrieval with JOINS. However this may be a little bit complex adding too many tables into the database.
This normalisation is having bit of problem when storing all this information against a Invoice. This is causing too many columns in invoice table.
Other way putting all of them into a one table with more columns and then each row will be a combination of them.. (hacking algorithm 4 list with 4 items over 24 records which will have reference ID).
Which one do you think the best and why!!
Your initial idea is correct. And anyone claiming that four tables is "a little bit complex" and/or "too many tables" shouldn't be doing database work. This is what RDBMS's are designed (and tuned) to do.
Each of these 4 items is an individual property of something so they can't simply be put, as is, into a table that merges them. As you had thought, you start with:
PaperSize
PaperWeight
PaperMode
PaperType
These are lookup tables and hence should have non-auto-incrementing ID fields.
These will be used as Foreign Key fields for the main paper-based entities.
Or if they can only exist in certain combinations, then there would need to be a relationship table to capture/manage what those valid combinations are. But those four paper "properties" would still be separate tables that Foreign Key to the relationship table. Some people would put an separate ID field on that relationship table to uniquely identify the combination via a single value. Personally, I wouldn't do that unless there was a technical requirement such as Replication (or some other process/feature) that required that each table had a single-field key. Instead, I would just make the PK out of the four ID fields that point to those paper "property" lookup tables. Then those four fields would still go into any paper-based entities. At that point the main paper entity tables would look about the same as they would if there wasn't the relationship table, the difference being that instead of having 4 FKs of a single ID field each, one to each of the paper "property" tables, there would be a single FK of 4 ID fields pointing back to the PK of the relationship table.
Why not jam everything into a single table? Because:
It defeats the purpose of using a Relational Database Management System to flatten out the data into a non-relational structure.
It is harder to grow that structure over time
It makes finding all paper entities of a particular property clunkier
It makes finding all paper entities of a particular property slower / less efficient
maybe other reasons?
EDIT:
Regarding the new info (e.g. Invoice Table, etc) that wasn't in the question when I was writing the above, that should be abstracted via a Product/Inventory table that would capture these combinations. That is what I was referring to as the main paper entities. The Invoice table would simply refer to a ProductID/InventoryID (just as an example) and the Product/Inventory table would have these paper property IDs. I don't see why these properties would be in an Invoice table.
EDIT2:
Regarding the IDs of the "property" lookup tables, one reason that they should not be auto-incrementing is that their values should be taken from Enums in the app layer. These lookup tables are just a means of providing a "data dictionary" so that the database layer can have insight into what these values mean.
In database design what are the feelings of tuple vs referencing table for small pieces of data?
For instance, supposing you are designing a schema involving office management. You want to record what department each employee belongs to, but are otherwise uninterested in any information relating to departments. So do you have department as a string/char/varchar/etc in your EMPLOYEE table, or have it instead be a foreign key, relating a DEPARTMENT table.
If the DEPARTMENT table is recording nothing other than department names, one would normally want to combine this with the EMPLOYEE table. But if this is contained in the EMPLOYEE table you cannot guarantee that some users will call HR "HumanResourses", some may call it "H-R", some may call it "human resources", etc. Having it as a foreign key guarantees that it can be only one thing. Also, if other information is ever to be added about departments, it would be easy if it is in a table of its own.
So what do people think about it? Naturally more tables and referencing is also likely to have a negative impact on performance. My question specifically is asked with Oracle 11g in mind, but I doubt that the type of rdms involved has much bearing on this design consideration.
If you use the related table, then you don't have the performance problem of updating 1,000,000 records because the Personnel Department became the Human Resources department.
You have another option. Create the table and use it as a lookup for data entry. But store the information in the main table.
However, I prefer the option of using the related table for the departments and storing the ID for the department and the employee in a join table that has the ids and start and endates. Over time employees tend to move from one department to another. It is helpful for reporting to be able to tell what department they were in when. You need to consider how the data will be used over time and in reporting when designing this sort of thing. Short-sighted designs are hard to fix later.
Your concern about having too many tables is really unfounded. Databases are designed to have many tables and to use joins. If you index correctly, there will not be preformance implications for most databases. And you know what,I know of realtional database with many many tables that have terrabytes of data that perform just fine.
You only have to worry about the performance impact of this sort of thing if you're dealing with truly massive datasets. For any regular office environment system like this, prefer the normalized schema.
I'm developing an Asset management application.
Looking through the excel tracker that was being used previously, I was able to identify some attributes that were common to all categories of assets (basically non-technical attributes such as Purchase Order No. , Warranty Info etc.) for which I think I will make a separate table.
But when storing technical-attributes, there are many categories of assets for which I need only one or two additional attributes to be stored.
Should a make a single table for all these attributes and store NULLs wherever applicable or should I make a separate table each category containing just the asset ID and the addition columns? Which approach is better/more pragmatic?
Is cluttering the database with too many tables ok? I have around 10 such categories.
There are 3 known approaches to this:
Single table
In this model, you have a single table with all known columns, and allow them to be null for types that don't have that attribute. This gives you a simple database, and fairly simple SQL, but doesn't allow support for common features that relational databases give you, like insisting on non-null columns for a data type, or creating unique indices where that makes sense.
It also tends to lead to messy SQL, with developers forgetting over time what columns mean, so you could get a column being used for multiple purposes.
It does make it easy to join to other tables - so if you have an asset and a purchase related to that asset, the "purchase" table joins to the "asset" table on "assetID".
Table per subtype
In this case, you build a table for each subtype, and enforce the data characteristics of that subtype with not null, unique etc.
This creates a clearer separation of subtypes, and is less likely to degrade into big ball of mud, but makes joins very hard - to join from "purchase" to "asset", you have to know which table holds that particular asset.
Common table for common fields, table per subtype
In this model, you have a single table for the fields that are common between subtypes - you say you've identified this already - and have further tables for each subtype to store the unique attributes.
This solves the joining problem between "asset" and "purchase", keeps the data pretty self-describing.
It does mean client logic needs to implement the "join asset_master to asset_subtype" issue.
I prefer option 3 - it's the best trade-off between maintainability and managability.
Databases should be able to handle lots of columns and lots of tables, so both approaches should work from that perspective.
If you don't have any additional requirements, I'd use the single table approach. It is the easiest, and the only thing you are loosing is the ability to put not null constraints on the fields that exist only form some categories
My question is related to ServiceASpecificField and ServiceBSpecificField. I feel that these two fields are placed inappropriately because for all records of service A for all subscribers in SubscriberServiceMap table, ServiceBSpecificField will have null value and vice versa.
If I move these two fields in Subscribers table, then I will have another problem. All those subscribers who only avail service A will have null value in Subscribers.ServiceBSpecificField.
So what should be done ideally?
place check constraint on Service_A and _B tables like:
alter table Service_A add constraint chk_A check (ServiceID = 1);
alter table Service_B add constraint chk_B check (ServiceID = 2);
then jou can join like
select *
from SubscriberService as x
left join Service_A as a on (a.SubscriberID = x.SubscriberID and a.ServiceID = x.ServiceID)
left join Service_B as b on (b.SubscriberID = x.SubscriberID and b.ServiceID = x.ServiceID)
An easy way to do this is to ask yourself: Do the values of these columns vary according to the Subscription (SubscriberServiceMap table) or the Service?
If every subscriber of "Service A" has the same value for ServiceASpecificField, only then must you move this to the Services table.
How many such fields do you anticipate? ServiceASpecificField, ServiceBSpecificField, C, D... and so forth? If the number is sizable, you could go for an EAV model, which would involve the creation of another table.
This is a simple supertype-subtype issue which you can solve at 5NF, you do not need EAV or improved EAV or 6NF (the full and final correct EAV) for this. Since the value of ServiceAColumn is dependent on the specific subscriber's subscription to the service, then it has to be in the Associative table.
▶Normalised Data Model◀ (inline links do not work on some browsers/versions.)
Readers who are not familiar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.
This is an ordinary Relational Supertype-Subtype structure. This one is Exclusive: a Service is exclusively one Subtype.
The Relations and Subtypes are more explicit and more controlled in this model than in other answers. Eg. the FK Relations are specific to the Service Subtype, not the Service Supertype.
The Discriminator, which identifies which Subtype any Supertype row is, is the ServiceType. The ServiceType does not need to be repeated in the Subtypes, we known which subtype it is by the subtype table.
Unless you have millions of Services, a short code is a more appropriate PK than a meaningless number.
Other
You can lose the Id column in SubscriberService because it is 100% redundant and serves no purpose.
the PK for SubscriberService is (SubscriberId, ServiceId), unless you want duplicate rows.
Please change the column names: Subscriber.Id to SubscriberId; Service.Id to ServiceId. Never use Id as a column name. For PKs and FKs, alway use the full column name. The relevance of that will become clear to you when you start coding.
Sixth Normal Form or EAV
Adding columns and tables when adding new services which have new attributes, is well, necessary in a Relational database, and you retain a lot of control and integrity.
If you don't "want" to add new tables per new service then yes, go with EAV or 6NF, but make sure you have the normal controls (type safety) and Data and Referential Integrity available in Relational databases. EAV is often implemented without proper Relational controls and Integrity, which leads to many, many problems. Here is a question/answer on that subject. If you do go with that, and the Data Models in that question are not explanatory enough, let me know and I will give you a Data Model that is specific to your requirement (the DM I have provided above is pure 5NF because that is the full requirement for your original question).
If the value of ServiceSpecificField depends both on service and subscriber and for all subscriber-service pairs the type of the field - is the same (as I see in your example - varchar(50) for both fields), then I would update the SubscriberSerivceMap table only:
table SubscriberSerivceMap:
Id
SubscriberId
ServiceId
SpecificField
Example of such table:
Id SubscriberId Service Id SpecifiedField
1 1 1 sub1_serv1
2 1 2 sub1_serv2
3 2 1 sub2_serv1
4 2 2 sub2_serv2