How to Handle Optional Columns - database

My question is related to ServiceASpecificField and ServiceBSpecificField. I feel that these two fields are placed inappropriately because for all records of service A for all subscribers in SubscriberServiceMap table, ServiceBSpecificField will have null value and vice versa.
If I move these two fields in Subscribers table, then I will have another problem. All those subscribers who only avail service A will have null value in Subscribers.ServiceBSpecificField.
So what should be done ideally?

place check constraint on Service_A and _B tables like:
alter table Service_A add constraint chk_A check (ServiceID = 1);
alter table Service_B add constraint chk_B check (ServiceID = 2);
then jou can join like
select *
from SubscriberService as x
left join Service_A as a on (a.SubscriberID = x.SubscriberID and a.ServiceID = x.ServiceID)
left join Service_B as b on (b.SubscriberID = x.SubscriberID and b.ServiceID = x.ServiceID)

An easy way to do this is to ask yourself: Do the values of these columns vary according to the Subscription (SubscriberServiceMap table) or the Service?
If every subscriber of "Service A" has the same value for ServiceASpecificField, only then must you move this to the Services table.
How many such fields do you anticipate? ServiceASpecificField, ServiceBSpecificField, C, D... and so forth? If the number is sizable, you could go for an EAV model, which would involve the creation of another table.

This is a simple supertype-subtype issue which you can solve at 5NF, you do not need EAV or improved EAV or 6NF (the full and final correct EAV) for this. Since the value of ServiceAColumn is dependent on the specific subscriber's subscription to the service, then it has to be in the Associative table.
▶Normalised Data Model◀ (inline links do not work on some browsers/versions.)
Readers who are not familiar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.
This is an ordinary Relational Supertype-Subtype structure. This one is Exclusive: a Service is exclusively one Subtype.
The Relations and Subtypes are more explicit and more controlled in this model than in other answers. Eg. the FK Relations are specific to the Service Subtype, not the Service Supertype.
The Discriminator, which identifies which Subtype any Supertype row is, is the ServiceType. The ServiceType does not need to be repeated in the Subtypes, we known which subtype it is by the subtype table.
Unless you have millions of Services, a short code is a more appropriate PK than a meaningless number.
Other
You can lose the Id column in SubscriberService because it is 100% redundant and serves no purpose.
the PK for SubscriberService is (SubscriberId, ServiceId), unless you want duplicate rows.
Please change the column names: Subscriber.Id to SubscriberId; Service.Id to ServiceId. Never use Id as a column name. For PKs and FKs, alway use the full column name. The relevance of that will become clear to you when you start coding.
Sixth Normal Form or EAV
Adding columns and tables when adding new services which have new attributes, is well, necessary in a Relational database, and you retain a lot of control and integrity.
If you don't "want" to add new tables per new service then yes, go with EAV or 6NF, but make sure you have the normal controls (type safety) and Data and Referential Integrity available in Relational databases. EAV is often implemented without proper Relational controls and Integrity, which leads to many, many problems. Here is a question/answer on that subject. If you do go with that, and the Data Models in that question are not explanatory enough, let me know and I will give you a Data Model that is specific to your requirement (the DM I have provided above is pure 5NF because that is the full requirement for your original question).

If the value of ServiceSpecificField depends both on service and subscriber and for all subscriber-service pairs the type of the field - is the same (as I see in your example - varchar(50) for both fields), then I would update the SubscriberSerivceMap table only:
table SubscriberSerivceMap:
Id
SubscriberId
ServiceId
SpecificField
Example of such table:
Id SubscriberId Service Id SpecifiedField
1 1 1 sub1_serv1
2 1 2 sub1_serv2
3 2 1 sub2_serv1
4 2 2 sub2_serv2

Related

Database theory: best way to have a "flags" table which could apply to many entities?

I'm building a data model for a new project I'm developing. Part of this data model involves entities having "flags".
A flag is simply a boolean value - either an entity has the flag or it does not have the flag. To that end I have a table simply called "flags" that has an ID, a string name, and a description. (An example of a flag might be "is active" or "should be displayed" or "belongs to group".)
So for example, any user in my users table could have none, one, or many flags. So I create a userFlags bridge table with user ID and flag ID. If the table contains a row for the given flag ID and user ID, that user has that flag.
Ok, so now I add another entity - say "section". Each section can also have flags. So I create a sectionFlags table to accommodate this.
Now I have another entity - "content", so again, "contentFlags".
And so on.
My final data model has basically two tables per entity, one to hold the entity and one for flags.
While this certainly works, it seems like there may be a better way to design my model, so I don't have to have so many bridge tables. One idea I had was a master "hasFlags" table with flag ID, item ID and item type. The item type could be an enumerated field only accepting values corresponding to known entities. The only problem there is that my foreign key for the entity will not work because each "item ID" could refer to a different entity. (I have actually used this technique in other data models, and while it certainly works, you lose referential integrity as well as things like cascade updates.)
Or, perhaps my data model is fine as-is and that's just the nature of the beast.
Any more-advanced experienced DB devs care to chime in?
The many-to-many relationships are one way to do it (and possibly faster than what I'm about to suggest because they can use integer key indexes).
The other way to do this is with a polymorphic relationship.
Your entity-to-flag table needs 2 columns as well as the foreign key link to the flag table;
other_key integer not null
other_type varchar(...) not null
And in those fields you store the foreign key of the relation in the integer and the type of the relation in the varchar. Full-on ORMs that support this sometimes store the class name of the foreign relation in the type column, to aid with object loading.
The downside here is that the integer can't be a true foreign key as it will contain duplicates from many tables. It also makes your querying a bit more interesting per-join than the many-to-many tables, but it does allow you to generalise your join in code.

Linking an address table to multiple other tables

I have been asked to add a new address book table to our database (SQL Server 2012).
To simplify the related part of the database, there are three tables each linked to each other in a one to many fashion: Company (has many) Products (has many) Projects and the idea is that one or many addresses will be able to exist at any one of these levels. The thinking is that in the front-end system, a user will be able to view and select specific addresses for the project they specify and more generic addresses relating to its parent product and company.
The issue now if how best to model this in the database.
I have thought of two possible ideas so far so wonder if anyone has had a similar type of relationship to model themselves and how they implemented it?
Idea one:
The new address table will additionally contain three fields: companyID, productID and projectID. These fields will be related to the relevant tables and be nullable to represent company and product level addresses. e.g. companyID 2, productID 1, projectID NULL is a product level address.
My issue with this is that I am storing the relationship information in the table so if a project is ever changed to be related to a different product, the data in this table will be incorrect. I could potentially NULL all but the level I am interested in but this will make getting parent addresses a little harder to get
Idea two:
On the address table have a typeID and a genericID. genericID could contain the IDs from the Company, Product and Project tables with the typeID determining which table it came from. I am a little stuck how to set up the necessary constraints to do this though and wonder if this is going to get tricky to deal with in the future
Many thanks,
I will suggest using Idea one and preventing Idea two.
Second Idea is called Polymorphic Association anti pattern
Objective: Reference Multiple Parents
Resulting side effect: Using dual-purpose foreign key will violating first normal form (atomic issue), loosing referential integrity
Solution: Simplify the Relationship
The simplification of the relationship could be obtained in two ways:
Having multiple null-able forging keys (idea number 1): That will be
simple and applicable if the tables(product, project,...) that using
the relation are limited. (think about when they grow up to more)
Another more generic solution will be using inheritance. Defining a
new entity as the base table for (product, project,...) to satisfy
Addressable. May naming it organization-unit be more rational. Primary key of this organization_unit table will be the primary key of (product, project,...). Other collections like Address, Image, Contract ... tables will have a relation to this base table.
It sounds like you could use Junction tables http://en.wikipedia.org/wiki/Junction_table.
They will give you the flexibility you need to maintain your foreign key restraints, as well as share addresses between levels or entities if that is desired.
One for Company_Address, Product_Address, and Project_Address

Best approach to avoid Too many columns and complexity in database design

Inventory Items :
Paper Size
-----
A0
A1
A2
etc
Paper Weight
------------
80gsm
150gsm etc
Paper mode
----------
Colour
Bw
Paper type
-----------
glass
silk
normal
Tabdividers and tabdivider Type
--------
Binding and Binding Types
--
Laminate and laminate Types
--
Such Inventory items and these all needs to be stored in invoice table
How do you store them in Database using proper RDBMS.
As per my opinion for each list a master table and retrieval with JOINS. However this may be a little bit complex adding too many tables into the database.
This normalisation is having bit of problem when storing all this information against a Invoice. This is causing too many columns in invoice table.
Other way putting all of them into a one table with more columns and then each row will be a combination of them.. (hacking algorithm 4 list with 4 items over 24 records which will have reference ID).
Which one do you think the best and why!!
Your initial idea is correct. And anyone claiming that four tables is "a little bit complex" and/or "too many tables" shouldn't be doing database work. This is what RDBMS's are designed (and tuned) to do.
Each of these 4 items is an individual property of something so they can't simply be put, as is, into a table that merges them. As you had thought, you start with:
PaperSize
PaperWeight
PaperMode
PaperType
These are lookup tables and hence should have non-auto-incrementing ID fields.
These will be used as Foreign Key fields for the main paper-based entities.
Or if they can only exist in certain combinations, then there would need to be a relationship table to capture/manage what those valid combinations are. But those four paper "properties" would still be separate tables that Foreign Key to the relationship table. Some people would put an separate ID field on that relationship table to uniquely identify the combination via a single value. Personally, I wouldn't do that unless there was a technical requirement such as Replication (or some other process/feature) that required that each table had a single-field key. Instead, I would just make the PK out of the four ID fields that point to those paper "property" lookup tables. Then those four fields would still go into any paper-based entities. At that point the main paper entity tables would look about the same as they would if there wasn't the relationship table, the difference being that instead of having 4 FKs of a single ID field each, one to each of the paper "property" tables, there would be a single FK of 4 ID fields pointing back to the PK of the relationship table.
Why not jam everything into a single table? Because:
It defeats the purpose of using a Relational Database Management System to flatten out the data into a non-relational structure.
It is harder to grow that structure over time
It makes finding all paper entities of a particular property clunkier
It makes finding all paper entities of a particular property slower / less efficient
maybe other reasons?
EDIT:
Regarding the new info (e.g. Invoice Table, etc) that wasn't in the question when I was writing the above, that should be abstracted via a Product/Inventory table that would capture these combinations. That is what I was referring to as the main paper entities. The Invoice table would simply refer to a ProductID/InventoryID (just as an example) and the Product/Inventory table would have these paper property IDs. I don't see why these properties would be in an Invoice table.
EDIT2:
Regarding the IDs of the "property" lookup tables, one reason that they should not be auto-incrementing is that their values should be taken from Enums in the app layer. These lookup tables are just a means of providing a "data dictionary" so that the database layer can have insight into what these values mean.

Many tables to a single row in relational database

Consider we have a database that has a table, which is a record of a sale. You sell both products and services, so you also have a product and service table.
Each sale can either be a product or a service, which leaves the options for designing the database to be something like the following:
Add columns for each type, ie. add Service_id and Product_id to Invoice_Row, both columns of which are nullable. If they're both null, it's an ad-hoc charge not relating to anything, but if one of them is satisfied then it is a row relating to that type.
Add a weird string/id based system, for instance: Type_table, Type_id. This would be a string/varchar and integer respectively, the former would contain for example 'Service', and the latter the id within the Service table. This is obviously loose coupling and horrible, but is a way of solving it so long as you're only accessing the DB from code, as such.
Abstract out the concept of "something that is chargeable" for with new tables, of which Product and Service now are an abstraction of, and on the Invoice_Row table you would link to something like ChargeableEntity_id. However, the ChargeableEntity table here would essentially be redundant as it too would need some way to link to an abstract "backend" table, which brings us all the way back around to the same problem.
Which way would you choose, or what are the other alternatives to solving this problem?
What you are essentially asking is how to achieve polymorphism in a relational database. There are many approaches (as you yourself demonstrate) to this problem. One solution is to use "table per class" inheritance. In this setup, there will be a parent table (akin to your "chargeable item") that contains a unique identifier and the fields that are common to both products and services. There will be two child tables, products and goods: Each will contain the unique identifier for that entity and the fields specific to it.
One benefit to this approach over others is you don't end up with one table with many nullable columns that essentially becomes a dumping ground to describe anything ("schema-less").
One downside is as your inheritance hierarchy grows, the number of joins needed to grab all the data for an entity also grows.
I believe it depends on use case(s).
You could put the common columns in one table and put product and service specific columns in its own tables.Here the deal is that you need to join stuff.
Else if you maintain two separate tables, one for Product and another for Sale. You use application logic to determine which table to insert into. And getting all sales will essentially mean , union of getting all products and getting all sale.
I would go for approach 2 personally to avoid joins and inserting into two tables whenever a sale is made.

What's more readable naming conventions for lookup tables?

We always name lookup tables - such as Countries,Cities,Regions ... etc - as below :
EntityName_LK OR LK_EntityName ( Countries_LK OR LK_Countries )
But I ask if any one have more better naming conversions for lookup tables ?
Edit:
We think to make postfix or prefix to solve like a conflict :
if we have User tables and lookup table for UserTypes (ID-Name) and we have a relation many to many between User & UserTypes that make us a table which we can name it like Types_For_User that may make confusion between UserTypes & Types_For_User So we like to make lookup table UserTypes to be like UserTypesLK to be obvious to all
Before you decide you need the "lookup" moniker, you should try to understand why you are designating some tables as "lookups" and not others. Each table should represent an entity unto itself.
What happens when a table that was designated as a "lookup" grows in scope and is no longer considered a "lookup"? You are either left with changing the table name which can be onerous or leaving it as is and having to explain to everyone that a given table isn't really a "lookup".
A common scenario mentioned in the comments related to a junction table. For example, suppose a User can have multiple "Types" which are expressed in a junction table with two foreign keys. Should that table be called User_UserTypes? To this scenario, I would first say that I prefer to use the suffix Member on the junction table. So we would have Users, UserTypes, UserTypeMembers. Secondly, the word "type" in this context is quite generic. Does a UserType really mean a Role? The term you use can make all the difference. If UserTypes are really Roles, then our table names become Users, Roles, RoleMembers which seems quite clear.
Here are two concerns for whether to use a prefix or suffix.
In a sorted list of tables, do you want the LK tables to be together or do you want all tables pertaining to EntityName to appear together
When programming in environments with auto-complete, are you likely to want to type "LK" to get the list of tables or the beginning of EntityName?
I think there are arguments for either, but I would choose to start with EntityName.
Every table can become a lookup table.
Consider that a person is a lookup in an Invoice table.
So in my opinion, tables should just be named the (singular) entity name, e.g. Person, Invoice.
What you do want is a standard for the column names and constraints, such as
FK_Invoice_Person (in table invoice, link to person)
PersonID or Person_ID (column in table invoice, linking to entity Person)
At the end of the day, it is all up to personal preference (if you can get away with dictating it) or team standards.
updated
If you have lookups that pertain only to entities, like Invoice_Terms which is a lookup from a list of 4 scenarios, then you could name it as Invoice_LK_Terms which would make it appear by name grouped under Invoice. Another way is to have a single lookup table for simple single-value lookups, separated by the function (table+column) it is for, e.g.
Lookups
Table | Column | Value
There is only one type of table and I don't believe there is any good reason for calling some tables "lookup" tables. Use a naming convention that works equally for every table.
One area where table naming conventions can help is data migration between environments. We often have to move data in lookup tables (which constrain values which may appear in other tables) along with schema changes, as these allowed value lists change. Currently we don't name lookup tables differently, but we are considering it to prevent the migration guy asking "which tables are lookup tables again?" every time.

Resources