Eliminating redundant relationships when modeling Header/Detail relationships? - data-modeling

I've got a model that looks something like this:
One Account has many Branches, and each Statement is generated for one Account. The model is redundant because the Account (the AccountID on the header) can be inferred from the BranchID on a transaction (a statement will always have one or more transactions).
Should the AccountID be removed from the StatementHeader, or is this level of redundancy OK? Or is there a better solution?

If you have the StatementHeader then it should have the AccountID to keep the referential integrity.
However it may be better to remove the StatementHeader completely and move the StatementDate into the Statement record. This would make things cleaner and make the model better describe what you want.

As a statement is historical, and usually read only, data some redundency is fine. I agree with Richard Harrison and would move both [AccountID] and [StatementDate] in to the [Statement] table; my reasoning being you say that an account has many branches so you will be genereating a statement for an Account.
Storing all this data in the same place will reduce joins and speed up reporting, which i assume is the reason for this database.

Sometimes, (real or perceived) redundancy is a consequence of a business rule. In this case, the business rule is: "a statement that is issued to an account shall contain only transactions for branches that belong to that particular account."
To enforce that rule, you could try to come up with a database schema that makes it impossible to violate it, or enforce it explicitly with a constraint or trigger. And that seems to be easier with StatementHeader.AccountID. In Oracle, you could write something like this:
create or replace trigger statement_has_unique_account
before insert or update on Statement
referencing old as old new as new
for each row
declare
m integer;
n integer;
begin
select b.AccountID
into m
from Branch b
where b.ID = new.BranchID;
select s.AccountID
into n
from StatementHeader s
where s.ID = new.StatementID;
if m <> n then
raise_application_error(-1000, 'No way!');
end if;
end;
Without AccountID in StatementHeader, you'd have to write a comparison with all the other AccountIDs from all other Statements that share the same StatementID, resulting in a more complicated sequence of statements.
So i would keep AccountID as a foreign key in StatementHeader and enforce the business rule explicitly with a trigger.

Related

Abstracting Differing Implementations With A Single Table

The problem I am trying to solve is not an overly complicated one but one that I would like to try and solve more elegantly than I currently am.
Problem:
We do business with multiple companies. For the sake of argument lets say each company produces motor vehicles. Each company has a differing implementation (i.e. data that must be persisted into a database). When a customer orders a car, you have no way of knowing what type of car they might buy so it is desirable to have a single looking table called 'Vehicles' that establishes the relationship between the CustomerId, a unique VehicleId, internal to our database, globally unique and some sort of composite key which would be unique in one of the many CompanyX_Vehicle tables.
An example would be:
Top level lookup table:
VehicleId
CustomerId
CompanyId
CompanyVehicleId
CompanyAVehicle Table:
CompanyAVehicleId ------> Part of composite key
CompanyId ------> Part of composite key
...... unique implementation and persistence requirements.
CompanyBVehicle Table:
CompanyBVehicleId ------> Part of composite key
CompanyId ------> Part of composite key
...... unique implementation and persistence requirements.
I have to disable foreign key enforcement for obvious reasons however in code (in this case C#, EF), I can perform a single query and eagerly include the necessary data from the correct CompanyXVehicle table.
Alternatively, I can omit any kind of relationship and just perform two queries each and every time, one to get the company and companyvehicle ID's and then make a call into the necessary table.
However I have a feeling there is a better alternative to either of these solutions. Does anyone have a suggestion on how to tackle this particular problem?
I'll put an answer........so this can be closed out (eventually and if no one else puts a better answer).
While there are several ways to do this, I prefer the DRY method.
Which is :
Base Class/Table has the PK and all the scalars that are the same.
A different sub-class(table) for the different "types" of entities. This would have the scalars that are unique to the type.
Animal(Table)
AnimalSurrogateKey (int or guid)
Species (lookup table FK int)
Birthddate (datetime, null)
Dog(Table)
ParentAnimalSurrogateKey (int) PK,FK
BarkDecibels (int)
Snake(Table)
ParentAnimalSurrogateKey (int) PK,FK
ScaleCount (int)
Something like that.
ORMs can handle this. Hand/Manual ORM can handle it....
You can query for general information about the "Animals".
Or you'll have multiple queries to get all the sub-type information.
If you needed to query about basic information about just Dogs, it would be
Select AnimalSurrogateKey , Species , Birthdate from dbo.Animal a where exists (select null from dbo.Dog d where d.ParentAnimalSurrogateKey = a.AnimalSurrogateKey )
..........
The key is to follow an established "pattern" for dealing with these scenarios. Most problems have already been thought out........and a future developer will thank you for mentioning in the comments/documentation "we implemented the blah blah blah pattern to implement".
APPEND:
(using info from http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server)
That is a great article going through the scenarios. Again, you'll have to judge if EF is "good enough" or not........if it isn't, then you can manually do your ORM ... and (to get around the multiple queries concern) ... maybe test a query like this .........
select p.PersonID , p.PersonTypeID , s.EnrollmentDate , t.HireDate , par.DifficultyScore
from dbo.People p
left join dbo.Students s on p.PersonID = s.PersonID
left join dbo.Teachers t on p.PersonID = t.PersonID
left join dbo.Parents par on p.PersonID = par.PersonID
And then you can manually do your ORM to "switch/case" off of PersonTypeID, and create your subclasses using the data unique to each subclass (noting that the rows where the type is off, you will have null values.......ex: if your subtype is "Student", then par.DifficultyScore will be null for that row. )
At some point, you're gonna have to POC (proof of concept) your choice. You have a problem, there are a handful number of ways to deal with it....you have to test it. EF may be good enough..it may not be. Thus why I go POCO first...so I can go with ado.net/old-school/idatareaders if EF isn't performing well enough.
Good luck.

Database Constraints

I have a database with three tables: TEAM, PLAYER, CONTRACT.
TEAM(teamID, name)
PLAYER(playerID, name)
CONTRACT(contractID, playerID, teamID, dateOfSigning, expirationDate)
In this database, I want the constraint that a player can't have multiple contracts at the same time. I mention that I want that expired contracts remain registered in my database.
For example:
CONTRACT(1,1,1, 01/01/2000, 01/01/2005)
CONTRACT(1,1,1, 01/01/2001, 01/01/2003)
So, My player has a contract from 01/01/2000 to 01/01/2005 and another contract from 01/01/2001 to 01/01/2003. This is not possible.
Two different contracts for a player do not overlap if and only if one starts after the other finishes. So they overlap if and only if NOT(one starts after the other finishes). The constraint you want is that no rows of contractID pairs satisfy the condition that they overlap:
CONTRACT(c1.contractID, playerID, c1.teamID, c1.dateOfSigning, c1.expirationDate)
AND CONTRACT(c2.contractID, playerID, c2.teamID, c2.dateOfSigning, c2.expirationDate)
AND c1.contractID <> c2.contractID
AND NOT(c1.dateOfSigning > c2.expirationDate
OR c2.dateOfSigning > c1.expirationDate)
This means the following set of rows is empty:
SELECT c1.contractID, c2.contractID
FROM CONTRACT c1
JOIN CONTRACT c2
ON c1.playerID = c2.playerID
AND c1.contractID <> c2.contractID
WHERE NOT(c1.dateOfSigning > c2.expirationDate
OR c2.dateOfSigning > c1.expirationDate)
If DBMSes supported SELECTs in CHECK then you could have a CONTRACT constraint:
CHECK (SELECT CASE
WHEN EXISTS (SELECT 1 FROM (...))
THEN 1
ELSE 0 END)
But they don't so you have to test this SELECT in a trigger or stored procedure. (As explained in another answer.) Arbitrary constraint checking is not well-supported by DBMSes.
You can reduce computation by keeping expired contracts in a different table than active ones. (They aren't going to overlap with new ones.) But I have just used the table you gave.
This is not directly enforceable through declarative constraints. Unfortunately, current DBMSes don't offer "unique range" constraint (to my knowledge at least).
You are left with two options:
Either enforce this through your business logic (usually triggers or stored procedures). But be careful about concurrency (you may need to employ locking to avoid race conditions between concurrent transactions fiddling with related dates).
Or decrease the granularity of time. For example, if you divide the year to months, and declare which months are "occupied" by the given contract (by having separate row for each month), then you can easily (and declaratively!) enforce no duplicates can exist for the same month.
NOTE: The latter can be even more granular, e.g. days, but that will produce more rows in the database and cause problems with leap years etc. - you'll need to find the right balance for your particular case.
If in the contract table you made the playerid unique and add another table to keep a history of contracts. A player can then only have one contact but could have had many.
In the business layer you need to decide if the contract being inserted should overwrite the current contract. If so, archive and remove the current contract from the contract table and insert the new one.

using dummy row with NOT NULL to solve DEFAULT NULL

I know having DEFAULT NULLS is not a good practice but I have many optional lookup values which are FK in the system so to solve this issue here is what i am doing: I use NOT NULL for every FK / lookup colunms. I have the first row in every lookup table which is PK id = 1 as a dummy row with just "none" in all the columns. This way I can use NOT NULL in my schema and if needed reference to the none row values PK =1 for FKs which do not have any lookup value.
Is this a good design or any other work arounds?
EDIT:
I have:
Neighborhood table
Postal table.
Every neighborhood has a city, so the FK can be NOT NULL.
But not every postal code belongs to a neighborhood. Some do, some don't depending on the country. So if i use NOT NULL for the FK between postal and neighborhood then I will be screwed as there has to be some value entered. So what i am doing in essence is: have a row in every table to be a dummy row just to link the FKs.
This way row one in neighborhood table will be:
n_id = 1
name =none
etc...
In postal table I can have:
postal_code = 3456A3
FK (city) = Moscow
FK (neighborhood_id)=1 as a NOT NULL.
If I don't have a dummy row in the neighborhood lookup table then I have to declare FK (neighborhood_id) as a Default null column and store blanks in the table. This is an example but there is a huge number of values which will have blanks then in many tables.
Is this a good design or any other work arounds?
ISNULL or COALESCE and LEFT JOIN
Often "None" is an option like any other in a list of options. It may be totally reasonable to have a special row for it; it simplifies things. It may be especially practical if you link other information to options, e.g. a human-readable name.
You can always use left joins to join postal codes that may not exists.
select * from from table_a
left join table_b
on table_a.postalcode_id = table_b.postalcode_id
will select rows whether or not the postalcode_id is null or not. When you use magic numbers to designate nulls then queries become less readable.
clear:
select count(*) from table_a where postalcode_id is null;
Not so clear:
select count(*) from table_a where postalcode_id = 1;
Using nulls makes your queries explicitly handle null cases, but it also self-documents your intentions that nulls are being handled.
This seems like a simple case of premature optimization in a database:
If your schema is something like this, then I don't see a problem. Some postal codes are in a neighborhood, some aren't. That's a good case for a nullable column.
The advice about avoiding nulls is about avoiding information that does not belong in the table. For instance, if you had another five columns which only pertained to postalcodes which were in a neighborhood, then those columns would be null for postal codes which were not in a neighborhood. This would be a good reason to have a second, parallel table for postalcodes which were in a neighborhood, which could contain these other five columns.
More importantly, if performance is a concern, then the solution is to try it both ways, test the performance, and see which performs best. This performance consideration would then compete with the simplicity and readability of the design, and performance might win.
An example to illustrate the issue. I started with an Object-Role Modeling model, the same that I used to produce the earlier ER diagram. However, I created a subtype of PostalCode and added two more mandatory roles to the subtype:
This can produce an ER model very similar to the first:
But this model fails to show that there are columns which are mandatory whenever the PostalCode is a NeighborhoodPostalCode. The following model does show that:
I would say that if you have a set of optional columns which are mandatory under certain circumstances, then you should create a "subtype" which always has those columns NOT NULL. However, if you simply have random columns which may randomly be not null, then keep them as NULL columns in the main table.

How to Handle Optional Columns

My question is related to ServiceASpecificField and ServiceBSpecificField. I feel that these two fields are placed inappropriately because for all records of service A for all subscribers in SubscriberServiceMap table, ServiceBSpecificField will have null value and vice versa.
If I move these two fields in Subscribers table, then I will have another problem. All those subscribers who only avail service A will have null value in Subscribers.ServiceBSpecificField.
So what should be done ideally?
place check constraint on Service_A and _B tables like:
alter table Service_A add constraint chk_A check (ServiceID = 1);
alter table Service_B add constraint chk_B check (ServiceID = 2);
then jou can join like
select *
from SubscriberService as x
left join Service_A as a on (a.SubscriberID = x.SubscriberID and a.ServiceID = x.ServiceID)
left join Service_B as b on (b.SubscriberID = x.SubscriberID and b.ServiceID = x.ServiceID)
An easy way to do this is to ask yourself: Do the values of these columns vary according to the Subscription (SubscriberServiceMap table) or the Service?
If every subscriber of "Service A" has the same value for ServiceASpecificField, only then must you move this to the Services table.
How many such fields do you anticipate? ServiceASpecificField, ServiceBSpecificField, C, D... and so forth? If the number is sizable, you could go for an EAV model, which would involve the creation of another table.
This is a simple supertype-subtype issue which you can solve at 5NF, you do not need EAV or improved EAV or 6NF (the full and final correct EAV) for this. Since the value of ServiceAColumn is dependent on the specific subscriber's subscription to the service, then it has to be in the Associative table.
▶Normalised Data Model◀ (inline links do not work on some browsers/versions.)
Readers who are not familiar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.
This is an ordinary Relational Supertype-Subtype structure. This one is Exclusive: a Service is exclusively one Subtype.
The Relations and Subtypes are more explicit and more controlled in this model than in other answers. Eg. the FK Relations are specific to the Service Subtype, not the Service Supertype.
The Discriminator, which identifies which Subtype any Supertype row is, is the ServiceType. The ServiceType does not need to be repeated in the Subtypes, we known which subtype it is by the subtype table.
Unless you have millions of Services, a short code is a more appropriate PK than a meaningless number.
Other
You can lose the Id column in SubscriberService because it is 100% redundant and serves no purpose.
the PK for SubscriberService is (SubscriberId, ServiceId), unless you want duplicate rows.
Please change the column names: Subscriber.Id to SubscriberId; Service.Id to ServiceId. Never use Id as a column name. For PKs and FKs, alway use the full column name. The relevance of that will become clear to you when you start coding.
Sixth Normal Form or EAV
Adding columns and tables when adding new services which have new attributes, is well, necessary in a Relational database, and you retain a lot of control and integrity.
If you don't "want" to add new tables per new service then yes, go with EAV or 6NF, but make sure you have the normal controls (type safety) and Data and Referential Integrity available in Relational databases. EAV is often implemented without proper Relational controls and Integrity, which leads to many, many problems. Here is a question/answer on that subject. If you do go with that, and the Data Models in that question are not explanatory enough, let me know and I will give you a Data Model that is specific to your requirement (the DM I have provided above is pure 5NF because that is the full requirement for your original question).
If the value of ServiceSpecificField depends both on service and subscriber and for all subscriber-service pairs the type of the field - is the same (as I see in your example - varchar(50) for both fields), then I would update the SubscriberSerivceMap table only:
table SubscriberSerivceMap:
Id
SubscriberId
ServiceId
SpecificField
Example of such table:
Id SubscriberId Service Id SpecifiedField
1 1 1 sub1_serv1
2 1 2 sub1_serv2
3 2 1 sub2_serv1
4 2 2 sub2_serv2

Is it a good idea to duplicate columns across tables in order to enforce check constraints?

I'm faced with a situation where I have two tables: A and B. B has a foreign key to A.
A has a column "Detailed" which identifies whether or not its children in B require their "Details" section to be filled out.
If I have my lean structure there is no way for me to determine if a record in B needs to have its "Details" section filled out, i.e. not null, without joining to A. Thus, the only way for me to prevent somebody from inserting or updating these records to an invalid state is to have a trigger to join with A and check its "Detailed" column.
My feeling is that constraints are better than triggers, as they are more like facts about data, in addition to filters, whereas triggers are only filters.
I could get by this by duplicating the "Detailed" column in B and then having a check constraint (Detailed = 'Y' AND Details IS NOT NULL) OR (Detailed = 'N')
Thoughts on the best way to approach this?
All the tools you mentioned (constraints and triggers) are just a way to enforce the data consistency in the database.
Simple business rules, like "always having a reference", "not having a NULL" etc are enforceable with the constraints.
More complex business rules, like the one you mention here, should be enforced using triggers.
Constraints are not "better" or "worse" than triggers: they are just a shortcut for the rules you need to implement often.
For your task, just implement a trigger.
However, in Oracle, both constraints and triggers are implemented not in pure set-based way. They are called in a loop for each record affected by a DML operation.
Most efficient way would be creating a package that would serve as a single entry point to all DML against your table and check the Details in that package.
You're right to enforce this on the database level and Quassnoi's points are all good. In addition, you might want to investigate having the API for this operation reference an updatable join view of the two tables and implement the constraint through that.
In an ideal world, Oracle and other DBMSs would support "assertions":
create assertion no_more_than_50_per_user as
check(not exists(select null
from a join b on ...
where a.detailed = 'Y'
and b.details is null
);
They don't though (and not without good reason: it would be very hard to implement them in a performance manner!)
As Quassnoi suggests, triggers can be used instead - but you need to be aware of the dangers in a multi-user environment. To be sure of enforcing consistency you need to take out locks when checking data to ensure that this doesn't happen:
(Assume A record 1 currently has detailed='N', but all associated B records have details not null).
user1> Update A set detailed = 'Y' where a_id=1;
That works, because all the associated B rows have details not null.
user2> Update B set details = null where a_id=1;
That works, because user1 hasn't committed yet, so user2's trigger sees detailed='N'.
user1> commit;
user2> commit;
Now you have corrupt data. To prevent that, the trigger on B needs to select the A row "for update".
I would enforce a rule like this in the UI. If your business rules become more complicated, you are going to have your hands full with lots of redundant columns in your model in order to enforce all rules in the database schema.
If the DETAILED field was to be duplicated on B you could use the foreign key to enforce it, e.g. (B.KEYFIELD, B.DETAILED) REFERENCES (A.KEYFIELD, A.DETAILED). I'm not crazy about duplicating the field on B, but on the other hand it seems that you have data which is related to B which exists on A. If you could remove DETAILED from A and push it down to B the design might be cleaner. Some additional details might help.
Share and enjoy.

Resources