Database Constraints - database

I have a database with three tables: TEAM, PLAYER, CONTRACT.
TEAM(teamID, name)
PLAYER(playerID, name)
CONTRACT(contractID, playerID, teamID, dateOfSigning, expirationDate)
In this database, I want the constraint that a player can't have multiple contracts at the same time. I mention that I want that expired contracts remain registered in my database.
For example:
CONTRACT(1,1,1, 01/01/2000, 01/01/2005)
CONTRACT(1,1,1, 01/01/2001, 01/01/2003)
So, My player has a contract from 01/01/2000 to 01/01/2005 and another contract from 01/01/2001 to 01/01/2003. This is not possible.

Two different contracts for a player do not overlap if and only if one starts after the other finishes. So they overlap if and only if NOT(one starts after the other finishes). The constraint you want is that no rows of contractID pairs satisfy the condition that they overlap:
CONTRACT(c1.contractID, playerID, c1.teamID, c1.dateOfSigning, c1.expirationDate)
AND CONTRACT(c2.contractID, playerID, c2.teamID, c2.dateOfSigning, c2.expirationDate)
AND c1.contractID <> c2.contractID
AND NOT(c1.dateOfSigning > c2.expirationDate
OR c2.dateOfSigning > c1.expirationDate)
This means the following set of rows is empty:
SELECT c1.contractID, c2.contractID
FROM CONTRACT c1
JOIN CONTRACT c2
ON c1.playerID = c2.playerID
AND c1.contractID <> c2.contractID
WHERE NOT(c1.dateOfSigning > c2.expirationDate
OR c2.dateOfSigning > c1.expirationDate)
If DBMSes supported SELECTs in CHECK then you could have a CONTRACT constraint:
CHECK (SELECT CASE
WHEN EXISTS (SELECT 1 FROM (...))
THEN 1
ELSE 0 END)
But they don't so you have to test this SELECT in a trigger or stored procedure. (As explained in another answer.) Arbitrary constraint checking is not well-supported by DBMSes.
You can reduce computation by keeping expired contracts in a different table than active ones. (They aren't going to overlap with new ones.) But I have just used the table you gave.

This is not directly enforceable through declarative constraints. Unfortunately, current DBMSes don't offer "unique range" constraint (to my knowledge at least).
You are left with two options:
Either enforce this through your business logic (usually triggers or stored procedures). But be careful about concurrency (you may need to employ locking to avoid race conditions between concurrent transactions fiddling with related dates).
Or decrease the granularity of time. For example, if you divide the year to months, and declare which months are "occupied" by the given contract (by having separate row for each month), then you can easily (and declaratively!) enforce no duplicates can exist for the same month.
NOTE: The latter can be even more granular, e.g. days, but that will produce more rows in the database and cause problems with leap years etc. - you'll need to find the right balance for your particular case.

If in the contract table you made the playerid unique and add another table to keep a history of contracts. A player can then only have one contact but could have had many.
In the business layer you need to decide if the contract being inserted should overwrite the current contract. If so, archive and remove the current contract from the contract table and insert the new one.

Related

Need help in developing DB logic

This is a mini-project of mine - Airline reservation system - lets call this airline FlyMi : I have a database (Not decided which one, friend of mine wants to go with MongoDB). Anyhoo, this is my requirement :
I have a table which has details of the flight - Flight number, schedule etc. I'm going to use this table to perform various operations - booking , cancellation , modification
This is where I'm stuck : For the desktop app and the web application - I'm offering an option to select seats. This means I've got to keep track of which seats are booked , which ones are not. And assume I have an UI , which shows seats as Red - Booked Green - Not Booked.And all of this - for each and every flight. My question is : What do you think would be the most efficient way to track seat bookings , for each flight in that airline?
Current Idea : Keep a table named passenger - with all the details such as name , address etc. which keep track of all passengers, and maintain a passenger ID such that , first 4 characters are flight ID, Last 2 character are seat numbers they have chosen, with random number in-between ( I say random because I think it is immaterial here). So, for any flight , If I have to find out number of un-booked seats, I will have to scan through every passenger , who has booked, and who has booked in that flight. I think this is really in-efficient. Provide me with the most efficient logic to do this.
Don't use "smart keys".
This is a bad idea called "smart keys" or "encoding information in keys".
See this answer which contains this excerpt:
Despite it now being easy to implement a Smart Key, it is hard to recommend that you create one of your own that isn't a natural key, because they tend to eventually run into trouble, whatever their advantages, because it makes the databases harder to refactor, imposes an order which is difficult to change and may not be optimal for your queries, requires a string comparison if the Smart Key includes non-numeric characters, and is less effective than a composite key in helping range-based aggregations. It also violates the basic relational guideline that every column should store atomic values
Smart Keys also tend to outgrow their original coding constraints
(Notice that seat locations are typically identified by smart keys in that they are a row number and a count across a row. But they are also typically visibly physically permanently bolted into that formation. And imagine if they were labelled and rearranged.)
Educate yourself about database design.
Just describe your business in the most straightforward terms. That is how relational model databases & DBMSs work.
Find enough fill-in-the-[named-]blanks sentence templates to describe your business situations:
"customer [cid] has name [firstname] [lastname]
AND customer [cid] has a phone number [phonenumber] of type [type] ..."
"customer [cid] can use credit card #[card_no]"
"seat [seatid] is at row [row] and column [column]"
"seat [seatid] is booked"
"seat [seatid] is temporarily committed to an unfinished booking"
...
For each such parameterized sentence template (aka predicate) have a base table where the names of the blanks/parameters are column names. Each row in a table states the statement (proposition) got from filling in the blanks per its column values; each row not in a table states NOT the statement from filling in the blanks per its column values.
Then for each table find every functional dependency (FD) that holds. (When a predicate can be expressed in the form "... AND column = F(column1,...)" then we say that column set {column1,...} functionally determines column column and that FD set → column holds.) Then identify every candidate key (CK). (A superkey is a column set that functionally determines every column. Ie that is unique, ie where each subrow of values for those columns appears only in one row of a table. A CK is a superkey that doesn't contain a smaller superkey.) Then find every join dependency (JD). (Some predicates say "... AND ..." for some number of ANDs & "..."s. There is a JD when the table for each predicate "..." would look like what you get from taking only its columns from the original table.) Note that every FD comes with an associated (binary) JD.
Then normalize your tables to fifth normal form (5NF). This means decomposing (ie replacing a table in which JD "... AND ..." holds by tables whose predicates are the "..."s) until each JD that holds is implied by the CKs (ie must hold when the JDs from the FDs from the CKs hold.) (For performance reasons one can also then denormalize by combining to base tables that aren't in 5NF.)
See this answer and this one.
Then we query by describing the rows we want. We do this by connecting base table predicates with logical operators (ie AND, OR, NOT, FOR SOME, FOR ALL etc) and function calls to give the predicates for the tables we want and/or by connecting base table names by relation operators (ie JOIN, UNION, MINUS/EXCEPT, PROJECT/SELECT, RENAME/AS) to give the values of the tables we want and/or both (eg RESTRICT/WHERE).
The JOIN of two tables holds the rows that make a true statement from, ie has as predicate, the AND of their predicates; and the UNION the OR, the MINUS/EXCEPT the AND NOT; and that PROJECT/SELECT columns of a table puts FOR SOME all-other-columns before its predicate; and RESTRICT/WHERE puts AND condition after its predicate; and the RENAME/AS of column renames that parameter in its predicate. So a table expression corresponds to a predicate: A table (base table or query result) value contains the rows that make a true statement from its (base table's or query expression's) predicate.
See this answer.
The same goes for constraints, which are true statements that collectively describe the application situations and database states than can arise given the situations that can arise and the base table predicates.
See this answer.

How to manage required data for a table

I have a table that there are 3 required records that must be present in order for the application that uses this table to function correctly. For example, if this is a linked list tree table, in my case, there are three top level groups that must be present. When I start using this table all future groups must be always under a group and no other top-level group can be created.
Group TL1
Group A
Group AA
Group TL2
Group B
Group B1
Group B1B
Group TL3
Group C
Note Group TL1, TL2, and TL3 must always be present, or else the data integrity is broken for the application's requirement.
What is the best way to insert/guard the required top level groups?
One idea I have is to have required data inserted upon table creation and have a function that checks for the presence of the required data. However, I also don't want to always check for their existence as it seems excessive and in-efficient.
Your replies are greatly appreciated.
Depending on the database, you can try things like:
1) Having a foreign key constraint that prevents the update/deletion if it is broken - if row A references row Important by foreign key, then you cannot delete row Important as the database will check the constraint and prevent it.
2) Having a trigger that runs before deletes/updates/inserts (as needed), ensures that the important rows are present and prevents the action (or inserts) if it would violate this.
I would look into whatever database flavour you use's options.

Database design - system default items and custom user items

This question applies to any database table design, where you would have system default items and custom user defaults of the same type (ie user can add his own custom items/settings).
Here is an example of invoicing and paymenttypes, By default an invoice can have payment terms of DueOnReceipt, NET10, NET15, NET30 (this is the default for all users!) therefore you would have two tables "INVOICE" and "PAYMENT_TERM"
INVOICE
Id
...
PaymentTermId
PAYMENT_TERM (System default)
Id
Name
Now what is the best way to allow a user to store their own custom "PaymentTerms" and why? (ie user can use system default payment terms OR user's own custom payment terms that he created/added)
Option 1) Add UserId to PaymentTerm, set userid for the user that has added the custom item and system default userid set to null.
INVOICE
Id
...
PaymentTermId
PaymentTerm
Id
Name
UserId (System Default, UserId=null)
Option 2) Add a flag to Invoice "IsPaymentTermCustom" and Create a custom table "PAYMENT_TERM_CUSTOM"
INVOICE
Id
...
PaymentTermId
PaymentTermCustomId
IsPaymentTermCustom (True for custom, otherwise false for system default)
PaymentTerm
Id
Name
PAYMENT_TERM_CUSTOM
Id
Name
UserId
Now check via SQL query if the user is using a custom payment term or not, if IsPaymentTermCustom=True, it means the user is using custom payment term otherwise its false.
Option 3) ????
...
As a general rule:
Prefer adding columns to adding tables
Prefer adding rows to adding columns
Generally speaking, the considerations are:
Effects of adding a table
Requires the most changes to the app: You're supporting a new kind of "thing"
Requires more complicated SQL: You'll have to join to it somehow
May require changes to other tables to add a foreign key column referencing the new table
Impacts performance because more I/O is needed to join to and read from the new table
Note that I am not saying "never add tables". Just know the costs.
Effects of adding a column
Can be expensive to add a column if the table is large (can take hours for the ALTER TABLE ADD COLUMN to complete and during this time the table wil be locked, effectively bringing your site "down"), but this is a one-time thing
The cost to the project is low: Easy to code/maintain
Usually requires minimal changes to the app - it's a new aspect of a thing, rather than a new thing
Will perform with negligible performance difference. Will not be measurably worse, but may be a lot faster depending on the situation (if having the new column avoids joining or expensive calculations).
Effects of adding rows
Zero: If your data model can handle your new business idea by just adding more rows, that's the best option
(Pedants kindly refrain from making comments such as "there is no such thing as 'zero' impact", or "but there will still be more disk used for more rows" etc - I'm talking about material impact to the DB/project/code)
To answer the question: Option 1 is best (i.e. add a column to the payment option table).
The reasoning is based on the guidelines above and this situation is a good fit for those guidelines.
Further,
I would also store "standard" payment options in the same table, but with a NULL userid; that way you only have to add new payment options when you really have one, rather than for every customer even if they use a standard one.
It also means your invoice table does not need changing, which is a good thing - it means minimal impact to that part of your app.
It seems to me that there are merely "Payment Terms" and "Users". The decision of what are the "Default" payment terms is a business rule, and therefore would be best represented in the business layer of your application.
Assuming that you would like to have a set of pre-defined "default" payment terms present in your application from the start, these would already be present in the payment terms table. However, I would put a reference table in between USERS and PAYMENT TERMS:
USERS:
user-id
user_namde
USER_PAYMENT_TERMS:
userID
payment_term_id
PAYMENT_TERMS:
payment_term_id
payment_term
Your business layer should offer up to the user (or more likely, the administrator) through a GUI the ability to:
Assign 0 to many payment term options to a particular user (some
users may not want one of the defaults to even be available, for
example.
Add custom payment terms, which then become available for assignment to one or more users (but which avoids the creation of duplicate payment terms by different users)
Allows the definition of a custom payment term to be assigned to more than one user (say the user's company a unique payment process which requires all of their users to utilize a payment term other than one of the defaults? Create the custom term once, and assign to all users.
Your application business layer would establish rules governing access to payment terms, which could then be accessed by your user interface.
Your UI would then (again, likely through an administrator function) allow the set up of one or more payment terms in addition to the standards you describe, and then make them available to one or more users through something like a checked list box (for example).
Option 1 is definately better for the following reasons:-
Correctness
You can implement a database constraint for uniqueness of the payment term name
You can implement a foreign key constraint from Invoice to PaymentTerm
Ease of Use
Conducting queries will be much simplier because you will always join from Invoice to PaymentTerm rather than requiring a more complex join. Most of the time when you select you will not care if it is an inbuilt or custom payment term. The optimizer will have an easier time with a normal join instead of one that depends on another column to decide which table to join.
Easier to display a list of PaymentTerms coming from one table
We use Option 1 in our data-model quite alot.
Part of the problem, as I see it, is that different payment terms lead to different calculations, too. If I were still in the welding supply business, I'd want to add "2% 10 NET 30", which would mean 2% discount if the payment is made in full within 10 days, otherwise, net 30."
Setting that issue aside, I think ownership of the payment terms makes sense. Assume that the table of users (not shown) includes the user "system" as, say, user_id 0.
create table payment_terms (
payment_term_id integer primary key,
payment_term_owner_id integer not null references users (user_id),
payment_term_desc varchar(30) not null unique,
);
insert into payment_terms values (1, 0, 'Net 10');
insert into payment_terms values (2, 0, 'Net 15');
...
insert into payment_terms values (5, 1, '2% 10, Net 30');
This keeps foreign keys simple, and it makes it easy to select payment terms at run time for presentation in the user interface.
Be very careful here. You probably want to store the description, not the ID number, with your invoices. (It's unique; you can set a foreign key reference to it.) If you store only the ID number, updating a user's custom description might subtly corrupt all the data that references it.
For example, let's say that the user created a custom payment term number 5, '2% 10, Net 30'. You store the ID number 5 in your table of invoices. Then the user decides that things will be different starting today, and updates that description to '2% 10, Net 20'. Now on all your past invoices, the arithmetic no longer matches the payment terms.
Your auditor will kill you. Twice.
You'll want to prevent ordinary users from deleting rows owned by the system user. There are several ways to do that.
Use a BEFORE DELETE trigger.
Add another table with foreign key references to the rows owned by the system user.
Restrict all access through stored procedures that prevent deleting system rows.
(And flags are almost never the best idea.)
Applying general rules of database design to the problem at hand:
one table for system payment terms
one table for user payment terms
a view of join of the two above
Now you can join invoice on the view of payment terms.
Benefits:
No flag columns
No nulls
You separate system defaults from user data
Things become straight forward for the db

Calculated Measure aggregating on certain cells only

I'm trying to figure out how I can create a calculated measure that produces a count of only unique facts in my fact table. My fact table basically stores events from a historical perspective. But I need the measure to filter out redundant events.
Using sales as an example(Since all material around OLAP always uses sales in examples):
The fact table stores sales EVENTS. When a sale is first made it has a unique sales reference which is a column in the fact table. A unique sale however can be amended(Items added or returned) or completely canceled. The fact table stores these changes to a sale as different rows.
If I create a count measure using SSAS I get a count of all sales events which means an unique sale will be counted multiple times for every change made to it (Which in some reports is desirable). However I also want a measure that produces a count of unique sales rather than events but not just based on counting unique sales references. If the user filters by date then they should see unique sales that still exist on that date (If a sale was canceled by that date if should not be represented in the count at all).
How would I do this in MDX/SSAS? It seems like I need have a count query work from a subset from a query that finds the latest change to a sale based on the time dimension.
In SQL it would be something like:
SELECT COUNT(*) FROM SalesFacts FACT1 WHERE Event <> 'Cancelled' AND
Timestamp = (SELECT MAX(Timestamp) FROM SalesFact FACT2 WHERE FACT1.SalesRef=FACT2.SalesRef)
Is it possible or event performant to have subqueries in MDX?
In SSAS, create a measure that is based on the unique transaction ID (The sales number, or order number) then make that measure a 'DistinctCount' aggregate function in the properties window.
Now it should count distinct order numbers, under whichever dimension slice it finds itself under.
The posted query might probably be rewritten like this:
SELECT COUNT(DISTINCT SalesRef)
FROM SalesFacts
WHERE Event <> 'Cancelled'
An simple answer would be just to have a 'sales count' column in your fact view / dsv query that supplies a 1 for an 'initial' event, a zero for all subsiquent revisions to the event and a -1 if the event is cancelled. This 'journalling' approach plays nicely with incremental fact table loads.
Another approach, probably more useful in the long run, would be to have an Events dimension: you could then expose a calculated measure that was the count of the members in that dimension non-empty over a given measure in your fact table. However for sales this is essentially a degenerate dimension (a dimension based on a fact table) and might get very large. This may be inappropriate.
Sometimes the requirements may be more complicated. If you slice by time, do you need to know all the distinct events that existed then, even if they were later cancelled? That starts to get tricky: there's a recent post on Chris Webb's blog where he talks about one (slightly hairy) solution:
http://cwebbbi.wordpress.com/2011/01/22/solving-the-events-in-progress-problem-in-mdx-part-2role-playing-measure-groups/

should i consolidate these database tables .

i have an event calendar application with a sql database behind it and right now i have 3 tables to represent the events:
Table 1: Holiday
Columns: ID, Date, Name, Location, CalendarID
Table 2: Vacation
Columns: Id, Date, Name, PersonId, WorkflowStatus
Table 3: Event
Columns: Id, Date, Name, CalendarID
So i have "generic events" which go into the event tableand special events like holidays and vacation that go into these separate tables. I am debating consolidating these into a single table and just having columns like location and personid blank for the generic events.
Table 1: Event:
Columns : Id, Date, Name, Location, PersonId, WorkflowStatus
does anyone see any strong positives or negative to each option. Obviously there will be records that have columns that dont necessarily apply but it there is overlap with these three tables.
Either way you construct it, the application will have to cope with variant types. In such a situation I recommend that you use a single representation in the DBM because the alternative is to require a multiplicity of queries.
So it becomes a question of where you stick the complexity and even in a huge organization, it's really hard to generate enough events to worry about DBMS optimization. Application code is more flexible than hardwired schemata. This is a matter of preference.
If it were my decision, i'd condense them into one table. I'd add a column called "EventType" and update that as you import the data into the new table to specify the type of event.
That way, you only need to index one table instead of three (if you feel indexes are required), the data is all in one table, and the queries to get the data out would be a little more concise because you wouldn't need to union all three tables together to see what one person has done. I don't see any downside to having it all in one table (although there will probably be one that someone will bring up that i haven't thought of).
How about sub-typing special events to an Event supertype? This way it is easy to later add any new special events.
Data integrity is the biggest downside of putting them in one table. Since these all appear to be fields that would be required, you lose the ability to require them all by default and would have to write a trigger to make sure that data integrity was maintained properly (Yes, this must be maintained in the database and not, as some people believe, by the application. Unless of course you want to have data integrity problems.)
Another issue is that these are the events you need now and there may be more and more specialized events in the future and possibly breaking code for one type of event because you added another specialized field that only applies to something else is a big risk. When you make a change to add some required vacation information, will you be sure to check that it doesn't break the application concerning holidays? Or worse not error out but show information you didn't want? Are you going to look at the actual screen everytime? Unit testing just of code may not pick up this type of thing especially if someone was foolish enough to use select * or fail to specify columns in an insert. And frankly not every organization actually has a really thorough automated test process in place (it could be less risk if you do).
I personally would tend to go with Damir Sudarevic's solution. An event table for all the common fields (making it easy to at least get a list of all events) and specialized tables for the fields not held in common, making is simpler to write code that affects only one event and allowing the database to maintain its integrity.
Keep them in 3 separate tables and do a UNION ALL in a view if you need to merge the data into one resultset for consumption. How you store the data on disk need not be identical to how you need to consume the data so long as the performance is adequate.
As you have it now there are no columns that do not apply for any of the presented entities. If you were to merge the 3 tables into one you'd have to add a field at the very least to know which columns to expect to be populated and reduce your performance. Now when you query for a holiday alone you go to a subset of the data that you would have to sift through / index to get at the same data in a merged storage table.
If you did not already have these tables defined you could consider creating one table with the following signature...
create table EventBase (
Id int PRIMARY KEY,
Date date,
Name varchar(50)
)
...and, say, the holiday table with the following signature.
create table holiday (
Id int PRIMARY KEY,
EventId int,
Location varchar(50),
CalendarId int
)
...and join the two when you needed to do so. Choosing between this and the 3 separate tables you already have depends on how you plan on using the tables and volume but I would definitely not throw all into a single table as is and make things less clear to someone looking at the table definition with no other initiation.
Or combine the common fields and separate out the unique ones:
Table 1: EventCommon
Columns: EventCommonID, Date, Name
Table 2: EventOrHoliday
Columns: EventCommonID, CalendarID, isHoliday
Table3: Vacation
Columns: EventCommonID, PersonId, WorkflowStatus
with 1->many relationships between EventCommon and the other 2.

Resources