How do you handle "special-case" data when modeling a database?

How do you handle "special-case" data when modeling a database? - database

Our organization provides a variety of services to our clients (e.g., web hosting, tech support, custom programming, etc...). There's a page on our website that lists all available services and their corresponding prices. This was static data, but my boss wants it all pulled from a database instead.
There are about 100 services listed. Only two of them, however, have a non numeric value for "price" (specifically, the strings "ISA" and "cost + 8%" - I really don't know what they're supposed to mean, so don't ask me).
I'd hate to make the "price" column a varchar just because of these two listings. My current approach is to create a special "price_display" field, which is either blank or contains the text to display in place of the price. This solution feels too much like a dirty hack though (it would needlessly complicate the queries), so is there a better solution?

Consider that this column is a price displayed to the customer that can contain anything.
You'd be inviting grief if you try to make it a numeric column. You're already struggling with two non-conforming values, and tomorrow your boss might want more...
PRICE ON APPLICATION!
CALL US FOR TODAYS SPECIAL!!
You get the idea.
If you really need a numeric column then call it internalPrice or something, and put your numeric constraints on that column instead.

When I have had to do this sort of thing in the past I used:
Price Unit Display
10.00 item null
100.00 box null
null null "Call for Pricing"
Price would be decimal datatype (any exact numeric, not float or real), unit and display would be some type of string data type.
Then used the case statement to display the price with either the price per unit or the display. Also put a constraint or trigger on the display column so that it must be null unless price is null. A constraint or trigger should also require a value in unit if price is not null.
This way you can calcuate prices for an order where possible and leave them out when the price is not specified but display both. I'd also put in a busness rule to make sure the total could not be totalled until the call for pricing was resolved (which you would also have to have a way to insert the special pricing to the order details rather than just pull from the price table).

Ask yourself...
Will I be adding these values? Will I be sorting by price? Will I need to convert to other currency values?
OR
Will I just be displaying this value on a web page?
If this is just a laundry list and not used for computation the simplest solution is to store price as a string (varchar).

Perhaps use a 'type' indicator in the main table, with one child table allowing numeric price and another with character values. These could be combined into one table, but I generally avoid that. You could also use an intermediate link table with a quantity if you ever want to base price on quantity purchased.

Lots of choices:
All prices stored as varchars
Prices stored numerically and extra price_display field that overrides the number if populated
Prices stored numberically and extra price_display field for display purposes populated manually or on trigger when numeric price is updated (duplication of data and it could get out of sync - yuk)
Store special case negative prices that map to special situations (simply yuk!!)
varchar price, prefix key field to a table of available prefixes ('cost +', ...), suffix key field to a table of available suffixes, type field key to a list of types for the value in price ('$', '%', 'description'). Useful if you'd need to write complex queries against prices in the future.
I'd probably go for 2 as a pragmatic solution, and an extension of 5 if I needed something very general for a generic pricing system.

If this is the extent of your data model, then a varchar field is fine. Your normal prices - decimal as they may be - are probably useless for calculations anyway. How do you compare $10/GB for "data transfer" and $25/month for "domain hosting"?
Your data model for this particular project isn't about pricing, but about displaying pricing. Design with that in mind.
Of course - if you're storing the price a particular customer paid for a particular project, or trying to figure out what to charge a particular customer - then you have a different (more complex) domain. And you'll need a different model to support that.

In that at least one of the alternate prices have a number involved, what about a Price column, a price type? The normal entries could be a number for the dollar value and type 'dollar', and the other could be 8 and 'PercentOverCost' and null and 'ISA' (for the Price and PriceType column).
You should probably have a PriceType table to validate and PriceTypeID if you go this route.
This would allow other types of pricing to be added in the future (unit pricing, foriegn currancy), give you a number, and also make it easier to know what type pricing you are dealing with..

http://books.google.co.in/books?id=0f9oLxovqIMC&pg=PA352&lpg=PA352&dq=How+do+you+handle+%E2%80%9Cspecial-case%E2%80%9D+data+when+modeling+a+database%3F&source=bl&ots=KxN9eRgO9q&sig=NqWPvxceNJPoyZzVS4AUtE-FF5c&hl=en&ei=V3RlSpDtI4bVkAWkzbHNDg&sa=X&oi=book_result&ct=result&resnum=3

Related

Database / table structure - similar entries vs. too much normalization?

I have designed this relational database that is keeping track of various assets and their owners over time. One of the most important piece of analysis I want to do is to track the value of those assets over time: expected original cost, actual original cost, actual cost, etc. So I have been putting data relative to a cost / value in a separate table called “Support_Value”. To complicates things some of the assets I’m tracking are in countries with foreign currencies so I’m collecting cost / value data in US Dollars but also in local currencies (“LC”), which ends up doubling the number of columns I have in this table. I also use this table as a way to keep track of the value of the asset owners themselves in a similar fashion.
- The columns of this table are the following:
My initial plan was to carve out separate tables to deal with (1) the various “qualities” of entries relative to cost and value (i.e. the “planned”, “upper” bound, “lower” bound”, “estimated” by analysts, and “actual” and another table to track) and (2) another one for currencies. But I realize this is likely to break as it doesn’t allow to have an initial “planned” cost that is then subsequently revised unless we make it explicit by creating new column for revised appendages but then there can be more than one revision.. So still not perfect.
What I’m now envisaging is to create a different value table that would have the following columns:
ID (PK representing individual instances of cost / value estimates)
Currency (FK to my currency table)
Asset (FK to my assets table) - i.e. what this cost or value is referring to
Date (FK to my date table) - i.e. to track revisions actually
Type (i.e. “cost" or “value")
Quality (i.e. “planned”, “upper”, “lower”, “estimated”, “actual”)
Valuation - i.e. the actual absolute amount in the currency designated in the second column
What do think of this approach? Is this an improvement?
Thanks for any suggestion you could have!

Both approaches are fine.
But, if you think you may need additional similar columns,
then the second aproach is more extensible.
Your second approach, it does look it has overnormalization,
I suggest split the "Quality" column back to its parts.
Some thing like:
"ID"
"Currency"
"Asset"
"Date"
"Type"
"Planned"
"Lower"
"Upper"
"Estimated"
"Actual"
"Valuation"
Cheers.

Database Design (Inventory DB)

I'm looking to design an inventory database that tracks a snack bar. As this would be a single person/computer access and need to be easily movable to another system, I plan to use SQLite as the DB engine. The basic concept is to track inventory bought from a wholesale warehouse such as Sams Club, and then keep track of the inventory.
The main obstacle I'm trying to overcome is how to track bulk vs individual items in the products database. For example if a bulk item is purchased, let us say a 24 pack of coke, how do I maintain in the product database, the bulk item and that it contains 24 of the individual items. The solution would be fairly easy if all bulk items only contained multiple of 1 item, but in variety packs, such as a carton of chips that contains 5 different individual items all with separate UPCs, the solution becomes a bit more difficult.
So far I have come up with the multiple pass approach where the DB would be scanned multiple times to obtain all of the information.
Product_Table
SKU: INT
Name: TEXT
Brand: TEXT
PurchasePrice: REAL
UPC: BIGINT
DESC: TEXT
BULK: BOOLEAN
BulkList: TEXT // comma separated list of SKUs for each individual item
BulkQty: TEXT // comma separated list corresponding to each SKU above representing the quantity
Transaction_Table
SKU: INT
Qty: INT
// Other stuff but that is the essential
When I add a bulk item to the inventory (A Positive Quantity Transaction), it should instead add all of it's individual items, as I can't think of any time I would keep in stock to sell the bulk item. I would like to keep the bulk items in the database however, to help receiving and adding them into the inventory.

one way to do it is to create a 1:N mapping between bulk objects and their contents:
create table bulk_item (
bulk_product_id integer not null,
item_product_id integer not null,
qty integer not null,
primary key(bulk_product_id, item_product_id),
foreign key(bulk_product_id) references product(sku),
foreign key(item_product_id) references product(sku)
);
A comma-separated list is certainly fine (it might make it harder to do certain queries such as find all bulk objects that contain this SKU etc...).

I have to both agree and disagree with jspcal. I agree with the "bulk_item" table, but I would not say that it's "fine" to use a comma separated list. I suspect that they were only being polite and would not endorse a design that isn't in first normal form.
The design that jspcal has suggested is commonly called "Bill of Materials" and is the only sane way to approach a problem like composite products.
In order to use this effectively with your transaction table, you should include a transaction type code along with the SKU and quantity. There are different reasons why your stock in any given SKU might go up or down. The most common are receiving new stock and customers buying stock. However, there are other things like manual inventory adjustments to take into consideration clerical errors and shrinkage. There are also stock conversions like when you decide to bust up a variety pack into individual products for sale. Don't think you can count on whether the quantity is positive or negative to give you enough information to be able to make sense of your inventory levels and how (and why) they've changed.

How to handle an immutable table referencing mutable tables?

In making a pretty standard online store in .NET, I've run in to a bit of an architectural conundrum regarding my database. I have a table "Orders", referenced by a table "OrderItems". The latter references a table "Products".
Now, the orders and orderitems tables are in most aspects immutable, that is, an order created and its orderitems should look the same no matter when you're looking at the tables (for instance, printing a receipt for an order for bookkeeping each year should yield the same receipt the customer got at the time of the order).
I can think of two ways of achieving this behavior, one of which is in use today:
1. Denormalization, where values such as price of a product are copied to the orderitem table.
2. Making referenced tables immutable. The code that handles products could create a new product whenever a value such as the price is changed. Mutable tables referencing the products one would have their references updated, whereas the immutable ones would be fine and dandy with their old reference
What is your preferred way of doing this? Is there a better, more clever way of doing this?

It depends. I'm writing on a quite complex enterprise software that includes a kind of document management and auditing and is used in pharmacy.
Normally, primitive values are denormalized. For instance, if you just need a current state of the customer when the order was created, I would stored it to the order.
There are always more complex data that that need to be available of almost every point in time. There are two approaches: you create a history of them, or you implement a revision control system, which is almost the same.
The history means that every state that ever existed is stored as a separate record, in the same or another table.
I implemented a revision control system, where I split records into two tables, one for the actual item, lets say a product, and the other one for its versions. This way I can reference the product as a whole, or any specific version of it, because both have its own primary key.
This system is used for many entities. I can safely reference an object under revision control from audit trail for instance or other immutable records. At the beginning it seems to be more complex to have such a system, but at the end it is very straight forward and solves many problems at once.

Storing the price in both the Product table and the OrderItem table is NOT denormalizing if the price can change over time. Normalization rules say that every "fact" should be recorded only once in the database. But in this case, just because both numbers are called "price" doesn't make them the same thing. One is the current price, the other is the price as of the date of the sale. These are very different things. Just like "customer zip code" and "store zip code" are completely different fields; the fact that both might be called "zip code" for short does not make them the same thing. Personally, I have a strong aversion to giving fields that hold different data the same name because it creates confusion. I would not call them both "Price": I would call one "Current_Price" and the other "Sale_Price" or something like that.
Not keeping the price at the time of the sale is clearly wrong. If we need to know this -- which we almost surely do -- than we need to save it.
Duplicating the entire product record for every sale or every time the price changes is also wrong. You almost surely have constant data about a product, like description and supplier, that does not change every time the price changes. If you duplicate the product record, you will be duplicating all this data, which definately IS denormalization. This creates many potential problems. Like, if someone fixes a spelling error in the product description, we might now have the new record saying "4-slice toaster" while the old record says "4-slice taster". If we produce a report and sort on the description, they'll get separated and look like different products. Etc.
If the only data that changes about the product and that you care about is the price, then I'd just post the price into the OrderItem record.
If there's lots of data that changes, then you want to break the Product table into two tables: One for the data that is constant or whose history you don't care about, and another for data where you need to track the history. Like, have a ProductBase table with description, vendor, stock number, shipping weight, etc.; and a ProductMutable table with our cost, sale price, and anything else that routinely changes. You probably also want an as-of date, or at least an indication of which is current. The primary key of ProductMutable could then be Product_id plus As_of_date, or if you prefer simple sequential keys for all tables, fine, it at least has a reference to product_id. The OrderItem table references ProductMutable, NOT ProductBase. We find ProductBase via ProductMutable.

I think Denormalization is the way to go.
Also, Product should not have price (when it changes from time to time & when price mean different value to different people -> retailers, customers, bulk sellers etc).
You could also have a price history table where it contains ProductID, FromDate, ToDate, Price, IsActive - to maintain the price history for a product.

Should I make specification table referenceable?

Since I know there are lots of expert database core designers here, I decided to ask this question on stackoverflow.
I'm developing a website whose main concern is to index every product that is available in the real world, like digital cameras, printers, refrigerators, and so on. As we know, each product has its own specifications. For example, a digital camera has its weight, lens, speed of shutter, etc. Each Specification has a type. For example, price (I see it like a spec) is a number.
I think the most standard way is to create whatever specs are needed for a specified product with its proper type and assign it to the product. So for each separate product PRICE has to be created and the type number should be set on it.
So here is my question, Is it possible to have a table for specs with all specs in it so for example PRICE with type of number has been created before and just need to search for the price in the table and assign it to the product. The problem with this method is I don't see a good way to prevent the user from creating duplicate entries. He has to be able to find the spec he needs (if it's been added before), and I also want him to know that the spec he finds is actually is the one he needed, since there may be some specs with the same name but different type and usage. If he doesn't find it, he will create it.
Any ideas?
---------------------------- UPDATE ----------------------------
My question is not about db flexibility. I think that in the second method users will mess the specs table up! They will create thousand of duplicate entries and also i think they wont find their proper specs.

I have just finished answering Dynamic Table Generation
which discusses a similar problem. Take a look at the observation pattern. If you replace "observation" by "specification" and "subject" by "product" you may find this model useful -- you will not need Report and Rep_mm_Obs tables.

My suggested data model based on your requirements:
SPECIFICATIONS table
SPECIFICATION_ID, pk
SPECIFICATION_DESCRIPTION
This allows you to have numerous specifications, without being attached to an item.
ITEM_SPECIFICATION_XREF table
ITEM_ID, pk, fk to ITEMS table
SPECIFICATION_ID, pk, fk to SPECIFICATIONS table
VALUE, pk
Benefits:
Making the primary key to be a composite ensures the set of values will be unique throughout the table. Blessing or curse, an item with a given specification could have values 0.99 and 1.00 - these would be valid.
This setup allows for a specification to be associated with 0+ items.

How do you manage "pick lists" in a database

I have an application with multiple "pick list" entities, such as used to populate choices of dropdown selection boxes. These entities need to be stored in the database. How do one persist these entities in the database?
Should I create a new table for each pick list? Is there a better solution?

In the past I've created a table that has the Name of the list and the acceptable values, then queried it to display the list. I also include a underlying value, so you can return a display value for the list, and a bound value that may be much uglier (a small int for normalized data, for instance)
CREATE TABLE PickList(
ListName varchar(15),
Value varchar(15),
Display varchar(15),
Primary Key (ListName, Display)
)
You could also add a sortOrder field if you want to manually define the order to display them in.

It depends on various things:
if they are immutable and non relational (think "names of US States") an argument could be made that they should not be in the database at all: after all they are simply formatting of something simpler (like the two character code assigned). This has the added advantage that you don't need a round trip to the db to fetch something that never changes in order to populate the combo box.
You can then use an Enum in code and a constraint in the DB. In case of localized display, so you need a different formatting for each culture, then you can use XML files or other resources to store the literals.
if they are relational (think "states - capitals") I am not very convinced either way... but lately I've been using XML files, database constraints and javascript to populate. It works quite well and it's easy on the DB.
if they are not read-only but rarely change (i.e. typically cannot be changed by the end user but only by some editor or daily batch), then I would still consider the opportunity of not storing them in the DB... it would depend on the particular case.
in other cases, storing in the DB is the way (think of the tags of StackOverflow... they are "lookup" but can also be changed by the end user) -- possibly with some caching if needed. It requires some careful locking, but it would work well enough.

Well, you could do something like this:
PickListContent
IdList IdPick Text
1 1 Apples
1 2 Oranges
1 3 Pears
2 1 Dogs
2 2 Cats
and optionally..
PickList
Id Description
1 Fruit
2 Pets

I've found that creating individual tables is the best idea.
I've been down the road of trying to create one master table of all pick lists and then filtering out based on type. While it works, it has invariably created headaches down the line. For example you may find that something you presumed to be a simple pick list is not so simple and requires an extra field, do you now split this data into an additional table or extend you master list?
From a database perspective, having individual tables makes it much easier to manage your relational integrity and it makes it easier to interpret the data in the database when you're not using the application

We have followed the pattern of a new table for each pick list. For example:
Table FRUIT has columns ID, NAME, and DESCRIPTION.
Values might include:
15000, Apple, Red fruit
15001, Banana, yellow and yummy
...
If you have a need to reference FRUIT in another table, you would call the column FRUIT_ID and reference the ID value of the row in the FRUIT table.

Create one table for lists and one table for list_options.
# Put in the name of the list
insert into lists (id, name) values (1, "Country in North America");
# Put in the values of the list
insert into list_options (id, list_id, value_text) values
(1, 1, "Canada"),
(2, 1, "United States of America"),
(3, 1, "Mexico");

To answer the second question first: yes, I would create a separate table for each pick list in most cases. Especially if they are for completely different types of values (e.g. states and cities). The general table format I use is as follows:
id - identity or UUID field (I actually call the field xxx_id where xxx is the name of the table).
name - display name of the item
display_order - small int of order to display. Default this value to something greater than 1
If you want you could add a separate 'value' field but I just usually use the id field as the select box value.
I generally use a select that orders first by display order, then by name, so you can order something alphabetically while still adding your own exceptions. For example, let's say you have a list of countries that you want in alpha order but have the US first and Canada second you could say "SELECT id, name FROM theTable ORDER BY display_order, name" and set the display_order value for the US as 1, Canada as 2 and all other countries as 9.
You can get fancier, such as having an 'active' flag so you can activate or deactivate options, or setting a 'x_type' field so you can group options, description column for use in tooltips, etc. But the basic table works well for most circumstances.

Two tables. If you try to cram everything into one table then you break normalization (if you care about that). Here are examples:
LIST
---------------
LIST_ID (PK)
NAME
DESCR
LIST_OPTION
----------------------------
LIST_OPTION_ID (PK)
LIST_ID (FK)
OPTION_NAME
OPTION_VALUE
MANUAL_SORT
The list table simply describes a pick list. The list_ option table describes each option in a given list. So your queries will always start with knowing which pick list you'd like to populate (either by name or ID) which you join to the list_ option table to pull all the options. The manual_sort column is there just in case you want to enforce a particular order other than by name or value. (BTW, whenever I try to post the words "list" and "option" connected with an underscore, the preview window goes a little wacky. That's why I put a space there.)
The query would look something like:
select
b.option_name,
b.option_value
from
list a,
list_option b
where
a.name="States"
and
a.list_id = b.list_id
order by
b.manual_sort asc
You'll also want to create an index on list.name if you think you'll ever use it in a where clause. The pk and fk columns will typically automatically be indexed.
And please don't create a new table for each pick list unless you're putting in "relationally relevant" data that will be used elsewhere by the app. You'd be circumventing exactly the relational functionality that a database provides. You'd be better off statically defining pick lists as constants somewhere in a base class or a properties file (your choice on how to model the name-value pair).

Depending on your needs, you can just have an options table that has a list identifier and a list value as the primary key.
select optionDesc from Options where 'MyList' = optionList
You can then extend it with an order column, etc. If you have an ID field, that is how you can reference your answers back... of if it is often changing, you can just copy the answer value to the answer table.

If you don't mind using strings for the actual values, you can simply give each list a different list_id in value and populate a single table with :
item_id: int
list_id: int
text: varchar(50)
Seems easiest unless you need multiple things per list item

We actually created entities to handle simple pick lists. We created a Lookup table, that holds all the available pick lists, and a LookupValue table that contains all the name/value records for the Lookup.
Works great for us when we need it to be simple.

I've done this in two different ways:
1) unique tables per list
2) a master table for the list, with views to give specific ones
I tend to prefer the initial option as it makes updating lists easier (at least in my opinion).

Try turning the question around. Why do you need to pull it from the database? Isn't the data part of your model but you really want to persist it in the database? You could use an OR mapper like linq2sql or nhibernate (assuming you're in the .net world) or depending on the data you could store it manually in a table each - there are situations where it would make good sense to put it all in the same table but do consider this only if you feel it makes really good sense. Normally putting different data in different tables makes it a lot easier to (later) understand what is going on.

There are several approaches here.
1) Create one table per pick list. Each of the tables would have the ID and Name columns; the value that was picked by the user would be stored based on the ID of the item that was selected.
2) Create a single table with all pick lists. Columns: ID; list ID (or list type); Name. When you need to populate a list, do a query "select all items where list ID = ...". Advantage of this approach: really easy to add pick lists; disadvantage: a little more difficult to write group-by style queries (for example, give me the number of records that picked value X".
I personally prefer option 1, it seems "cleaner" to me.

You can use either a separate table for each (my preferred), or a common picklist table that has a type column you can use to filter on from your application. I'm not sure that one has a great benefit over the other generally speaking.
If you have more than 25 or so, organizationally it might be easier to use the single table solution so you don't have several picklist tables cluttering up your database.
Performance might be a hair better using separate tables for each if your lists are very long, but this is probably negligible provided your indexes and such are set up properly.
I like using separate tables so that if something changes in a picklist - it needs and additional attribute for instance - you can change just that picklist table with little effect on the rest of your schema. In the single table solution, you will either have to denormalize your picklist data, pull that picklist out into a separate table, etc. Constraints are also easier to enforce in the separate table solution.

This has served us well:
SQL> desc aux_values;
Name Type
----------------------------------------- ------------
VARIABLE_ID VARCHAR2(20)
VALUE_SEQ NUMBER
DESCRIPTION VARCHAR2(80)
INTEGER_VALUE NUMBER
CHAR_VALUE VARCHAR2(40)
FLOAT_VALUE FLOAT(126)
ACTIVE_FLAG VARCHAR2(1)
The "Variable ID" indicates the kind of data, like "Customer Status" or "Defect Code" or whatever you need. Then you have several entries, each one with the appropriate data type column filled in. So for a status, you'd have several entries with the "CHAR_VALUE" filled in.