Since I know there are lots of expert database core designers here, I decided to ask this question on stackoverflow.
I'm developing a website whose main concern is to index every product that is available in the real world, like digital cameras, printers, refrigerators, and so on. As we know, each product has its own specifications. For example, a digital camera has its weight, lens, speed of shutter, etc. Each Specification has a type. For example, price (I see it like a spec) is a number.
I think the most standard way is to create whatever specs are needed for a specified product with its proper type and assign it to the product. So for each separate product PRICE has to be created and the type number should be set on it.
So here is my question, Is it possible to have a table for specs with all specs in it so for example PRICE with type of number has been created before and just need to search for the price in the table and assign it to the product. The problem with this method is I don't see a good way to prevent the user from creating duplicate entries. He has to be able to find the spec he needs (if it's been added before), and I also want him to know that the spec he finds is actually is the one he needed, since there may be some specs with the same name but different type and usage. If he doesn't find it, he will create it.
Any ideas?
---------------------------- UPDATE ----------------------------
My question is not about db flexibility. I think that in the second method users will mess the specs table up! They will create thousand of duplicate entries and also i think they wont find their proper specs.
I have just finished answering Dynamic Table Generation
which discusses a similar problem. Take a look at the observation pattern. If you replace "observation" by "specification" and "subject" by "product" you may find this model useful -- you will not need Report and Rep_mm_Obs tables.
My suggested data model based on your requirements:
SPECIFICATIONS table
SPECIFICATION_ID, pk
SPECIFICATION_DESCRIPTION
This allows you to have numerous specifications, without being attached to an item.
ITEM_SPECIFICATION_XREF table
ITEM_ID, pk, fk to ITEMS table
SPECIFICATION_ID, pk, fk to SPECIFICATIONS table
VALUE, pk
Benefits:
Making the primary key to be a composite ensures the set of values will be unique throughout the table. Blessing or curse, an item with a given specification could have values 0.99 and 1.00 - these would be valid.
This setup allows for a specification to be associated with 0+ items.
Related
In our database design we have a couple of tables that describe different objects but which are of the same basic type. As describing the actual tables and what each column is doing would take a long time I'm going to try to simplify it by using a similar structured example based on a job database.
So say we have following tables:
These tables have no connections between each other but share identical columns. So the first step was to unify the identical columns and introduce a unique personId:
Now we have the "header" columns in person that are then linked to the more specific job tables using a 1 to 1 relation using the personId PK as the FK. In our use case a person can only ever have one job so the personId is also unique across the Taxi driver, Programmer and Construction worker tables.
While this structure works we now have the use case where in our application we get the personId and want to get the data of the respective job table. This gets us to the problem that we can't immediately know what kind of job the person with this personId is doing.
A few options we came up with to solve this issue:
Deal with it in the backend
This means just leaving the architecture as it is and look for the right table in the backend code. This could mean looking through every table present and/or construct a semi-complicated join select in which we have to sift through all columns to find the ones which are filled.
All in all: Possible but means a lot of unecessary selects. We also would like to keep such database oriented logic in the actual database.
Using a Type Field
This means adding a field column in the Person table filled for example with numbers to determine the correct child table like:
So you could add a 0 in Type if it's a taxi driver, a 1 if it's a programmer and so on...
While this greatly reduced the amount of backend logic we then have to make sure that the numbers we use in the Type field are known in the backend and don't ever change.
Use separate IDs for each table
That means every job gets its own ID (has to be nullable) in Person like:
Now it's easy to find out which job each person has due to the others having an empty ID.
So my question is: Which one of these designs is the best practice? Am i missing an obvious solution here?
Bill Karwin made a good explanation on a problem similar to this one. https://stackoverflow.com/a/695860/7451039
We've now decided to go with the second option because it seem to come with the least drawbacks as described by the other commenters and posters. As there was no actual answer portraying the second option as a solution i will try to summarize our reasoning:
Against Option 1:
There is no way to distinguish the type from looking at the parent table. As a result the backend would have to include all logic which includes scanning all tables for the that contains the id. While you can compress most of the logic into a single big Join select it would still be a lot more logic as opposed to the other options.
Against Option 3:
As #yuri-g said this one is technically not possible as the separate IDs could not setup as primary keys. They would have to be nullable and as a result can't be indexed, essentially rendering the parent table useless as one of the reasons for it was to have a unique personID across the tables.
Against a single table containing all columns:
For smaller use cases as the one i described in the question this might me viable but we are talking about a bunch of tables with each having roughly 2-6 columns. This would make this option turn into a column-mess really quickly.
Against a flat design with a key-value table:
Our properties have completly different data types, different constraints and foreign key relations. All of this would not be possible/difficult in this design.
Against custom database objects containt the child specific properties:
While this option that #Matthew McPeak suggested might be a viable option for a lot of people our database design never really used objects so introducing them to the mix would likely cause confusion more than it would help us.
In favor of the second option:
This option is easy to use in our table oriented database structure, makes it easy to distinguish the proper child table and does not need a lot of reworking to introduce. Especially since we already have something similar to a Type table that we can easily use for this purpose.
Third option, as you describe it, is impossible: no RDBMS (at least, of I personally know about) would allow you to use NULLs in PK (even composite).
Second is realistic.
And yes, first would take up to N queries to poll relatives in order to determine the actual type (where N is the number of types).
Although you won't escape with one query in second case either: there would always be two of them, because you cant JOIN unless you know what exactly you should be joining.
So basically there are flaws in your design, and you should consider other options there.
Like, denormalization: line non-shared attributes into the parent table anyway, then fields become nulls for non-correpondent types.
Or flexible, flat list of attribute-value pairs related through primary key (yes, schema enforcement is a trade-off).
Or switch to column-oriented DB: that's a case for it.
So because I have 4 different product types (books, magazines, gifts, food) I can't just put all products in one "products" table without having a bunch of null values. So I decided to break each product up into their own tables but I know this is just wrong (https://c1.staticflickr.com/1/742/23126857873_438655b10f_b.jpg).
I also tried creating an EAV model for this (https://c2.staticflickr.com/6/5734/23479108770_8ae693053a_b.jpg), but I got stuck as I'm not sure how to link the publishers and authors tables.
I know this question has been asked a lot but I don't understand ANY of the answer's I've seen. I think this is because I'm a very visual learner and this makes it hard to understand what's being talked about when not a lot of information is given.
Your model is on the right track, except that the product name should be sufficient you don't need Gift name, book name etc. What you put in those tables is the information that is specific to the type of product that the other products don't need. The Product table contains all the common fields. I would use productid in the child tables rather than renaming it giftID, magazineID etc. It is easier to remember what things are celled when you are consistent in nameing them.
Now to be practical, you put as much as you can into the product table especially if you are going to do calculations. I prefer the child tables in this specific case to have what is mostly display information. So product contains the product name, the cost, the type of product, the units the product is sold in etc. The stuff that generally is needed to calculate the cost of an order or to have a report of what was ordered. There may be one or two fields that can contain nulls, but it simplifies the calculation type queries so much it might be worth it.
The meat of the descriptive details though would go in the child table for the type of product. These would usually only be referenced when displaying the product in the shopping area and only one at a time, so you can use the product type to let you only join to the one child table you need for display. So while the order cares about the product number and name and cost calculations, it probably doesn't need to go line by line describing the book ISBN number or the megapixels in a camera. But the description page of the product does need those things.
This approach is not purely relational, although it mostly is, but it does group the information by the meanings of the data and how they will be used which will make the database easier to understand and query. I am a big fan of relational tables because database just work better when they hit at least the third normal form but sometimes you can go too far for practicality, so the meaning of the data and the way you are grouping to use the data (and not just for the user interface, but for later reporting as well) is almost always one of my considerations in design.
Breaking each product type into its own table is fine - let the child tables use the same id as the parent Product table, and create views for the child tables that join with Product
Your case is a classic case of types and subtypes. This is often called class/subclass in object modeling and generalization/specialization in ER modeling. It's a well understood pattern. There are known techniques for dealing with this pattern.
Visit the following tabs, and read the description under the info tab (presented as "learn more"). Also look over the questions grouped under these tags.
single-table-inheritance class-table-inheritance shared-primary-key
If you want to rean in more depth use these buzzwords to search for articles on the web.
You've already discovered and discarded single table inheritance on your own. Other answers have pointed you at shared primary key. Class table inheritance involves a single table for generalized data as well as the four specialized tables. Shared primary key is generally used in conjunction with class table inheritance.
I would like to create a database for MTG cards I own. What would the design be?
I would like to store the following information about each card:
1. Name of card.
2. Set the card belongs to.
3. Condition of card.
4. Price it sold for.
5. Price I bought it for.
Here is a little info about MTG cards in general:
1. Every card has a name.
2. Every card belongs to a set.
3. A card may have a foil version of itself.
4. Card name, set it belongs to, and whether it's foil or not makes it unique.
5. A card may be in multiple sets.
6. A set has multiple cards.
The gimmick is that in my collection I may have several copies of the same card but with different conditions, or purchased price, or sold price may be different.
I will have another collection of mtg cards that have been sold on eBay. This collection will have the price/condition/date/whether it was a "Buy It Now" or Bid, etc.
The idea is to find out what price I should sell my cards based on the eBay collection.
It's not a programming question, it's a modeling question. Anyone who is programming but not modeling, is a coder, not a programmer. That's just a step above data entry. Modeling is a fundamental aspect of programming as it deals directly with abstraction, and abstraction is the real genius of computer programming.
Normalization and database design is an excellent way for someone to become better at programming in general as normalization is also an abstraction process.
Abstraction is arguably the most difficult aspect of computer programming, particularly since computer programming requires a person to both be especially pedantic and literal (in order to properly work with the steadfast stubbornness and idiocy that is a computer) as well as handle and work in a very high level and abstract space.
For example, the arguments in design meetings are not over language syntax.
So, that said. I have updated the schema in minor ways to address the changes.
create table card (
card_key numeric not null primary key,
name varchar(256) not null,
foil varchar(1) not null); -- "Y" if it's foil, "N" if it is not.
create table set (
set_key numeric not null primary key,
name varchar(256) not null);
create table cardset (
card_key numeric not null references card(card_key),
set_key numeric not null references set(set_key));
create table condition (
condition_key numeric not null primary key,
alias varchar(64),
description varchar(256));
create table saletype (
saletype_key numeric not null primary key,
alias varchar(64),
description varchar(256));
create table singlecard (
singlecard_key numeric not null primary key,
card_key numeric not null references card(card_key),
condition_key numeric not null references condition(condition_key),
purchase_date date,
purchase_price numeric,
saletype_key numeric references saletype(saletype_key),
sell_date date,
sell_price numeric,
notes varchar(4000));
A more detailed explanation.
The card table is the concept of the card vs an actual card. You can have a card row without having any actual cards in hand. It models any details of the card that are common to all cards. Obviously MTG cards have very many details (color text as some one mentioned), but these are likely not important to this kind of model, since this is to track actual cards for the sake of collecting and sale. But if there was any desire to add any other attributes, like card rarity, the 'card' table would be the place to put them.
The set table is for the sets. I don't know what a set is, only what is posited here (there is also casual reference to a series, I don't know if they are related or not). Sets have a name, and are used to group cards. So, we have a 'set' table.
The cardset table is the many-to-many joiner table. Since a set can have several cards, and a card can belong to several sets, the model needs something to represent that relationship. This is a very common pattern in relational databases, but it is also non-obvious to novices.
There are two simple lookup tables, the condition and saletype table. These two tables are here for normalization purposes and let the user standardize their terms for these two categories of data. They each have an 'alias' and a 'description'. The alias is the short english version: 'Good', 'Poor', 'Auction', 'Buy it now', while the description is the longer english text 'Poor cards show sign of wear, bending, and rub marks'. Obviously someone doing this for their own purpose likely do not need the description, but it's there as a habit.
Finally, the meat of the system is the singlecard table. The singlecard table represents an actual, in your hand card. It models all of the characteristic that make each actual card different from each other. An individual card is not a member of a set (at least not from the description), rather that's a higher level concept (such as how it was published -- all "Hero: Bartek the Axe Wielder" cards are part of the "Dark Mysteries" and "Clowns of Death" sets, or whatever). So, the single card needs only reference its parent card table, with the actual common card characteristics.
This single card has the references to the card's condition and how it was sold via the foreign keys to the appropriate tables. It also has the other data, such as the dates and prices that were necessary.
Based on what was given, this should meet the basic need.
It would be a good exercise to remodel this yourself. Start with the your most basic needs, and the best model that you understand to make. Then contrast it to what I've written here, and then use that book to perhaps try and understand how whatever your simple design may have been becomes this design.
Note that there is no way to actually enforce that a card is a member of ANY set, or that a set has any cards. That will be an application logic problem. This is the one of this issues with many-to-many joiner tables. It can model the relationship, but it can not enforce it.
Ok, this isn't really a programming question as such; it's very high-level and you haven't indicated what database you'll be using and what you've tried.
However, just to give you a few points, the first list (in your question) almost certainly represents most of the information that you'll need to store in your database.
What you need to figure out is which combination of those fields can be used to mark a card uniquely from another.
If as you say, the Date of Purchase and the Cost can vary then in the database you choose, you would need to make an index based upon those fields; this will give you the ability to store more than one instance of the same card.
I would read up on 'Relational databases'. If you're really lost, I suggest picking up a copy of 'SQL for dummies', SQL is the language that most database providers use and it has step by step instructions and tutorials for building your own databases'.
I suggest you look at data file from www.mtgjson.com
By merely seeing what field types they selected and reading comments and documentation you will be likely to avoid many caveats.
For instance you will see how they handle duplicate names, cards that are related to each other like one is flipped or rotated or meld together version of another and many many more little nuances.
My software went in production some days ago and now I want to argue a bit about the database structure.
The software collects data about ships, currently 174 details for each ship, each detail can be a text value, a long text value, a number (of a specified length, with or without a specified number of decimals), a date, a date with time, a boolean field, a menu with many values, a list of data and more.
I solved the problem with the following tables
Ship:
- ID - smallint, Autoincrement identity
- IMO - int, A number that does not change for the life of the ship
ShipDetailType:
- ID - smallint, Autoincrement identity
- Description - nvarchar(200), The description of the value the field contains
- Position - smallint, The position of the field in the data input form
- ShipDetailGroup_ID - smallint, A key to the group the field belongs to in the data input form
- Type - varchar(4), The type of the field as mentioned above
ShipDetailGroup
- ID - smallint, Autoincrement identity
(snip...)
ShipMenuPresetValue
- ID - smallint, Autoincrement identity
- ShipDetailType_ID - smallint, A key to the detail the values belongs to
- Value - nvarchar(100), The values preset in the menu type detail
ShipTextDetail
- ID - smallint, Autoincrement identity
- Ship_ID - smallint, A Key to the ship the detail belongs to
- ShipDetailType_ID - smallint, a Key to the detail type of the value
- Text - nvarchar(500), the field containing the detail's value
- ModifiedDate - smalldatetime
- User_ID - smallint, A key to the user table
ShipTextDetailHistory
(snip...)
This table is the same as the ShipTextDetail and contains every change to the details.
Other tables for the list detail type, each with the specified fields required for the list, ...
I just read this article: http://thedailywtf.com/Articles/The_Inner-Platform_Effect.aspx and http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
The articles says that this is not the right way to handle the problem.
My customer has a management gui for the details and groups as he changes the details descriptions and adds more details.
The data input form is dynamically built by reading the structure from the DetailGroups and DetailTypes, each detail type generates a specified input control.
The comments suggests that another way of solving this matter is dynamically creating and removing columns from the table.
What do you think?
Diagram Screenshot: http://img24.imageshack.us/my.php?image=66604496uk3.png
I would refactor your code if:
Your customer complained
You found something that didn't work
You found a way that the code couldn't handle a
change you knew was going to happen
in the future.
You remembered to write unit tests that will allow you to refactor, right?
*As far as the structure you have there, I've seen structures like it before. It's a little cumbersome but it is standard in many places. One thing to remember is that while its possible to dynamically add and remove columns from databases, the internal storage mechanism of the database doesn't necessarily expect you to be adding and removing these columns continuously. But I don't think this is very relevant compared to the points above, which boil down to: *Does it work?
I've seen this approach before and it's presented loads of performance issues once the data volume has grown. The kind of problems you'll encounter come when you need to return multiple items and use multiple criteria in your where clause. You join back and forth between Ship and ShipTextDetail to get all your select columns - maybe you have to do that 10/20 times ? You then do the same for your criteria maybe 2-3 times. Now you have a query with so many joins it runs really slowly. Next you 'pre-cook' some of the data to improve performance, ie you drag out common data into a fixed table structure - ah you've returned to a semi-normalised model.
My recommendation would be this - you know the information for 174 fields those are your core attributes. Your customer may add to that list, and may change the description of the fields, but it's still a really good starting point. Create a proper DataModel based around those, and then build in an extensability mechanism, as you have already done, but only for the new fields. The metadata - the descriptions of the fields, can reside in another table, or potentially in a resource file (useful for internationalisation?) and that gives some flexibility for existing fields.
I agree with Joe, you may not have problems if your DB is small, ie <1000 ships and your selects are simple. Although with 174 attributes to chose from this doesn't appear likely. I think you should change some of the 'obvious' fields first, ie I'd assume you have a Ship.Name, Ship.Owner, Ship.Weight, Ship.Registration ...
Good Luck.
I've done similar things, but there are a couple problems with this specific implementation:
You are storing numbers, booleans, dates, etc. as strings. This might be less than ideal. An alternative is to implement separate classes (inheriting from a base) for the different data types then store them in tables made for their data type.
Do the properties that you track change very frequently? Are they a different set per tanker? If not, it might be better to make objects rather than property bags to store all the data. Those objects can then be persisted to the database.
From a performance standpoint, either approach will be fine. How many ships could there possibly be? All the data is going to fit into RAM on any server.
This is a scenario I've seen in multiple places over the years; I'm wondering if anyone else has run across a better solution than I have...
My company sells a relatively small number of products, however the products we sell are highly specialized (i.e. in order to select a given product, a significant number of details must be provided about it). The problem is that while the amount of detail required to choose a given product is relatively constant, the kinds of details required vary greatly between products. For instance:
Product X might have identifying characteristics like (hypothetically)
'Color',
'Material'
'Mean Time to Failure'
but Product Y might have characteristics
'Thickness',
'Diameter'
'Power Source'
The problem (one of them, anyway) in creating an order system that utilizes both Product X and Product Y is that an Order Line has to refer, at some point, to what it is "selling". Since Product X and Product Y are defined in two different tables - and denormalization of products using a wide table scheme is not an option (the product definitions are quite deep) - it's difficult to see a clear way to define the Order Line in such a way that order entry, editing and reporting are practical.
Things I've Tried In the Past
Create a parent table called 'Product' with columns common to Product X and Product Y, then using 'Product' as the reference for the OrderLine table, and creating a FK relationship with 'Product' as the primary side between the tables for Product X and Product Y. This basically places the 'Product' table as the parent of both OrderLine and all the disparate product tables (e.g. Products X and Y). It works fine for order entry, but causes problems with order reporting or editing since the 'Product' record has to track what kind of product it is in order to determine how to join 'Product' to its more detailed child, Product X or Product Y. Advantages: key relationships are preserved. Disadvantages: reporting, editing at the order line/product level.
Create 'Product Type' and 'Product Key' columns at the Order Line level, then use some CASE logic or views to determine the customized product to which the line refers. This is similar to item (1), without the common 'Product' table. I consider it a more "quick and dirty" solution, since it completely does away with foreign keys between order lines and their product definitions. Advantages: quick solution. Disadvantages: same as item (1), plus lost RI.
Homogenize the product definitions by creating a common header table and using key/value pairs for the customized attributes (OrderLine [n] <- [1] Product [1] <- [n] ProductAttribute). Advantages: key relationships are preserved; no ambiguity about product definition. Disadvantages: reporting (retrieving a list of products with their attributes, for instance), data typing of attribute values, performance (fetching product attributes, inserting or updating product attributes etc.)
If anyone else has tried a different strategy with more success, I'd sure like to hear about it.
Thank you.
The first solution you describe is the best if you want to maintain data integrity, and if you have relatively few product types and seldom add new product types. This is the design I'd choose in your situation. Reporting is complex only if your reports need the product-specific attributes. If your reports need only the attributes in the common Products table, it's fine.
The second solution you describe is called "Polymorphic Associations" and it's no good. Your "foreign key" isn't a real foreign key, so you can't use a DRI constraint to ensure data integrity. OO polymorphism doesn't have an analog in the relational model.
The third solution you describe, involving storing an attribute name as a string, is a design called "Entity-Attribute-Value" and you can tell this is a painful and expensive solution. There's no way to ensure data integrity, no way to make one attribute NOT NULL, no way to make sure a given product has a certain set of attributes. No way to restrict one attribute against a lookup table. Many types of aggregate queries become impossible to do in SQL, so you have to write lots of application code to do reports. Use the EAV design only if you must, for instance if you have an unlimited number of product types, the list of attributes may be different on every row, and your schema must accommodate new product types frequently, without code or schema changes.
Another solution is "Single-Table Inheritance." This uses an extremely wide table with a column for every attribute of every product. Leave NULLs in columns that are irrelevant to the product on a given row. This effectively means you can't declare an attribute as NOT NULL (unless it's in the group common to all products). Also, most RDBMS products have a limit on the number of columns in a single table, or the overall width in bytes of a row. So you're limited in the number of product types you can represent this way.
Hybrid solutions exist, for instance you can store common attributes normally, in columns, but product-specific attributes in an Entity-Attribute-Value table. Or you could store product-specific attributes in some other structured way, like XML or YAML, in a BLOB column of the Products table. But these hybrid solutions suffer because now some attributes must be fetched in a different way
The ultimate solution for situations like this is to use a semantic data model, using RDF instead of a relational database. This shares some characteristics with EAV but it's much more ambitious. All metadata is stored in the same way as data, so every object is self-describing and you can query the list of attributes for a given product just as you would query data. Special products exist, such as Jena or Sesame, implementing this data model and a special query language that is different than SQL.
There's no magic bullet that you've overlooked.
You have what are sometimes called "disjoint subclasses". There's the superclass (Product) with two subclasses (ProductX) and (ProductY). This is a problem that -- for relational databases -- is Really Hard. [Another hard problem is Bill of Materials. Another hard problem is Graphs of Nodes and Arcs.]
You really want polymorphism, where OrderLine is linked to a subclass of Product, but doesn't know (or care) which specific subclass.
You don't have too many choices for modeling. You've pretty much identified the bad features of each. This is pretty much the whole universe of choices.
Push everything up to the superclass. That's the uni-table approach where you have Product with a discriminator (type="X" and type="Y") and a million columns. The columns of Product are the union of columns in ProductX and ProductY. There will be nulls all over the place because of unused columns.
Push everything down into the subclasses. In this case, you'll need a view which is the union of ProductX and ProductY. That view is what's joined to create a complete order. This is like the first solution, except it's built dynamically and doesn't optimize well.
Join Superclass instance to subclass instance. In this case, the Product table is the intersection of ProductX and ProductY columns. Each Product has a reference to a key either in ProductX or ProductY.
There isn't really a bold new direction. In the relational database world-view, those are the choices.
If, however, you elect to change the way you build application software, you can get out of this trap. If the application is object-oriented, you can do everything with first-class, polymorphic objects. You have to map from the kind-of-clunky relational processing; this happens twice: once when you fetch stuff from the database to create objects and once when you persist objects back to the database.
The advantage is that you can describe your processing succinctly and correctly. As objects, with subclass relationships.
The disadvantage is that your SQL devolves to simplistic bulk fetches, updates and inserts.
This becomes an advantage when the SQL is isolated into an ORM layer and managed as a kind of trivial implementation detail. Java programmers use iBatis (or Hibernate or TopLink or Cocoon), Python programmers use SQLAlchemy or SQLObject. The ORM does the database fetches and saves; your application directly manipulate Orders, Lines and Products.
This might get you started. It will need some refinement
Table Product ( id PK, name, price, units_per_package)
Table Product_Attribs (id FK ref Product, AttribName, AttribValue)
Which would allow you to attach a list of attributes to the products. -- This is essentially your option 3
If you know a max number of attributes, You could go
Table Product (id PK, name, price, units_per_package, attrName_1, attrValue_1 ...)
Which would of course de-normalize the database, but make queries easier.
I prefer the first option because
It supports an arbitrary number of attributes.
Attribute names can be stored in another table, and referential integrity enforced so that those damn Canadians don't stick a "colour" in there and break reporting.
Does your product line ever change?
If it does, then creating a table per product will cost you dearly, and the key/value pairs idea will serve you well. That's the kind of direction down which I am naturally drawn.
I would create tables like this:
Attribute(attribute_id, description, is_listed)
-- contains values like "colour", "width", "power source", etc.
-- "is_listed" tells us if we can get a list of valid values:
AttributeValue(attribute_id, value)
-- lists of valid values for different attributes.
Product (product_id, description)
ProductAttribute (product_id, attribute_id)
-- tells us which attributes apply to which products
Order (order_id, etc)
OrderLine (order_id, order_line_id, product_id)
OrderLineProductAttributeValue (order_line_id, attribute_id, value)
-- tells us things like: order line 999 has "colour" of "blue"
The SQL to pull this together is not trivial, but it's not too complex either... and most of it will be write once and keep (either in stored procedures or your data access layer).
We do similar things with a number of types of entity.
Chris and AJ: Thanks for your responses. The product line may change, but I would not term it "volatile".
The reason I dislike the third option is that it comes at the cost of metadata for the product attribute values. It essentially turns columns into rows, losing most of the advantages of the database column in the process (data type, default value, constraints, foreign key relationships etc.)
I've actually been involved in a past project where the product definition was done in this way. We essentially created a full product/product attribute definition system (data types, min/max occurrences, default values, 'required' flags, usage scenarios etc.) The system worked, ultimately, but came with a significant cost in overhead and performance (e.g. materialized views to visualize products, custom "smart" components to represent and validate data entry UI for product definition, another "smart" component to represent the product instance's customizable attributes on the order line, blahblahblah).
Again, thanks for your replies!