Merge data object table with associated attributes table in a view - sql-server

Here's the setup: I have several tables that hold information for data objects which have the potential to have various and sundry bits of data associated with them. Each of these tables has an associated attributes table, which holds 3 bits of information:
the id (integer) of the row the attribute is associated with
a short attribute name ( < 50 chars )
a value (varchar)
The object table will have any number of columns of varying data types, but will always have an integer primary key. If possible, I would like to set up a view that will allow me to select a row from the object table, and all of its associated attributes at one go.
****EDIT****
Ideally, the form I'd like this to take is having columns in the view with the names of the matched attribute from the attributes table, and the value as the value of the attribute.
So for example, if I have table Foo with columns 'Bar', 'Bat', and 'Baz' the view would have those columns, and additionally, columns for any attributes that a row might have.
****END EDIT****
Now, I know (or think I do) that SQL doesn't allow using variables as an alias for a column name. Is there a clean, practical way of doing what I want, or am I chasing a pipe dream?
The obvious solution is to handle all of this in the application code, but I'm curious if it can be done in SQL.

The answer depends on what you are actually seeking. Will the output of the view have one row per attribute per object or one column per attribute per object? If the former, then I'm not sure why you need a view:
Select ...
From ObjectTable
Join AttributeTable
On AttributeTable.Id = ObjectTable.Id
However, I suspect what you want is the later or something like:
Select ...
, ... As Attribute1
, ... As Attribute2
, ... As Attribute3
...
From ObjectTable
In this scenario, the columns that would be generated are not known at execution because the attribute names are dynamic. This is commonly known as a dynamic crosstab. In general, the SQL language is not designed for dynamic column generation. The only way to do this in T-SQL is to use some fugly dynamic SQL. Thus, it is better done in a reporting tool or in middle-tier code.

It sounds like you want a view for each of your 'object' tables as well as its 'attributes' table. Correct me if I am wrong in my reading. It's not clear what your intentions are with 'using variables as an alias for a column name'. Were you hoping to merge ALL your objects, with their different columns, into one view?
Suggest create one view per entity table, and join to its relevant 'attributes' table.
Question though - why is there one matching attributes table for each entity table? Why are they split out? Perhaps you've made the question simpler or obfuscated, so perhaps my question is rhetorical.
CREATE VIEW Foo AS
SELECT O.ID
,O.EverythingElse
,A.ShortName
,A.SomeVarcharValue
FROM
ObjectTable AS O --customer, invoice, whathaveyou
INNER JOIN
ObjectAttribute AS A ON A.ObjectID = O.ID
To consume from this, you could:
SELECT * FROM Foo WHERE ID = 4 OR
SELECT * FROM Foo WHERE ShortName = 'Ender'

Related

WHERE clause partly based on a field value from another table: is there a better way than exec with a dynamic string?

I have a table with about 50,000 records (a global index of corporate and government bonds).
I would like the user to be able to filter this master index firstly into a smaller subset index (based on permanent logic), and then apply further run time criteria that vary each time.
For example, let's say the user wanted to start from one of many subset indices of bonds, let's say of government bonds only, rather than government and corporate bonds, and also only wanted the US$ government bond index specifically. This would be a permanently defined subset of the master index, with a where clause something like "[Level1]='Government' AND [Currency]='USD' AND [CountryCode]='US'"
At run time, the user would additionally request additional criteria, say for example "AND [IssueSize] > 1,000,000,000 AND [Yield] > 0.0112".
I initially thought of having a separate table that stored the different criteria for these permanent sub-indices as where clauses, for example it might have columns "IndexCode, IndexLogic", and using the example above the values would be "UST", "[Level1]='Government' and [Currency]='USD' AND [CountryCode]='US'", and there would be dozens of rows in this table defining commonly used bond indices.
I had originally thought of creating a dynamic string at run-time, where the user supplies their choice of sub-index code ('UST' in the example above), which then adds the relevant where conditions, and any additional criteria passed as separate parameters, and then doing an exec(#tsql) type command. I had also thought of perhaps having a where clause that was a function call, but this seems very inefficient?
Is the dynamic string method the best way of doing this, or is there a better way involving some kind of 'eval' function equivalent which can take a field value and use that as a where clause?
The problem here is you don't know in advance what the filtered index is.
A solution I have used in this instance, where the filtered index can often change is to grab the definition of the filter back into the client app, and use that to dynamically generate the SQL batch. You can also do this with dynamic SQL in a stored procedure:
SELECT ISNULL(
(SELECT i.filter_definition
FROM sys.indexes i
WHERE i.object_id = OBJECT_ID(#tablename) AND
i.name = #indexname AND has_filter = 1),
'(1=1)');
You pass the table name, and the index name, and you get back the exact definition for the index. This has the benefit of if the index is dropped, the condition becomes (1=1) i.e. every row. You can change this to (1=0) to return nothing instead.
Then you concat this into your dynamic query like so:
SELECT *
FROM table
WHERE regular_conditions_here
AND concated_filter_here
If you are joining other tables, I would advise you to subquery the filter, otherwise you many get column clash as there are no aliases.
SELECT *
FROM (SELECT * FROM table WHERE concated_filter_here) table
JOIN othertables etc
WHERE regular_conditions_here

Dynamic columns in database tables vs EAV

I'm trying to decide which way to go if I have an app that needs to be able to change the db schema based on the user input.
For example, if I have a "car" object that contains car properties, like year, model, # of doors etc, how do I store it in the DB in such a way, that the user should be able to add new properties?
I read about EAV tables and they seem right for this thing, but the problem is that queries will get pretty complicated when I try to get a list of cars filtered by a set of properties.
Could I generate the tables dynamically instead? I see that Sqlite has support for ADD COLUMN, but how fast is it when the table reaches many records? And it looks like there's no way to remove a column. I have to create a new table without the column I want to remove, and copy the data from the old table. That's certainly slow on large tables :(
I will assume that SQLite (or another relational DBMS) is a requirement.
EAVs
I have worked with EAVs and generic data models, and I can say that the data model is very messy and hard to work with in the long run.
Lets say that you design a datamodel with three tables: entities, attributes, and _entities_attributes_:
CREATE TABLE entities
(entity_id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE attributes
(attribute_id INTEGER PRIMARY KEY, name TEXT, type TEXT);
CREATE TABLE entity_attributes
(entity_id INTEGER, attribute_id INTEGER, value TEXT,
PRIMARY KEY(entity_id, attribute_id));
In this model, the entities table will hold your cars, the attributes table will hold the attributes that you can associate to your cars (brand, model, color, ...) and its type (text, number, date, ...), and the _entity_attributes_ will hold the values of the attributes for a given entity (for example "red").
Take into account that with this model you can store as many entities as you want and they can be cars, houses, computers, dogs or whatever (ok, maybe you need a new field on entities, but it's enough for the example).
INSERTs are pretty straightforward. You only need to insert a new object, a bunch of attributes and its relations. For example, to insert a new entity with 3 attributes you will need to execute 7 inserts (one for the entity, three more for the attributes, and three more for the relations.
When you want to perform an UPDATE, you will need to know what is the entity that you want to update, and update the desired attribute joining with the relation between the entity and its attributes.
When you want to perform a DELETE, you will also need to need to know what is the entity you want to delete, delete its attributes, delete the relation between your entity and its attributes and then delete the entity.
But when you want to perform a SELECT the thing becomes nasty (you need to write really difficult queries) and the performance drops horribly.
Imagine a data model to store car entities and its properties as in your example (say that we want to store brand and model). A SELECT to query all your records will be
SELECT brand, model FROM cars;
If you design a generic data model as in the example, the SELECT to query all your stored cars will be really difficult to write and will involve a 3 table join. The query will perform really bad.
Also, think about the definition of your attributes. All your attributes are stored as TEXT, and this can be a problem. What if somebody makes a mistake and stores "red" as a price?
Indexes are another thing that you could not benefit of (or at least not as much as it would be desirable), and they are very neccesary as the data stored grows.
As you say, the main concern as a developer is that the queries are really hard to write, hard to test and hard to maintain (how much would a client have to pay to buy all red, 1980, Pontiac Firebirds that you have?), and will perform very poorly when the data volume increases.
The only advantage of using EAVs is that you can store virtually everything with the same model, but is like having a box full of stuff where you want to find one concrete, small item.
Also, to use an argument from authority, I will say that Tom Kyte argues strongly against generic data models:
http://tkyte.blogspot.com.es/2009/01/this-should-be-fun-to-watch.html
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
Dynamic columns in database tables
On the other hand, you can, as you say, generate the tables dynamically, adding (and removing) columns when needed. In this case, you can, for example create a car table with the basic attributes that you know that you will use and then add columns dynamically when you need them (for example the number of exhausts).
The disadvantage is that you will need to add columns to an existing table and (maybe) build new indexes.
This model, as you say, also has another problem when working with SQLite as there's no direct way to delete columns and you will need to do this as stated on http://www.sqlite.org/faq.html#q11
BEGIN TRANSACTION;
CREATE TEMPORARY TABLE t1_backup(a,b);
INSERT INTO t1_backup SELECT a,b FROM t1;
DROP TABLE t1;
CREATE TABLE t1(a,b);
INSERT INTO t1 SELECT a,b FROM t1_backup;
DROP TABLE t1_backup;
COMMIT;
Anyway, I don't really think that you will need to delete columns (or at least it will be a very rare scenario). Maybe someone adds the number of doors as a column, and stores a car with this property. You will need to ensure that any of your cars have this property to prevent from losing data before deleting the column. But this, of course depends on your concrete scenario.
Another drawback of this solution is that you will need a table for each entity you want to store (one table to store cars, another to store houses, and so on...).
Another option (pseudo-generic model)
A third option could be to have a pseudo-generic model, with a table having columns to store id, name, and type of the entity, and a given (enough) number of generic columns to store the attributes of your entities.
Lets say that you create a table like this:
CREATE TABLE entities
(entity_id INTEGER PRIMARY KEY,
name TEXT,
type TEXT,
attribute1 TEXT,
attribute1 TEXT,
...
attributeN TEXT
);
In this table you can store any entity (cars, houses, dogs) because you have a type field and you can store as many attributes for each entity as you want (N in this case).
If you need to know what the attribute37 stands for when type is "red", you would need to add another table that relates the types and attributes with the description of the attributes.
And what if you find that one of your entities needs more attributes? Then simply add new columns to the entities table (attributeN+1, ...).
In this case, the attributes are always stored as TEXT (as in EAVs) with it's disadvantages.
But you can use indexes, the queries are really simple, the model is generic enough for your case, and in general, I think that the benefits of this model are greater than the drawbacks.
Hope it helps.
Follow up from the comments:
With the pseudo-generic model your entities table will have a lot of columns. From the documentation (https://www.sqlite.org/limits.html), the default setting for SQLITE_MAX_COLUMN is 2000. I have worked with SQLite tables with over 100 columns with great performance, so 40 columns shouldn't be a big deal for SQLite.
As you say, most of your columns will be empty for most of your records, and you will need to index all of your colums for performance, but you can use partial indexes (https://www.sqlite.org/partialindex.html). This way, your indexes will be small, even with a high number of rows, and the selectivity of each index will be great.
If you implement a EAV with only two tables, the number of joins between tables will be less than in my example, but the queries will still be hard to write and maintain, and you will need to do several (outer) joins to extract data, which will reduce performance, even with a great index, when you store a lot of data. For example, imagine that you want to get the brand, model and color of your cars. Your SELECT would look like this:
SELECT e.name, a1.value brand, a2.value model, a3.value color
FROM entities e
LEFT JOIN entity_attributes a1 ON (e.entity_id = a1.entity_id and a1.attribute_id = 'brand')
LEFT JOIN entity_attributes a2 ON (e.entity_id = a2.entity_id and a2.attribute_id = 'model')
LEFT JOIN entity_attributes a3 ON (e.entity_id = a3.entity_id and a3.attribute_id = 'color');
As you see, you would need one (left) outer join for each attribute you want to query (or filter). With the pseudo-generic model the query will be like this:
SELECT name, attribute1 brand, attribute7 model, attribute35 color
FROM entities;
Also, take into account the potential size of your _entity_attributes_ table. If you can potentially have 40 attributes for each entity, lets say that you have 20 not null for each of them. If you have 10,000 entities, your _entity_attributes_ table will have 200,000 rows, and you will be querying it using one huge index. With the pseudo-generic model you will have 10,000 rows and one small index for each column.
It all depends on the way in which your application needs to reason about the data.
If you need to run queries which need to do complicated comparisons or joins on data whose schema you don't know in advance, SQL and the relational model are rarely a good fit.
For instance, if your users can set up arbitrary data entities (like "car" in your example), and then want to find cars whose engine capacity is greater than 2000cc, with at least 3 doors, made after 2010, whose current owner is part of the "little old ladies" table, I'm not aware of an elegant way of doing this in SQL.
However, you could achieve something like this using XML, XPath etc.
If your application has a set on data entities with known attributes, but users can extend those attributes (a common requirement for products like bug trackers), "add column" is a good solution. However, you may need to invent a custom query language to allow users to query those columns. For instance, Atlassian Jira's bug tracking solution has JQL, a SQL-like language for querying bugs.
EAV is great if your task is to store and then show data. However, even moderately complex queries become very hard in an EAV schema - imagine how you'd execute my made up example above.
For your use case, a document oriented database like MongoDB would do great.
Another option that I haven't seen mentioned above is to use denormalized tables for the extended attributes. This is a combination of the pseudo-generic model and the dynamic columns in database tables. Instead of adding columns to existing tables, you add columns or groups of columns into new tables with FK indexes to the source table. Of course, you'll want a good naming convention (car, car_attributes_door, car_attributes_littleOldLadies)
Your selection problem becomes that of applying a LEFT OUTER JOIN to include the extended attributes that you want to include.
Slower than normalized, but not as slow as EAV.
Adding new extended attributes becomes a problem of adding a new table.
Harder than EAV, easier/faster than modifying table schema.
Deleting attributes becomes a problem of dropping whole tables.
Easier/faster than modifying table schema.
These new attributes can be strongly typed.
As good as modifying table schema, faster than EAV or generic columns.
The biggest advantage to this approach that I can see is that deleting unused attributes is quite easy compared to any of the others via a single DROP TABLE command. You also have the option to later normalize often-used attributes into larger groups or into the main table using a single ALTER TABLE process rather than one for each new column you were adding as you added them, which helps with the slow LEFT OUTER JOIN queries.
The biggest disadvantage is that you're cluttering up your table list, which admittedly is often not a trivial concern. That and I'm not sure how much better LEFT OUTER JOIN's actually perform than EAV table joins. It's definitely closer to EAV join performance than normalized table performance.
If you're doing a lot of comparisons/filters of values that benefit greatly from strongly typed columns, but you add/remove these columns frequently enough to make modifying a huge normalized table intractable, this seems like a good compromise.
I would try EAV.
Adding columns based on user input doesn't sounds nice to me and you can quickly run out of capacity. Queries on very flat table can also be a problem. Do you want to create hundreds of indexes?
Instead of writing every thing to one table, I would store as many as possible common properties (price, name , color, ...) in the main table and those less common properties in an "extra" attributes table. You can always balance them later with a little effort.
EAV can performance well for small to middle sized data set. Since you want to use SQLlite, I guess it's not be a problem.
You may also want to avoid "over" normalizing your data. With the cheap storage
we currently have, you can use one table to store all "Extra" attributes, instead of two:
ent_id, ent_name, ...
ent_id, attr_name, attr_type, attr_value ...
People against EAV will say its performance is poor on large database. It's sure that it won't performance as well as normalized structure but you don't want to change structure on a 3TB table either.
I have a low quality answer, but possible, that came from HTML tags that are like : <tag width="10px" height="10px" ... />
In this dirty way you will have just one column as a varchar(max) for all properties say it Props column and you will store data in it like this:
Props
------------------------------------------------------------
Model:Model of car1|Year:2010|# of doors:4
Model:Model of car2|NewProp1:NewValue1|NewProp2:NewValue2
In this way all works will go to the programming code in business layer with using some functions like concatCustom that get an array and return a string and unconcatCustom that get a string and return an array.
For more validity of special characters like ':' and '|', I suggest '#:#' and '#|#' or something more rare for splitter part.
In a similar way you can use a text or binary field and store an XML data in the column.

Relational database design tables with common base

I have a lot trouble finding the best design solution for this situation. I have two tables with a common base. Currently I have designed it like this: I have an order table (the common base):
[order_table]
order_id
order_type
company
created
I have another table with reference to the order table:
[product_order]
order_id fk
product_id
quantity
price
I have second table with reference to the order table:
[special_order]
order_id fk
description
price_estimate
color
size
Both tables share the same order_id which i like. I often have to do large queries on order_table using the information available in that table lets say 'company = 200'. But for each result I also need its data from product_order or special_order depending on which type it is. So the only optimal solution I see is to left joining the query with both tables on order_id and filter the information afterwards. The only other option I see is to add the common columns to each table, but then I would have a lot of reorganizing afterwards to get them in correct order.
Is there a better way to organize the data?
So those extra tables are extra attributes to a specific order-id (1:1)?
I'd consider adding all the fields to the common tables, or at least the fields from the most used sub-table.
If not appropriate, you may want to add "Type" to the common table and let a trigger manage insert/delete of related records to avoid the fuzz with orphans etc.
Use views with your left joins (wouldn't inner be better?) to fetch the different types.

Alternative to multi-valued fields in MS Access

Related question: Multivalued Fields a Good Idea?
I know that multi-valued fields are similar to many-to-many relationship. What is the best way to replace multi-valued fields in an MS Access application?
I have an application that has multi-valued fields. I am not sure how exactly to do away with those and implement exactly same logic in the form of fields that are single-valued?
What would be the implementation in terms of table-relationships when I want to move a multi-valued relationship to single-valued one.
Thank you.
The following is probably far more detailed than you need, but it is intended for a beginner. Let us say you have a table, MainTable:
ID -> Numeric, primary key
Title -> Text
Surname -> Text
Address -> Text
Country -> Numeric
You will probably want a list of titles and countries from which to select.
In the case of Title, it would not be the worst thing to store the information in a field in a table, because you have a single column and the data is unlikely to change, and you probably will not be creating a query using the data.
Country is a different story, conventionally you would store a number and have a look-up table. It is the kind of situation where people are tempted to use a multi-value field.
However, convention is a lot easier. Add another table for country:
ID -> Numeric, primary key
Country -> Text
You might like to call the related field in the main table CountryID. You can now create a relationship in the relationship window showing how Country relates to MainTable:
You can see that Enforce Referential Integrity is selected, which means that you must have null or a country from the countries table in the CountryID field.
To view the data, you can create a query:
SELECT
MainTable.ID,
MainTable.Title,
MainTable.Surname,
MainTable.Address,
Country.Country
FROM Country
INNER JOIN MainTable
ON Country.ID = MainTable.CountryID;
But the main point is to have a form that allows data entry. You can create a form using the wizards, but after that, you either need to right-click CountryID and change it to a combobox or add a combobox or listbox using the wizard. Option 2 is probably the easiest. Here are most of the steps from the wizard:
You now have a dropdown list of countries on your form.
See also: create form to add records in multiple tables
In Access 2010, there are new ways of adding values to combos when the user enters data that does not exist in the table of possible values. In previous versions (although I am not sure about 2007), you would use the Not In List event to add items to a look-up table, in 2010, you have a choice of adding a List Items Edit form to the property sheet.
The question is rather odd since it asks about a single-value field, but also asks about table-relationships. In a very strict interpretation, a multi-value field (MVF) could be replaced with a single TextBox filled with comma-separated items... no table-relationships required. Instead, I assume by "single-value" field that the question means standard fields in a multi-table relationship, in which each field of each related row has a single value. But each primary record can still be related to multiple rows in the related value table, which preserves the whole purpose of the MVF.
Consider the database outline below to illustrate a possible replacement for the MVF. I'm not including every possible property or how to create basic object, just what's necessary for creating desired behavior--assuming enough knowledge of Access to "fill in the blanks" and no fear of basic code or SQL.
The basic structure consists of three tables: 1) Primary table, 2) "value list" table, 3) junction table used to define many-to-many relationship.
Customer Table
CustomerID: autonumber, primary
CustomerName: short text
Codes Table
Code: short text, length 5, primary key
Description: short text
[Customer Codes] Table
CustomerID: long
Code: short text
Create relationships between the tables. This will require appropriate indexes defined on the tables (not detailed here).
Customer Table to [Customer Codes] Table
CustomerID -> CustomerID fields (with enforce integrity enabled)
Code Table to [Customer Codes] Table
Code -> Code fields (with enforce integrity enabled)
(A separate ID field could also be created for the values table [e.g. Codes Table] table, but that just complicates later queries and controls, etc. In such a case, the junction table would contain another ID field and not the value directly.)
In a standard VBA module, create a function like
Public Function GetCodeList(ByVal CustomerID As Integer) As String
Dim sSQL As String
Dim qry As QueryDef
Dim rs As Recordset2
sSQL = "PARAMETERS [CustID] LONG;" & _
" SELECT * FROM [Customer Codes] WHERE [CustomerID] = [CustID]"
Set qry = CurrentDb.CreateQueryDef("", sSQL)
qry.Parameters("CustID") = CustomerID
Set rs = qry.OpenRecordset(dbOpenForwardOnly, dbReadOnly)
Dim sCodes As String
sCodes = ""
Dim bFirst As Boolean
bFirst = True
Do Until rs.EOF
sCodes = sCodes & IIf(bFirst, "", ",") & rs("Code")
bFirst = False
rs.MoveNext
Loop
rs.Close
qry.Close
GetCodeList = sCodes
End Function
Useful queries of the primary table without duplicate rows will require creating some sort of aggregate queries (i.e. Group By, Count, etc.). Simple selection could be done in a single query, for example
SELECT Customer.CustomerID, Customer.[CustomerName], GetCodeList([CustomerID]) AS Codes, Count(Customer.CustomerID) AS CountOfID
FROM Customer LEFT JOIN [Customer Codes] ON Customer.ID = [Customer Codes].CustomerID
WHERE ((([Customer Codes].Code)="Current" Or ([Customer Codes].Code)="Free"))
GROUP BY Customer.ID, Customer.[CustomerName], GetCodeList([ID]);
More complicated selection may require multiple queries, one to first select the proper records, then another to join the primary table to the first query. But honestly, these types of queries are no more complicated than what can be required to select on Multi-Valued Fields. In fact, the query syntax for MVF is non-standard and can get rather complicated and confusing, even more than having a junction table and many-to-many relationships. Behind the scenes, Access is essentially doing the same thing as I've outlined, but because it hides so much detail it make some queries even more difficult.
Regarding multi-value presentation and selection on a form, mimicking the multi-valued ComboBox exactly is not possible--mainly because the basic Access Combobox does not have a multi-selection option with the ability to show checkboxes. However, one can populate a non-bound ListBox with property [Multi Select] = Simple. On the Form_Load event, add available values (e.g. Code from the example table) to the listbox using ListBox.AddItem method. Then in the Form_Current, Form_AfterUpdate, Form_Undo events, one can add code to show and/or save the selected values. This requires more code which is probably beyond the scope here.
Technically the question asked about "moving" a MVF to another implementation. The gist is to populate the values table (e.g. Code table in the example) with the same values in the MVF list. This could be a manual process, but depends on how the MVF's ComboBox is populated. Then write a query which copies each MFV into the junction table (e.g. [Customer Code]) for the same primary record, something like
INSERT INTO [Customer Codes] ( CustomerID, Code )
SELECT Customer.CustomerID, Customer.TestMVF.Value
FROM Customers
WHERE (((Customers.TestMVF.Value) Is Not Null));
A full implementation is definitely not a simple task overall, but if you find too many problems with MVFs or if you are wanting to migrate to another database, this kind of change will be necessary to understand.
There is no replacement for MVFs in an Access database. Some query techniques can mimic MVFs but you may find MVF functionality to be superior. 1. It is fast and very easy to implement. No code and no SQL. 2. It is visual and therefore it is intuitive for the user. There are some things that you cannot do with an MVF so you really need to decide what is more important.

How do you manage "pick lists" in a database

I have an application with multiple "pick list" entities, such as used to populate choices of dropdown selection boxes. These entities need to be stored in the database. How do one persist these entities in the database?
Should I create a new table for each pick list? Is there a better solution?
In the past I've created a table that has the Name of the list and the acceptable values, then queried it to display the list. I also include a underlying value, so you can return a display value for the list, and a bound value that may be much uglier (a small int for normalized data, for instance)
CREATE TABLE PickList(
ListName varchar(15),
Value varchar(15),
Display varchar(15),
Primary Key (ListName, Display)
)
You could also add a sortOrder field if you want to manually define the order to display them in.
It depends on various things:
if they are immutable and non relational (think "names of US States") an argument could be made that they should not be in the database at all: after all they are simply formatting of something simpler (like the two character code assigned). This has the added advantage that you don't need a round trip to the db to fetch something that never changes in order to populate the combo box.
You can then use an Enum in code and a constraint in the DB. In case of localized display, so you need a different formatting for each culture, then you can use XML files or other resources to store the literals.
if they are relational (think "states - capitals") I am not very convinced either way... but lately I've been using XML files, database constraints and javascript to populate. It works quite well and it's easy on the DB.
if they are not read-only but rarely change (i.e. typically cannot be changed by the end user but only by some editor or daily batch), then I would still consider the opportunity of not storing them in the DB... it would depend on the particular case.
in other cases, storing in the DB is the way (think of the tags of StackOverflow... they are "lookup" but can also be changed by the end user) -- possibly with some caching if needed. It requires some careful locking, but it would work well enough.
Well, you could do something like this:
PickListContent
IdList IdPick Text
1 1 Apples
1 2 Oranges
1 3 Pears
2 1 Dogs
2 2 Cats
and optionally..
PickList
Id Description
1 Fruit
2 Pets
I've found that creating individual tables is the best idea.
I've been down the road of trying to create one master table of all pick lists and then filtering out based on type. While it works, it has invariably created headaches down the line. For example you may find that something you presumed to be a simple pick list is not so simple and requires an extra field, do you now split this data into an additional table or extend you master list?
From a database perspective, having individual tables makes it much easier to manage your relational integrity and it makes it easier to interpret the data in the database when you're not using the application
We have followed the pattern of a new table for each pick list. For example:
Table FRUIT has columns ID, NAME, and DESCRIPTION.
Values might include:
15000, Apple, Red fruit
15001, Banana, yellow and yummy
...
If you have a need to reference FRUIT in another table, you would call the column FRUIT_ID and reference the ID value of the row in the FRUIT table.
Create one table for lists and one table for list_options.
# Put in the name of the list
insert into lists (id, name) values (1, "Country in North America");
# Put in the values of the list
insert into list_options (id, list_id, value_text) values
(1, 1, "Canada"),
(2, 1, "United States of America"),
(3, 1, "Mexico");
To answer the second question first: yes, I would create a separate table for each pick list in most cases. Especially if they are for completely different types of values (e.g. states and cities). The general table format I use is as follows:
id - identity or UUID field (I actually call the field xxx_id where xxx is the name of the table).
name - display name of the item
display_order - small int of order to display. Default this value to something greater than 1
If you want you could add a separate 'value' field but I just usually use the id field as the select box value.
I generally use a select that orders first by display order, then by name, so you can order something alphabetically while still adding your own exceptions. For example, let's say you have a list of countries that you want in alpha order but have the US first and Canada second you could say "SELECT id, name FROM theTable ORDER BY display_order, name" and set the display_order value for the US as 1, Canada as 2 and all other countries as 9.
You can get fancier, such as having an 'active' flag so you can activate or deactivate options, or setting a 'x_type' field so you can group options, description column for use in tooltips, etc. But the basic table works well for most circumstances.
Two tables. If you try to cram everything into one table then you break normalization (if you care about that). Here are examples:
LIST
---------------
LIST_ID (PK)
NAME
DESCR
LIST_OPTION
----------------------------
LIST_OPTION_ID (PK)
LIST_ID (FK)
OPTION_NAME
OPTION_VALUE
MANUAL_SORT
The list table simply describes a pick list. The list_ option table describes each option in a given list. So your queries will always start with knowing which pick list you'd like to populate (either by name or ID) which you join to the list_ option table to pull all the options. The manual_sort column is there just in case you want to enforce a particular order other than by name or value. (BTW, whenever I try to post the words "list" and "option" connected with an underscore, the preview window goes a little wacky. That's why I put a space there.)
The query would look something like:
select
b.option_name,
b.option_value
from
list a,
list_option b
where
a.name="States"
and
a.list_id = b.list_id
order by
b.manual_sort asc
You'll also want to create an index on list.name if you think you'll ever use it in a where clause. The pk and fk columns will typically automatically be indexed.
And please don't create a new table for each pick list unless you're putting in "relationally relevant" data that will be used elsewhere by the app. You'd be circumventing exactly the relational functionality that a database provides. You'd be better off statically defining pick lists as constants somewhere in a base class or a properties file (your choice on how to model the name-value pair).
Depending on your needs, you can just have an options table that has a list identifier and a list value as the primary key.
select optionDesc from Options where 'MyList' = optionList
You can then extend it with an order column, etc. If you have an ID field, that is how you can reference your answers back... of if it is often changing, you can just copy the answer value to the answer table.
If you don't mind using strings for the actual values, you can simply give each list a different list_id in value and populate a single table with :
item_id: int
list_id: int
text: varchar(50)
Seems easiest unless you need multiple things per list item
We actually created entities to handle simple pick lists. We created a Lookup table, that holds all the available pick lists, and a LookupValue table that contains all the name/value records for the Lookup.
Works great for us when we need it to be simple.
I've done this in two different ways:
1) unique tables per list
2) a master table for the list, with views to give specific ones
I tend to prefer the initial option as it makes updating lists easier (at least in my opinion).
Try turning the question around. Why do you need to pull it from the database? Isn't the data part of your model but you really want to persist it in the database? You could use an OR mapper like linq2sql or nhibernate (assuming you're in the .net world) or depending on the data you could store it manually in a table each - there are situations where it would make good sense to put it all in the same table but do consider this only if you feel it makes really good sense. Normally putting different data in different tables makes it a lot easier to (later) understand what is going on.
There are several approaches here.
1) Create one table per pick list. Each of the tables would have the ID and Name columns; the value that was picked by the user would be stored based on the ID of the item that was selected.
2) Create a single table with all pick lists. Columns: ID; list ID (or list type); Name. When you need to populate a list, do a query "select all items where list ID = ...". Advantage of this approach: really easy to add pick lists; disadvantage: a little more difficult to write group-by style queries (for example, give me the number of records that picked value X".
I personally prefer option 1, it seems "cleaner" to me.
You can use either a separate table for each (my preferred), or a common picklist table that has a type column you can use to filter on from your application. I'm not sure that one has a great benefit over the other generally speaking.
If you have more than 25 or so, organizationally it might be easier to use the single table solution so you don't have several picklist tables cluttering up your database.
Performance might be a hair better using separate tables for each if your lists are very long, but this is probably negligible provided your indexes and such are set up properly.
I like using separate tables so that if something changes in a picklist - it needs and additional attribute for instance - you can change just that picklist table with little effect on the rest of your schema. In the single table solution, you will either have to denormalize your picklist data, pull that picklist out into a separate table, etc. Constraints are also easier to enforce in the separate table solution.
This has served us well:
SQL> desc aux_values;
Name Type
----------------------------------------- ------------
VARIABLE_ID VARCHAR2(20)
VALUE_SEQ NUMBER
DESCRIPTION VARCHAR2(80)
INTEGER_VALUE NUMBER
CHAR_VALUE VARCHAR2(40)
FLOAT_VALUE FLOAT(126)
ACTIVE_FLAG VARCHAR2(1)
The "Variable ID" indicates the kind of data, like "Customer Status" or "Defect Code" or whatever you need. Then you have several entries, each one with the appropriate data type column filled in. So for a status, you'd have several entries with the "CHAR_VALUE" filled in.

Resources