What is the best way to keep this schema clear? - sql-server

Currently I'm working on a RFID project where each tag is attached to an object. An object could be a person, a computer, a pencil, a box or whatever it comes to the mind of my boss.
And of course each object have different attributes.
So I'm trying to have a table tags where I can keep a register of each tag in the system (registration of the tag). And another tables where I can relate a tag with and object and describe some other attributes, this is what a have done. (No real schema just a simplified version)
Suddenly, I realize that this schema could have the same tag in severals tables.
For example, the tag 123 could be in C and B at the same time. Which is impossible because each tag just could be attached to just a single object.
To put it simple I want that each tag could not appear more than once in the database.
My current approach
What I really want
Update:
Yeah, the TagID is chosen by the end user. Moreover the TagID is given by a Tag Reader and the TagID is a 128-bit number.
New Update:
The objects until now are:
-- Medicament(TagID, comercial_name, generic_name, amount, ...)
-- Machine(TagID, name, description, model, manufacturer, ...)
-- Patient(TagID, firstName, lastName, birthday, ...)
All the attributes (columns or whatever you name it) are very different.
Update after update
I'm working on a system, with RFID tags for a hospital. Each RFID tag is attached to an object in order keep watch them and unfortunately each object have a lot of different attributes.
An object could be a person, a machine or a medicine, or maybe a new object with other attributes.
So, I just want a flexible and cleaver schema. That allow me to introduce new object's types and also let me easily add new attributes to one object. Keeping in mind that this system could be very large.
Examples:
Tag(TagID)
Medicine(generic_name, comercial_name, expiration_date, dose, price, laboratory, ...)
Machine(model, name, description, price, buy_date, ...)
Patient(PatientID, first_name, last_name, birthday, ...)
We must relate just one tag for just one object.
Note: I don't really speak (or also write) really :P sorry for that. Not native speaker here.

You can enforce these rules using relational constraints. Check out the use of a persisted column to enforce the constraint Tag:{Pencil or Computer}. This model gives you great flexibility to model each child table (Person, Machine, Pencil, etc.) and at same time prevent any conflicts between tag. Also good that we dont have to resort to triggers or udfs via check constraints to enforce the relation. The relation is built into the model.
create table dbo.TagType (TagTypeID int primary key, TagTypeName varchar(10));
insert into dbo.TagType
values(1, 'Computer'), (2, 'Pencil');
create table dbo.Tag
( TagId int primary key,
TagTypeId int references TagType(TagTypeId),
TagName varchar(10),
TagDate datetime,
constraint UX_Tag unique (TagId, TagTypeId)
)
go
create table dbo.Computer
( TagId int primary key,
TagTypeID as 1 persisted,
CPUType varchar(25),
CPUSpeed varchar(25),
foreign key (TagId, TagTypeID) references Tag(TagId, TagTypeID)
)
go
create table dbo.Pencil
( TagId int primary key,
TagTypeId as 2 persisted,
isSharp bit,
Color varchar(25),
foreign key (TagId, TagTypeID) references Tag(TagId, TagTypeId)
)
go
-----------------------------------------------------------
-- create a new tag of type Pencil:
-----------------------------------------------------------
insert into dbo.Tag(TagId, TagTypeId, TagName, TagDate)
values(1, 2, 'Tag1', getdate());
insert into dbo.Pencil(TagId, isSharp, Color)
values(1, 1, 'Yellow');
-----------------------------------------------------------
-- try to make it a Computer too (fails FK)
-----------------------------------------------------------
insert into dbo.Computer(TagId, CPUType, CPUSpeed)
values(1, 'Intel', '2.66ghz')

Have a Tag Table with PK identity insert of TagID.
This will ensure that each TagID only shows up once no matter what...
Then in the Tag Table have a TagType column that can either be free form (TableName) or better yet have a TagType table with entries A,B,C and then have a FK in Tag pointing TagType.
I would move the Tag attributes into Table A,B,C to minimize extra data in Tag or have a series of Junction Tables between Tag and A,B, and C
EDIT:
Assuming the TagID is created when the object is created this will work fine (Insert into Tag first to get TagID and capture it using IDENTITY_INSERT)
This assumes users cannot edit the TagID itself.
If users can choose the TagID then still use a Tag Table with the TagID but have another field called DisplayID where the user can type in a number. Just put on a unique constraint on Tag.DisplayID....
EDIT:
What attributes are you needing and are they nullable? If they are different for A, B, and C then it is cleaner to put them in A, B, and C especially if there might be some for A and B but not C...

talked with Raz to clear up what he's trying to do. What he's wanting is a flexable way to store attributes related to tags. Tags can one of multiple types of objects, and each object has a specific list of attributes. he also wants to be able to add objects/attributes without having to change the schema. here's the model i came up with:

if each tag can only be in a, b, or c only once, i'd just combine a, b, and c into one table. it'd be easier to give you a better idea of how to build your schema if you gave an example of exactly what you're wanting to collect.
to me, from what i've read, it sounds like you have a list of tags, and a list of objects, and you need to assign a tag to an object. if that is the case, i'd have a tags table, and objects table, and a ObjectTag table. in the object tab table you would have a foreign key to the tag table and a foreign key to the object table. then you make a unique index on the tag foreign key and now you've enforced your requirement of only using a tag once.

I would tackle this using your original structures. Relational databases are a lot better at aggregating/combining atomic data than they are at parsing complex data structures.
Keep the design of each "tag-able" object type in its own table. Data types, check constraints, default values, etc. are still easily implemented this way. Also, continue to define a FK from each object table to the Tags table.
I'm assuming you already have this in place, but if you place a unique constraint on the TagId column in each of the object tables (A, B, C, etc.) then you can guarantee uniqueness within that object type.
There are no built-in SQL Server constraints to guarantee uniqueness among all the object types, if implemented as separate tables. So, you will have to make your own validation. An INSTEAD OF trigger on your object tables can do this cleanly.
First, create a view to access the TagId list across all your object tables.
CREATE VIEW TagsInUse AS
SELECT A.TagId FROM A
UNION
SELECT B.TagId FROM B
UNION
SELECT C.TagId FROM C
;
Then, for each of your object tables, define an INSTEAD OF trigger to test your TagId.
CREATE TRIGGER dbo.T_IO_Insert_TableA ON dbo.A
INSTEAD OF INSERT
AS
IF EXISTS (SELECT 0 FROM dbo.TagsInUse WHERE TagId = inserted.TagId)
BEGIN;
--The tag(s) is/are already in use. Create the necessary notification(s).
RAISERROR ('You attempted to re-use a TagId. This is not allowed.');
ROLLBACK
END;
ELSE
BEGIN;
--The tag(s) is/are available, so proceed with the INSERT.
INSERT INTO dbo.A (TagId, Attribute1, Attribute2, Attribute3)
SELECT i.TagId, i.Attribute1, i.Attribute2, i.Attribute3
FROM inserted AS i
;
END;
GO
Keep in mind that you can also (and probably should) encapsulate that IF EXISTS test in a T-SQL function for maintenance and performance reasons.
You can write supplementary stored procedures for doing things like finding what object type a TagId is associated with.
Pros
You are still taking advantage of SQL Server's data integrity features, which are all quite fast and self-documenting. Don't underestimate the usefulness of data types.
The view is an encapsulation of the domain that must be unique without combining the underlying sets of attributes. Now, you won't have to write any messy code to decipher the object's type. You can base that determination by which table contains the matching tag.
Your options remain open...
Because you didn't store everything in an EAV-friendly nvarchar(300) column, you can tweak the data types for whatever makes the most sense for each attribute.
If you run into any performance issues, you can index the view.
You (or your DBA) can move the object tables to different file groups on different disks if you need to balance things out and help with parallel disk I/O. Think of it as a form of horizontal partitioning. For example, if you have 8 times as many RFID tags applied to medicine containers as you have for patients, you can place the medicine table on a different disk without having to create the partitioning function that you would need for a monolithic table (one table for all types).
If you need to eventually partition your tables vertically (for archiving data onto a read-only partition), you can more easily create a partitioning function for each object type. This would be useful where the business rules do
Most importantly, implementing different business rules based on object type is much simpler. You don't have to implement any nasty conditional logic like "IF type = 'needle' THEN ... ELSE IF type = 'patient' THEN ... ELSE IF....". If you need to apply different rules, then apply them to the relevant object table without having to test a "type" value.
Cons
Triggers have to be maintained. However, this would have to be done in your application anyway, so you are performing the same data integrity checking at the database. That means that you will have no extra network overhead and this will be available for any application that uses this database.

What you're describing is a classical "table-per-type" ORM mapping. Entity Framework has built-in support of this, which you should look into.
Otherwise, I don't think most databases have easy integrity constraints that are enforced over primary keys of multiple tables.
However, is there any reason why you can't just use a single tags table to hold all the fields? Use a type field to hold the type of object. NULL all the irrelevant fields -- this way they don't consume disk space. You'll end up with far fewer tables (only one) that you can maintain as one single coherent object; it also makes you write far fewer SQL queries to work on tags that may span multiple object types.
Implementing it as one single table also saves you disk space because you can implement tiers of inheritance -- for example, "patient" and "doctor" and "nurse" can be three different object types, each having similar fields (e.g. firstname, lastname etc.) and some unique fields. Right now you'll need three tables, with duplicated fields.
It is also simpler when you add an object type. Before, you need to add a new table, and duplicate some SQL statements that span multiple object types. Now you only need to add new fields to the same table (maybe reuse some). The SQL you need to change are far fewer.
The only reason why you won't go with one single table is when the number of fields make a row too large to fit inside a SQL-Server page (which I believe is 8K). Then SQL will complain and won't allow you to add any more fields. The solution, in this case, is to adopt an ORM tool (like Entity Framework), and then "reuse" fields. For example, if "Field1" is only used by object type #1, there is no reason why object type #3 can't use it to store something as well. You only need to be able to distinguish it in your programs.

You could have the Tags table such that it can have a pointer to any of those tables, and could include a Type that tells you which of the tables it is
Tags
-
ID
Type (A,B, or C)
A (nullable)
B (nullable)
C (nullable)
A
-
ID
(other attributes)

Related

Postgresql inheritance based database design

I'm developing a simple babysitter application that has 2 types of users: a 'Parent' and the 'Babysitter'. I'm using postgresql as my database but I'm having trouble working out my database design.
The 'Parent' and the 'Babysitter' entities have attributes that can be generalized, for example: username, password, email, ... Those attributes could be
placed into a parent entity called 'User'. They both also have their own attributes, for example: Babysitter -> age.
In terms of OOP things are very clear for me, just extend the user class and you are good to go but in DB design things are differently.
Before posting this question I roamed around the internet for a good week looking for insight into this 'issue'. I did find a lot of information but
it seemed to me that there was a lot a disagreement. Here are some of the posts I've read:
How do you effectively model inheritance in a database?: Table-Per-Type (TPT), Table-Per-Hierarchy (TPH) and Table-Per-Concrete (TPC) VS 'Forcing the RDb into a class-based requirements is simply incorrect.'
https://dba.stackexchange.com/questions/75792/multiple-user-types-db-design-advice:
Table: `users`; contains all similar fields as well as a `user_type_id` column (a foreign key on `id` in `user_types`
Table: `user_types`; contains an `id` and a `type` (Student, Instructor, etc.)
Table: `students`; contains fields only related to students as well as a `user_id` column (a foreign key of `id` on `users`)
Table: `instructors`; contains fields only related to instructors as well as a `user_id` column (a foreign key of `id` on `users`)
etc. for all `user_types`
https://dba.stackexchange.com/questions/36573/how-to-model-inheritance-of-two-tables-mysql/36577#36577
When to use inherited tables in PostgreSQL?: Inheritance in postgresql does not work as expected for me and a bunch of other users as the original poster points out.
I am really confused about which approach I should take. Class-table-inheritance (https://stackoverflow.com/tags/class-table-inheritance/info) seems like the most correct in
my OOP mindset but I would very much appreciate and updated DB minded opinion.
The way that I think of inheritance in the database world is "can only be one kind of." No other relational modeling technique works for that specific case; even with check constraints, with a strict relational model, you have the problem of putting the wrong "kind of" person into the wrong table. So, in your example, a user can be a parent or a babysitter, but not both. If a user can be more than one kind-of user, then inheritance is not the best tool to use.
The instructor/student relationship really only works well in the case where students cannot be instructors or vice-versa. If you have a TA, for example, it's better to model that using a strict relational design.
So, back to the parent-babysitter, your table design might look like this:
CREATE TABLE user (
id SERIAL,
full_name TEXT,
email TEXT,
phone_number TEXT
);
CREATE TABLE parent (
preferred_payment_method TEXT,
alternate_contact_info TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE babysitter (
age INT,
min_child_age INT,
preferred_payment_method TEXT,
PRIMARY KEY(id)
) INHERITS(user);
CREATE TABLE parent_babysitter (
parent_id INT REFERENCES parent(id),
babysitter_id INT REFERENCES babysitter(id),
PRIMARY KEY(parent_id, babysitter_id)
);
This model allows users to be "only one kind of" user - a parent or a babysitter. Notice how the primary key definitions are left to the child tables. In this model, you can have duplicated ID's between parent and babysitter, though this may not be a problem depending on how you write your code. (Note: Postgres is the only ORDBMS I know of with this restriction - Informix and Oracle, for example, have inherited keys on inherited tables)
Also see how we mixed the relational model in - we have a many-to-many relationship between parents and babysitters. That way we keep the entities separated, but we can still model a relationship without weird self-referencing keys.
All the options can be roughly represented by following cases:
base table + table for each class (class-table inheritance, Table-Per-Type, suggestions from the dba.stackexchange)
single table inheritance (Table-Per-Hierarchy) - just put everything into the single table
create independent tables for each class (Table-Per-Concrete)
I usually prefer option (1), because (2) and (3) are not completely correct in terms of DB design.
With (2) you will have unused columns for some rows (like "age" will be empty for Parent). And with (3) you may have duplicated data.
But you also need to think in terms of data access. With option (1) you will have the data spread over few tables, so to get Parent, you will need to use join operations to select data from both User and Parent tables.
I think that's the reason why options (2) and (3) exist - they are easier to use in terms of SQL queries (no joins are needed, you just select the data you need from one table).

Relational database design considering partly dependent information?

I strongly having the feeling, I don't see the wood for the trees, so I need your help.
Think of the following two tables:
create table Category (Category_ID integer,
Category_Desc nvarchar2(500));
create table Text (Text_Id integer,
Text nvarchar2(1000),
Category_Id integer references Category.Category_Id);
This code follows no proper syntax, it's just to get an idea of the problem.
Consider the idea to save text parts for certain categories to use them in an interface, like messages ("You can't do that!", "Do this!",...), but also to create notes for other objects, e. g. like orders ("Important customer! Prioritize this order!").
Now for my question. Some of this text bits bring some more information with them, like if you add the "Important customer" note to an order, also the Order.Prio_Flag is set.
Now this is a very special information, only considering text used by the category Order_Note. I don't want to add this to the Text table, since most of the entries are not affected by this and the table would get more and more crowded by special cases for only the least part of its content.
I get the feeling, the design is flawed, but I also don't want a table for every category and keep this as general as possible.
Keep in mind, this is a simplified view of the problem.
TL:DR: How do I add information to a table's content without adding new attributes, because the new attribute would only be filled for the least number of entries.
Subtyping and dependent attributes are easy to do in a relational database. For example, if some Texts are important and need to have a dependent attribute (e.g. DisplayColor), you could add the following table to your schema:
CREATE TABLE ImportantText (
Text_Id integer NOT NULL ,
Display_Color integer NOT NULL ,
PRIMARY KEY (Text_Id),
CONSTRAINT ImportantTextSubtypeOfText
FOREIGN KEY (Text_Id) REFERENCES Text (Text_Id)
ON DELETE CASCADE ON UPDATE CASCADE
);
Many people think foreign key constraints establish relationships between entities. That's not what they're for. They're CONSTRAINTS, i.e. they limit the values in a column to be a subset of another column. In this way, a subtyping relation is established which can record additional properties.
In the table above, any element of ImportantText must be an element of Text, and will have all the attributes of Text (since it must be recorded in the Text table), as well as the additional attributes of ImportantText.

Database design, huge number of parameters, denormalise?

Given the table tblProject. This has a myriad of properties. For example, width, height etc etc. Dozens of them.
I'm adding a new module which lets you specify settings for your project for mobile devices. This is a 1-1 relationship, so all the mobile settings should be stored in tblProject. However, the list is getting huge, there will be some ambiguity amongst properties (IE, I will have to prefix all mobile fields with MOBILE so that Mobile_width isn't confused with width).
How bad is it to denormalise and store the mobile settings in another table? Or a better way to store the settings? The properties and becoming unwieldly and hard to modify/find in the table.
I want to respond to #Alexander Sobolev's suggestion and provide my own.
#Alexander Sobolev suggests an EAV model. This trades maximum flexibility, for poor performance and complexity as you need to join multiple times to get all values for an entity. The way you typically work around those issues is keeping all the entity meta data in memory (i.e. tblProperties) so you don't join to it at runtime. And, denormalize the values (i.e. tblProjectProperties) as a CLOB (i.e. XML) off the root table. Thus you only use the values table for querying and sorting, but not to actually retrieve the data. Also you usually end up caching the actual entities by ID as well so you don't have the expense of deserialization each time. Issues you run into the are cache invalidation of the entities and their meta data. So overall a non trivial approach.
What I would do instead is create a separate table, perhaps more than one depending on your data, with a discriminator/type column:
create table properties (
root_id int,
type_id int,
height int
width int
...etc...
)
Make the unique a combination of root_id and type_id, where type_id would be representative of mobile for instance - assuming a separate lookup table in my example.
There is nothing bad in storing mobile section in other table. This could even carry some economy, this depends on how much this information is used.
You can store in another table or use even more complicated version with three tables. One is your tblProject, one is tblProperties and one is tblProjectProperties.
create table tblProperties (
id int autoincrement(1,1) not null,
prop_name nvarchar(32),
prop_description nvarchar(1024)
)
create table tblProjectProperties
(
ProjectUid int not null,
PropertyUid int not null,
PropertyValue nvarchar(256)
)
with foreign key tblProjectProperties. ProjectUid -> tblProject.uid
and foreign key tblProjectProperties.propertyUid -> tblProperties.id
Thing is if you have different types of projects wich use different properties, you have no need to store all these unused null and store only properties you really need for given project. Above schema gives you some flexibility. You can create some views for different project types and use it to avoid too much joins in user selects.

What is the best way to keep changes history to database fields?

For example I have a table which stores details about properties. Which could have owners, value etc.
Is there a good design to keep the history of every change to owner and value. I want to do this for many tables. Kind of like an audit of the table.
What I thought was keeping a single table with fields
table_name, field_name, prev_value, current_val, time, user.
But it looks kind of hacky and ugly. Is there a better design?
Thanks.
There are a few approaches
Field based
audit_field (table_name, id, field_name, field_value, datetime)
This one can capture the history of all tables and is easy to extend to new tables. No changes to structure is necessary for new tables.
Field_value is sometimes split into multiple fields to natively support the actual field type from the original table (but only one of those fields will be filled, so the data is denormalized; a variant is to split the above table into one table for each type).
Other meta data such as field_type, user_id, user_ip, action (update, delete, insert) etc.. can be useful.
The structure of such records will most likely need to be transformed to be used.
Record based
audit_table_name (timestamp, id, field_1, field_2, ..., field_n)
For each record type in the database create a generalized table that has all the fields as the original record, plus a versioning field (additional meta data again possible). One table for each working table is necessary. The process of creating such tables can be automated.
This approach provides you with semantically rich structure very similar to the main data structure so the tools used to analyze and process the original data can be easily used on this structure, too.
Log file
The first two approaches usually use tables which are very lightly indexed (or no indexes at all and no referential integrity) so that the write penalty is minimized. Still, sometimes flat log file might be preferred, but of course functionally is greatly reduced. (Basically depends if you want an actual audit/log that will be analyzed by some other system or the historical records are the part of the main system).
A different way to look at this is to time-dimension the data.
Assuming your table looks like this:
create table my_table (
my_table_id number not null primary key,
attr1 varchar2(10) not null,
attr2 number null,
constraint my_table_ak unique (attr1, att2) );
Then if you changed it like so:
create table my_table (
my_table_id number not null,
attr1 varchar2(10) not null,
attr2 number null,
effective_date date not null,
is_deleted number(1,0) not null default 0,
constraint my_table_ak unique (attr1, att2, effective_date)
constraint my_table_pk primary key (my_table_id, effective_date) );
You'd be able to have a complete running history of my_table, online and available. You'd have to change the paradigm of the programs (or use database triggers) to intercept UPDATE activity into INSERT activity, and to change DELETE activity into UPDATing the IS_DELETED boolean.
Unreason:
You are correct that this solution similar to record-based auditing; I read it initially as a concatenation of fields into a string, which I've also seen. My apologies.
The primary differences I see between the time-dimensioning the table and using record based auditing center around maintainability without sacrificing performance or scalability.
Maintainability: One needs to remember to change the shadow table if making a structural change to the primary table. Similarly, one needs to remember to make changes to the triggers which perform change-tracking, as such logic cannot live in the app. If one uses a view to simplify access to the tables, you've also got to update it, and change the instead-of trigger which would be against it to intercept DML.
In a time-dimensioned table, you make the strucutural change you need to, and you're done. As someone who's been the FNG on a legacy project, such clarity is appreciated, especially if you have to do a lot of refactoring.
Performance and Scalability: If one partitions the time-dimensioned table on the effective/expiry date column, the active records are in one "table", and the inactive records are in another. Exactly how is that less scalable than your solution? "Deleting" and active record involves row movement in Oracle, which is a delete-and-insert under the covers - exactly what the record-based solution would require.
The flip side of performance is that if the application is querying for a record as of some date, partition elimination allows the database to search only the table/index where the record could be; a view-based solution to search active and inactive records would require a UNION-ALL, and not using such a view requires putting the UNION-ALL in everywhere, or using some sort of "look-here, then look-there" logic in the app, to which I say: blech.
In short, it's a design choice; I'm not sure either's right or either's wrong.
In our projects we usually do it this way:
You have a table
properties(ID, value1, value2)
then you add table
properties_audit(ID, RecordID, timestamp or datetime, value1, value2)
ID -is an id of history record(not really required)
RecordID -points to the record in original properties table.
when you update properties table you add new record to properties_audit with previous values of record updated in properties. This can be done using triggers or in your DAL.
After that you have latest value in properties and all the history(previous values) in properties_audit.
I think a simpler schema would be
table_name, field_name, value, time, userId
No need to save current and previous values in the audit tables. When you make a change to any of the fields you just have to add a row in the audit table with the changed value. This way you can always sort the audit table on time and know what was the previous value in the field prior to your change.

How do you manage "pick lists" in a database

I have an application with multiple "pick list" entities, such as used to populate choices of dropdown selection boxes. These entities need to be stored in the database. How do one persist these entities in the database?
Should I create a new table for each pick list? Is there a better solution?
In the past I've created a table that has the Name of the list and the acceptable values, then queried it to display the list. I also include a underlying value, so you can return a display value for the list, and a bound value that may be much uglier (a small int for normalized data, for instance)
CREATE TABLE PickList(
ListName varchar(15),
Value varchar(15),
Display varchar(15),
Primary Key (ListName, Display)
)
You could also add a sortOrder field if you want to manually define the order to display them in.
It depends on various things:
if they are immutable and non relational (think "names of US States") an argument could be made that they should not be in the database at all: after all they are simply formatting of something simpler (like the two character code assigned). This has the added advantage that you don't need a round trip to the db to fetch something that never changes in order to populate the combo box.
You can then use an Enum in code and a constraint in the DB. In case of localized display, so you need a different formatting for each culture, then you can use XML files or other resources to store the literals.
if they are relational (think "states - capitals") I am not very convinced either way... but lately I've been using XML files, database constraints and javascript to populate. It works quite well and it's easy on the DB.
if they are not read-only but rarely change (i.e. typically cannot be changed by the end user but only by some editor or daily batch), then I would still consider the opportunity of not storing them in the DB... it would depend on the particular case.
in other cases, storing in the DB is the way (think of the tags of StackOverflow... they are "lookup" but can also be changed by the end user) -- possibly with some caching if needed. It requires some careful locking, but it would work well enough.
Well, you could do something like this:
PickListContent
IdList IdPick Text
1 1 Apples
1 2 Oranges
1 3 Pears
2 1 Dogs
2 2 Cats
and optionally..
PickList
Id Description
1 Fruit
2 Pets
I've found that creating individual tables is the best idea.
I've been down the road of trying to create one master table of all pick lists and then filtering out based on type. While it works, it has invariably created headaches down the line. For example you may find that something you presumed to be a simple pick list is not so simple and requires an extra field, do you now split this data into an additional table or extend you master list?
From a database perspective, having individual tables makes it much easier to manage your relational integrity and it makes it easier to interpret the data in the database when you're not using the application
We have followed the pattern of a new table for each pick list. For example:
Table FRUIT has columns ID, NAME, and DESCRIPTION.
Values might include:
15000, Apple, Red fruit
15001, Banana, yellow and yummy
...
If you have a need to reference FRUIT in another table, you would call the column FRUIT_ID and reference the ID value of the row in the FRUIT table.
Create one table for lists and one table for list_options.
# Put in the name of the list
insert into lists (id, name) values (1, "Country in North America");
# Put in the values of the list
insert into list_options (id, list_id, value_text) values
(1, 1, "Canada"),
(2, 1, "United States of America"),
(3, 1, "Mexico");
To answer the second question first: yes, I would create a separate table for each pick list in most cases. Especially if they are for completely different types of values (e.g. states and cities). The general table format I use is as follows:
id - identity or UUID field (I actually call the field xxx_id where xxx is the name of the table).
name - display name of the item
display_order - small int of order to display. Default this value to something greater than 1
If you want you could add a separate 'value' field but I just usually use the id field as the select box value.
I generally use a select that orders first by display order, then by name, so you can order something alphabetically while still adding your own exceptions. For example, let's say you have a list of countries that you want in alpha order but have the US first and Canada second you could say "SELECT id, name FROM theTable ORDER BY display_order, name" and set the display_order value for the US as 1, Canada as 2 and all other countries as 9.
You can get fancier, such as having an 'active' flag so you can activate or deactivate options, or setting a 'x_type' field so you can group options, description column for use in tooltips, etc. But the basic table works well for most circumstances.
Two tables. If you try to cram everything into one table then you break normalization (if you care about that). Here are examples:
LIST
---------------
LIST_ID (PK)
NAME
DESCR
LIST_OPTION
----------------------------
LIST_OPTION_ID (PK)
LIST_ID (FK)
OPTION_NAME
OPTION_VALUE
MANUAL_SORT
The list table simply describes a pick list. The list_ option table describes each option in a given list. So your queries will always start with knowing which pick list you'd like to populate (either by name or ID) which you join to the list_ option table to pull all the options. The manual_sort column is there just in case you want to enforce a particular order other than by name or value. (BTW, whenever I try to post the words "list" and "option" connected with an underscore, the preview window goes a little wacky. That's why I put a space there.)
The query would look something like:
select
b.option_name,
b.option_value
from
list a,
list_option b
where
a.name="States"
and
a.list_id = b.list_id
order by
b.manual_sort asc
You'll also want to create an index on list.name if you think you'll ever use it in a where clause. The pk and fk columns will typically automatically be indexed.
And please don't create a new table for each pick list unless you're putting in "relationally relevant" data that will be used elsewhere by the app. You'd be circumventing exactly the relational functionality that a database provides. You'd be better off statically defining pick lists as constants somewhere in a base class or a properties file (your choice on how to model the name-value pair).
Depending on your needs, you can just have an options table that has a list identifier and a list value as the primary key.
select optionDesc from Options where 'MyList' = optionList
You can then extend it with an order column, etc. If you have an ID field, that is how you can reference your answers back... of if it is often changing, you can just copy the answer value to the answer table.
If you don't mind using strings for the actual values, you can simply give each list a different list_id in value and populate a single table with :
item_id: int
list_id: int
text: varchar(50)
Seems easiest unless you need multiple things per list item
We actually created entities to handle simple pick lists. We created a Lookup table, that holds all the available pick lists, and a LookupValue table that contains all the name/value records for the Lookup.
Works great for us when we need it to be simple.
I've done this in two different ways:
1) unique tables per list
2) a master table for the list, with views to give specific ones
I tend to prefer the initial option as it makes updating lists easier (at least in my opinion).
Try turning the question around. Why do you need to pull it from the database? Isn't the data part of your model but you really want to persist it in the database? You could use an OR mapper like linq2sql or nhibernate (assuming you're in the .net world) or depending on the data you could store it manually in a table each - there are situations where it would make good sense to put it all in the same table but do consider this only if you feel it makes really good sense. Normally putting different data in different tables makes it a lot easier to (later) understand what is going on.
There are several approaches here.
1) Create one table per pick list. Each of the tables would have the ID and Name columns; the value that was picked by the user would be stored based on the ID of the item that was selected.
2) Create a single table with all pick lists. Columns: ID; list ID (or list type); Name. When you need to populate a list, do a query "select all items where list ID = ...". Advantage of this approach: really easy to add pick lists; disadvantage: a little more difficult to write group-by style queries (for example, give me the number of records that picked value X".
I personally prefer option 1, it seems "cleaner" to me.
You can use either a separate table for each (my preferred), or a common picklist table that has a type column you can use to filter on from your application. I'm not sure that one has a great benefit over the other generally speaking.
If you have more than 25 or so, organizationally it might be easier to use the single table solution so you don't have several picklist tables cluttering up your database.
Performance might be a hair better using separate tables for each if your lists are very long, but this is probably negligible provided your indexes and such are set up properly.
I like using separate tables so that if something changes in a picklist - it needs and additional attribute for instance - you can change just that picklist table with little effect on the rest of your schema. In the single table solution, you will either have to denormalize your picklist data, pull that picklist out into a separate table, etc. Constraints are also easier to enforce in the separate table solution.
This has served us well:
SQL> desc aux_values;
Name Type
----------------------------------------- ------------
VARIABLE_ID VARCHAR2(20)
VALUE_SEQ NUMBER
DESCRIPTION VARCHAR2(80)
INTEGER_VALUE NUMBER
CHAR_VALUE VARCHAR2(40)
FLOAT_VALUE FLOAT(126)
ACTIVE_FLAG VARCHAR2(1)
The "Variable ID" indicates the kind of data, like "Customer Status" or "Defect Code" or whatever you need. Then you have several entries, each one with the appropriate data type column filled in. So for a status, you'd have several entries with the "CHAR_VALUE" filled in.

Resources