Related
I strongly having the feeling, I don't see the wood for the trees, so I need your help.
Think of the following two tables:
create table Category (Category_ID integer,
Category_Desc nvarchar2(500));
create table Text (Text_Id integer,
Text nvarchar2(1000),
Category_Id integer references Category.Category_Id);
This code follows no proper syntax, it's just to get an idea of the problem.
Consider the idea to save text parts for certain categories to use them in an interface, like messages ("You can't do that!", "Do this!",...), but also to create notes for other objects, e. g. like orders ("Important customer! Prioritize this order!").
Now for my question. Some of this text bits bring some more information with them, like if you add the "Important customer" note to an order, also the Order.Prio_Flag is set.
Now this is a very special information, only considering text used by the category Order_Note. I don't want to add this to the Text table, since most of the entries are not affected by this and the table would get more and more crowded by special cases for only the least part of its content.
I get the feeling, the design is flawed, but I also don't want a table for every category and keep this as general as possible.
Keep in mind, this is a simplified view of the problem.
TL:DR: How do I add information to a table's content without adding new attributes, because the new attribute would only be filled for the least number of entries.
Subtyping and dependent attributes are easy to do in a relational database. For example, if some Texts are important and need to have a dependent attribute (e.g. DisplayColor), you could add the following table to your schema:
CREATE TABLE ImportantText (
Text_Id integer NOT NULL ,
Display_Color integer NOT NULL ,
PRIMARY KEY (Text_Id),
CONSTRAINT ImportantTextSubtypeOfText
FOREIGN KEY (Text_Id) REFERENCES Text (Text_Id)
ON DELETE CASCADE ON UPDATE CASCADE
);
Many people think foreign key constraints establish relationships between entities. That's not what they're for. They're CONSTRAINTS, i.e. they limit the values in a column to be a subset of another column. In this way, a subtyping relation is established which can record additional properties.
In the table above, any element of ImportantText must be an element of Text, and will have all the attributes of Text (since it must be recorded in the Text table), as well as the additional attributes of ImportantText.
The question is how database design should I apply for this situation:
main table:
ID | name | number_of_parameters | parameters
parameters table:
parameter | parameter | parameter
Number of elements in parameters table does not change. number_of_parameters cell defines how many parameters tables should be stored in next cell.
I have problems to move from object thinking to database design. So when we talk about object one row has as much parameters as number_of_parameters says.
I hope that description of requirements is clear. What is the correct way to design such database. If someone can provide some SQL statments to obtain it it would be nice. But the main goal of this question is to understand how to make such architecture.
I want to use SQLite to create this database.
The relational way is to have two tables. The main table has an ID, name and as many other universally-present parameters as possible. The parameters table contains a mapping from an ID in the main table to a parameter name and a parameter value; the main table ID should be a foreign key, and the combination of ID and name should be unique.
The number of parameters can be found by just counting the number of rows with a particular ID.
If you can serialize the data whiile saving to the database and deserialize it back when you get the record it will work. You can get total number of objects in serialized container and save the count to the number_of_parameters field and serialized data in parameters field.
There isn't one perfect correct way, but if you want to use a relational database, you preferably have relational tables.
If you have a key-value database, you place your serialized data as a document attached to your key.
If you want a hybrid solution, both human editable and single table, you can serialize your data to a human-readable format such as yaml, which sees heavy usage in configuration sections of open source projects.
I am redesigning a few databases into one encompassing database, and I have noticed the previous designer(s) of the old databases like to store categories in their own tables. For example, say that there is a table boats(bid: integer, bname: string, color: integer), and in the application there is a drop-down box allowing the user to specify the colour of the boat, then there is a table color(cid: integer, cname: string). I would have not included the color table, and just put the colours as strings in the boats table. I realize that this decreases redundant storage of colour names, but is the added run-time cost of joining the boat table with the colour table "worth it"? Also the drop-downs are populated with SELECT cname FROM color statements, while I would have defined a view on SELECT DISTINCT color FROM boats to populate the drop-downs.
The example is simple, but this happens multiple times in the system I am redesigning, even for categories with only two options. This has resulted in many tables with only 2 fields. Some only have 1 field (I haven't figured out what those are for yet, but I think they are only to populate the drop-downs, and the actual tables contain the values as well).
I would personally keep them in their own table if this were my DB.
If you get into a situation where you get the requirement that Boats a,b and c can only come in silver and black then you will be thankful that you did. I've seen these types of requests bubble up down the road in a lot of projects.
If you are just concerned about the query complexity you could create a view that joins the information you need so you only need to query it once and with no JOIN.
If you are worried about the performance implications of the JOIN then I would look at creating the appropriate indexes or possibly an indexed view.
Good luck!
When you know a column should have a limited set of values, you should tell the dbms to enforce that limited set. The three most common ways to deal with that kind of requirement are
ignore it,
set a foreign key reference to a table of colors, and
use a CHECK() constraint against a list of colors.
Of those three, setting a foreign key to a table of colors tends to make life easiest.
I realize that this decreases redundant storage of colour names, but
is the added run-time cost of joining the boat table with the colour
table "worth it"?
This is a different issue. First, storing foreign key values is a form of data integrity, not a form of redundancy. Keys exist for two reasons: 1) to identify things in the real world, and 2) to be stored in other tables. (Because the thing the key identifies is relevant to the other table.)
Second, if you identify colors by assigning an arbitrary id number to them, you have to use a JOIN to get human-readable information. But colors, like many attributes, carry their identity with them. If you use the color's name itself ("red", "orange", etc.) or use a human-readable code for the name ("R", "O", etc.) you don't need a join. You do still need a table of colors (or a CHECK() constraint), because the column in boats has a limited set of values, and the dbms should enforce use of that limited set of values.
So you could do something like this.
create table boats (
boat_id integer primary key,
registered_name varchar(35) not null,
hull_color varchar(10) not null references hull_colors (color)
);
create table hull_colors (
color varchar(10) primary key
);
insert into hull_colors values ('red'),('orange'),('yellow') etc.
Both those tables are in 5NF.
It is generally better to have a normalized database.
However, in your example, you can use a Categories(ID, Type, Name) table and store the colors as ( 3, "Color", "Blue" ), ( 4, "Color", "Red" ), ... This way, you can store more categories in the same table and in the same time, store them separately. Populating a drop-down list will require a simple select of the form select ID, Name from Categories where Type = 'Color'.
EDIT: Note that this solution violates the first rule of database normalization, as #Catcall said. A 3NF table would be Colors(ID, Name). This way, you can refer to a certain color using its ID.
Using select distinct color from boats to populate a drop-down list has many disadvantages, for example, what if the Boats table contains no records. Then, your select will return nothing and the drop-down control will not be populated with any value. Another problem is when you have fields containing 'Red' and 'red' or similar. See more details on Database Normalization here
It sounds like those are lookup tables so that if the end user want's to add an additional color then they can add it to the database and it will then propagate down the UI. This also gets into normalization. If there is only one place where colors is referenced then the lookup table is not really necessary. However if there are multiple tables where colors are referenced for different things then the lookup table will save you a huge headache down the road.
I am looking at a problem which would involve users uploading lists of records with various field structures into an application. The 2nd part of this would be to also allow the users to specify fields to capture information.
This is a step beyond anything ive done up to this point where i would have designed a static RDMS structure myself. In some respects all records will be treated the same so there will be some common fields required for each. Almost all queries will be run on these common fields.
My first thought would be to dynamically generate a new table for each import and another for each data capture field spec.Then have a master table with a guid for every record in the application along with the common fields and then fields that specify the name of the table the data was imported to and name of table with the data capture fields.
Further information (metadata?) about the fields in the dynamically generated tables could be stored in xml or in a 'property' table.
This would mean as users log into the application i would be dynamically choosing which table of data to presented to the user, and there would be a large number of tables in the database if it was say not only multiuser but then multitennant.
My question is are there other methods to solving this kind of varaible field issue, im i going down an unadvised path here?
I believe that EAV would require me to have a table defining the fields for each import / data capture spec and then another table with the import - field - values data and that seems impracticle.
I hate storing XML in the database, but this is a perfect example of when it makes sense. Store the user imports in XML initially. As your data schema matures, you can later decide which tables to persist for your larger clients. When the users pick which fields they want to query, that's when you come back and build a solid schema.
What kind is each field? Could the type of field be different for each record?
I am working on a program now that does this sorta and the way we handle it is basically a record table which points to a recordfield table. the recordfield table contains all of the fields along with the field name of the actual field in the database(the column name). We then have a recorddata table which is where all the data goes for each record. We also store a record_id telling it which record it is holding.
This is how we do it where if each column for the record is the same type, then we don't need to add new columns to the table, and if it has more fields or fields of a different type, then we add fields as appropriate to the data table.
I think this is what you are talking about.. correct me if I'm wrong.
I think that one additional table for each type of user defined field for the table that the user can add the fields to is a good way to go.
Say you load your records into user_records(id), that table would have an id column which is a foreign key in the user defined fields tables.
user defined string fields would go in user_records_string(id, name), where id is a foreign key to user_records(id), and name is a string, or a foreign key to a list of user defined string fields.
Searching on them requires joining them in to the base table, probably with a sub-select to filter down to one field based on the user meta-data, so that the right field can be added to the query.
To simulate the user creating multiple tables, you can have a foreign key in the user_records table that points at a table list, and filter on that when querying for a single table.
This would allow your schema to be static while allowing the user to arbitrarily add fields and tables.
I have an application with multiple "pick list" entities, such as used to populate choices of dropdown selection boxes. These entities need to be stored in the database. How do one persist these entities in the database?
Should I create a new table for each pick list? Is there a better solution?
In the past I've created a table that has the Name of the list and the acceptable values, then queried it to display the list. I also include a underlying value, so you can return a display value for the list, and a bound value that may be much uglier (a small int for normalized data, for instance)
CREATE TABLE PickList(
ListName varchar(15),
Value varchar(15),
Display varchar(15),
Primary Key (ListName, Display)
)
You could also add a sortOrder field if you want to manually define the order to display them in.
It depends on various things:
if they are immutable and non relational (think "names of US States") an argument could be made that they should not be in the database at all: after all they are simply formatting of something simpler (like the two character code assigned). This has the added advantage that you don't need a round trip to the db to fetch something that never changes in order to populate the combo box.
You can then use an Enum in code and a constraint in the DB. In case of localized display, so you need a different formatting for each culture, then you can use XML files or other resources to store the literals.
if they are relational (think "states - capitals") I am not very convinced either way... but lately I've been using XML files, database constraints and javascript to populate. It works quite well and it's easy on the DB.
if they are not read-only but rarely change (i.e. typically cannot be changed by the end user but only by some editor or daily batch), then I would still consider the opportunity of not storing them in the DB... it would depend on the particular case.
in other cases, storing in the DB is the way (think of the tags of StackOverflow... they are "lookup" but can also be changed by the end user) -- possibly with some caching if needed. It requires some careful locking, but it would work well enough.
Well, you could do something like this:
PickListContent
IdList IdPick Text
1 1 Apples
1 2 Oranges
1 3 Pears
2 1 Dogs
2 2 Cats
and optionally..
PickList
Id Description
1 Fruit
2 Pets
I've found that creating individual tables is the best idea.
I've been down the road of trying to create one master table of all pick lists and then filtering out based on type. While it works, it has invariably created headaches down the line. For example you may find that something you presumed to be a simple pick list is not so simple and requires an extra field, do you now split this data into an additional table or extend you master list?
From a database perspective, having individual tables makes it much easier to manage your relational integrity and it makes it easier to interpret the data in the database when you're not using the application
We have followed the pattern of a new table for each pick list. For example:
Table FRUIT has columns ID, NAME, and DESCRIPTION.
Values might include:
15000, Apple, Red fruit
15001, Banana, yellow and yummy
...
If you have a need to reference FRUIT in another table, you would call the column FRUIT_ID and reference the ID value of the row in the FRUIT table.
Create one table for lists and one table for list_options.
# Put in the name of the list
insert into lists (id, name) values (1, "Country in North America");
# Put in the values of the list
insert into list_options (id, list_id, value_text) values
(1, 1, "Canada"),
(2, 1, "United States of America"),
(3, 1, "Mexico");
To answer the second question first: yes, I would create a separate table for each pick list in most cases. Especially if they are for completely different types of values (e.g. states and cities). The general table format I use is as follows:
id - identity or UUID field (I actually call the field xxx_id where xxx is the name of the table).
name - display name of the item
display_order - small int of order to display. Default this value to something greater than 1
If you want you could add a separate 'value' field but I just usually use the id field as the select box value.
I generally use a select that orders first by display order, then by name, so you can order something alphabetically while still adding your own exceptions. For example, let's say you have a list of countries that you want in alpha order but have the US first and Canada second you could say "SELECT id, name FROM theTable ORDER BY display_order, name" and set the display_order value for the US as 1, Canada as 2 and all other countries as 9.
You can get fancier, such as having an 'active' flag so you can activate or deactivate options, or setting a 'x_type' field so you can group options, description column for use in tooltips, etc. But the basic table works well for most circumstances.
Two tables. If you try to cram everything into one table then you break normalization (if you care about that). Here are examples:
LIST
---------------
LIST_ID (PK)
NAME
DESCR
LIST_OPTION
----------------------------
LIST_OPTION_ID (PK)
LIST_ID (FK)
OPTION_NAME
OPTION_VALUE
MANUAL_SORT
The list table simply describes a pick list. The list_ option table describes each option in a given list. So your queries will always start with knowing which pick list you'd like to populate (either by name or ID) which you join to the list_ option table to pull all the options. The manual_sort column is there just in case you want to enforce a particular order other than by name or value. (BTW, whenever I try to post the words "list" and "option" connected with an underscore, the preview window goes a little wacky. That's why I put a space there.)
The query would look something like:
select
b.option_name,
b.option_value
from
list a,
list_option b
where
a.name="States"
and
a.list_id = b.list_id
order by
b.manual_sort asc
You'll also want to create an index on list.name if you think you'll ever use it in a where clause. The pk and fk columns will typically automatically be indexed.
And please don't create a new table for each pick list unless you're putting in "relationally relevant" data that will be used elsewhere by the app. You'd be circumventing exactly the relational functionality that a database provides. You'd be better off statically defining pick lists as constants somewhere in a base class or a properties file (your choice on how to model the name-value pair).
Depending on your needs, you can just have an options table that has a list identifier and a list value as the primary key.
select optionDesc from Options where 'MyList' = optionList
You can then extend it with an order column, etc. If you have an ID field, that is how you can reference your answers back... of if it is often changing, you can just copy the answer value to the answer table.
If you don't mind using strings for the actual values, you can simply give each list a different list_id in value and populate a single table with :
item_id: int
list_id: int
text: varchar(50)
Seems easiest unless you need multiple things per list item
We actually created entities to handle simple pick lists. We created a Lookup table, that holds all the available pick lists, and a LookupValue table that contains all the name/value records for the Lookup.
Works great for us when we need it to be simple.
I've done this in two different ways:
1) unique tables per list
2) a master table for the list, with views to give specific ones
I tend to prefer the initial option as it makes updating lists easier (at least in my opinion).
Try turning the question around. Why do you need to pull it from the database? Isn't the data part of your model but you really want to persist it in the database? You could use an OR mapper like linq2sql or nhibernate (assuming you're in the .net world) or depending on the data you could store it manually in a table each - there are situations where it would make good sense to put it all in the same table but do consider this only if you feel it makes really good sense. Normally putting different data in different tables makes it a lot easier to (later) understand what is going on.
There are several approaches here.
1) Create one table per pick list. Each of the tables would have the ID and Name columns; the value that was picked by the user would be stored based on the ID of the item that was selected.
2) Create a single table with all pick lists. Columns: ID; list ID (or list type); Name. When you need to populate a list, do a query "select all items where list ID = ...". Advantage of this approach: really easy to add pick lists; disadvantage: a little more difficult to write group-by style queries (for example, give me the number of records that picked value X".
I personally prefer option 1, it seems "cleaner" to me.
You can use either a separate table for each (my preferred), or a common picklist table that has a type column you can use to filter on from your application. I'm not sure that one has a great benefit over the other generally speaking.
If you have more than 25 or so, organizationally it might be easier to use the single table solution so you don't have several picklist tables cluttering up your database.
Performance might be a hair better using separate tables for each if your lists are very long, but this is probably negligible provided your indexes and such are set up properly.
I like using separate tables so that if something changes in a picklist - it needs and additional attribute for instance - you can change just that picklist table with little effect on the rest of your schema. In the single table solution, you will either have to denormalize your picklist data, pull that picklist out into a separate table, etc. Constraints are also easier to enforce in the separate table solution.
This has served us well:
SQL> desc aux_values;
Name Type
----------------------------------------- ------------
VARIABLE_ID VARCHAR2(20)
VALUE_SEQ NUMBER
DESCRIPTION VARCHAR2(80)
INTEGER_VALUE NUMBER
CHAR_VALUE VARCHAR2(40)
FLOAT_VALUE FLOAT(126)
ACTIVE_FLAG VARCHAR2(1)
The "Variable ID" indicates the kind of data, like "Customer Status" or "Defect Code" or whatever you need. Then you have several entries, each one with the appropriate data type column filled in. So for a status, you'd have several entries with the "CHAR_VALUE" filled in.