database design - best practice- one table for web form drop down options or separate table for each drop down options - sql-server

I'm looking at the best practice approach here. I have a web page that has several drop down options. The drop downs are not related, they are for misc. values (location, building codes, etc). The database right now has a table for each set of options (e.g. table for building codes, table for locations, etc). I'm wondering if I could just combine them all into on table (called listOptions) and then just query that one table.
Location Table
LocationID (int)
LocatValue (nvarchar(25))
LocatDescription (nvarchar(25))
BuildingCode Table
BCID (int)
BCValue (nvarchar(25))
BCDescription (nvarchar(25))
Instead of the above, is there any reason why I can't do this?
ListOptions Table
ID (int)
listValue (nvarchar(25))
listDescription (nvarchar(25))
groupID (int) //where groupid corresponds to Location, Building Code, etc
Now, when I query the table, I can pass to the query the groupID to pull back the other values I need.

Putting in one table is an antipattern. These are differnt lookups and you cannot enforce referential integrity in the datbase (which is the ciorrect place to enforce it as applications are often not the only way data gets changed) unless they are in separate tables. Data integrity is FAR more important than saving a few minutes of development time if you need an additonal lookup.

If you plan to use the values later in some referencing FKeys - better use separate tables.
But why do you need "all in one" table? Which problem it solves?

You could do this.
I believe that is your master data and it would not be having any huge amounts of rows that it might create and performance problems.
Secondly, why would you want to do it once your app is up and running. It should have thought about earlier. The tables might be used in a lot of places and it's might be a lot of coding and most importantly testing.
Can you throw further light into your requirements.

You can keep them in separate tables and have your stored procedure return one set of data with a "datatype" key that signifies which set of values go with what option.
However, I would urge you to consider a much different approach. This suggestion is based on years of building data driven websites. If these drop-down options don't change very often then why not build server-side include files instead of querying the database. We did this with most of our websites. Think about it, each time the page is presented you query the database for the same list of values... that data hardly ever changes.
In cases when that data did have the tendency to change, we simply added a routine to the back end admin that rebuilt the server-side include file whenever an add, change or delete was done to one of the lookup values. This reduced database I/O's and spead up the load time of all our websites.
We had approximately 600 websites on the same server all using the same instance of SQL Server (separate databases) our total server database I/O's were drastically reduced.
Edit:
We simply built SSI that looked like this...
<option value="1'>Blue</option>
<option value="2'>Red</option>
<option value="3'>Green</option>

With single table it would be easy to add new groups in favour of creating new tables, but for best practices concerns you should also have a group table so you can name those groups in the db for future maintenance

The best practice depends on your requirements.
Do the values of location and building vary frequently? Where do the values come from? Are they imported from external data? Do other tables refer the unique table (so that I need a two-field key to preper join the tables)?
For example, I use unique table with hetorogeneus data for constants or configuration values.
But if the data vary often or are imported from external source, I prefer use separate tables.

Related

Database Is-a relationship

My problem relates to DB schema developing and is as follows.
I am developing a purchasing module, in which I want to use for purchasing items and SERVICES.
Following is my EER diagram, (note that service has very few specialized attributes – max 2)
My problem is to keep products and services in two tables or just in one table?
One table option –
Reduces complexity as I will only need to specify item id which refers to item table which will have an “item_type” field to identify whether it’s a product or a service
Two table option –
Will have to refer separate product or service in everywhere I want to refer to them and will have to keep “item_type” field in every table which refers to either product or service?
Currently planning to use option 1, but want to know expert opinion on this matter. Highly appreciate your time and advice. Thanks.
I'd certainly go to the "two tables" option. You see, you have to distinguish Products and Services, so you may either use switch(item_type) { ... } in your program or entirely distinct code paths for Product and for Service. And if a need for updating the DB schema arises, switch is harder to maintain.
The second reason is NULLs. I'd advise avoid them as much as you can — they create more problems than they solve. With two tables you can declare all fields non-NULL and forget about NULL-processing. With one table option, you have to manually write code to ensure that if item_type=product, then Product-specific fields are not NULL, and Service-specific ones are, and that if item_type=service, then Service-specific fields are not NULL, and Product-specific ones are. That's not quite pleasant work, and the DBMS can't do it for you (there is no NOT NULL IF another_field = value column constraint in SQL or anything like this).
Go with two tables. It's easier to support. I once saw a DB where everything, every single piece of data went in just two tables — there were pages and pages of code to make sure that necessary fields are not NULL.
If I were to implement I would have gone for the Two table option, It's kinda like the first rule of normalization of the schema. To remove multi-valued attributes. Using item_type is not recommended. Once you create separate tables you dont need to use the item_type you can just use the foreign key relationship.
Consider reading this article :
http://en.wikipedia.org/wiki/Database_normalization
It should help.

Indicating primary/default record in database

I've been struggling with how I should indicate that a certain record in a database is the "fallback" or default entry. I've also been struggling with how to reduce my problem to a simple problem statement. I'm going to have to provide an example.
Suppose that you are building a very simple shipping application. You'll take orders and will need to decide which warehouse to ship them from.
Let's say that you have a few cities that have their own dedicated warehouses*; if an order comes in from one of those cities, you'll ship from that city's warehouse. If an order comes in from any other city, you want to ship from a certain other warehouse. We'll call that certain other warehouse the fallback warehouse.
You might decide on a schema like this:
Warehouses
WarehouseId
Name
WarehouseCities
WarehouseId
CityName
The solution must enforce zero or one fallback warehouses.
You need a way to indicate which warehouse should be used if there aren't any warehouses specified for the city in question. If it really matters, you're doing this on SQL Server 2008.
EDIT: To be clear, all valid cities are NOT present in the WarehouseCities table. It is possible for an order to be received for a City not listed in WarehouseCities. In such a case, we need to be able to select the fallback warehouse.
If any number of default warehouses were allowed, or if I was assigning default warehouses to, say, states, I would use a DefaultWarehouse table. I could use such a table here, but I would need to limit it to exactly one row, which doesn't feel right.
How would you indicate the fallback warehouse?
*Of course, in this example we discount the possibility that there might be multiple cities with the same name. The country you are building this application for rigorously enforces a uniqueness constraint on all city names.
I understand your problem, but have questions about parts of it, so I'll be a bit more general.
If at all possible I would store warehouse/backup warehouse data with your inventory data (either directly hanging of warehouses, or if it's product specific off the inventory tables).
If the setup has to be calculated through your business logic then the records should hang off the order/order_item table
In terms how to implement the structure in SQL, I'll assume that all orders ship out of a single warehouse and that the shipping must be hung off the orders table (but the ideas should be applicable elsewhere):
The older way to enforce zero/one backup warehouses would be to hang a Warehouse_Source record of the Orders table and include an "IsPrimary" field or "ShippingPriority" then include a composite unique index that includes OrderID and IsPrimary/ShippingPriority.
if you will only ever have one backup warehouse you could add ShippingSource_WareHouseID and ShippingSource_Backup_WareHouseID fields to the order. Although, this isn't the route I would go.
In SQL 2008 and up we have the wonderful addition of Filtered Indexes. These allow you to add a WHERE clause to your index -- resulting in a more compact index. It also has the added benefit of allowing you to accomplish some things that could only be done through triggers in the past.
You could put a Unique filtered index on OrderID & IsPrimary/ShippingPriority (WHERE IsPrimary = 0).
Add a comment or such if you want me to explain further.
re: how I should indicate that a certain record in a database is the "fallback" or default entry
use another column, isFallback, holding a binary value. I'm assuming your fallback warehouse won't have any cities associated with it.
As I see it, the fallback warehouse after all is just another warehouse and if I understood, every record in WarehouseCities has a reference to one record in Warehouses:
WarehouseCities(*)...(1)Warehouses
Which means that if there are a hundred cities without a dedicated warehouse they all will reference the id of an specific fallback warehouse. So I don't see any problem (which makes me thing I didn't understand the problem), even the model looks well defined.
Now you could identify if a warehouse is fallback warehouse with an attribute like type_warehouse on Warehouses.
EDIT after comment
Assuming there is only one fallback warehouse for the cities not present in WarehouseCities, I suggest to keep the fallback warehouse as just another warehouse and keep its Id (WarehouseId) as an application parameter (a table for parameters maybe?), of course, this solution is programmatically and not attached to your database platform.

How to insert values from multiple tables into another table?

I'm new to databases. I have 4 tables in total: 3 tables are populated automatically when the user logs on to Facebook. I want the values of the primary keys form those tables to be populated into the 4th table. How do I do this... I need help soon!
This is how the tables look:
table:attributes
fb_user : fb_uid, birhtday, gender, email.
company_master : com_id, com_name.
position_master : pos_id, pos_name.
And the 4th table goes like this:
[table]:[attributes]
work_history : work_id, fb_uid, com_id, pos_id.
fb_uid, pos_id and com_id are primary keys.
How to perform this using less database operations? Is there any way to use triggers for this to optimize?
Firstly what type of database are you using? Secondly, this seems to be a database design issue. You really should use a single unique primary key across all tables instead of using different primaries and mapping them. Since your using Facebook it would make sense to use their Facebook id as the primary for all tables then store the other ids as unique fields. This would also allow you to easily use useful features like joins to retrieve data from multiple tables at once. If this isn't practical, since for example, your using multiple logins (Facebook Google, etc) for the same user, you would then want to have a lookup table like you suggest as the driving table then use it to help populated the others. Ideally you want to minimize redundant data as much as possible to reduce the risk of data inconsistencies. If you are new to databases you should do some reading on database design and database normalization. A good design will help with scalability and prevent a lot of headaches and frustration.

SQL Server: One Table with 400 Columns or 40 Tables with 10 Columns?

I am using SQL Server 2005 Express and Visual Studio 2008.
I have a database which has a table with 400 Columns. Things were (just about manageable) until I had to perform bi-directional sync between several databases.
I am wondering what arguments are for and against using 400 column database or 40 table database are?
The table in not normalised and comprises of mainly nvarchar(64) columns and some TEXT columns. (there are no datatypes as it was converted from text files).
There is one other table that links to this table and is a 1-1 relationship (i.e one entry relates to one entry in the 400 column table).
The table is a list files that contained parameters that are "plugged" into a application.
I look forward to your replies.
Thank you
Based on your process description I would start with something like this. The model is simplified, does not capture history, etc -- but, it is a good starting point. Note: parameter = property.
- Setup is a collection of properties. One setup can have many properties, one property belongs to one setup only.
- Machine can have many setups, one setup belongs to one machine only.
- Property is of a specific type (temperature, run time, spindle speed), there can be many properties of a certain type.
- Measurement and trait are types of properties. Measurement is a numeric property, like speed. Trait is a descriptive property, like color or some text.
For having a wide table:
Quick to report on as it's presumably denormalized and so no joins are needed.
Easy to understand for end-consumers as they don't need to hold a data model in their heads.
Against having a wide table:
Probably need to have multiple composite indexes to get good query performance
More difficult to maintain data consistency i.e. need to update multiple rows when data changes if that data is on multiple rows
As you're having to update multiple rows and maintain multiple indexes, concurrent performance for updates may become an issue as locks escalate.
You might end up with records with loads of nulls in columns if the attribute isn't relevant to the entity on that row which can make handling results awkward.
If lazy developers do a SELECT * from the table you end up dragging loads of data across the network, so you generally have to maintain suitable subset views.
So it all really depends on what you're doing. If the main purpose of the table is OLAP reporting and updates are infrequent and affect few rows then perhaps a wide, denormalized table is the right thing to have. In an OLTP environment then it's probably not and you should prefer narrower tables. (I generally design in 3NF and then denormalize for query performance as I go along.)
You could always take the approach of normalizing and providing a wide-view for readers if that's what they want to see.
Without knowing more about the situation it's not really possible to say more about the pros and cons in your particular circumstance.
Edit:
Given what you've said in your comments, have you considered just having a long & skinny name=value pair table so you'd just have UserId, PropertyName, PropertyValue columns? You might want to add in some other meta-attributes into it too; timestamp, version, or whatever. SQL Server is quite efficient at handling these sorts of tables so don't discount a simple solution like this out-of-hand.

Database design - do I need one of two database fields for this?

I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).

Resources