I'm writing an application that will generate inspections for our locations. Basically, think of them as health inspection forms. Each "inspection" will have a series of questions and answers. The answers can be either numeric (1,2,3,4,5 - which will represent their point values), or multiple choice ('Yes','No') that will have map to points (1 for yes, 0 for no) and flat text answers that will not map to points but might be able to be used by the application layer for averaging. So for example, we could have a field for "Sauce Temperature" which carries no points, but could be used for reporting down the road.
Questions can be reused on multiple inspection forms but can have different point values. So can answers.
I'm having trouble figuring out the schema for this. My instinct says EAV would be a good way to go, but the more I think about it, the more I'm thinking more of a data warehouse model would be better.
Particularly, I'm having a problem figuring out the best way to map the min_points, max_points and no_points to each question/answer. This is where I am thinking I'm going to have to use EAV. I'm kind of stuck on it actually. If it was a survey or something where there were no points, or the same point value for each answer, it would be pretty simple. Question table, answer table, some boilerplate tables for input type and so forth. But since each question MAY have a point value, and that point value may change depending on which location is using that question, I'm not sure how to proceed.
So, the example questions are as follows
Was the food hot [Yes, No] Possible points = 5 (5 for yes, 0 for no)
Was the food tasty [1,2,3,4,5] Possible points = 5 (1 for 1, 2 for 2, etc)
Was the manager on duty [Yes, No] Possible points = 5 (5 for yes, 0 for no)
Was the building clean [1,2,3,4,5] Possible Points = 10 (2 for 1, 4 for 2, 6 for 3, etc)
Was the staff professional [Yes, No] Possible Points = 5 (5 for yes, 0 for no)
Freezer Temp [numerical text input]
Manager on duty [text input]
Since all the answers can have different data types and point values I'm not sure how to build out the database for them.
I'm thinking (Other tables, names and other imp details left out or changed for brevity)
CREATE TABLE IF NOT EXISTS inspection(
id mediumint(8) unsigned not null auto_increment PRIMARY KEY,
store_id mediumint(8) unsigned not null,
inspection_id mediumint(8) unsigned not null,
date_created datetime,
date_modified timestamp,
INDEX IDX_STORE(store_id),
INDEX IDX_inspection(inspection_id),
FOREIGN KEY (store_id) REFERENCES store (store_id)ON DELETE CASCADE,
FOREIGN KEY (inspection_id) REFERENCES inspection (inspection_id)ON DELETE CASCADE)
CREATE TABLE IF NOT EXISTS input_type(
input_type_id tinyint(4) unsigned not null auto_increment PRIMARY KEY,
input_type_name varchar(255),
date_created datetime,
date_modified timestamp)
CREATE TABLE IF NOT EXISTS inspection_question(
question_id mediumint(8) unsigned not null auto_increment PRIMARY KEY,
question text,
input_type_id mediumint(8),
date_created datetime,
date_modified timestamp)
CREATE TABLE IF NOT EXISTS inspection_option(
option_id,
value)
But here's where I'm kind of stuck. I'm not sure how to build the question answers tables to account for points, no points, and different data types.
Also, I know I'll need mapping tables for stores to inspections and so forth, but I've left those all off for now, since it's not important to the question.
So, should I make a table for answers where all possible answers (built from either the options table or entered as text) are stored in that table and then a mapping table to map an "answer" to a "question" (for any particular inspection) and store the points there?
I'm just not thinking right. I could use some help.
There’s no right or wrong answer here, I’m just tossing out some ideas and discussion points.
I would propose that the basic “unit” isn’t the question, but the pair of question + answer type (e.g. 1-5, text, or whatever). Seems to me that Was the food hot / range 1 to 5 and Was the food hot / text description are so very different you’d go nuts trying to relate a question with two (or more) answer types (let alone answer keys for those answers--ignore that for now, I pick up on that later). Call the pair a QnA item. You may end up with a lot of similar pairs, but hey, it's what you've got to work with.
So you have a “pool” of QnA items. How are they selected for use? Are specific forms (or questionnaires) built from items in the pool, or are they randomly selected every time a questionnaire is filled out? Are forms specifically related to location, or might a form be used at any location? How fussy are they at building their forms/questionnaires? How the QnA items are collected/associated with one another and/or there ultimate results is pretty important, and you should work it all out before you start writing code, unless you really like rewriting code.
Given a defined QnA item, you should also have an “answer key” for that item – a means by which a given answer (as based on the item's answer type) measured: Zero, Value, Value * 2, whatever. This apparently can vary from usage to usage (questionnaire to questionnaire? Does it differ based on the location at which the questionnaire is presented? If so, how or why?) Are there standardized answer key algorithms (alwyas zero, always Value * 2, etc) or are these also extremely free-form? Determining how they are used/associated with the QnA items will be essential for proper modeling.
Related
I'm designing the database for an application in which the user is presented with questions, and he must answer them. Think of it either as a questionnaire or as a quiz game, the concept applies to both. I plan to have:
a table with the questions
a table with the possible answers, each of them linked to the question it belongs to with a foreign key (let's keep things simple and assume it's a 1:many relationship, where answers cannot be shared between questions)
a table with the answers that users provided (with foreign keys to the question, the answer and the user ID)
Since many of the questions will be common cases, like yes/no, I decided I'd specify a "question type" enumeration to each question. If the application sees a yes/no question, for example, it means there are no answers in the database, and the application will automatically add the two answers, "Yes" and "No". This saves me hundreds or thousands of useless rows in the answers table.
However, I'm not sure how I should define the table to record user answers. Without the special types of questions, I'd just record the question ID, the answer ID and the user ID, which means "user X answered Y to question Z". However, "yes/no" questions would not have a matching answer in the table, so I can't use the answer ID.
Even making the answers shareable between questions (by making a many-to-many relationship between questions and answers) is not a good solution. Sure, it would allow me to define "Yes" and "No" as regular answers, but then applications should be aware that a "yes/no" question uses answers (say) 7 and 8 - or, when creating a "yes/no" question answers 7 and 8 should be bound to that question. But this means that these "special" answers' IDs must be hardcoded somewhere else. Also, this would not scale well should I add more special types of question in the future.
How should I proceed? Ideally, I need to store in each row of my "user answers" table either a fixed value or a foreign key to the answers table. Is there a better solution than using two columns, one of which is NULL?
I'm using SQL Server, if that matters.
Based on your description I think I'd go on the route of adding another column to the table and making the FK column nullable.
You'd probably have only a few choices for those special questions, so a nullable TINYINT datatype would cut it, and it is only 1 extra byte for your answer row. If this extra column happen to raise the number of columns to more than a multiple of eight, say you go from 8 to 9 or 16 to 17, than you pay another extra byte for the growth of the null bitmap. But it's 2 extra bytes per row worst case.
I would like to create a database for MTG cards I own. What would the design be?
I would like to store the following information about each card:
1. Name of card.
2. Set the card belongs to.
3. Condition of card.
4. Price it sold for.
5. Price I bought it for.
Here is a little info about MTG cards in general:
1. Every card has a name.
2. Every card belongs to a set.
3. A card may have a foil version of itself.
4. Card name, set it belongs to, and whether it's foil or not makes it unique.
5. A card may be in multiple sets.
6. A set has multiple cards.
The gimmick is that in my collection I may have several copies of the same card but with different conditions, or purchased price, or sold price may be different.
I will have another collection of mtg cards that have been sold on eBay. This collection will have the price/condition/date/whether it was a "Buy It Now" or Bid, etc.
The idea is to find out what price I should sell my cards based on the eBay collection.
It's not a programming question, it's a modeling question. Anyone who is programming but not modeling, is a coder, not a programmer. That's just a step above data entry. Modeling is a fundamental aspect of programming as it deals directly with abstraction, and abstraction is the real genius of computer programming.
Normalization and database design is an excellent way for someone to become better at programming in general as normalization is also an abstraction process.
Abstraction is arguably the most difficult aspect of computer programming, particularly since computer programming requires a person to both be especially pedantic and literal (in order to properly work with the steadfast stubbornness and idiocy that is a computer) as well as handle and work in a very high level and abstract space.
For example, the arguments in design meetings are not over language syntax.
So, that said. I have updated the schema in minor ways to address the changes.
create table card (
card_key numeric not null primary key,
name varchar(256) not null,
foil varchar(1) not null); -- "Y" if it's foil, "N" if it is not.
create table set (
set_key numeric not null primary key,
name varchar(256) not null);
create table cardset (
card_key numeric not null references card(card_key),
set_key numeric not null references set(set_key));
create table condition (
condition_key numeric not null primary key,
alias varchar(64),
description varchar(256));
create table saletype (
saletype_key numeric not null primary key,
alias varchar(64),
description varchar(256));
create table singlecard (
singlecard_key numeric not null primary key,
card_key numeric not null references card(card_key),
condition_key numeric not null references condition(condition_key),
purchase_date date,
purchase_price numeric,
saletype_key numeric references saletype(saletype_key),
sell_date date,
sell_price numeric,
notes varchar(4000));
A more detailed explanation.
The card table is the concept of the card vs an actual card. You can have a card row without having any actual cards in hand. It models any details of the card that are common to all cards. Obviously MTG cards have very many details (color text as some one mentioned), but these are likely not important to this kind of model, since this is to track actual cards for the sake of collecting and sale. But if there was any desire to add any other attributes, like card rarity, the 'card' table would be the place to put them.
The set table is for the sets. I don't know what a set is, only what is posited here (there is also casual reference to a series, I don't know if they are related or not). Sets have a name, and are used to group cards. So, we have a 'set' table.
The cardset table is the many-to-many joiner table. Since a set can have several cards, and a card can belong to several sets, the model needs something to represent that relationship. This is a very common pattern in relational databases, but it is also non-obvious to novices.
There are two simple lookup tables, the condition and saletype table. These two tables are here for normalization purposes and let the user standardize their terms for these two categories of data. They each have an 'alias' and a 'description'. The alias is the short english version: 'Good', 'Poor', 'Auction', 'Buy it now', while the description is the longer english text 'Poor cards show sign of wear, bending, and rub marks'. Obviously someone doing this for their own purpose likely do not need the description, but it's there as a habit.
Finally, the meat of the system is the singlecard table. The singlecard table represents an actual, in your hand card. It models all of the characteristic that make each actual card different from each other. An individual card is not a member of a set (at least not from the description), rather that's a higher level concept (such as how it was published -- all "Hero: Bartek the Axe Wielder" cards are part of the "Dark Mysteries" and "Clowns of Death" sets, or whatever). So, the single card needs only reference its parent card table, with the actual common card characteristics.
This single card has the references to the card's condition and how it was sold via the foreign keys to the appropriate tables. It also has the other data, such as the dates and prices that were necessary.
Based on what was given, this should meet the basic need.
It would be a good exercise to remodel this yourself. Start with the your most basic needs, and the best model that you understand to make. Then contrast it to what I've written here, and then use that book to perhaps try and understand how whatever your simple design may have been becomes this design.
Note that there is no way to actually enforce that a card is a member of ANY set, or that a set has any cards. That will be an application logic problem. This is the one of this issues with many-to-many joiner tables. It can model the relationship, but it can not enforce it.
Ok, this isn't really a programming question as such; it's very high-level and you haven't indicated what database you'll be using and what you've tried.
However, just to give you a few points, the first list (in your question) almost certainly represents most of the information that you'll need to store in your database.
What you need to figure out is which combination of those fields can be used to mark a card uniquely from another.
If as you say, the Date of Purchase and the Cost can vary then in the database you choose, you would need to make an index based upon those fields; this will give you the ability to store more than one instance of the same card.
I would read up on 'Relational databases'. If you're really lost, I suggest picking up a copy of 'SQL for dummies', SQL is the language that most database providers use and it has step by step instructions and tutorials for building your own databases'.
I suggest you look at data file from www.mtgjson.com
By merely seeing what field types they selected and reading comments and documentation you will be likely to avoid many caveats.
For instance you will see how they handle duplicate names, cards that are related to each other like one is flipped or rotated or meld together version of another and many many more little nuances.
My collegues don't like auto generated int serial number by database and want to use string primary key like :
"camera0001"
"camera0002"
As camera may be deleted, I can not use "total nubmer of camera + 1" for id of a new camera.
If you were me, how will you generate this kind of key in your program?
PS : I think auto generated serail number as primary key is OK, just don't like arguing with my collegues.
Don't do it like "camera0001"! argue it out, that is a horrible design mistake.
try one of these:
http://en.wikipedia.org/wiki/Database_normalization
http://www.datamodel.org/NormalizationRules.html
just google: database normalization
Each column in a database should only contain 1 piece of information. Keep the ID and the type in different columns. You can display them together if you wish, but do not store them the together! You will have to constantly split them and make simple queries difficult. The string will take a lot of space on disk and cache memory, if it is a FK it will waste space there too.
have a pure numeric auto column ID and a type column that is a foreign key to a table that contains a description, like:
Table1
YourID int auto id PK
YourType char(1) fk
TypeTable
YourType char(1) PK
Description varchar(100)
Table1
YourID YourType YourData....
1 C xyz
2 C abc
3 R dffd
4 C fg
TypeTable
YourType Description
C Camera
R Radio
I don't agree that a sequence number is always the best key. When there is a natural primary key available, I prefer it to a sequence number. If, say, your cameras are identified by some reasonably short model name or code, like you identify your "Super Duper Professional Camera Model 3" as "SDPC3" in the catalog and all, that "SDPC3" would, in my opinion, be an excellent choice for a primary key.
But that doesn't sound like what your colleagues want to do here. They want to take a product category, "camera", that of course no one expects to be unique, and then make it unique by tacking on a sequence number. This gives you the worst of both worlds: It's hard to generate, a long string which makes it slower to process, and it's still meaningless: no one is going to remember that "camera0002904" is the 3 megapixel camera with the blue case while "camera0002905" is the 4 megapixel camera with the red case. No one is going to consistently remember that sort of thing, anyway. So you're not going to use these values as useful display values to the user.
If you are absolutely forced to do something like this, I'd say make two fields: One for the category, and one for the sequence number. If they want them concatenated together for some display, fine. Preferably make the sequence number unique across categories so it can be the primary key by itself, but if necessary you can assign sequence numbers within the category. MySQL will do this automatically; most databases will require you to write some code to do it. (Post again if you want discussion on how.) Oh, and I wouldn't have anyone type in "camera" for the category. This should be a look-up table of legal values, and then post the primary key of this look-up table into the product record. Otherwise you're going to have "camera" and "Camera" and "camrea" and dozens of other typos and variations.
Have a table with your serial number counters, increment it and insert your record.
OR
Set the Id to 'camera' + PAD((RIGHT(MAX(ID), 4) + 1), '0', 4)
Since I know there are lots of expert database core designers here, I decided to ask this question on stackoverflow.
I'm developing a website whose main concern is to index every product that is available in the real world, like digital cameras, printers, refrigerators, and so on. As we know, each product has its own specifications. For example, a digital camera has its weight, lens, speed of shutter, etc. Each Specification has a type. For example, price (I see it like a spec) is a number.
I think the most standard way is to create whatever specs are needed for a specified product with its proper type and assign it to the product. So for each separate product PRICE has to be created and the type number should be set on it.
So here is my question, Is it possible to have a table for specs with all specs in it so for example PRICE with type of number has been created before and just need to search for the price in the table and assign it to the product. The problem with this method is I don't see a good way to prevent the user from creating duplicate entries. He has to be able to find the spec he needs (if it's been added before), and I also want him to know that the spec he finds is actually is the one he needed, since there may be some specs with the same name but different type and usage. If he doesn't find it, he will create it.
Any ideas?
---------------------------- UPDATE ----------------------------
My question is not about db flexibility. I think that in the second method users will mess the specs table up! They will create thousand of duplicate entries and also i think they wont find their proper specs.
I have just finished answering Dynamic Table Generation
which discusses a similar problem. Take a look at the observation pattern. If you replace "observation" by "specification" and "subject" by "product" you may find this model useful -- you will not need Report and Rep_mm_Obs tables.
My suggested data model based on your requirements:
SPECIFICATIONS table
SPECIFICATION_ID, pk
SPECIFICATION_DESCRIPTION
This allows you to have numerous specifications, without being attached to an item.
ITEM_SPECIFICATION_XREF table
ITEM_ID, pk, fk to ITEMS table
SPECIFICATION_ID, pk, fk to SPECIFICATIONS table
VALUE, pk
Benefits:
Making the primary key to be a composite ensures the set of values will be unique throughout the table. Blessing or curse, an item with a given specification could have values 0.99 and 1.00 - these would be valid.
This setup allows for a specification to be associated with 0+ items.
My software went in production some days ago and now I want to argue a bit about the database structure.
The software collects data about ships, currently 174 details for each ship, each detail can be a text value, a long text value, a number (of a specified length, with or without a specified number of decimals), a date, a date with time, a boolean field, a menu with many values, a list of data and more.
I solved the problem with the following tables
Ship:
- ID - smallint, Autoincrement identity
- IMO - int, A number that does not change for the life of the ship
ShipDetailType:
- ID - smallint, Autoincrement identity
- Description - nvarchar(200), The description of the value the field contains
- Position - smallint, The position of the field in the data input form
- ShipDetailGroup_ID - smallint, A key to the group the field belongs to in the data input form
- Type - varchar(4), The type of the field as mentioned above
ShipDetailGroup
- ID - smallint, Autoincrement identity
(snip...)
ShipMenuPresetValue
- ID - smallint, Autoincrement identity
- ShipDetailType_ID - smallint, A key to the detail the values belongs to
- Value - nvarchar(100), The values preset in the menu type detail
ShipTextDetail
- ID - smallint, Autoincrement identity
- Ship_ID - smallint, A Key to the ship the detail belongs to
- ShipDetailType_ID - smallint, a Key to the detail type of the value
- Text - nvarchar(500), the field containing the detail's value
- ModifiedDate - smalldatetime
- User_ID - smallint, A key to the user table
ShipTextDetailHistory
(snip...)
This table is the same as the ShipTextDetail and contains every change to the details.
Other tables for the list detail type, each with the specified fields required for the list, ...
I just read this article: http://thedailywtf.com/Articles/The_Inner-Platform_Effect.aspx and http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
The articles says that this is not the right way to handle the problem.
My customer has a management gui for the details and groups as he changes the details descriptions and adds more details.
The data input form is dynamically built by reading the structure from the DetailGroups and DetailTypes, each detail type generates a specified input control.
The comments suggests that another way of solving this matter is dynamically creating and removing columns from the table.
What do you think?
Diagram Screenshot: http://img24.imageshack.us/my.php?image=66604496uk3.png
I would refactor your code if:
Your customer complained
You found something that didn't work
You found a way that the code couldn't handle a
change you knew was going to happen
in the future.
You remembered to write unit tests that will allow you to refactor, right?
*As far as the structure you have there, I've seen structures like it before. It's a little cumbersome but it is standard in many places. One thing to remember is that while its possible to dynamically add and remove columns from databases, the internal storage mechanism of the database doesn't necessarily expect you to be adding and removing these columns continuously. But I don't think this is very relevant compared to the points above, which boil down to: *Does it work?
I've seen this approach before and it's presented loads of performance issues once the data volume has grown. The kind of problems you'll encounter come when you need to return multiple items and use multiple criteria in your where clause. You join back and forth between Ship and ShipTextDetail to get all your select columns - maybe you have to do that 10/20 times ? You then do the same for your criteria maybe 2-3 times. Now you have a query with so many joins it runs really slowly. Next you 'pre-cook' some of the data to improve performance, ie you drag out common data into a fixed table structure - ah you've returned to a semi-normalised model.
My recommendation would be this - you know the information for 174 fields those are your core attributes. Your customer may add to that list, and may change the description of the fields, but it's still a really good starting point. Create a proper DataModel based around those, and then build in an extensability mechanism, as you have already done, but only for the new fields. The metadata - the descriptions of the fields, can reside in another table, or potentially in a resource file (useful for internationalisation?) and that gives some flexibility for existing fields.
I agree with Joe, you may not have problems if your DB is small, ie <1000 ships and your selects are simple. Although with 174 attributes to chose from this doesn't appear likely. I think you should change some of the 'obvious' fields first, ie I'd assume you have a Ship.Name, Ship.Owner, Ship.Weight, Ship.Registration ...
Good Luck.
I've done similar things, but there are a couple problems with this specific implementation:
You are storing numbers, booleans, dates, etc. as strings. This might be less than ideal. An alternative is to implement separate classes (inheriting from a base) for the different data types then store them in tables made for their data type.
Do the properties that you track change very frequently? Are they a different set per tanker? If not, it might be better to make objects rather than property bags to store all the data. Those objects can then be persisted to the database.
From a performance standpoint, either approach will be fine. How many ships could there possibly be? All the data is going to fit into RAM on any server.