Database design for form with many fields [closed] - sql-server

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last year.
Improve this question
A form contains 30-40 different fields (some more might be added in the future, so this has to be possible) and I need to find a good data structure to save them in a database (MS SQL Server). These fields can have different types like checkboxes (true/false), selects (1 to x possible values to choose from) and text fields. Every form can be identified by a composite key of the operation number (string) and a revision (int).
Here is an example, just think of some more of these types of fields:
Example form
I would try it this way:
- table form
- operation_number nvarchar(255) PK
- revision int PK
- created_at datetime
------------------------------------------------
This will be used for all true/false checkboxes:
- table checkbox_option
- id int PK
- form_operation_number nvarchar(255) FK (for form.operation_number)
- form_revision int FK (for form.revision)
- value bit
------------------------------------------------
This will be used for all text fields (see "Locations" and "Additional comments" in the example):
- table text_option
- id int PK
- form_operation_number nvarchar(255) FK (for form.operation_number)
- form_revision int FK (for form.revision)
- value varchar(max)
------------------------------------------------
This will be used for all selects:
- table select_option_name
- id int PK
- form_operation_number nvarchar(255) FK (for form.operation_number)
- form_revision int FK (for form.revision)
- name string (the text for the label, like "color" in the example)
------------------------------------------------
- table select_option_junction
- select_option_name_id int PK FK (for select_option_name.id)
- select_option_values_id int PK FK (for select_option_value.id)
------------------------------------------------
Table for all the options of a select field:
- table select_option_values
- id int PK
- value nvarchar(255)
Does this seem like a good design or have I forgotten something important? Maybe you have a better idea? Thank you.

id int PK
You will see much advice encouraging this design choice. I recommend against.
Begin by going a little deeper than you have so far. Determine the domain of each column. Decide on its relationship to the form. Any column for which more than one value may appear in the form (as, say, order items in sales order) goes in another table, related to the form by a foreign key.
Figure out what values in any row uniquely identify the row, i.e., distinguish it from all others. That is a natural key. By using natural keys, you're forced to understand the cardinality of the data, and you enable the DBMS to enforce it.
Your color column, for example, would have a table of all possible colors, a domain table. You could use the color name as the primary key, because no two colors have the same name. Then, in your form table, you have a color column with a foreign key to the colors table. If the user somehow tried to insert a form-table row with a color not in the colors table, the DBMS would reject it with a foreign key violation.
If OTOH you choose to use id int PK as the primary key for your colors table, nothing prevents, say, "blue" from appearing twice in the table, perhaps ids 7 and 17. To prevent that, you have to do the same analysis needed to define natural keys. Adding the surrogate id key just creates more work.
The natural key also is easier to work with.
With an id primary key, your form-table has only an opaque number in the color column. If you use the natural key, you'll see "red" instead of maybe 11.
Someone will warn you that natural keys can change. A person's name, for example, may change. That's usually evidence that the column isn't actually a natural key. More important: if you lack a solid example in the data, any design effort to compensate for the possibility that stable data will change will likely prove insufficient (and wrong) when that change does arrive, if it does.
There are cases where no natural key exists, or seems not to. Sometimes, as with grocery items or stocks, the indistinguishable items can be accounted for by counting them in a quantity column. For rows that change over time (and are kept over time), adding a date to the key often works.
Occasionally, though, you're adding something completely new to the database, something that no human being can be bothered to name. That's when you'll need to invent a key.

Related

Normalization and primary keys

In a given table if there is no primary key and even impossible to create a composite primary key then what is the normal form of that table ?
If its zero(0NF) adding a new column and making it primary key will convert this table to 1NF ?
Normal forms apply to relations, which are mathematical structures. Tables can be used to represent relations, but this requires some rules to ensure that the table doesn't contain more or less information than the corresponding relation.
In order for a table to represent a relation:
all rows and columns must be unique
the order they're in mustn't matter
all significant information must be represented as values in cells (i.e. fonts, highlighting, etc, mustn't matter)
every cell must contain one value (doesn't matter how simple or complex that value is)
Also, the relational model cares about candidate keys, not primary keys. A relation can have multiple candidate keys. A primary key is just a selected candidate key that is used by some disciplines (e.g. the entity-relationship model) or by some database management systems (e.g. for physical record ordering).
With all that said, I can now answer your question. If your table follows the rules and specifically the rows are all unique, then there will be at least one candidate key, on all the columns together at worst. If your table's rows aren't unique, then the table doesn't represent a relation and the normal forms don't apply. A surrogate key (like an auto-increment column) can be added to identify rows uniquely, but that isn't necessarily sufficient on its own to make a table represent a relation (1NF).
BTW, I suggest you avoid using "0NF" or "UNF". Non-relational tables don't have a level of normalization, so attaching any kind of "NF" to them is misleading.
As long as you are talking about tables, there is one further case that needs to be covered. It's the case of duplicate rows.
Duplicate rows are rows that are identical in appearance but not in row number. Such a table cannot have a primary key. Sometimes duplicate rows represent the same information. Sometimes not.
For example, consider a table with just four columns: customerid, productid, quentity, price. If a customer orders the same product twice, we'll have two identical rows, representing different inforation. Ths is not good.
Note that the corresonding thing cannot happen with relations. If two tuples in a relation have the same appearance, then they are the same tuple.
As to the other points, they are covered by excellent earlier answers.
before you wan to check for normalization your table must have a Primary key(the primary key is playing lead role in Relational DB,...).
1NF: says that all of your table attributes must be single valued.
Answer of Question 1 : In a given table if there is no primary key and even impossible to create a composite primary key then what is the normal form of that table ?
Answer : If it is no primary key in relation and if it is impossible to create a composite primiary key(According to me your question says ,even if combine all the column of row to make candidate key then also it will not able to identify your relationship uniquly(duplicate rows are there), hence it is not in any normal form.
Answer of Question 2:
If you add some column(having unique values in it) and if all the cell contains only one value then it is in 1NF.
Still if you need some clarification can ask in comment box.
0NF is not any form of normalization. refer C.J. Date or Henry korth(database management system book)
Hope this helps.

Primary keys & database normalization [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Should each and every table have a primary key?
I've been working on a school project about Database normalization.
I need help in normalizing a table that has no primary key
The table I'm having difficulty with is a table for subscriptions and it's structure is like this:
itemSubscribed emailAddress
-------------- ------------
1 a#b.com
1 b#c.com
1 a#b.com
2 x#z.com
2 aaa#b.com
3 a#b.com
Notice that itemSubscribed and emailAddress values may repeat, so neither can be a primary key.
This structure will work fine with my code for I can send an email to all item X subscribers when there's an update in item X but my teacher requires a normalized database and 1NF must have a primary key.
If I created an autogenerated primary key for the sake of having a primary key I can't proceed with 3NF for it requires that all columns are dependent upon the primary key, w/c is not the case.
Should I create a autogenerated primary key? Am I missing something in regards to 3NF?
A table with repeating rows does not represent a relation. A relation is a set of tuples. A set never has the same element in it more than once. A bag is like a set, but can have multiple instances of elements that look identical.
In the table you give us, I presume that itemSubscribed is a count, and the the two rows that have itemSubscribed equal to one with the same emailAddress describe different events.
But that is in your mind, and not visible in the data.
You are going to get into trouble with this table. In particular, there is no way to distinguish between an erroneous duplicate entry, and two valid entries that look alike.
Are you allowed to have the same e-mail address subscribed to one item multiple times? If not your natural key is obvious: itemSubscribed and emailAddress. Even if you chose to have an artificial primary key in this case, you'd probably want a unique index across the two columns.
In answer to your question, yes it is really bad not to have a primary key. The database must have a way to identify a specific record. Suppose you wanted to update the record shown below in bold but not the one italics. How would you do that without a primary key.
itemSubscribed emailAddress
1 a#b.com
1 b#c.com
1 a#b.com
In a database class, I would fail you if you had any table without a primary key, it is that critical to database design.
Now I suspect that you would not want to actually have the data as shown unless you had other columns that were differnt. Why do you really want two records with the same items subscribed and the same email address? It is better to have a PK or unique index to prevent this sort of bad data. I suspect you really have a natural key of both fields and just currently have bad data.

Composite primary keys and influence on natural/surrogate keys usage [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a fairly simple question about natural/surrogate key usage in a well-defined context which manifests itself often, and that i'm going to illustrate.
Let's assume you are designing the DB schema for a product using SQL Server 2005 as DBMS. For the sake of simplicity let's say there are only two entities involved, which have been mapped to 2 tables, Master and Slave.
Assume that:
We can have 0..n Slave entries for a single Master's row;
Column set (A, B, C, D) in Master is the only candidate for primary key;
Column B in Master is subject to changes over time;
A, B, C, D are a mix of varchar, decimal and bigint columns.
The question is: how would you design keys/constraints/references for those tables?
Would you rather (argumenting your choice):
Implement a composite natural key on Master on (A, B, C, D), and a related composite foreign key on Slave, or
Introduce a surrogate key K on Master, let say an IDENTITY(1,1) column with a related (single column) foreign key on Slave, adding a UNIQUE constraint on Master's (A, B, C, D), or
Use a different approach.
As for me I'd go with option 2), mainly because of assumption 3) and performance-wise, but I'd like to hear someone else's opinion (since there is quite an open debate on the topic).
I'd go for option 2. Keep it simple.
It ticks the boxes (narrow, numeric, unchanging, strictly monotonically increasing) for a useful clustered index (which is the default of PKs in SQL Server).
You need to force the uniqueness on A,B,C,D, though, to preserve data integrity, as noted.
There is nothing conceptually wrong with option 1, but as soon as you require more indexes on "master" then the wide clustered key becomes a liability. Or more work to determine which index is best as clustered.
Edit:
In case of any confusion
the choice of which index is clustered is separate to the choice of key
Your assumption (3) tends to suggest option (2) because it is inconvenient and potentially time consuming to deal with cascading updates of the primary key of Master when B changes.
Of course it depends on how often this will occur: if it is something that you expect to happen "all the time" then it suggests (A,B,C,D) is a poor choice of primary key; on the other hand, if it will only rarely happen, then (A,B,C,D) may be a good choice of primary key, and having those columns in Slave may have some advantages (no need to join to Master all the time to find out those column values).
Either 1,2 or 3. There isn't necessarily enough information to determine whether a surrogate is necessary or how useful it might be. Are any of the compound key attributes also part of some key or constraint in the Slave table? Is there some other key of Master that could be used as a foreign key? The fact that a key value may change shouldn't be the deciding factor because any key value may need to change - surrogates are no exception.
there is quite an open debate on the
topic
Unfortunately, much of that debate is based on the mistaken assumption that you need to choose between either a surrogate or a natural key. As your option 2 rightly suggests you can use both as the need arises. One is not a substitute for the other because simple keys and compound keys on different attributes obviously mean different things in your data model and enforce different constraints on your data.
Three.
The first suggestion could be drop composte keys and add a new field for an automatic key, that is not related to the other fields. To both master and detail table.
It could be an integer autoincrement key or a Global Unique Identifier. Keeping composite keys in any S.Q.L. server brand, is dificult, and sometimes innecesarily difficult.
But, if you need to keep the composite key in the master table, you may still wonder how to deal with the slave table primary key. Many developers usually take the same fields for the primary key, from the master table, put it on the slave / detail table, and add an additional consecutive integer key. But, as you mention, if you have to change the key for master table, and keep the already existing detail rows, you get into trouble with the referencial integrity constraints.
Summary: I suggest, add a new field for the slave table, that is not related to the master table, add a foreign key of fields to the slave / detail table, that references the master table. Keep the detail primary key and the foreign key to master table, independant.

How to model a custom type in a relational database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm fairly new to database design, but I understand the fundamentals. I'm creating a relational database and I'd like to do something similar to creating a reusable type or class. For example, let's say I have a Customer table and a Item table. Customer and Item are related by a standard 1-to-many relationship, so Item has a column called CustomerId.
I'd also like to have multiple "notes" for each Customer and each Item. In an normal OOP model, I'd just create a Note class and create instances of that whenever I needed. Of course, a relational database is different. I was thinking about having a Note table, and I'd like a 1-to-many relationship between Customer and Note, as well as Item and Note. The problem then is that the Note table will have to have a column for each other table that wishes to use this "type". (see example below)
I also thought that instead, I could create an intermediate table between Note and Customer/Item (or others). This would allow me to avoid having extra columns in Note for each table referencing it, so note could remain unchanged as I add more tables that require notes. I'm thinking that this is the better solution. (see example)
How is this sort of situation usually handled? Am I close to correct? I'd appreciate any advice on how to design my database to have the sort of functionality I've described above.
Yes, your concluding example is correct, and should be the way to go.
You model a "complex type" in relational databases by creating tables. You can consider the table as a class: in fact ORM solutions often map a class directly to a table. An instance of the custom type is a row in its table, and the instance can be referenced by the value of the primary key.
You can use your custom "complex type" for fields in other tables by using the same data type as the primary key of the "complex type", and enforcing the relationship with a foreign key constraint:
Let's build a complex type for "countries":
CREATE TABLE countries (
iso_code char(2) NOT NULL,
name varchar(100) NOT NULL,
population bigint
PRIMARY KEY (iso_code)
);
And let's add a couple of "country" instances:
INSERT INTO countries VALUES ('IE', 'Republic of Ireland', 4470700);
INSERT INTO countries VALUES ('US', 'United States of America', 310403000);
Now we're going to use our complex "countries" type in a "users" table:
CREATE TABLE users (
id int NOT NULL, -- primitive type
name varchar(50) NOT NULL, -- primitive type
age int, -- primitive type
country char(2), -- complex type
PRIMARY KEY (id),
FOREIGN KEY (country) REFERENCES countries (iso_code)
);
With the above model, we are guaranteed that the country field of the users table can only be a valid country, and nothing but a valid country.
In addition, using a junction table, as you suggested, is also a suitable approach to deal with that kind of polymorphic relationship. You may be interested in checking out the following Stack Overflow posts for some further reading on this topic:
How can you represent inheritance in a database?
Possible to do a MySQL foreign key to one of two possible tables?
I think you can best add a Note field to the Customer table and the Item table. In this note field (foreign key) you can store the id of the nota that belongs the the Customer / Item. To make sure you can attach multiple notes to a Customer or Item you could choose to add a Notes table and attach single Notes to the "Notes" table that you can attach to your Customer / Item table.

What's the best way to handle one-to-one relationships in SQL? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Let's say I've got Alpha things that may or may not be or be related to Bravo or Charlie things.
These are one-to-one relationships: No Alpha will relate to more than one Bravo. And no Bravo will relate to more than one Alpha.
I've got a few goals:
a system that's easy to learn and
maintain.
data integrity enforced within my
database.
a schema that matches the
real-world, logical organization of
my data.
classes/objects within my
programming that map well to
database tables (à la Linq to SQL)
speedy read and write operations
effective use of space (few null fields)
I've got three ideas…
PK = primary key
FK = foreign key
NU = nullable
One table with many nullalbe fields (flat file)…
Alphas
--------
PK AlphaId
AlphaOne
AlphaTwo
AlphaThree
NU BravoOne
NU BravoTwo
NU BravoThree
NU CharlieOne
NU CharlieTwo
NU CharlieThree
Many tables with zero nullalbe fields…
Alphas
--------
PK AlphaId
AlphaOne
AlphaTwo
AlphaThree
Bravos
--------
FK PK AlphaId
BravoOne
BravoTwo
BravoThree
Charlies
--------
FK PK AlphaId
CharlieOne
CharlieTwo
CharlieThree
Best (or worst) of both: Lots of nullalbe foreign keys to many tables…
Alphas
--------
PK AlphaId
AlphaOne
AlphaTwo
AlphaThree
NU FK BravoId
NU FK CharlieId
Bravos
--------
PK BravoId
BravoOne
BravoTwo
BravoThree
Charlies
--------
PK CharlieId
CharlieOne
CharlieTwo
CharlieThree
What if an Alpha must be either Bravo or Charlie, but not both?
What if instead of just Bravos and Charlies, Alphas could also be any of Deltas, Echos, Foxtrots, or Golfs, etc…?
EDIT: This is a portion of the question: Which is the best database schema for my navigation?
If you want each Alpha to be related to by only one Bravo I would vote for the possibility with using a combined FK/PK:
Bravos
--------
FK PK AlphaId
BravoOne
BravoTwo
BravoThree
This way one and only one Bravo may refer to your Alphas.
If the Bravos and Charlies have to be mutually exclusive, the simplest method would probably to create a discriminator field:
Alpha
--------
PK AlphaId
PK AlphaType NOT NULL IN ("Bravo", "Charlie")
AlphaOne
AlphaTwo
AlphaThree
Bravos
--------
FK PK AlphaId
FK PK AlphaType == "Bravo"
BravoOne
BravoTwo
BravoThree
Charlies
--------
FK PK AlphaId
FK PK AlphaType == "Charlie"
CharlieOne
CharlieTwo
CharlieThree
This way the AlphaType field forces the records to always belong to exactly one subtype.
I'm assuming you will be using SQL Server 2000 / 2005. I have a standard pattern for 1-to-1 relationships which I use, which is not too dissimilar to your 2nd idea, but here are the differences:
Every entity must have its own primary key first, so your Bravo, Charlie, etc tables should define their own surrogate key, in addition to the foreign key column for the Alpha table. You are making your domain model quite inflexible by specifying that the primary key of one table must be exactly the same as the primary key of another table. The entities therefore become very tightly coupled, and one entity cannot exist without another, which is not a business rule that needs to be enforced within database design.
Add a foreign key constraint between the AlphaID columns in the Bravo and Charlie tables to the primary key column on the Alpha table. This gives you 1-to-many, and also allows you to specify whether the relationship is mandatory simply by setting the nullability of the FK column (something that isn't possible in your current design).
Add a unique key constraint to tables Bravo, Charlie, etc on the AlphaID column. This creates a 1-to-1 relationship, with the added benefit that the unique key also acts as an index which can help to speed up queries that retrieve rows based on the foreign key value.
The major benefit of this approach is that change is easier:
Want 1-to-many back? Drop the relevant unique key, or just change it to a normal index
Want Bravo to exist independently of Alpha? You've already got the surrogate key, all you do is set the AlphaID FK column to allow NULLs
Personally, I've had lots of success with your second model, using a PK/FK on a single column.
I have never had a situation where all Alphas were required to have a record in a Bravo or Charlie table. I've always dealt with 1 <-> 0..1, never 1 <-> 1.
As for your last question, that's just that many more tables.
One more approach is having 3 tables for storing the 3 entities and having a separate table for storing the relations.
You could have a join table that specifies an Alpha and a related ID. You can then add another column specifing if it is an ID for Bravo, Charlie or whatever. Keeps the column creep down on Alpha but does add some complexity to joining queries.
I have an example working pretty well so far that fits your model:
I Have Charlie and Bravo Tables Having the Foreign Key alpha_id from Alpha. Like your first example, except alpha is not the Primary Key, bravo_id and charlie_id are.
I use alpha_id on every table I need to address to those entities, so, to avoid a SQL that may cause some delay researching both Bravo and Charlie to find which one Alpha is, I created a AlphaType table and on Alpha table I have its id (alpha_type_id) as foreign key. That way I can know in a programmatic way which AlphaType I am dealing with without Joining tables that may have zillions of records. in tSQL:
// For example sake lets think Id as a CHAR.
// and pardon me on any mistake, I dont have the exact code here,
// but you can get the idea
SELECT
(CASE alpha_type_id
WHEN 'B' THEN '[Bravo].[Name]'
WHEN 'C' THEN '[Charlie].[Name]'
ELSE Null
END)
FROM ...
You raise a lot of questions that make it hard to select any of your proposed solutions without a lot more clarification on the exact problem you are trying to solve. Consider not just my clarification questions, but the criteria that you will use to evaluate my questions, as an indication of the amount of detail required to solve your problem:
a system that's easy to learn and maintain.
What "System" will it be easy to learn and maintain? The source code of your app, or the app's data via it's end-user interface?
data integrity enforced within my database.
What do you mean by "enforced within my database"? Does this mean you cannot by any means control data integrity any other way, i.e. the project requires only DB-based data integrity rules?
a schema that matches the real-world, logical organization of my data.
Can you provide us the real world, logical organization to which you are referring? It's impossible to infer it from your three examples of the data you are trying to store -- i.e. suppose all three of your structures are completely wrong. How would we know that unless we know the real-world spec?
classes/objects within my programming that map well to database tables (à la Linq to SQL)
This requirement sounds like your hand is being forced to create this with linq to SQL, is that the case?
speedy read and write operations
What is "speedy"? .03 seconds? 3 seconds? 30 minutes? It's unclear because you're not specifying the data size and type of operations to which you are referring.
effective use of space (few null fields)
Effective use of space has nothing to do with the number of null fields. If you mean a normalized database structure, that will depend again on the real-world spec's and other design elements of the application that have not been provided in the question.
I'd go with option 1 unless I had a significant reason not to. It might not cost you as much space as you think, esp. if you are using varchars in Bravo. Don't forget that splitting it will cost you for foreign keys, secondary identity and needed indexes.
A place where you might run into trouble is if Bravo is unlikely to be needed (<%10) AND you need to quickly query by one of its fields so you index it.
I would create a supertype / subtype relationship.
THINGS
------
PK ThingId
ALPHAS
------
FK ThingId (not null, identifying, exported from THINGS)
AlphaCol1
AlphaCol2
AlphaCol3
BRAVOS
------
FK ThingId (not null, identifying, exported from THINGS)
BravoCol1
BravoCol2
BravoCol3
CHARLIES
--------
FK ThingId (not null, identifying, exported from THINGS)
CharlieCol1
CharlieCol2
CharlieCol3
So, for example, an alpha that has a charlie but not a bravo:-
insert into things values (1);
insert into alphas values (1,'alpha col 1',5,'blue');
insert into charlies values (1,'charlie col 1',17,'Y');
Note, you can't create more than one charlie for the alpha, as if you tried to create a two charlies with a ThingId of 1 the second insert would get a unique index/constraint violation.

Resources