My company requries that each table have a uniqueidentifier. However, most of the lookup tables have a numerical code for each entry. For example, there is a list of privileges: 10-None, 20-View, 30-Edit and so on. The code is essential and must be unique. Alos, the text of the code "None", "Edit", etc also needs to be unique, so now I have two separate fields that each need to be unique. Now I have to add a guid column. That's now three separate fields in the table that each need to be unique. This example uses a very simple list. The code value is essential and also some of these tables equate to an enum. I need both the code and the text. Having three separate fields each with a unique index seems contrary to normal table design.
Is there a more common practice to avoid this?
Thank you.
I do not know why you would want to change that setup even though you have several unique values already. We had a similar practice at my last employer where even though we had uniquely identifying fields everything had a unique identifier field.
This was a good thing as there would be infrequent errors in the program that would cause duplicates to be entered erroneously, except for the identifier fields. This usually kept our program from crashing or returning incorrect data as we always joined on the identifier rather than the 'unique' fields that were not supposed to have duplicates.
Indexes are placed on fields that are typically used when sorting or linking tables. For example, let's say you have a "normalized" environment of Product Orders. You would have one table for Customers, with a Customer ID and other info, and you would have one table with Orders, which woul dcontain the Customer ID. You would index Customer ID in both tables, because you're going to frequently join those tables.
As far as a unique identifier, you don't need to index them for them to be unique.
Related
I am creating a database where we want to combine data of several sites into one database. I now have an issue with the unique constraint for the samplepoint table. for each site the samplepointname must be unique. in the old system I enforced this with a unique constraint. The problem in the new system is that the siteID's are not stored in te table with samplepoints because these are enheritted from the parent of samplepoints (projects).
can I create a unique constraint that include the siteID stored in its parent, or should I create a siteID field in the table itself
I'm a bit confused by some of the phrasing of the question, so I'm going to lay out some clarifying assumptions based on what I think is my best read of it. Hopefully these assumptions actually match your situation.
In the original configuration, you had:
a single site
represented by a single pair of tables named "project" and "samplepoints"
a unique constraint over a field named "samplepointname"
a field named "siteID" in in a table named "project"
it had previously been unnecessary to add "siteID" to "samplepoints" because there was only one row in "project" and that one row's single "siteID" was always implied throughout the table "samplepoints"
And in the new configuration you have the following changes:
multiple sites
one row for each site in the table "projects"
a unique value for each field "siteID" in "projects"
You've stated that the field "sitepointname" within each site must be unique, but not globally. So I'm going to work with that.
Given these assumptions, you almost certainly will not merely want but need to add "siteID" to your table "sitepoints". This is because you can no longer simply read from "projects" and "sitepoints" at the same time without either joining them or adding a WHERE clause to filter down to the relevant site.
In fact, if your table "sitepoints" has already been populated without "siteID" you may well need to obtain the original tables from all of the different sites, empty that consolidated table, and repopulate it such that "siteID" correctly represents each independent site.
After you've added the new field "siteID", you'll remove the UNIQUE constraint on the field. You're going to replace that with what you'll use instead, and if you don't remove it all names will need to be unique across all sites rather than just within each site.
If you're simply executing commands directly, this will create that index:
CREATE UNIQUE INDEX unique_sitepointnames ON sitepoints (siteID, sitepointname);
The index name "unique_sitepointnames" is just an identifier, it can be whatever you wish, but that's my suggestion for it as it's clear and describes the purpose.
Rather than "UNIQUE" being a constraint on the column, "UNIQUE" is here a constraint on the index. Any more options to how the index is created is just optimization.
What is the line that you should draw when normalising data, in terms of data duplication? i.e would you say that 2 employees who share the same birthday or have the same timestamp for a shift is data duplication? and therefore should be placed into another data table?
Birth date has full and non-transitive dependency to a person which means that it should be stored within the same table where you keep your employees and it would comply with third normal form (3NF).
Work shifts are not an attribute of an employee which means that they are a different entity and stay in relation with employee entity.
There is no a particular 'limit' when following the normalisation to data, since the main restriction that is given for every relational database table is to have an unique parimary key. Hence, if all other columns contain the same data, but the primary key is still different, it is a different row of a table.
The actual restrictions can come in two form. One is either the programming or systhematic approach, where the restriction on what kind of data is inputed is given from a program which interacts with the database or already defined script handed down physically for the admin of the database.
Other, more database-orriented approach would be to create primary keys composed of multiple columns. That way a row is unique only if for both columns the data is unique. It should be noted that a primary key is not necessary the same as an unique key, which should be different for every instance.
You have misunderstood what normalization does.
Two attributes having the same value (i.e. two employees having the same birthday) is not redundancy.
Rather having the same attribute in the two tables (i.e. two tables having birthday column, therefore repeating every employee's birthday information) is.
Normalization is a quality decision and denormalization is a performance decision. For my school projects, my teachers recommended me to normalize at least till 3NF. So that may be a good guideline.
I have the luxury of designing a database from scratch. When designing columns to act as unique keys should I just use unique integers or should I attempt to make the values interpretable. So if I had a lookup table of ward names in a hospital should the id column contain unique codes that in someway relate to the name of the ward or just unique integers?
Resist the temptation to overload the id values with meaning. Use other attributes to store the info you're considering stuffing into the id.
Overloading the id with "meaning" is bad because:
If the data being stuffed into the ID changes, so must your ID. ID's should never change
If the data type of the data changes, you'll have a problem, for example:
If your ID is numeric, and the stuffed info changes from numeric to text, you'll have big problems
If the stuffed data changes from a simple field to a one-to-many child, your model will break
What you believe has "important" meaning now may not be important in the future. Then your "specially encoded" data will become useless and a burden, even a serious restriction
What currently "identifies" a product may change as the business evolves
If have seen this idea attempted many times, never successfully. In every case, the idea was scraped and surrogate IDs were introduced to replace the magic IDs, with all the risk and development cost associated with that task.
In my career, have seen most of the problems listed above actually happen.
You should not be using a lookup table. Make your tables innodb and use referential integrity to join tables together. Your id columns should always be set as primary and should be set to auto increment. Never try to make up your own ids. You should really look at some tutorial on referential integrity and learn how to assoicate tables with other tables.
What are the adverse effects of having too many lookup tables in the database?
I have to incorportate too many Enumerations, based on the applications.
What would experts advice?
Initially you have to ask yourself "how many is too many?". If there is a logical relation between two tables, there has to be a FK.
If you don't need the related tables anywhere within the database, you could consider to remove them and use a CHECK constraint with an "IN" clause to enforce data validity. Though, this would cause an alteration of the table with each new value within the enumeration.
My personal advice is to keep the FKs and the tables. It's a clear solution and the database is way better to maintain if there is a describing text available for all those numbers.
Let me tell how awful it is to have too few lookup tables. THe orginal designers at one place I worked decided to put all lookups into one table and define what the lookups were for using a typeid. This caused almost all queries to hit this table to get the lookup descriptive value causing a performance jam.
Further, without separate lookups, the fields that took the typeid were not constrained by the values appropriate to that field because a foreign key can only be on the the whole table not a chunk. So the filed that stored the clientid might accidentally contain the value for a user group. This caused data integrity problems and made reporting much more difficult as we had to intepret values that didn't make sense in context. There is no prize for using too few tables, in fact it is often an anti-pattern in database design.
Create 1000 lookup tables if that is what you need.
As Florian, I like a lot more to have tons of Foreign Keys then to have CHECK IN (..) - for a simple reason: you can insert other records on your tables.
Maintaning CHECK IN () is a much bigger problem. Imagine this scenario:
CREATE TABLE street
(
id serial not null,
st_type varchar(20) not null,
st_name varchar(100) not null,
constraint street_pk primary key (id)
constraint street_type_check check st_type in ('STREET','AVENUE','SQUARE')
);
You have 1000 rows with those types checked, correct? If you need to add another one, you will need to drop the constraint and recreate it.
IF you take a item off that list, like SQUARE, what will happen to the rows already commited (and checked at moment of insertion) that have that type? They will still keep an invalid type.
Tables and Foreign Keys are easier to maintain and keep track of.
The Whole point of lookup data is that there is a finite list of valid identifiers for a specific field. if those specific fields are used in procedures or where statements to determine the correct process path or the limit the select list, then there is no such thing as too many lookups.
if it is not a finite list of identifiers for a specific process or where clause then they should not be a lookup value.
two types of fields that come to mind which might be considered lookup values but don't necessarily need to be.
City and Province/state:
There is a finite list of these but because there are sooo many you might not want to make a lookup for these.
I have ten or more(i don't know) tables that have a column named foo with same datatype.
how can i tell sql that values in all the tables should be unique.
I mean If(i have value "1" in table1) I should NOT be able to have value "1" in table2
Have a common ID's table, which these ten tables reference. That will work well in that it will ensure unique ID's, but doesn't mean you couldn't duplicate the ID's in the table if someone really wants to.
What I mean is a common ID's table ensures that you don't have duplicates for insert (by also inserting an ID into this common table), but the thing is the way to guarantee that it never happens is by building the business rules into the system or placing check constraints to cross reference the other tables (which would ensure uniqueness, but degrade performance).
The question is phrased vaguely; if you need to generate a column that's unique among several tables, use row GUIDs or a common ID generator table; if you need to enforce uniqueness (and the field values are already there), use triggers.
Generally, if you generate the values, you don't need to enforce anything. The generation logic, if done right, will take care of that. If you are inserting, say, user input, then you can and should enforce uniqueness during insertion. As a validation rule or something.
You can define the field as a GUID (or a UNIQUEIDENTIFIER in SQL server). Then it will always be unique no matter what.
How about setting a check constraint on each table, such that ID % 10 = N (where N is the table number, from 0-9). And use IDENTITY(N,10) each time.
I would suggest that possibly your design is flawed. Why are these separate tables? It ouwld be better to put them in one table with one id field and another filed to identify whatever is making these spearate tables (cusotmer id for instance). Then you can read about partioning tables if you want them to be split by customer for performance reasons.