Should I add a unique constraint to a UUID column? - database

I'm adding a UUID column to one of my tables so I can easily provision API keys. Should I bother adding a unique constraint to the column? I don't want to have duplicate API keys but on the other hand, the odds of a collision on generating the UUID values is infinitesimal.

I think you need to take into consideration if you are going to join tables based on this column or perform any operations like filter etc. If so, you will need to create a unique key on the UUID column as it will help retrieve data faster.

Related

Primary key and entity id difference

I am little confused about the primary key and ID in the database table.
I want to identify the rows in my table based on some logical id like pay_xyz123, pay_xyz124 , order_xyz etc. If I use these format as pk which is a string would it affect the performance.
Or should I use auto-increment numbers as pk and the id's like pay_xyz123 as unique key. What would be the best approach
Edit: the logical id's can be a long string say 15 characters
In general, I prefer having synthetic ids (i.e. auto-increment/serial/identity column) for such tables. This has some advantages:
You can update the entity id easily, because it is an attribute and not used for foreign key references.
Integers are (slightly) more efficient for indexing purposes.
The a synthetic id hides entity name information in the tables that reference the key.
It also allows things like soft deletes -- where deletion is by a flag rather than removing the row -- with an insert using the same id. Of course, you have to adjust the uniqueness constraint to allow this.
Of course, there is a slight overhead to storing the auto-incremented key. This increases the size of the base table. Usually string names are longer (as in your example), so this is more than offset by having a reduced length in the rows that refer to the entity.
If you are using your key for foreign keys into other tables, I personally would use a numeric auto increment id. You can still place an alternate index on your logical key and even a unique constraint on the logical key if the business rules warrant.
The downside, in my opinion, of using the logical key for foreign keys, is that if the logical key changes, then you have to update all of the foreign keys.

Create persistent key using two unique id's as the source

I'm extracting data from a system that is using uniqueidentifier as the field type for it's primary keys.
On the system I'm extracting from, I've been given access to a single table that's been derived. That table has been made by joining one table to a one to many table resulting in me needing to use two of these uniqueidentifier columns to get uniqueness.
Is there a way for me to create a simple persistent key using these two columns?
The only idea I have at the moment is to create an identity column on my table, and upsert any future extractions (daily) into my table.
Is there a better method than this?
You can add what is known as a 'composite key'.
ALTER TABLE dbo.yourtablename
ADD CONSTRAINT uq_yourtablename UNIQUE(column1,column2);

Creating a Unique SQL Table Constraint with Primary Key

We are trying to enforce a unique table constraint on certain datatables in SQL Server, which I have working but I am running into a few issues. I want it to be ordered by Primary Key, but if I include that in the Index Keys, it no longer enforces uniqueness because it obviously will always have a unique ID since its a primary key.
If I remove the ID from the indexed keys, it works as it is supposed to but it no longer sorts by Primary Key anymore, which is what I want. It sorts by another one of the columns.
How do I include the primary key in the constraint so I can use it for sorting, but have it be ignored when checking the table constraint for uniqueness(ie, it should still not allow a new record to be written if all other info is the same other than ID)?
UPDATE: How do I handle a situation where a table has more columns than can be put into an index? Can I not enforce no duplicate entries in these?
A Relational database is built based on Set theory and Predicate logic. And according to Set theory There is no difference between sets like A {1,2,3} & B {2,3,1}.
So this is the reason there is no guarantee in any RDBMS where results will come in particular order.
But you will get them in your order when you provide an ORDER BY in the SELECT statement explicitely.
So better you do it in front end or by adding an Order By clause to your query.

When having an identity column is not a good idea?

In tables where you need only 1 column as the key, and values in that column can be integers, when you shouldn't use an identity field?
To the contrary, in the same table and column, when would you generate manually its values and you wouldn't use an autogenerated value for each record?
I guess that it would be the case when there are lots of inserts and deletes to the table. Am I right? What other situations could be?
If you already settled on the surrogate side of the Great Primary Key Debacle then I can't find a single reason not use use identity keys. The usual alternatives are guids (they have many disadvatages, primarily from size and randomness) and application layer generated keys. But creating a surrogate key in the application layer is a little bit harder than it seems and also does not cover non-application related data access (ie. batch loads, imports, other apps etc). The one special case is distributed applications when guids and even sequential guids may offer a better alternative to site id + identity keys..
I suppose if you are creating a many-to-many linking table, where both fields are foreign keys, you don't need an identity field.
Nowadays I imagine that most ORMs expect there to be an identity field in every table. In general, it is a good practice to provide one.
I'm not sure I understand enough about your context, but I interpret your question to be:
"If I need the database to create a unique column (for whatever reason), when shouldn't it be a monotonically increasing integer (identity) column?"
In those cases, there's no reason to use anything other than the facility provided by the DBMS for the purpose; in your case (SQL Server?) that's an identity.
Except:
If you'll ever need to merge the table with data from another source, use a GUID, which will prevent duplicate keys from colliding.
If you need to merge databases it's a lot easier if you don't have to regenerate keys.
One case of not wanting an identity field would be in a one to one relationship. The secondary table would have as its primary key the same value as the primary table. The only reason to have an identity field in that situation would seem to be to satisfy an ORM.
You cannot (normally) specify values when inserting into identity columns, so for example if the column "id" was specified as an identify the following SQL would fail:
INSERT INTO MyTable (id, name) VALUES (1, 'Smith')
In order to perform this sort of insert you need to have IDENTITY_INSERT on for that table - this is not intended to be on normally and can only be on for a maximum of 1 tables in the database at any point in time.
If I need a surrogate, I would either use an IDENTITY column or a GUID column depending on the need for global uniqueness.
If there is a natural primary key, or the primary key is defined as a unique combination of other foreign keys, then I typically do not have an IDENTITY, nor do I use it as the primary key.
There is an exception, which is snapshot configuration tables which I am tracking with an audit trigger. In this case, there is usually a logical "primary key" (usually date of the snapshot and natural key of the row - like a cost center or gl account number for which the row is a configuration record), but instead of using the natural "primary key" as the primary key, I add an IDENTITY and make that the primary key and make a unique index or constraint on the date and natural key. Although theoretically the date and natural key shouldn't change, in these tables, if a user does that instead of adding a new row and deleting the old row, I want the audit (which reflects a change to a row identified by its primary key) to really reflect a change in the row - not the disappearance of a key and the appearance of a new one.
I recently implemented a Suffix Trie in C# that could index novels, and then allow searches to be done extremely fast, linear to the size of the search string. Part of the requirements (this was a homework assignment) was to use offline storage, so I used MS SQL, and needed a structure to represent a Node in a table.
I ended up with the following structure : NodeID Character ParentID, etc, where the NodeID was a primary key.
I didn't want this to be done as an autoincrementing identity for two main reasons.
How do I get the value of a NodeID after I add it to the database/data table?
I wanted more control when it came to generating my own IDs.

primary key on very small table

I am having a very small tables with at most 5 records that holds some labels. I am using Postgres.
The structure is as follows:
id - smallint
label - varchar(100)
The table will be used mainly to reference the rows from other tables. The question is if it's really necessary to have a primary key on id or to have just an index on the id or have them both?
I did read about indexes and primary keys and I understand that this depends quite a lot on what's the table going to be used for:
Tables with no Primary Key
Edit: I was going to ask about having a primary key or an index or have them both. I edited the question.
It is always good practice to have a primary key column. The typical scenario it is needed is when you want to update or delete a row, having a PK makes it much easier and safer.
Yes, a primary key is not only good practice -- it's crucial. A table that lacks a unique key fails to be in First Normal Form.
You must declare a PRIMARY KEY or UNIQUE constraint if you want other tables to reference this one with a foreign key.
In most RDBMS brands, both PRIMARY KEY and UNIQUE constraints implicitly create an index on the column(s). If it doesn't do this implicitly, you may be required to define the index yourself before you can declare the constraint.
Yes, you will need a primary key on the id field, since you do not want two labels that share the same id.
You also want an index, to speed up the search/lookup process in this table (although for small tables there is less performance gain). The sequence will just help you fill in the next ID; it does not prevent you from changing a previous value into one that already exists.
There are very little reasons for creating an index instead of a primary key. AS Bill Karwin said, you won't save resources at all. And, as you may have already guessed, there is no need at all to create a new index if you have the primary key.
In some cases it may be hard to find a key candidate. But it doesn't seems to be the case and it clearly goes against some good practices.
By the way. As your table is so small most queries will rather use a full table scan even if there is an index. Don't worry you see a full table scan.
From the developer's point of view, PRIMARY KEY is just a combination of NOT NULL and UNIQUE INDEX on one of the columns.
A UNIQUE INDEX is good if:
You need to enforce uniqueness. Using index is the most efficient way to do that.
You need to perform a SELECT, UPDATE or DELETE with a WHERE condition that is selective on the indexed field, that is number of rows affected by the query is much less that total number of the rows (say, 10 rows of 2,000,000).
A UNIQUE INDEX is bad if:
You don't need uniqueness on this field, of course :) But you'll better have one unique index in the table, for each record to be identifiable.
You need fast INSERTS
You need fast UPDATE's and DELETE's of the indexed value with a WHERE condition that is not selective on the indexed field, that is number of rows affected by the query is comparable to the total number of the rows (say, 1,500,000 rows of 2,000,000).
Given that you are going to have a little table, I'd advice you to create a PRIMARY KEY on it.

Resources