I'm extracting data from a system that is using uniqueidentifier as the field type for it's primary keys.
On the system I'm extracting from, I've been given access to a single table that's been derived. That table has been made by joining one table to a one to many table resulting in me needing to use two of these uniqueidentifier columns to get uniqueness.
Is there a way for me to create a simple persistent key using these two columns?
The only idea I have at the moment is to create an identity column on my table, and upsert any future extractions (daily) into my table.
Is there a better method than this?
You can add what is known as a 'composite key'.
ALTER TABLE dbo.yourtablename
ADD CONSTRAINT uq_yourtablename UNIQUE(column1,column2);
Related
My boss has assigned a SQL task to me and as I am new to SQL, I am struggling to know where to start.
Task: Create a Customer table to hold the data written in the #Customer temporary table in the PopulateCustomers stored procedure. This table will also need to have a unique id to ensure multiple instances of the populate functionality can be run concurrently.
I know how to create a table in SQL and I am guessing I can look in the PopulateCustomer stored procedure to know what data will be written in the temp Customer table in order to create columns for the Customer table.
But what I am really struggling with is the concept of a unique Id for a database table. I immediately thought primary key for each row in the table. Which my boss responded no, I didn't want to push for more as not to come across as a newbie.
I have tried to google this myself and all I keep coming up with is pages that tell me about identifiers vs primary keys. But nothing ever tells me about a table having its own unique ID unless its in reference to the rows within the table each having an Identifier or primary key. This is leading me to think that I am not searching for the right key word for what this functionality is.
The closest thing I found was here. http://sqlservercodebook.blogspot.com/2008/03/check-if-temporary-table-exists.html
This query looks to me like its creating a temp table with an id.
CREATE TABLE #temp(id INT)
I have not pasted any of my work queries because I really want to research myself and figure this out. I just want to make sure I am looking in the right direction with what term I need to search for to find out how to create a table that has a unique ID. Or maybe I have misinterpreted the task and there is no such thing.
What I got from your story is that you need a table with an unique id, automatically generated, and use this id as the primary key.
This table can be created like:
create table example
(
id int identity(1,1) primary key clustered,
other_data varchar(200)
)
The key terms here are:
identity - for the id column be auto-incremented
primary key - so SQL Server ensures this column is unique
clustered - for all the data in this table be organized physically by this column (and make it faster to be searched by it)
I'm adding a UUID column to one of my tables so I can easily provision API keys. Should I bother adding a unique constraint to the column? I don't want to have duplicate API keys but on the other hand, the odds of a collision on generating the UUID values is infinitesimal.
I think you need to take into consideration if you are going to join tables based on this column or perform any operations like filter etc. If so, you will need to create a unique key on the UUID column as it will help retrieve data faster.
In SQL Server I have the following lookup table that holds degree levels:
create table dbo.DegreeLevel
(
Id int identity not null
constraint PK_DegreeLevel_Id primary key clustered (Id),
Name nvarchar (80) not null
constraint UQ_DegreeLevel_Name unique (Name)
)
Should I use identity on the ID?
When should I use identity or a simple int in a lookup table?
After dealing with multiple environments where we move changes from one environment to the next, I'd say not to use identity columns on look up tables.
Here's why: if you need to reference an ID as a "magic #", you need consistency. Ideally, you wouldn't ever reference a magic #, but in reality, that is not what is commonly done. And it's a pain to correct when the IDs are out of sync. And it's really not much more effort to insert the table's data with an ID.
In a lookup table, having a "normal" Id INT might be better, because it gives you the ability to pick and choose the Id values. You get to define which values you have, and what they mean.
Identity is very useful for actual data tables, where you just need to know that you have a good, unique ID value - but you don't really care about what that value is.
I guess it comes down to whether or not you have a natural candidate to use in the clustered index...
If you already have a property that can uniquely identify the row, then its definitely worth considering whether adding an identity column is the right move.
If you don't have a natural candidate, then you'd need to invent a value and in this case using an identity column or sequence is probably easier than hand-rolling something.
As an example of having a natural key, imagine a 'DegreeModule' table where each module had a 4-character reference code that was printed on course materials (e.g. U212)
In this case, I would definitely skip creating an internal identifier and use the natural identifier as primary key...
create table dbo.DegreeModule
(
Reference char(4) not null primary key clustered,
Name nvarchar(80) not null
constraint UQ_DegreeModule_Name unique (Name)
/* .. plus FK's for stuff like parent degree, prerequisites,etc .. */
)
When you specify Identity property on an integer column on any table, the column becomes an auto-incrementing integer column. If you want your lookup table to create the id value automatically when you insert any row, use identity. if you want to create it yourself, just define the column as int.
A Table can only have one identity column
You cannot manually insert / update values in an identity column unless you specify SET identity_insert on
If you are going to use some object relational mapping (ORM) tool, refer to its documentation. In that case, you most probably would like to allow ORM to handle the primary key and you should not use identity.
If you have no specific requirements for primary key generation, then using identity here is fine. Specific requirements may be: primary keys follow special format, primary keys should be globally unique, primary keys are imported from other database, e.g. by insert into DegreeLevel values (1, 'Bachelor') etc.
My question specifically about sql-server, but probably can be answered by anyone with any database background
If I want table A to have a 1:1 relationship with table B on a certain column, should I somehow modify the CREATE TABLE statement to identify this relationship or is this something that is not done at all (and rather it is handled by logic)?
EDIT
The second part of my question is: what is the point of embedding this into the code? why not just handle it logically on selects/updates?
All you need to do is have the column in Table A be a foreign key to the primary key of Table B:
create table TableB (
Id int primary key identity(1,1),
Name varchar(255))
create table TableA (
Id int primary key identity(1,1),
Name varchar(255),
TableBRelation int unique,
foreign key (TableBRelation) references TableB (Id))
The SQL may not be perfect but you should be able to get the idea.
As for why you would want to do this in the database rather than just application logic:
Other databases or developers may try to access your database. Do you want them to be able to create invalid data that may break your application? No. That's one of the points of referential integrity.
At some point, somebody is going to have to maintain your application. Defining your keys at the database level will clearly identify relationships between your data rather than requiring the develop to dig through your application code.
To create a 1:1 relationship just make the B table column a foreign key or unique. This will ensure that there can be only one column in table B that matches the PK field in table A and that way you effectively get a 1:1 relationship...
You can setup a foreign key and add a constraint for it to be unique. This would setup a 1:1 relationship between your tables.
In tables where you need only 1 column as the key, and values in that column can be integers, when you shouldn't use an identity field?
To the contrary, in the same table and column, when would you generate manually its values and you wouldn't use an autogenerated value for each record?
I guess that it would be the case when there are lots of inserts and deletes to the table. Am I right? What other situations could be?
If you already settled on the surrogate side of the Great Primary Key Debacle then I can't find a single reason not use use identity keys. The usual alternatives are guids (they have many disadvatages, primarily from size and randomness) and application layer generated keys. But creating a surrogate key in the application layer is a little bit harder than it seems and also does not cover non-application related data access (ie. batch loads, imports, other apps etc). The one special case is distributed applications when guids and even sequential guids may offer a better alternative to site id + identity keys..
I suppose if you are creating a many-to-many linking table, where both fields are foreign keys, you don't need an identity field.
Nowadays I imagine that most ORMs expect there to be an identity field in every table. In general, it is a good practice to provide one.
I'm not sure I understand enough about your context, but I interpret your question to be:
"If I need the database to create a unique column (for whatever reason), when shouldn't it be a monotonically increasing integer (identity) column?"
In those cases, there's no reason to use anything other than the facility provided by the DBMS for the purpose; in your case (SQL Server?) that's an identity.
Except:
If you'll ever need to merge the table with data from another source, use a GUID, which will prevent duplicate keys from colliding.
If you need to merge databases it's a lot easier if you don't have to regenerate keys.
One case of not wanting an identity field would be in a one to one relationship. The secondary table would have as its primary key the same value as the primary table. The only reason to have an identity field in that situation would seem to be to satisfy an ORM.
You cannot (normally) specify values when inserting into identity columns, so for example if the column "id" was specified as an identify the following SQL would fail:
INSERT INTO MyTable (id, name) VALUES (1, 'Smith')
In order to perform this sort of insert you need to have IDENTITY_INSERT on for that table - this is not intended to be on normally and can only be on for a maximum of 1 tables in the database at any point in time.
If I need a surrogate, I would either use an IDENTITY column or a GUID column depending on the need for global uniqueness.
If there is a natural primary key, or the primary key is defined as a unique combination of other foreign keys, then I typically do not have an IDENTITY, nor do I use it as the primary key.
There is an exception, which is snapshot configuration tables which I am tracking with an audit trigger. In this case, there is usually a logical "primary key" (usually date of the snapshot and natural key of the row - like a cost center or gl account number for which the row is a configuration record), but instead of using the natural "primary key" as the primary key, I add an IDENTITY and make that the primary key and make a unique index or constraint on the date and natural key. Although theoretically the date and natural key shouldn't change, in these tables, if a user does that instead of adding a new row and deleting the old row, I want the audit (which reflects a change to a row identified by its primary key) to really reflect a change in the row - not the disappearance of a key and the appearance of a new one.
I recently implemented a Suffix Trie in C# that could index novels, and then allow searches to be done extremely fast, linear to the size of the search string. Part of the requirements (this was a homework assignment) was to use offline storage, so I used MS SQL, and needed a structure to represent a Node in a table.
I ended up with the following structure : NodeID Character ParentID, etc, where the NodeID was a primary key.
I didn't want this to be done as an autoincrementing identity for two main reasons.
How do I get the value of a NodeID after I add it to the database/data table?
I wanted more control when it came to generating my own IDs.