Do you need a primary key if duplicates are allowed? - sql-server

This might be too subjective, but it's been puzzling me some time.
If you have a Fact table that allows duplicates with 10 dimensions that do not, do you really need a primary key?
Why Are There Duplicates?
It's a bit tricky, but ideally each duplicate is actually valid. There is just not a unique identifier to separate them from the source system recording the record. We don't own that system so there is no way to ever change it.
Data
The data is in batch and only include the previous days worth of records. Therefore, in the event of a republish. We just drop the entire days worth of records and republish the new day of records without the use of a primary key.
This is how I would fix bad data.
Generate A Primary Key Already
I can, but if it's never used or have anyway to validate if the duplicate is legit, why do it?

SQL Server database tables do not require a primary key.
A database engine may well create a primary key in the background though.

Yes, SQL Server don't need primary key. Mostly, it needs in CLUSTERED index. Because, if you have another NONCLUSTERED indexes on this table, every of them will use CLUSTERED index for pointing data. So, primary key is good example of clustered key. And if it's short, and you have another indexes - it's reason to create it.

Related

Necessary to update indexes if the primary key data type change?

I have large table with many rows. I want to change the primary key from int to bigint.
My question is do I have to update/rebuild the indexes? Or is that automatically done behind the scenes?
I assume you are dropping and recreating the PK for the data type change and it is a clustered index. In this case, the non-clustered indexes are rebuilt automatically when the clustered primary key is dropped and again when it's recreated.
With a large table, you could manually drop the non-clustered indexes first and recreate afterwards. That way, they are only rebuilt once and save some time.
In principle, the only thing you need to do is to issue ALTER TABLE ALTER COLUMN .... SQL Server will fix the implicit primary key index as part of the type change.
In practice you will need to plan for such a change.
In some circumstances the change can be done online by issuing WITH (ONLINE=ON). (The default is to do the change "offline", blocking access to the table while the change is carried out.). However, depending on the type of index the change might not appear to be entirely "online".
You need enough disk space (the old and new key will exist on disk at the same time).
Are there foreign keys referencing the PK? FK ints to PK bigins are allowed, but generally a very bad idea.
If this is a production system the only responsible thing to do is to test it. If there is no test system then copy a subset of the base table to a parallel table on the production server and do the change. Check if normal production queries can be executed against the table during the change.

Dropping and recreating unexpected primary keys

I have a tool which uses SQL scripts to apply changes to a customer database. Often this invloves changing a column definition (datatype etc). The problem is that often there are primary keys applied by the user that we don't know about (and they don't remember), which trips up the process (eg when changing columns belonging to the indexes or primary keys).
The requirement given to me is that this update process should be 'seamless', with no human involvement to prepare the ground. I have also researched this on this forum, and as far as I can see my particular question has not yet been asked.
I know how to disable and then later rebuild all indexes on a database, and even those only in certain tables, but if the index is on a primary key I still can't change any column that is part of the primary key unless I explicitly drop the PK by name, and later recreate it explicitly, which means I have to know about it at code-time. I can probably write a query to find the name of the primary key on a table if one is there, but how to know how to recreate it?
How can I, using Transact-SQL (or PL/SQL), detect, drop and then recreate the primary keys on given tables, without knowing at code time what they are or what columns belong to them? The key is that the tool cannot know in advance what the primary keys are are on any given table, nor what they comprise. The SQL code must handle this itself.
Better still would be to detect if a known column belongs to a primary key, then drop and later recreate that after I have changed the column.
This needs to be done in both Oracle and Sql Server, ideally purely with SQL code.
TIA
I really don't understand why would a customer define his own primary keys for the tables? Moreover, I don't understand why would you let them? In my world, if customer changes schema in any way, this automatically means end of support for them.
I will strongly advise against dropping and recreating primary keys on production database. Any number of bad things can happen, leading to data loss.
And it's not just the PKs, you will have to drop the foreign key constraints first. And FKs may reference not only the PKs but the unique constraints as well, so yao have to deal with those as well.
Your best bet would be to create a new table with the required schema, copy the data, drop original table and rename the new one. Of course, you will have to handle the FKs, but it's easier. Check this link an example:
http://sqlblog.com/blogs/john_paul_cook/archive/2009/09/17/script-to-create-all-foreign-keys.aspx

SQL Server Performance Suggestion

I have been creating database tables using only a primary key of the datatype int and I have always had great performance but need to setup merge replication with updatable subscribers.
The tables use a typical primary key, data type int, and identity increment. Setting up merge replication, I have to add the rowguid to all tables with a newsequentialid() function for the default value. I noticed that the rowguid has indexable on and was wondering if I needed the primary key anymore?
Is it okay to have 2 indexes, the primary key int and the rowguid? What is the best layout for a merge replication table? Do I keep the int id for easy row referencing and just remove the index but keep the primary key? Not sure what route to take, Thanks.
Remember that if you remove the int id column and replace it with a GUID, you may need to rework a good deal of your data and your queries. And do you really want to do queries like:
select * from orders where customer_id = '2053995D-4EFE-41C0-8A04-00009890024A'
Remember if your ids are exposed to any users (often in the case of a customer because the customer table often has no natural key since names are not unique), they will find the guid daunting for doing research.
There is nothing wrong in an existing system with having both. In a new system, you could plan to not use the ints, but there is a great risk of introducing bugs if you try to remove them in a system already using them.
The only downside of replacing the integer primary key with the guid (that I know of) is that GUIDs are larger, so the btree (index space used) will be larger and if you have foreign keys to this table (which you'd also need to change) a lot more space may end up being used across (potentially) many tables.

Should each and every table have a primary key?

I'm creating a database table and I don't have a logical primary key assigned to it. Should each and every table have a primary key?
Short answer: yes.
Long answer:
You need your table to be joinable on something
If you want your table to be clustered, you need some kind of a primary key.
If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?
In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.
Note that a primary key can be composite.
If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.
Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.
And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.
See this article in my blog for why you should always create a unique index on unique data:
Making an index UNIQUE
P.S. There are some very, very special cases where you don't need a primary key.
Mostly they include log tables which don't have any indexes for performance reasons.
Always best to have a primary key. This way it meets first normal form and allows you to continue along the database normalization path.
As stated by others, there are some reasons not to have a primary key, but most will not be harmed if there is a primary key
Disagree with the suggested answer. The short answer is: NO.
The purpose of the primary key is to uniquely identify a row on the table in order to form a relationship with another table. Traditionally, an auto-incremented integer value is used for this purpose, but there are variations to this.
There are cases though, for example logging time-series data, where the existence of a such key is simply not needed and just takes up memory. Making a row unique is simply ...not required!
A small example:
Table A: LogData
Columns: DateAndTime, UserId, AttribA, AttribB, AttribC etc...
No Primary Key needed.
Table B: User
Columns: Id, FirstName, LastName etc.
Primary Key (Id) needed in order to be used as a "foreign key" to LogData table.
Pretty much any time I've created a table without a primary key, thinking I wouldn't need one, I've ended up going back and adding one. I now create even my join tables with an auto-generated identity field that I use as the primary key.
Except for a few very rare cases (possibly a many-to-many relationship table, or a table you temporarily use for bulk-loading huge amounts of data), I would go with the saying:
If it doesn't have a primary key, it's not a table!
Marc
Just add it, you will be sorry later when you didn't (selecting, deleting. linking, etc)
Will you ever need to join this table to other tables? Do you need a way to uniquely identify a record? If the answer is yes, you need a primary key. Assume your data is something like a customer table that has the names of the people who are customers. There may be no natural key because you need the addresses, emails, phone numbers, etc. to determine if this Sally Smith is different from that Sally Smith and you will be storing that information in related tables as the person can have mulitple phones, addesses, emails, etc. Suppose Sally Smith marries John Jones and becomes Sally Jones. If you don't have an artifical key onthe table, when you update the name, you just changed 7 Sally Smiths to Sally Jones even though only one of them got married and changed her name. And of course in this case withouth an artificial key how do you know which Sally Smith lives in Chicago and which one lives in LA?
You say you have no natural key, therefore you don't have any combinations of field to make unique either, this makes the artficial key critical.
I have found anytime I don't have a natural key, an artifical key is an absolute must for maintaining data integrity. If you do have a natural key, you can use that as the key field instead. But personally unless the natural key is one field, I still prefer an artifical key and unique index on the natural key. You will regret it later if you don't put one in.
It is a good practice to have a PK on every table, but it's not a MUST. Most probably you will need a unique index, and/or a clustered index (which is PK or not) depending on your need.
Check out the Primary Keys and Clustered Indexes sections on Books Online (for SQL Server)
"PRIMARY KEY constraints identify the column or set of columns that have values that uniquely identify a row in a table. No two rows in a table can have the same primary key value. You cannot enter NULL for any column in a primary key. We recommend using a small, integer column as a primary key. Each table should have a primary key. A column or combination of columns that qualify as a primary key value is referred to as a candidate key."
But then check this out also: http://www.aisintl.com/case/primary_and_foreign_key.html
To make it future proof you really should. If you want to replicate it you'll need one. If you want to join it to another table your life (and that of the poor fools who have to maintain it next year) will be so much easier.
I am in the role of maintaining application created by offshore development team. Now I am having all kinds of issues in the application because original database schema did not contain PRIMARY KEYS on some tables. So please dont let other people suffer because of your poor design. It is always good idea to have primary keys on tables.
Late to the party but I wanted to add my two cents:
Should each and every table have a primary key?
If you are talking about "Relational Albegra", the answer is Yes. Modelling data this way requires the entities and tables to have a primary key. The problem with relational algebra (apart from the fact there are like 20 different, mismatching flavors of it), is that it only exists on paper. You can't build real world applications using relational algebra.
Now, if you are talking about databases from real world apps, they partially/mostly adhere to the relational algebra, by taking the best of it and by overlooking other parts of it. Also, database engines offer massive non-relational functionality nowadays (it's 2020 now). So in this case the answer is No. In any case, 99.9% of my real world tables have a primary key, but there are justifiable exceptions. Case in point: event/log tables (multiple indexes, but not a single key in sight).
Bottom line, in transactional applications that follow the entity/relationship model it makes a lot of sense to have primary keys for almost (if not) all of the tables. If you ever decide to skip the primary key of a table, make sure you have a good reason for it, and you are prepared to defend your decision.
I know that in order to use certain features of the gridview in .NET, you need a primary key in order for the gridview to know which row needs updating/deleting. General practice should be to have a primary key or primary key cluster. I personally prefer the former.
I'd like to find something official like this - 15.6.2.1 Clustered and Secondary Indexes - MySQL.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
So, why not create primary key or something like it by yourself? Besides, ORM cannot identify this hidden ID, meaning that you cannot use ID in your code.
I always have a primary key, even if in the beginning I don't have a purpose in mind yet for it. There have been a few times when I eventually need a PK in a table that doesn't have one and it's always more trouble to put it in later. I think there is more of an upside to always including one.
If you are using Hibernate its not possible to create an Entity without a primary key. This issues can create problem if you are working with an existing database which was created with plain sql/ddl scripts, and no primary key was added
In short, no. However, you need to keep in mind that certain client access CRUD operations require it. For future proofing, I tend to always utilize primary keys.

Impact of changing a Unique key in SQL Server 2005

What is the impact of changing a Unique key in SQL Server 2005
I am having a table one primary key ID int and composite unique key for 4 fields.
But due to nature of my project one of the keys(fields) of the composite key keeps on changing.
Does anyone find any problem in changing the field of composite key that often?
there is maintenance involved since all nonclustered keys point to either the clustered key or to the row if you have a heap (table without a clustered key)
Since the clustered key holds all the data for the table (in essence it is the table) whenever you make changes to the nonclustered key the clustered key will be updated and vice versa
The index will need some reorganisation.
This is part of the C in ACID: When your UPDATE completes everything is done and dusted.
Also, any indexed views using the data will need updated too, again part of the "C".
If it's not clustered then this is about it.
I wouldn't worry about it too much unless it's happening many time a second...
I would just be sure to add some code to be watchful of unique constraint violations. You shouldn't run into a problem, but if you're changing it that often, I would say that you run a greater risk.

Resources