SQL Server insert performance with and without primary key - sql-server

Summary: I have a table populated via the following:
insert into the_table (...) select ... from some_other_table
Running the above query with no primary key on the_table is ~15x faster than running it with a primary key, and I don't understand why.
The details: I think this is best explained through code examples.
I have a table:
create table the_table (
a int not null,
b smallint not null,
c tinyint not null
);
If I add a primary key, this insert query is terribly slow:
alter table the_table
add constraint PK_the_table primary key(a, b);
-- Inserting ~880,000 rows
insert into the_table (a,b,c)
select a,b,c from some_view;
Without the primary key, the same insert query is about 15x faster. However, after populating the_table without a primary key, I can add the primary key constraint and that only takes a few seconds. This one really makes no sense to me.
More info:
The estimated execution plan shows 0% total query time spent on the clustered index insert
SQL Server 2008 R2 Developer edition, 10.50.1600
Any ideas?

Actually its not as clear cut as Ryk suggests.
It can actually be faster to add data to a table with an index then in a heap.
Read this arctle - and as far as i am aware its quite well regarded:
http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx
Bear in mind its written by SQL Server MVP and a Microsoft Regional Director.
Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing. I have some simple numbers but I'm thinking about creating a much larger/complex scenario and publishing those. Simple/quick tests on a laptop are not always as "exciting".

I think if you create a simple primary key that is clustered and made up of a single auto-incrementing column, then inserts into such a table might be faster. Most likely, a primary key made up of multiple columns may be the cause of slowdown in inserts. When you use a composite key for primary key, then rows inserted may not get added to the end of table but may need to be added somewhere in the middle of existing physical order of rows in table, which adds to the insert time and hence makes the INSERTS slower. So use a single auto-incrementing column as the the primary key value in your case to speed up inserts.

This is a good question, but a pretty crappy question too. Before you ask why an index slows down inserts, do you know what an index is?
If not, I suggest you read up on it. A clustered index is a B-tree, (Balanced tree), so every insert has to .... wait for it.... balance the tree. Hence clustered inserts are slower than inserting on heaps. If you don't know what a heap is, then I suggest stop using SQL Server until you understand basics. Else you are attempting to use a product of which you have no idea what you are doing, and basically driving a truck on the highway, blindfolded, thinking you are riding a bike. Unexpected results...
So when you create a clustered Index after a table is populated, your 'heap' has some statistics to use, and SQL can basically optimise a few things. This process is much more complicated than this, but in some cases you will find that creating a clustered index after the fact could be a lot slower than simply to insert to it. This has all to do with key types, number of columns, types of columns etc. This is unfortunately not a topic that is fit for an answer, this is more a whole course and few books by itself. Looking at your table above, it is a VERY simple table with ~7byte rows. In this instance a create-index after the insert will be faster, but chuck in a few varchar(250)'s etc, and the ballgame changes.
If you didn't know, a clustered index, (if your table has one), IS your table.
Hope this helps.

Related

Concurrent inserts into table with non-clustered GUID primary key

I have this table on Microsoft SQL Server 2012:
CREATE TABLE [dbo].[Addresses_line_format]
(
address_id UNIQUEIDENTIFIER NOT NULL
CONSTRAINT pk_addresses_line_format PRIMARY KEY NONCLUSTERED,
country_id UNIQUEIDENTIFIER NOT NULL
CONSTRAINT fk_address_single_line_country FOREIGN KEY REFERENCES Countries (country_id)
ON UPDATE NO ACTION
ON DELETE NO ACTION,
address_line NVARCHAR(255) NOT NULL,
district_line NVARCHAR(255) NOT NULL
)
With 3.362.817 records in it.
Our application consumes messages from a queue, with 10 concurrent consumers. Each consumer inserts a line into this table, using the following statement:
INSERT [dbo].[Addresses_line_format] ([address_id], [country_id], [address_line], [district_line])
VALUES (#0, #1, #2, #3)
Looking at statistics, the average elapsed time for this query is 16 seconds, which is obviously way too much.
I'm wondering if this is because of how heap tables are handling inserts like described here, or do you have any ideas what is causing this?
I tried changing the PK to be clustered, but without any noticeable performance improvements.
Queries against the table are always performed using the following:
SELECT country_id, address_line, district_line
FROM Addresses_line_format
WHERE address_id = #1
Well, if that GUID isn't the clustered key - what IS the clustered key on that table? It should have one - a well chosen clustered key speeds up operations - even inserts and deletes! See Kimberly Tripp's blog post The Clustered Index Debate Continues... for a great explanation and more background.
When you read Kim Tripp's blog post and all her other articles on the subject, it's clear that a good clustering key is narrow, unique, static and ever-increasing - perfect fits for an INT or BIGINT identity column.
Earlier versions of SQL Server (before 2000 or 2005) did in fact have insert hotspots if all the inserts were happening in a single spot - those negative impacts have since been removed, those are no longer a problem, and therefore, using an INT IDENTITY column as your clustering key is a nearly optimal choice for the most part.
16 seconds for one row is too much. Judging by the order of magnitude of this problem this is not an issue of bad index keys or too many indexes. All of that is in the millisecond range. Investigate the actual execution plan. You also can use SQL Profiler to trace what's being executed and how long it takes. Waiting and blocking might also be a reason it takes so long.

Non Clustered Index not working sql server

I have a table that doesn't have any primary key. data is already there. I have made a non clustered index. but when i run query, actual execution plan is not showing index scanning. I think non clustered index is not working. what could be the reason. Please Help Me
First of all - why isn't there a primary key?? If it doesn't have a primary key, it's not a table - just add one! That will help on so many levels....
Secondly: even if you have an index, SQL Server query optimizer will always look at your query to decide whether it makes sense to use the index (or not). If you select all columns, and a large portion of the rows, then using an index is pointless.
So things to avoid are:
SELECT * FROM dbo.YourTable is almost guaranteed not to use any indices
if you don't have a good WHERE clause in your query
if your index is on a column that doesn't really select a small percentage of data; an index on a boolean column, or a Gender column with at most three different values doesn't help at all
Without knowing a lot more about your table structure, the data contained in those tables, the number of rows, and what kind of queries you're executing, no one can really answer your question - it's just way too broad....
Update: if you want to create a clustered index on a table which is different from your primary key, do these steps:
1) First, design your table
2) Then open up the index designer - create a new, clustered index on a column of your choice. Mind you - this is NOT the primary key !
3) After that, you can put your primary key on the ID column - it will create an index, but that index is not clustered !
Without having any more information I'd guess that the reason is that the table is too small for an index seek to be worth it.
If your table has less than a few thousand rows then SQL Server will almost always choose to do a table / index scan regardless of the indexes on that table simply because an index scan is in fact faster.
An index scan in itself doesn't necessarily indicate a performance problem - is the query actually slow?

Sql Server 2005 novice query

I am very beginner in SQL Server 2005 and I am learning it from online tutorial, here is some of my question:
1: What is the difference between Select * from XYZ and Select ALL * from XYZ.
2: The purpose of Clustered index is like to make the search easier by physically sorting the table [as far as I kknow :-)]. Let say if have primary column on a table than is it good to create a clustered index on the table? because we have already a column which is sorted.
3: Why we can create 1 Clustered Index + 249 Nonclustered Index = 250 Index on a table? I understand the requirement of 1 clustered index. But why 249?? Why not more than 249?
No difference SELECT ALL is the default as opposed to SELECT DISTINCT
Opinion varies. For performance reasons Clustered indexes should ideally be small, stable, unique, and monotonically increasing. Primary keys should also be stable and unique so there is an obvious fit there. However clustered indexes are well suited for range queries. Looking up individual records by PK can perform well if the PK is nonclustered so some authors suggest not "wasting" the clustered index on the PK.
In SQL Server 2008 you can create up to 999 NCIs on a table. I can't imagine ever doing so but I think the limit was raised as potentially with "filtered indexes" there might be a viable case for this many. Indexes add a cost to data modification operations though as the changes need to be propagated in multiple places so I would imagine it would only be largely read only (e.g. reporting) databases that ever achieve even double figures of non clustered non filtered indexes.
For 3:
Everytime when you insert/delete record in the table ALL indexes must be updated. If you will have too many indexes it takes too long time.
If your table have more then 5-6 indexes I think you need take the time and check yourself.

SQL Server "Write Once" Table Clustered Index

I have a fairly unique table in a SQL Server database that doesn't follow 'typical' usage conventions and am looking for some advice regarding the clustered index.
This is a made-up example, but follows the real data pretty closely.
The table has a 3 column primary key, which are really foreign keys to other tables, and a fourth field that contains the relevant data. For this example, let's say that the table looks like this:
CREATE TABLE [dbo].[WordCountsForPage](
[AuthorID] [int] NOT NULL,
[BookID] [int] NOT NULL,
[PageNumber] [int] NOT NULL,
[WordCount] [int] NOT NULL
)
So, we have a somewhat hierarchical primary key, with the unique data being that fourth field.
In the real application, there are a total of 2.8 Billion possible records, but that's all. The records are created on the fly as the data is calculated over time, and realistically probably only 1/4 of those records will ever actually be calculated. They are stored in the DB since the calculation is an expensive operation, and we only want to do it once for each unique combination.
Today, the data is read thousands of times a minute, but (at least for now) there are also hundreds of inserts per minute as the table populates itself (and this will continue for quite some time). I would say that there are 10 reads for every insert (today).
I am wondering if we are taking a performance hit on all of those inserts because of the clustered index.
The clustered index makes sense "long term" since the table will eventually become read-only, but it will take some time to get there.
I suppose I could make the index non-clustered during the heavy insert period, and change it to clustered as the table becomes populated, but how do you determine when the cross-over point would be (and how can I notify myself in the future that the 'time has come')?
What I really need is a convertible index that crosses over from non-clustered to clustered at some magical time in the future.
Any suggestions for how to handle this one?
Actually, I would not bother with trying to have a non-clustered index first and convert it to a clustered one (that alone is a really messy affair!) later on.
As The Queen Of Indexing, Kimberly Tripp, explains in her The Clustered Index Debate Continues.., having a clustered index on a table can actually improve your INSERT performance!
Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing.
A heap is a table which has no clustered index defined on it.
Considering this, and the effort and trouble it takes to go from heap to a table with a clustered index - I wouldn't even bother. Just define your indices, and start using that table!

Primary Key: Slow Inserts?

Defining a column to be a primary in table on SQL Server - will this make inserts slower?
I ask because I understand this is the case for indexes.
The table has millions of records.
No, not necessarily! Sounds counter-intuitive, but read this quote from Kim Tripp's blog post:
Inserts are faster in a clustered
table (but only in the "right"
clustered table) than compared to a
heap. The primary problem here is that
lookups in the IAM/PFS to determine
the insert location in a heap are
slower than in a clustered table
(where insert location is known,
defined by the clustered key). Inserts
are faster when inserted into a table
where order is defined (CL) and where
that order is ever-increasing.
So actually, having a good clustered index (e.g. on a INT IDENTITY column, if ever possible) does speed things up - even insert, updates and deletes!
Primary keys are automatically indexed, clustered if possible and failing that non-clustered.
So in that sense inserts are slightly affected, but of course having no primary key would usually be much much worse, assuming the table needs a primary key.
First measure, identify a problem and then try to optimize. Optimizing away primary keys is a very bad idea in general.
Not enough to create a perceptual performance hit, and the benefits far outweigh the very minor perforamnce issues. There are very very few scenarios where you should not put a primary key on a table.
The really quick answer is:
Yes.
Primary keys are always indexed (and SQL will attempt a clustered index). Indexes make inserts slower, clustered indexes even more so.
Depending on what your table is used for, you may have a couple of options.
If you do a lot of bulk inserts then reads, you can remove the primary key, insert into a heap (if you have SQL 2008 this can be minimally logged to run even faster) then reassign the key and wait for the index to run.
As an addendum to that, you can also insert using an ORDER BY clause which will keep the inserted rows in correct order for the clustered index. This will really only help if you are inserting millions of rows at once from a source that is already ordered, though.
Yes, adding a primary key to a table will slow inserts (which is OK, because not adding a primary key to your table will speed up your application's eventual catastrophic failure).
If what you're doing is creating a new table and then inserting millions of records into it, there is nothing wrong with initially creating the table without a primary key, inserting all the records, and then creating the primary key. Or use an alternative tool to perform a bulk insert.
Yes, inserts are slowed, especially with several clients doing inserts simultaneously, and more so if your key is sequentially increasing (all inserts occur at the right-most nodes of the index tree, in most database implementations, or at the last page of the table for e.g. clustered SQL Server indices -- both of which scenarios cause resource contention).
That said, SELECTs using the primary key are speeded up quite a bit, and the integrity of your key is guaranteed. Do the right thing first (define primary keys everywhere). Second, measure to see if you cannot meet your performance targets, and whether this is caused by your data integrity constraints. Only then consider workarounds.
No, not necessarily
Regardless, that is not why you would define a primary key on a table.
You define a primary key when it is REQUIRED by the domain model.

Resources