SQL Server clustered index creation in parallel

SQL Server clustered index creation in parallel - sql-server

I have two tables T_A and T_B.
Both are empty.
Both has clustered index on them.
Recovery model is set to SIMPLE.
The insert...select.. meets the requirements of minimal logging. See
http://msdn.microsoft.com/en-us/library/ms191244.aspx
Both staging tables contains large amount of data.
I need to import data into them from staging tables.
If I perform the following T-SQL blocks individually, each takes 2 to 3 minutes to finish. The total time is about 5 to 6 minutes.
BEGIN TRAN
INSERT INTO T_A WITH(TABLOCK) FROM SRC_A WITH(NOLOCK);
COMMIT TRAN
BEGIN TRAN
INSERT INTO T_B WITH(TABLOCK) FROM SRC_B WITH(NOLOCK);
COMMIT TRAN
To make it faster I open two sessions in SMSS and execute the two blocks in parallel. To my surprise, each session takes about 10 to 12 minutes to finish. Together the total time is more than doubled. The wait_type shown is PAGEIOLATCH_SH which point to disk I/O bottleneck. What I don't understand is that even if the two sessions have to wait on each other for I/O it should not wait for that long. Can anyone help explain this?
My story has not ended here yet. Then I removed the clustered index on both table and ran the two blocks in parallel each in a different session. This time each takes about 1 minutes to finish. The total time is about 1 minutes since they are in parallel. Great! But the nightmare comes when I try to create clustered index back.
If I create the cluster index individually it takes 4 minutes each to finish. The total time is about 8 minutes. This defeated my purpose of improving performance.
Then I try to create clustered index on the two tables in parallel each on a different session. This time it is the worst: one takes 12 minutes to finish and the other takes 25 minutes to finish.
From my test result my best choice is back to square one: execute the two transactions sequentially with clustered index on the table.
Has anyone experienced similar situation and what is the best practice to make it faster?

When creating the clustered index after inserting the records SQL has to recreate this table in the background anyway so it would be faster to insert the records directly into the table with the clustered index already present.
Also disable any non clustered indexes while inserting and enable them afterwards again, creating indexes on a filled table is faster than updating them for each insert. Remember to set Max DOP option to 0 when creating indexes.
Bulk inserts are also a lot faster then insert into statement.
I use the 'SQL server import and export wizard' for copying large amounts of data and it seems to be way faster (the wizard uses bulk statements). If necessary you can try to find the statement this wizard uses and run it yourself.

Related

Gather Streams operator before table update causing serial update leading to long running query in SQL Server 2017

I have a long running stored procedure with lot of statements. After analyzing identified few statements which are taking most time. Those statements are all update statements.
Looking at the execution plan, the query scans the source table in parallel in few seconds, and then passed it to gather streams operation which then passes to
This is somewhat similar to below, and we see same behavior with the index creation statements too causing slowness.
https://brentozar.com/archive/2019/01/why-do-some-indexes-create-faster-than-others/
Table has 60 million records and is a heap as we do lot of data loads, updates and deletes.
Reading the source is not a problem as it completes in few seconds, but actual update which happens serially is taking most time.

A few suggestions to try:
if you have indexes on the target table, dropping them before and recreating after should improve insert performance.
Add insert into [Table] with (tablock) hint to the table you are inserting into, this will enable sql server to lock the table exclusively and will allow the insert to also run in parallel.
Alternatively if that doesn't yield an improvement try adding a maxdop 1 hint to the query.

How often do you UPDATE the rows in this heap?
Because, unlike clustered indexes, heaps will use a RID to find specific rows. But the thing is that (unless you specifically rebuild this) when you update a row, the last row will still remain where it was and now point to the new location instead, increasing the number of lookups that is needed for each time you perform an update on a row.
I don't really think that is something that will be affected here, but could you possible see what happens if you add a clustered index on the table and see how the update times are affected?
Also, I don't assume you got some heavy trigger on the table, doing a bunch of stuff as well, right?
Additionally, since you are referring to an article by Brent Ozar, he does advocate to break updates into batches of no more than 4000 rows a time, as that has both been proven to be the fastest and will be below the 5000 rows X-lock that will occur during updates.

Improve performance of insert?

I ran the Performance – Top Queries by Total IO (I am trying to improve this process).
The top #1 is this code:
DECLARE #LeadsVS3 AS TT_LEADSMERGE
DECLARE #LastUpdateDate DATETIME
SELECT #LastUpdateDate = MAX(updatedate)
FROM [BUDatamartsource].[dbo].[salesforce_lead]
INSERT INTO #LeadsVS3
SELECT
Lead_id,
(more columns…)
OrderID__c,
City__c
FROM
[ReplicatedVS3].[dbo].[Lead]
WHERE
UpdateDate > #LastUpdateDate
(the code is a piece of a larger SP)
This is in a job that runs every 15 minutes... Other than running the job less frequently is there any other improvement I could make?

Make a try with a local hash table like #LeadsVS3, it is faster than udtt in most cases
Also there is another trick you may do.
On those cases where you always get all 'recent' rows, you may get locked for 1 row, the latest, waiting to commit. You may sacrifice a small part e.g. 1 minute that is to ignore last minute records ( current datetime - 1 minute ). You get those to the next run and save yourself any transaction (or replication) lock waits.

The execution plan that you posted appears to be the estimated execution plan. (the actual execution plan includes the actual number of rows). Without the actual plan it's impossible to tell what's really going on.
The obvious improvement would be to add a covering nonclustered index on Lead.leadid that includes the other columns in your SELECT statement. Right now your scanning a the widest possible index (your clustered index) to retrieve a presumably small percentage of records. Turning that clustered scan into a non-clustered seek will be huge.
On that same note you could make that index a filtered index that's only includes records for dates greater than your last UpdateDate. Then setup a regular SQL Job that periodically rebuilds it to filter on a more current date.
Other things you can do to increase insert performance:
Drop any constraints and/or indexes before the insert then rebuild
them after.
Use smaller data types

How can I block users while I truncate a SQL Table

We have a SQL Server 2008R2 Table that tracks incremented unique key values, like transaction numbers, etc. It's like a bad version of sequence objects. There are about 300 of these unique keys in the table.
My problem is that the table grows several 100,000 rows every day because it keeps the previously used numbers.
My issue is that we have to clean out the table once a week or performance suffers. I want to use a truncate and then kick off the SP to generate the next incremented value for each of the 300 keys. This works with a run time of about 5 minutes, but during this time the system is trying to use the table and throwing errors because there is no data.
Is there any way to lock the table, preventing user access, truncate and then lift the lock?

TRUNCATE automatically will lock the whole table. A delete statement will implement row locking, which will not interfere with your user queries. You might want to think about a purge of old records during off hours.

This will require cooperation by the readers. If you want to avoid using a highly blocking isolation level like serializable, you can use sp_getapplock and sp_releaseapplock to protect the table during the regeneration process. https://msdn.microsoft.com/en-us/library/ms189823.aspx
An alternative might be to build your new set in another table and then use sp_rename to swap them out.

Many operations time out and block process in SQL Server

We have a lot of operations that time out in our site log.
After installing Redgate SQL Monitor on the server, we figured out we have many blocked processes and some times deadlock.
With Redgate we realized problem is for a stored procedure. It's a simple stored procedure and just increase view count of product (simple update)
ALTER PROCEDURE [dbo].[SP_IncreaseProductView]
#PID int
AS
BEGIN
UPDATE dbo.stProduct
SET ViewCount = ViewCount + 1
WHERE ID = #PID
END
When that stored procedure is off, everything is fine but some times block process error is back.
This table(product) hasn't any trigger. but its like heart of system and has 12000 records.
It has 3 indexes, 1 clustered and 2 non-clustered, and many statistics
We don't have any transaction. block process mostly happens in update query.
How can I figure out where the problem is?
Sorry for my bad languages
Thanks
Edit:
I think the problem isn't the SP, its about update on product table (Its my opinion). Its large table. I still get block process when SP is off but less.
We have a lot of select and update on this table.
Also i rewrite the increase view count with LINQ to SQL, but still get block process like when SP is on.
Edit 2:
I set profiler and get all query on product table.
530 select (most with join with another 2 table) and 25 update per minute (on product table only).
For now, [SP_IncreaseProductView] is off. Because when its on, we getting block process and operation timed out about every 10 second and web site stopped.
After that (set SP to off) block process still exist but roughly 50 per day.

I would go with Ingaz' second solution, but further optimizations or simplification can be performed:
1) store view count in a product 1:1 table. This is particularly useful when some queries do not need view count
2) view count redundancy
- keep view count in product table and read it from there
- also define view count in another table (just productid and viewcount columns)
- application can update asynchronously directly in the second table
- a job updates in the product table based on data from the second table
This ensures that locking is affecting product table much less than independent updates.

It is expected: I suppose that you have a lot of updates into single small (12K rows) table.
You can:
Work around your problem
Put ROWLOCK hint in your UPDATE
Change database option to READ_COMMITED_SNAPSHOT
Be warned though: it can create another problems.
More complex recipe to eliminate blocking completely
Not for faint of heart.
Create table dbo.stProduct_Increment.
Modify your [SP_IncreaseProductView] to INSERT into increment table.
Create periodic task that UPDATEs your dbo.stProduct and clear increment table.
Create view that combines stProduct and stProduct_Increment.
Modify all SELECT statements for stProduct to created view.

SQL Server Delete Lock issue

I have a SQL Server database where I am deleting rows from three tables A,B,C in batches with some conditions through a SQL script scheduled in a SQL job. The job runs for 2 hours as the tables have a large amount of data. While the job is running, my front end application is not accessible (giving timeout error) since the application inserts and updates data in these same tables A,B,C.
Is it possible for the front end application to run in parallel without any issues while the SQL script is running? I have checked for the locks on the table and SQL Server is acquiring page locks. Can Read Committed Snapshot or Snapshot isolation levels or converting page locks to row locks help here. Need advice.

Split the operation in two phases. In the first phase, collect the primary keys of rows to delete:
create table #TempList (ID int);
insert #TempList
select ID
from YourTable
In the second phase, use a loop to delete those rows in small batches:
while 1=1
begin
delete top (1000)
from YourTable
where ID in (select ID from #TempList)
if ##rowcount = 0
break
end
The smaller batches will allow your front end applications to continue in between them.

I suspect that SQL Server at some point escalates to table lock, and this means that the table is inaccessible, both for reading and updating.
To optimize locking and concurrency when dealing with large deletes, use batches. Start with 5000 rows at the time (to prevent lock escalation) and monitor how it behaves and whether it needs further tuning up or down. 5000 is a "magic number", but it's low enough number that lock manager doesn't consider escalating to table lock, and large enough for the performance.
Whether timeouts will happen or not depends on other factors as well, but this will surely reduce if not elliminate alltogether. If the timeout happen on read operations, you should be able to get rid of them. Another approach, of course, is to increase the command timeout value on client.
Snapshot (optimistic) isolation is an option as well, READ COMMITTED SNAPSHOT more precisely, but it won't help with updates from other sessions. Also, beware of version store (in tempdb) growth. Best if you combine it with the proposed batch approach to keep the transactions small.
Also, switch to bulk-logged recovery for the duration of delete if the database is in full recovery normally. But switch back as soon as it finishes, and make a backup.
Almost forgot -- if it's Enterprise edition of SQL Server, partition your table; then you can just switch the partition out, it's almost momentarilly and the clients will never notice it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight