I am performing a simple insert of a few hundred rows, e.g.:
INSERT INTO Foo
SELECT * FROM Bar
The table has a handful of secondary indexes. With these indexes disabled the query runs nearly instantly. With the secondary indexes enabled, the query takes seconds to run, with a relatively high subtree cost.
The issue is that for every secondary index, the database performs:
Physical Operation: Table spool
Logical Operation: Lazy spool
where it caches:
all columns in the destiation table (when it only needs the values it needs)
multiple times values (rather than just once)
While it may be interesting to know why SQL Server (2008 R2 SP2) thinks it needs to do this, what i really need is a way to make inserting 100 rows in a live server not take six seconds.
The really, really, horrible part is that every for every table spool, SQL Server caches the value of every column, every time:
Which is just burning logical IO.
Without these problematic index updates, the complete import of 60,000 rows happens in a second or two
With these indexes, the complete import takes literally dozens of minute
Steps to reproduce
Of course, my real AuditLog table contains 4M rows. But we can reproduce the exact same operators, with a high subtree cost, using an empty AuditLog table:
CREATE TABLE [dbo].[AuditLog](
[AuditLogID] [int] IDENTITY(216,1) NOT NULL,
[ChangeDate] [datetime] NOT NULL CONSTRAINT [DF_AuditLog_ChangeDate] DEFAULT (getdate()),
[RowGUID] [uniqueidentifier] NOT NULL,
[ChangeType] [varchar](50) NOT NULL,
[TableName] [varchar](128) NOT NULL,
[FieldName] [varchar](128) NOT NULL,
[OldValue] [varchar](max) NULL,
[NewValue] [varchar](max) NULL,
[SystemUser] [varchar](128) NULL CONSTRAINT [DF_AuditLog_SystemUser] DEFAULT (suser_sname()),
[Username] [varchar](128) NOT NULL CONSTRAINT [DF_AuditLog_Username] DEFAULT (user_name()),
[Hostname] [varchar](50) NOT NULL CONSTRAINT [DF_AuditLog_Hostname] DEFAULT (host_name()),
[AppName] [varchar](128) NULL CONSTRAINT [DF_AuditLog_AppName] DEFAULT (app_name()),
[UserGUID] [uniqueidentifier] NULL,
[TagGUID] [uniqueidentifier] NULL,
[Tag] [varchar](max) NULL,
[timestamp] [timestamp] NOT NULL,
CONSTRAINT [PK_AuditLog] PRIMARY KEY CLUSTERED ([AuditLogID] ASC)
)
And we have the painful indexes:
SET ANSI_PADDING OFF
GO
/****** Object: Index [IX_AuditLog_ChangeDate] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_ChangeDate] ON [dbo].[AuditLog]
(
[ChangeDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
/****** Object: Index [IX_AuditLog_FieldName] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_FieldName] ON [dbo].[AuditLog]
(
[FieldName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
/****** Object: Index [IX_AuditLog_LastRowActionByTable] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_LastRowActionByTable] ON [dbo].[AuditLog]
(
[TableName] ASC,
[ChangeType] ASC,
[RowGUID] ASC,
[UserGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
/****** Object: Index [IX_AuditLog_RowGUID] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_RowGUID] ON [dbo].[AuditLog]
(
[RowGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
/****** Object: Index [IX_AuditLog_RowInsertedByUserGUID] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_RowInsertedByUserGUID] ON [dbo].[AuditLog]
(
[ChangeType] ASC,
[RowGUID] ASC,
[UserGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
/****** Object: Index [IX_AuditLog_RowLastModifiedByUserGUID] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_RowLastModifiedByUserGUID] ON [dbo].[AuditLog]
(
[RowGUID] ASC,
[ChangeDate] ASC,
[UserGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
SET ANSI_PADDING ON
GO
/****** Object: Index [IX_AuditLog_TableName] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_TableName] ON [dbo].[AuditLog]
(
[TableName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
/****** Object: Index [IX_AuditLog_TagGUID] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_TagGUID] ON [dbo].[AuditLog]
(
[TagGUID] ASC,
[RowGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO
/****** Object: Index [IX_AuditLog_UserGUID] Script Date: 11/17/2016 2:58:43 PM ******/
CREATE NONCLUSTERED INDEX [IX_AuditLog_UserGUID] ON [dbo].[AuditLog]
(
[ChangeDate] ASC,
[UserGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
And then we create our insert:
INSERT INTO AuditLog(
RowGUID,
ChangeType,
UserGUID,
TableName,
FieldName,
TagGUID,
Tag)
SELECT
'E5E31EDD-7D39-47FD-BCFF-4B7044AC433D',
'INSERTED',
'4A2FDACD-0209-403B-ADBC-1B8A68E90350', --UserGUID
'Customers', --TableName
'', --FieldName
'7A74267D-64F9-44D7-A1D7-1490A66136BF', --TagGUID
'Contoso'
FROM (
--A dummy derived table that lets us select the above row 100 times
SELECT TOP 400 (a.Number * 256) + b.Number AS Number
FROM (
SELECT number FROM master..spt_values WHERE type = 'P' AND number <= 255) a (Number),
(SELECT number FROM master..spt_values WHERE type = 'P' AND number <= 255) b (Number)
) dt
Wait Times You Ask?
| Wait Type | Wait Time (s) | Wait Count |
|----------------|---------------|------------|
| IO_COMPLETION | 4.55 s | 211 |
| WRITELOG | 0.79 s | 37 |
| PAGEIOLATCH_UP | 0.36 s | 1 |
| PAGELATCH_UP | 0.09 s | 2 |
| PAGEIOLATCH_EX | 0.07 s | 4 |
4.55s of a 6s execution in IO_COMPLETION:
Occurs while waiting for I/O operations to complete. This wait type generally represents non-data page I/Os. Data page I/O completion waits appear as PAGEIOLATCH_* waits.
Non-redundant indexes you say?
| Index Name | Columns | Index Entry Size |
|---------------------------------------|------------------------------------------|--------------------------|
| IX_AuditLog_ChangeDate | ChangeDate | 12 bytes per entry |
| IX_AuditLog_UserGUID | ChangeDate, UserGUID | 28 bytes per entry |
| IX_AuditLog_FieldName | FieldName | 4 bytes per entry (avg) |
| IX_AuditLog_TableName | TableName | 13 bytes per entry (avg) |
| IX_AuditLog_LastRowActionByTable | TableName, ChangeType, RowGUID, UserGUID | 52 bytes per entry (avg) |
| IX_AuditLog_RowGUID | RowGUID | 20 bytes per entry |
| IX_AuditLog_RowLastModifiedByUserGUID | RowGUID, ChangeDate, UserGUID | 44 bytes per entry |
| IX_AuditLog_RowInsertedByUserGUID | ChangeType, RowGUID, UserGUID | 43 bytes per entry (avg) |
| IX_AuditLog_TagGUID | TagGUID, RowGUID | 36 bytes per entry |
No Sort Warnings
SQL Server Profiler results for the batch
Duration: 7,401 ms
Reads: 233,597
Writes: 17,077
CPU: 1,141 ms
No sort warnings. Nor is there any Attention, Bitmap Warning, Execution Warning, Hash Warning, Missing Column Statistics, Missing Join Predicate, Sort Warning, User Error Message.
Indexes were all rebuilt. All statistics were updated.
You have problem with overlaping and redundant indexes.
This qry will help: T-SQL for finding Redundant Indexes
Related
I have database MS SQL Server 2008 R2 with loading 2000 transactions per second. Tables are:
'lion_Tasks'(uid_obj, order_new, __usn_field_order_new, and more 40 fields)
and
'lion_Tasks_Changes_Parts'(uid_task_cp, uid_user_cp, __usn_entity_cp and more 20 fields )
Web-server get 10.000 objects, in which only one field has changed 'order_new'. Only 3 fields need to be updated in the database. These objects I pass in a table parameter to the stored procedure:
I tried to rewrite this stored procedure several times without any performance changes, it still trips out 10-20 seconds, current version is:
CREATE PROCEDURE dbo.lion_UpdateTasksNewOrder
( #Table TasksNewOrderTableType READONLY)
WITH RECOMPILE AS
BEGIN
DECLARE #ErrorCode int
SET #ErrorCode = -1
UPDATE TasksTable
SET
order_new = t.ORDER_NEW,
__usn_field_order_new = t.USN_ORDER_NEW
FROM dbo.lion_Tasks TasksTable
INNER JOIN #Table t
ON t.UUID_TASK = TasksTable.uid_obj
IF( ##ERROR != 0 )
BEGIN
SET #ErrorCode = -1
GOTO Cleanup
END
UPDATE TasksTableCP
SET
__usn_entity_cp = tcp.USN_ENTITY
FROM dbo.lion_Tasks_Changes_Parts TasksTableCP
INNER JOIN #Table tcp
ON tcp.UUID_TASK = TasksTableCP.uid_task_cp AND tcp.UUID_USER = TasksTableCP.uid_user_cp
IF( ##ERROR != 0 )
BEGIN
SET #ErrorCode = -1
GOTO Cleanup
END
RETURN 0
Cleanup:
RETURN #ErrorCode
END
The temp-database is located on the SSD disk. Maybe someone tell me what to do for acceleration?
The estimated execution plan is:
https://www.brentozar.com/pastetheplan/?id=Byq2JzTwm
Table script:
USE [lion_data]
GO
/****** Object: Table [dbo].[lion_Tasks_Changes_Parts] Script Date: 09/05/2018 11:31:53 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[lion_Tasks_Changes_Parts](
[uid_obj_cp] [uniqueidentifier] NOT NULL,
[uid_task_cp] [uniqueidentifier] NOT NULL,
[uid_user_cp] [uniqueidentifier] NOT NULL,
[_order_cp] [int] NOT NULL,
[uid_marker_cp] [uniqueidentifier] NOT NULL,
[date_begin_cp] [datetime] NULL,
[date_end_cp] [datetime] NULL,
[readed_cp] [int] NOT NULL,
[collapsed_cp] [int] NOT NULL,
[__usn_entity_cp] [int] NOT NULL,
[__usn_field_order_cp] [int] NOT NULL,
[__usn_field_uid_marker_cp] [int] NOT NULL,
[__usn_field_date_begin_cp] [int] NOT NULL,
[__usn_field_date_end_cp] [int] NOT NULL,
[__usn_field_readed_cp] [int] NOT NULL,
[__usn_field_collapsed_cp] [int] NOT NULL,
[__usn_field_list_tags_cp] [int] NULL,
[Contacts] [nvarchar](max) NULL,
[__usn_field_contacts_cp] [int] NULL,
[uid_user_marker] [uniqueidentifier] NULL,
[__usn_field_uid_user_marker] [int] NULL,
[focus_cp] [int] NULL,
[__usn_field_focus_cp] [int] NULL,
CONSTRAINT [PK_lion_Tasks_Changes_Parts] PRIMARY KEY CLUSTERED
(
[uid_obj_cp] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Scripts for indexes:
USE [lion_data]
GO
/****** Object: Index [PK_lion_Tasks] Script Date: 09/05/2018 11:28:44 ******/
ALTER TABLE [dbo].[lion_Tasks] ADD CONSTRAINT [PK_lion_Tasks] PRIMARY KEY CLUSTERED
(
[uid_obj] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
USE [lion_data]
GO
/****** Object: Index [IX_lion_Tasks_CP_Uid_Task] Script Date: 09/05/2018 11:29:17 ******/
CREATE NONCLUSTERED INDEX [IX_lion_Tasks_CP_Uid_Task] ON [dbo].[lion_Tasks_Changes_Parts]
(
[uid_task_cp] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
USE [lion_data]
GO
/****** Object: Index [IX_lion_Tasks_CP_Uid_User] Script Date: 09/05/2018 11:29:30 ******/
CREATE NONCLUSTERED INDEX [IX_lion_Tasks_CP_Uid_User] ON [dbo].[lion_Tasks_Changes_Parts]
(
[uid_user_cp] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
I have two tables, Bins and Locations, the first with 3 million records, the second with 30 million. I am trying to match Bin for Location with the following code
Create Table #tbl_locations (LocationID int not null, Lat float not null, Lon float not null)
Create Table #tbl_bins (BinID int not null, MinLat float not null, MaxLat float not null, MinLon float not null, MaxLon float not null)
CREATE NONCLUSTERED INDEX [IX_tbl_locations] ON [#tbl_locations] ([Lat] ASC, [Lon] ASC)
ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_tbl_bins1] ON [#tbl_bins] ([MinLat] ASC, [MaxLat] ASC)
ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_tbl_bins2_ABC] ON [#tbl_bins] ([MinLon] ASC, [MaxLon] ASC)
ON [PRIMARY]
Select L.LocationID, C.BinID
From #tbl_bins C
Inner Join #tbl_locations L
ON (L.Lat Between C.MinLat And C.MaxLat)
AND (L.Lon Between C.MinLon And C.MaxLon)
Unfortunately the performance is extremely bad, I have already tried to index the different fields but that did not help enough. Everything still takes more than 10 minutes to run.
Any idea of how can I make this perform better? Maybe a better matching algorithm? SQL Server 2012 SP3.
BinID and LocationID already have a PRIMARY KEY CLUSTERED created on them.
When I check the Execution Plan I see that the JOIN is performed by a NESTED LOOP.
The output of STATISTICS IO is
Table '#tbl_locations_________________________________________________________________________________________________________000000000283'.
Scan count 2631070, logical reads 25575057, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#tbl_bins________________________________________________________________________________________________________000000000284'.
Scan count 17, logical reads 14741, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Thanks a lot!
Edit - if instead of Temp Tables I use real ones, this would be the schema exported
/****** Object: Table [dbo].[tbl_Bins] Script Date: 5/30/2017 3:49:05 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tbl_Bins](
[BinID] [int] NOT NULL,
[MinLat] [float] NOT NULL,
[MaxLat] [float] NOT NULL,
[MinLon] [float] NOT NULL,
[MaxLon] [float] NOT NULL,
CONSTRAINT [PK_tbl_Bins] PRIMARY KEY CLUSTERED
(
[BinID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[tbl_Locations] Script Date: 5/30/2017 3:49:05 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[tbl_Locations](
[ExposureID] [int] NOT NULL,
[Accgrpid] [int] NOT NULL,
[LocID] [int] NOT NULL,
[Lat] [float] NOT NULL,
[Lon] [float] NOT NULL,
CONSTRAINT [PK_tbl_locations] PRIMARY KEY CLUSTERED
(
[ExposureID] ASC,
[Accgrpid] ASC,
[LocID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Index [IX_tbl_bins1] Script Date: 5/30/2017 3:49:05 PM ******/
CREATE NONCLUSTERED INDEX [IX_tbl_bins1] ON [dbo].[tbl_Bins]
(
[MinLat] ASC,
[MaxLat] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [IX_tbl_bins2_ABC] Script Date: 5/30/2017 3:49:05 PM ******/
CREATE NONCLUSTERED INDEX [IX_tbl_bins2_ABC] ON [dbo].[tbl_Bins]
(
[MinLon] ASC,
[MaxLon] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [IX_tbl_locations] Script Date: 5/30/2017 3:49:05 PM ******/
CREATE NONCLUSTERED INDEX [IX_tbl_locations] ON [dbo].[tbl_Locations]
(
[Lat] ASC,
[Lon] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
I have found a way to improve this.
SELECT L.LocationID, C.BinID
FROM #tbl_bins C
INNER JOIN #tbl_locations L
ON (L.Lat Between C.MinLat And C.MaxLat)
AND (L.Lon Between C.MinLon And C.MaxLon)
WHERE FLOOR(C.MinLat * 1000 ) <= FLOOR(L.Lat*1000)
AND FLOOR(C.MinLon * 1000 ) <= FLOOR(L.Lon*1000)
AND CEILING(C.MaxLat * 1000 ) >= CEILING(L.Lat*1000)
AND CEILING(C.MaxLon * 1000 ) >= CEILING(L.Lon*1000)
This helps the query by reducing the amounts of Bins to join to 1.
Create a covering index:
CREATE INDEX IX_tbl_lat_long_locations ON [#tbl_locations] (Lat, Lon, LocationID)
Having such an index means the table doesn't need to be accessed in order to retrieve the LocationID.
Make it a CLUSTERED index if you can.
I have a table with 24 milion rows.
I want to run this query:
select r1.userID, r2.userID, sum(r1.rate * r2.rate) as sum
from dbo.Ratings as r1
join dbo.Ratings as r2
on r1.movieID = r2.movieID
where r1.userID <= r2.userID
group by r1.userID, r2.userID
As I tested, it took 24 hours to produce 0.02 percent of the final result.
How can I speed it up?
Here is the definition of the table:
CREATE TABLE [dbo].[Ratings](
[userID] [int] NOT NULL,
[movieID] [int] NOT NULL,
[rate] [real] NOT NULL,
PRIMARY KEY CLUSTERED
(
[userID] ASC,
[movieID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_RatingsMovies] ON [dbo].[Ratings]
(
[movieID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [IX_RatingsUsers] ON [dbo].[Ratings]
(
[userID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Here is the execution plan:
The workaround I suggested was to create a "reverse" index:
CREATE INDEX IX_Ratings_Reverse on Ratings(movieid, userid) include(rate);
and then force SQL Server to use it:
select r1.userID, r2.userID, sum(r1.rate * r2.rate) as sum
from dbo.Ratings as r1 join dbo.Ratings as r2
with (index(IX_Ratings_Reverse))
on r1.movieID = r2.movieID
where r1.userID <= r2.userID group by r1.userID, r2.userID
There are two things that might help.
1) Change the order of columns in your clustered index to MovieID,UserID. This would group all the same MovieID's together first, which might change your Hash Match to an Inner Loop, and improve the performance of the JOIN.
2) Change the [IX_RatingsMovies] index to INCLUDE UserID and Rate. The more I think about it, I think this is less likely than my first suggestion to help. But it's possible.
I have a large table in a SQl Server 2008 database, it has about 570 million records.
Every day we run a batch job that takes a file of 200,000 or so transaction records, does a group by and sum against this data and inserts it into the large table.
Recently I have experimented with changing the clustered index of the large table to an identity int column, which has brought the insert down from 3 hours to one hour, but I am still puzzled why this simple query should take so long to run (regardless of the size of the table)
This is the table with 570 million rows
CREATE TABLE [dbo].[POINTS_EARNED](
[POINTS_EARNED_ID]int identity not null,
[CARD_ID] [int] NOT NULL,
[CYCLE_ID] [int] NOT NULL,
[POINTS_CODE] [int] NOT NULL,
[NO_POINTS] [int] NULL,
[ACCOUNT_ID] [int] NOT NULL,
[CREATED_DATE] [datetime] NULL,
[CREATED_BY] [varchar](20) NULL,
[LAST_MODIFIED_DATE] [datetime] NULL,
[LAST_MODIFIED_BY] [varchar](20) NULL,
[DELETED] [bit] NULL,
CONSTRAINT [PK_POINTS_EARNED] PRIMARY KEY CLUSTERED
(
[POINTS_EARNED_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
It also has some constraints (defaults and foreign keys) and indexes, and I am wondering if these are what is causing the problem.
The actual SQL that takes an hour to run is:
insert into points_earned (
card_id,
cycle_id,
points_code,
no_points,
account_id
)
select pe.card_id, pe.cycle_id, pe.points_code, sum(pe.no_points),pe.account_id
from #points_earned pe
group by pe.card_id, pe.cycle_id, pe.points_code,pe.account_id
and the temp table #points_earned has about 200,000 rows, and has the following structure (with no indexes)
create table #points_earned (
card_id int,
cycle_id int,
points_code int,
card_type varchar(5),
no_points int,
account_id int
)
So, I would like some opinions on whether I should
Add indexes on the temp table
Drop non clustered indexes on the large table before adding the data, then recreating them
Any other options?
Update - as requested a bit more info
- The select statement runs without the insert in 2 seconds, so this doesn't appear to be the problem, so probably don't need o worry about indexing the temp table
Indexes, (update) trigger, and constraints are:
CREATE NONCLUSTERED INDEX [IDX_CYCLE_ID] ON [dbo].[POINTS_EARNED]
(
[CYCLE_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_ACCOUNT_ID] ON [dbo].[POINTS_EARNED]
(
[ACCOUNT_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_ACCOUNT_ID_POINTS_CODE] ON [dbo].[POINTS_EARNED]
(
[ACCOUNT_ID] ASC,
[POINTS_CODE] ASC
)
INCLUDE ( [CARD_ID],
[CYCLE_ID],
[NO_POINTS]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [RELATION_151_FK] ON [dbo].[POINTS_EARNED]
(
[CARD_ID] ASC,
[CYCLE_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [RELATION_152_FK] ON [dbo].[POINTS_EARNED]
(
[POINTS_CODE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Trigger [update_points_earned] Script Date: 09/13/2013 13:20:54 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TRIGGER [dbo].[update_points_earned] ON [dbo].[POINTS_EARNED]
FOR UPDATE
AS
BEGIN
UPDATE points_earned
SET Last_Modified_By = USER,
Last_Modified_Date = GETDATE()
FROM
points_earned t,
inserted i
WHERE
t.card_id = i.card_id AND
t.cycle_id = i.cycle_id AND
t.points_code = i.points_code AND
t.account_id = i.account_id
END
GO
/****** Object: Default [DF_POINTS_EARNED_ACCOUNT_ID] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] ADD CONSTRAINT [DF_POINTS_EARNED_ACCOUNT_ID] DEFAULT ((0)) FOR [ACCOUNT_ID]
GO
/****** Object: Default [DF_POINTS_EARNED_CREATED_DATE] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] ADD CONSTRAINT [DF_POINTS_EARNED_CREATED_DATE] DEFAULT (getdate()) FOR [CREATED_DATE]
GO
/****** Object: Default [DF_POINTS_EARNED_CREATED_BY] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] ADD CONSTRAINT [DF_POINTS_EARNED_CREATED_BY] DEFAULT (user_name()) FOR [CREATED_BY]
GO
/****** Object: Default [DF_POINTS_EARNED_DELETED] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] ADD CONSTRAINT [DF_POINTS_EARNED_DELETED] DEFAULT ((0)) FOR [DELETED]
GO
/****** Object: ForeignKey [FK_POINTS_E_REFERENCE_CYCLE_CA] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] WITH CHECK ADD CONSTRAINT [FK_POINTS_E_REFERENCE_CYCLE_CA] FOREIGN KEY([CARD_ID], [CYCLE_ID])
REFERENCES [dbo].[CYCLE_CARD] ([CARD_ID], [CYCLE_ID])
GO
ALTER TABLE [dbo].[POINTS_EARNED] CHECK CONSTRAINT [FK_POINTS_E_REFERENCE_CYCLE_CA]
GO
/****** Object: ForeignKey [FK_POINTS_E_REFERENCE_POINTS_C] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] WITH NOCHECK ADD CONSTRAINT [FK_POINTS_E_REFERENCE_POINTS_C] FOREIGN KEY([POINTS_CODE])
REFERENCES [dbo].[POINTS_CODE] ([POINTS_CODE])
GO
ALTER TABLE [dbo].[POINTS_EARNED] CHECK CONSTRAINT [FK_POINTS_E_REFERENCE_POINTS_C]
GO
/****** Object: ForeignKey [FK_POINTS_EARNED_REF_ACCOUNT] Script Date: 09/13/2013 13:20:54 ******/
ALTER TABLE [dbo].[POINTS_EARNED] WITH NOCHECK ADD CONSTRAINT [FK_POINTS_EARNED_REF_ACCOUNT] FOREIGN KEY([ACCOUNT_ID])
REFERENCES [dbo].[ACCOUNT] ([ACCOUNT_ID])
GO
ALTER TABLE [dbo].[POINTS_EARNED] CHECK CONSTRAINT [FK_POINTS_EARNED_REF_ACCOUNT]
Edit 2, query plan for the insert statement
|--Sequence
|--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[IDX_CYCLE_ID]), SET:([POINTS_EARNED_ID1040] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[CYCLE_ID1041] = [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID]) WITH ORDERED PREFETCH)
| |--Sort(ORDER BY:([Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] ASC))
| |--Table Spool
| |--Clustered Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[PK_POINTS_EARNED]), SET:([Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID] = RaiseIfNullInsert([tempdb].[dbo].[#points_earned].[card_id] as [pe].[card_id]),[Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID] = RaiseIfNullInsert([tempdb].[dbo].[#points_earned].[cycle_id] as [pe].[cycle_id]),[Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE] = RaiseIfNullInsert([tempdb].[dbo].[#points_earned].[points_code] as [pe].[points_code]),[Progressive_Points].[dbo].[POINTS_EARNED].[NO_POINTS] = [Expr1006],[Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID] = RaiseIfNullInsert([tempdb].[dbo].[#points_earned].[account_id] as [pe].[account_id]),[Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] = [Expr1007],[Progressive_Points].[dbo].[POINTS_EARNED].[CREATED_DATE] = [Expr1008],[Progressive_Points].[dbo].[POINTS_EARNED].[CREATED_BY] = [Expr1009],[Progressive_Points].[dbo].[POINTS_EARNED].[DELETED] = [Expr1010],[Progressive_Points].[dbo].[POINTS_EARNED].[LAST_MODIFIED_DATE] = NULL,[Progressive_Points].[dbo].[POINTS_EARNED].[LAST_MODIFIED_BY] = NULL) WITH UNORDERED PREFETCH)
| |--Compute Scalar(DEFINE:([Expr1008]=getdate(), [Expr1009]=CONVERT_IMPLICIT(varchar(20),user_name(),0), [Expr1010]=(0)))
| |--Compute Scalar(DEFINE:([Expr1007]=getidentity((1243867498),(8),NULL)))
| |--Top(ROWCOUNT est 0)
| |--Parallelism(Gather Streams)
| |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1062]=(0) THEN NULL ELSE [Expr1063] END))
| |--Hash Match(Aggregate, HASH:([pe].[card_id], [pe].[cycle_id], [pe].[points_code], [pe].[account_id]), RESIDUAL:([tempdb].[dbo].[#points_earned].[card_id] as [pe].[card_id] = [tempdb].[dbo].[#points_earned].[card_id] as [pe].[card_id] AND [tempdb].[dbo].[#points_earned].[cycle_id] as [pe].[cycle_id] = [tempdb].[dbo].[#points_earned].[cycle_id] as [pe].[cycle_id] AND [tempdb].[dbo].[#points_earned].[points_code] as [pe].[points_code] = [tempdb].[dbo].[#points_earned].[points_code] as [pe].[points_code] AND [tempdb].[dbo].[#points_earned].[account_id] as [pe].[account_id] = [tempdb].[dbo].[#points_earned].[account_id] as [pe].[account_id]) DEFINE:([Expr1062]=COUNT_BIG([tempdb].[dbo].[#points_earned].[no_points] as [pe].[no_points]), [Expr1063]=SUM([tempdb].[dbo].[#points_earned].[no_points] as [pe].[no_points])))
| |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pe].[card_id], [pe].[cycle_id], [pe].[points_code], [pe].[account_id]))
| |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#points_earned] AS [pe]))
|--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[IX_ACCOUNT_ID]), SET:([POINTS_EARNED_ID1042] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[ACCOUNT_ID1043] = [Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID]) WITH ORDERED PREFETCH)
| |--Sort(ORDER BY:([Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] ASC))
| |--Table Spool
|--Assert(WHERE:(CASE WHEN [Expr1050] IS NULL THEN (0) ELSE NULL END))
| |--Nested Loops(Left Semi Join, OUTER REFERENCES:([Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID], [Expr1068]) WITH UNORDERED PREFETCH, DEFINE:([Expr1050] = [PROBE VALUE]))
| |--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[IX_ACCOUNT_ID_POINTS_CODE]), SET:([POINTS_EARNED_ID1044] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[CARD_ID1045] = [Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID],[CYCLE_ID1046] = [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID],[POINTS_CODE1047] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE],[NO_POINTS1048] = [Progressive_Points].[dbo].[POINTS_EARNED].[NO_POINTS],[ACCOUNT_ID1049] = [Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID]) WITH UNORDERED PREFETCH)
| | |--Table Spool
| |--Clustered Index Seek(OBJECT:([Progressive_Points].[dbo].[ACCOUNT].[PK_ACCOUNT]), SEEK:([Progressive_Points].[dbo].[ACCOUNT].[ACCOUNT_ID]=[Progressive_Points].[dbo].[POINTS_EARNED].[ACCOUNT_ID]) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1054] IS NULL THEN (0) ELSE NULL END))
| |--Nested Loops(Left Semi Join, OUTER REFERENCES:([Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID], [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID], [Expr1070]) WITH UNORDERED PREFETCH, DEFINE:([Expr1054] = [PROBE VALUE]))
| |--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[RELATION_151_FK]), SET:([POINTS_EARNED_ID1051] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[CARD_ID1052] = [Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID],[CYCLE_ID1053] = [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID]) WITH ORDERED PREFETCH)
| | |--Sort(ORDER BY:([Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] ASC))
| | |--Table Spool
| |--Row Count Spool
| |--Index Seek(OBJECT:([Progressive_Points].[dbo].[CYCLE_CARD].[IDX_NCLST_CARD_ID_CYCLE_ID]), SEEK:([Progressive_Points].[dbo].[CYCLE_CARD].[CARD_ID]=[Progressive_Points].[dbo].[POINTS_EARNED].[CARD_ID] AND [Progressive_Points].[dbo].[CYCLE_CARD].[CYCLE_ID]=[Progressive_Points].[dbo].[POINTS_EARNED].[CYCLE_ID]) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1057] IS NULL THEN (0) ELSE NULL END))
|--Merge Join(Left Semi Join, MANY-TO-MANY MERGE:([Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE])=([Progressive_Points].[dbo].[POINTS_CODE].[POINTS_CODE]), RESIDUAL:([Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE]=[Progressive_Points].[dbo].[POINTS_CODE].[POINTS_CODE]))
|--Index Insert(OBJECT:([Progressive_Points].[dbo].[POINTS_EARNED].[RELATION_152_FK]), SET:([POINTS_EARNED_ID1055] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID],[POINTS_CODE1056] = [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE]) WITH ORDERED PREFETCH)
| |--Sort(ORDER BY:([Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_CODE] ASC, [Progressive_Points].[dbo].[POINTS_EARNED].[POINTS_EARNED_ID] ASC))
| |--Table Spool
|--Index Scan(OBJECT:([Progressive_Points].[dbo].[POINTS_CODE].[POINTS_CODES_PK]), ORDERED FORWARD)
OK, here's what I would do:
Check to see if you need both indexes [IX_ACCOUNT_ID_POINTS_CODE] and [IX_ACCOUNT_ID] as they may be redundant.
Before you do the INSERT, Disable the Trigger and drop the Foreign Keys.
Do the INSERT setting the fields normally set by the Trigger, and insuring that the FK Column's values are valid.
Re-Enable the trigger, and re-create the Foreign Keys WITH NOCHECK.
I would leave the indexes on as you are inserting less than 0.2% of the total row count so it's probably faster to update them in-place rather than to drop and rebuild them.
instead of deleting 200k rows from the massive table in one shot, try chunking it. E.g.:
while (1=1)
begin
delete top(1000) from #points_earned
output deleted.* into points_earned
if ##rowcount=0 break
end
Regarding your TABLE, there's 3 considerations that affects your performance for each record you add :
(1) Your Indexes
(2) Your Trigger
(3) Your Foreign Keys
If you can afford it, apply the appropriate architecture for your TABLE, like PARTITION TABLE, and PARTITION INDEXES within appropriate SAS Drives.
Otherwise, on similar situation, with dozen of thousands records updated every minute, i use the technique of BULK/INSERT, with another TABLE (ex. [POINTS_EARNED_TMP] on a separate Database within the same Instance (*).
Add the record with the Trigger [POINTS_EARNED_TMP].
Then, from another BULK, you set your procedure with no triggers and BULK/INSERT your data from [POINTS_EARNED_TMP] to [POINTS_EARNED] (including the USER, and DATE Update).
At least, the Trigger performance is avoided, and the #TMP within the same Instance is avoided too.
(*) Using another Database is mainly for maintenance reason.
BULK gives so far, amazing results compare to the INSERT TO.
EDIT:
We're in the process of moving server and I've just tested this on the new server. There's no performance problem there. This seems to be down to an underpowered, badly organised server.
One of our processes suddenly ran very slowly last night. The slow step was tracked down to an update statement on a table that was admittedly not too cleverly indexed.
So today I've added indexes to all the tables involved, but I'm still getting terrible performance.
I really don't understand it - possibly I'm still doing something less than smart.
Any suggestions welcomed.
update is as follows:
update test_HDM_RTT
set patient_district_no = b.legacy_number
from test_HDM_RTT a
inner join PHD.migration.PatScope b
on a.patient_pas_no = b.TrustNumber
patscope is 2474147 rows, test_hdm_rtt is 815278
definition of tables:
CREATE TABLE [dbo].[test_HDM_RTT](
[pk_episode_id] [int] NULL,
[pk_event_id] [int] NOT NULL,
[activity_date] [datetime] NULL,
[activity_datetime] [datetime] NULL,
[activity_subtype1] [nvarchar](50) NULL,
[activity_subtype1_code] [nvarchar](50) NULL,
[activity_subtype2] [nvarchar](50) NULL,
[activity_subtype2_code] [nvarchar](50) NULL,
[activity_type] [nvarchar](50) NULL,
[activity_type_code] [nvarchar](50) NULL,
[clock_start_date] [datetime] NULL,
[clock_stop_date] [datetime] NULL,
[dir_code] [nvarchar](10) NULL,
[div_code] [nvarchar](10) NULL,
[episode_id_ext] [nvarchar](50) NULL,
[episode_id_appt] [nvarchar](50) NULL,
[episode_id_ref] [nvarchar](50) NULL,
[episode_id_ref_medway] [nvarchar](50) NULL,
[episode_id_wl] [nvarchar](50) NULL,
[erod] [datetime] NULL,
[nhs_number] [nvarchar](20) NULL,
[patient_id] [int] NULL,
[patient_district_no] [nvarchar](20) NULL,
[patient_pas_no] [nvarchar](50) NULL,
[pathway_id] [nvarchar](50) NULL,
[pct_code] [nvarchar](10) NULL,
[ref_source_code] [nvarchar](10) NULL,
[rtt_episode_id] [nvarchar](50) NULL,
[rtt_outcome_code] [nvarchar](50) NULL,
[rtt_outcome_desc] [nvarchar](50) NULL,
[rtt_start_date] [datetime] NULL,
[rtt_start_ind] [nvarchar](10) NULL,
[rtt_stop_date] [datetime] NULL,
[site_code] [nvarchar](10) NULL,
[spec_natcode] [nvarchar](10) NULL,
[spec_pascode] [nvarchar](10) NULL,
[transfer_text] [nvarchar](100) NULL,
[op_rtt_count] [int] NULL,
[app_rec_date] [datetime] NULL,
[cons_code] [varchar](10) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
USE [PIP]
/****** Object: Index [pk_event_id_clustered] Script Date: 03/06/2013 14:46:52 ******/
CREATE CLUSTERED INDEX [pk_event_id_clustered] ON [dbo].[test_HDM_RTT]
(
[pk_event_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
USE [PIP]
/****** Object: Index [idx_episode_id_appt] Script Date: 03/06/2013 14:46:52 ******/
CREATE NONCLUSTERED INDEX [idx_episode_id_appt] ON [dbo].[test_HDM_RTT]
(
[episode_id_appt] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
USE [PIP]
/****** Object: Index [idx_episode_id_ref] Script Date: 03/06/2013 14:46:52 ******/
CREATE NONCLUSTERED INDEX [idx_episode_id_ref] ON [dbo].[test_HDM_RTT]
(
[episode_id_ref] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
USE [PIP]
/****** Object: Index [idx_episode_id_wl] Script Date: 03/06/2013 14:46:52 ******/
CREATE NONCLUSTERED INDEX [idx_episode_id_wl] ON [dbo].[test_HDM_RTT]
(
[episode_id_wl] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
USE [PIP]
/****** Object: Index [patient_pas_no] Script Date: 03/06/2013 14:46:52 ******/
CREATE NONCLUSTERED INDEX [patient_pas_no] ON [dbo].[test_HDM_RTT]
(
[patient_pas_no] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
and
USE [PHD]
GO
/****** Object: Table [migration].[PatScope] Script Date: 03/06/2013 14:47:57 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [migration].[PatScope](
[RID] [varchar](7) NOT NULL,
[Number] [varchar](17) NOT NULL,
[TrustNumber] [varchar](10) NULL,
[NumberType] [nvarchar](10) NULL,
[legacy_number] [varchar](10) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
USE [PHD]
/****** Object: Index [TrustNoClustered] Script Date: 03/06/2013 14:47:57 ******/
CREATE CLUSTERED INDEX [TrustNoClustered] ON [migration].[PatScope]
(
[TrustNumber] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
USE [PHD]
/****** Object: Index [TrustNo] Script Date: 03/06/2013 14:47:57 ******/
CREATE NONCLUSTERED INDEX [TrustNo] ON [migration].[PatScope]
(
[TrustNumber] ASC,
[Number] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
USE [PHD]
/****** Object: Index [TrustNumber_legacy_lookup] Script Date: 03/06/2013 14:47:57 ******/
CREATE UNIQUE NONCLUSTERED INDEX [TrustNumber_legacy_lookup] ON [migration].[PatScope]
(
[TrustNumber] ASC,
[legacy_number] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
personally, I would only update if the value is not the same as existing. That should speed up the query
update test_HDM_RTT
set patient_district_no = b.legacy_number
from test_HDM_RTT a
inner join PHD.migration.PatScope b
on a.patient_pas_no = b.TrustNumber
where a.patient_district_no <> b.legacy_number
I would also check out the EXPLAIN results (ctrl + l) your query may be using the wrong index.
Show Execution Plan & see if the plan uses indexes. If it doesn't, you can force it to use a particular index:
SET ANSI_NULLS OFF
GO
update test_HDM_RTT
set patient_district_no = b.legacy_number
from test_HDM_RTT a
inner join PHD.migration.PatScope b WITH (INDEX(TrustNumber_legacy_lookup))
on a.patient_pas_no = b.TrustNumber
where a.patient_district_no <> b.legacy_number
You'll want to see if you need to force an index on test_HDM_RTT instead because it looks like it's already going to do an index scan of TrustNumber_legacy_lookup to get its data.