Mssql/Perl DBI performance woes

Mssql/Perl DBI performance woes - sql-server

Im writing an application that does a lot of large inserts to a remote mssql server with Perl DBI. Not sure if it will be 05 or 08 sql server yet, but I've timed it in both so far and performance is similar. Basically, there are a large number of rows that need to be inserted are turning out to be the bottleneck by far. I have tried multi-row inserts in 08 ( and the sub-select UNION ALL trick in 05), changing whether the inserts are fired off during a previous fetch or after, using execut_array() on single-row inserts, all these with/without binding params.
Psuedocode:
select data query
while fetchrow {
do lots of calculations
construct insert
1) do inserts here
}
2) or do inserts here
The activity monitor on the sql server averages the multi-row inserts at 70ms a piece. The queries themselves have been limited to 58ish rows a piece because there are 36 fields on the insert, and it easily hits the 2100 parameter limit.
Is there anything obvious I am overlooking? Any other method I could try to improve the times? Ignoring issues like latency or hardware, I feel like there has to be another improvement to my Perl workflow or the queries themselves. (I was looking into sql server bcp, bulk inserts, etc).
Thanks for any advice

execute_array will not make much difference because it is not implemented in DBD::ODBC (because I've never got around to it). As such it is really DBI just calling execute repeatedly for you.
I don't understand the 58ish rows, 36 parameters and hitting 2100 parameter limit although I can see 58 * 36 is pretty much 2100. If your insert needs 36 parameters just prepare it before the select and execute it in the while loop - that should be faster since the SQL does not need to be passed/parsed each insert.
Apart from that, it you can disable AutoCommit and commit at the end or regularly but less than once per insert it should be a lot faster. You may also have indexes/triggers etc on the table which can slow inserts down so you could look at disabling them until afterwards.

I guess Perl/DBI is a requirement (because this isn't running on Windows perhaps)?
Any possibility of using BCP (you would write an intermediate text file) or SSIS (you might be able to do everything in SSIS) or even ADO.NET SqlBulkCopy (in some kind of .NET environment)? All these are specifically designed for large loads.

Related

Slow query operations on Azure database

I have 1.2 million rows in Azure data table. The following command:
DELETE FROM _PPL_DETAIL WHERE RunId <> 229
is painfully slow.
There is an index on RunId.
I am deleting most of the data.
229 is a small number of records.
It has been running for an hour now
Should it take this long?
I am pretty sure it will finish.
Is there anything I can do to make operations like this faster?
The database does have a PK, although it is a dummy PK (not used). I already saw that as an optimization need to help this problem, but it still takes way too long (SQL Server treats a table without a PK differently -- much less efficient). It is still taking 1+ hour.

How about trying something like below
BEGIN TRAN
SELECT * INTO #T FROM _PPL_DETAIL WHERE RunId = 229
TRUNCATE TABLE _PPL_DETAIL
INSERT INTO _PPL_DETAIL
SELECT * FROM #T
COMMIT TRAN

Without knowing what database tier is using the database where that statment runs it is not easy to help you. However, let us tell you how the system works so that you can make this determination with a bit more investigation by yourself.
Currently the log commit rate is limited by the tier the database has. Deletes are fundamentally limited on the ability to write out log records (and replicate them to multiple machines in case your main machine dies). When you select records, you don't have to go over the network to N machines and you may not even need to go to the local disk if the records are preserved in memory, so selects are generally expected to be faster than inserts/updates/deletes because of the need to harden log for you. You can read about the specific limits for different reservation sizes are here: DTU Limits and vCore Limits.
One common problem is to do individual operations in a loop (like a cursor or driven from the client). This implies that each statement has a single row updated and thus has to harden each log record serially because the app has to wait for the statement to return before submitting the next statement. You are not hitting that since you are running a big delete as a single statement. That could be slow for other reasons such as:
Locking - if you have other users doing operations on the table, it could block the progress of the delete statement. You can potentially see this by looking at sys.dm_exec_requests to see if your statement is blocking on other locks.
Query Plan choice. If you have to scan a lot of rows to delete a small fraction, you could be blocked on the IO to find them. Looking at the query plan shape will help here, as will set statistics time on (We suggest you change the query to do TOP 100 or similar to get a sense of whether you are doing lots of logical read IOs vs. actual logical writes). This could imply that your on-disk layout is suboptimal for this problem. The general solutions would be to either pick a better indexing strategy or to use partitioning to help you quickly drop groups of rows instead of having to delete all the rows explicitly.
An additional strategy to have better performance with deletes is to perform batching.

As I know SQL Server had a change and the default DOP is 1 on their servers, so if you run the query with OPTION(MAXDOP 0) could help.
Try this:
DELETE FROM _PPL_DETAIL
WHERE RunId <> 229
OPTION (MAXDOP 0);

TSQL : Is replace better than ltrim/rtrim

Is replace is better than ltrim/rtrim.
I have no spaces between the words, because I am running it on key column.
update [db14].[dbo].[S_item_60M]
set [item_id]=ltrim(rtrim([item_id]))
Item_id having non-clustered index
Shall I disable index for better performance?
Windows 7, 24GB RAM , SQL Server 2014
This query was running for 20 hours and then I canceled it. I am thinking to run Replace instead of ltrim/rtrim for performance reasons.
SSMS studio crashed.
Now I can see it running in Activity Monitor
Error Log says FlushCache: cleaned up 66725 bufs with 25872 writes in 249039 ms (avoided 11933 new dirty bufs) for db 7:0
Please guide and suggest me.

The throughput of bulk updates does not depend on a single call per row to ltrim or rtrim. You arbitrarily pick some highly visible element of your query and consider it responsible for bad performance. Look at the query plan to see what's being done physically. Also, make yourself familiar with bulk update techniques (such as dropping and recreating indexes).
Note, that contrary to popular belief a bulk update with all rows in one statement is usually the fastest option. This strategy can cause blocking and high log usage. But is usually has the best throughput because the optimizer can optimize all the DML that you are executing in one plan. If splitting DML into chunks was almost always a good idea SQL Server would just do it automatically as part of the plan.

I don't think REPLACE versus LTRIM/TRIM is the long pole in the tent performance wise. Do you have concurrent activity against the table during the update? I suggest you perform this operation during a maintenance window to avoid blocking with other queries.
If a lot of rows will be updated (more than 10% or so) I suggest you drop (or disable) the non-clustered index on item_id column, perform the update, and then create (or enable) the index afterward. Specify the TABLOCKX locking hint.

If there are some rows which already have no spaces, exclude them from the UPDATE by using a WHERE clause such as CHARINDEX(' ',item_id)<>0. But the most important advice (already posted above by gvee) is to do the UPDATE in batches (if you have a key which you can use for paging). Another aproach (possibly better if you have enough space) would be to use an operation that can be minimally logged (in the bulk-logged or simple recovery model): use a SELECT INTO another table and then rename that table.

Bulk Copy from small table to larger one in SQL Server 2005

I'm a newbie in SQL Server and have the following dilemma:
I have two tables with the same structure. Call it runningTbl and finalTbl.
runningTbl contains about 600 000 to 1 million rows every 15 minutes.
After doing some data cleanup in runningTbl I want to move all the records to finalTbl.
finalTbl currently has about 38 million rows.
The above process needs to be repeated every 15-20 minutes.
The problem is that the moving of data from runningTbl to finalTbl is taking way longer than 20 minutes at times..
Initially when the tables were small it took anything from 10 seconds to 2 minutes to copy.
Now it just takes too long.
Any one that can assist with this? SQL query to follow..
Thanks

There are a number of things that you will need to do in order to get the most efficient method of copying the data. So far you are on the right track but you have a long way to go. I would suggest you first look at your indexes. There may be optimizations there that can help. Next, make sure you don't have triggers on this table that could cause a slowdown. Next, change the logging level (if that is permutable).
There is a bunch more help here (from Microsoft):
http://msdn.microsoft.com/en-us/library/ms190421(v=SQL.90).aspx
Basically you are on the right track using BCP. This is actually Microsoft's recommendation:
To bulk-copy data from one instance of SQL Server to another, use bcp to export the table data into a data file. Then use one of the bulk import methods to import the data from the file to a table. Perform both the bulk export and bulk import operations using either native or Unicode native format.
When you do this though, you need to also consider the possibility of dropping your indexes if there is too much data being brought in (based upon the type of index you use). If you use a clustered index, it may also be a good idea to order your data before import. Here is more information (including the source of the above quote):
http://msdn.microsoft.com/en-US/library/ms177445(v=SQL.90).aspx

For starters : one of the things I've learned over the years is that MSSQL does a great job at optimizing all kinds of operations but to do so heavily relies on the statistics for all tables involved. Hence, I would suggest to run "UPDATE STATISTICS processed_logs" & "UPDATE STATISTICS unprocessed_logs" before running the actual inserts; even on a large table these things don't take all that long.
Apart from that, based on the query above, a lot depends on the indexes of the target table. I'm assuming the target table has its clustered index (or PRIMARY KEY) on (at least) UnixTime, if not you'll create major data-fragmentation when you squeeze more and more data in-between the already existing records. To work around this you could try defragmenting the target table once in a while (can be done online, but takes a long time), but making the clustered index (or PK) so that data is always appended to the end of the table would be the better approach; well, at least in my opinion.

I suggest that you should have a window service and use timer and a boolean variable. Once your request is sent to server set the bool to high bit and the timer event should not execute code until the bit is low.

How to do very fast inserts to SQL Server 2008

I have a project that involves recording data from a device directly into a sql table.
I do very little processing in code before writing to sql server (2008 express by the way)
typically i use the sqlhelper class's ExecuteNonQuery method and pass in a stored proc name and list of parameters that the SP expects.
This is very convenient, but i need a much faster way of doing this.
Thanks.

ExecuteNonQuery with an INSERT statement, or even a stored procedure, will get you into thousands of inserts per second range on Express. 4000-5000/sec are easily achievable, I know this for a fact.
What usually slows down individual updates is the wait time for log flush and you need to account for that. The easiest solution is to simply batch commit. Eg. commit every 1000 inserts, or every second. This will fill up the log pages and will amortize the cost of log flush wait over all the inserts in a transaction.
With batch commits you'll probably bottleneck on disk log write performance, which there is nothing you can do about it short of changing the hardware (going raid 0 stripe on log).
If you hit earlier bottlenecks (unlikely) then you can look into batching statements, ie. send one single T-SQL batch with multiple inserts on it. But this seldom pays off.
Of course, you'll need to reduce the size of your writes to a minimum, meaning reduce the width of your table to the minimally needed columns, eliminate non-clustered indexes, eliminate unneeded constraints. If possible, use a Heap instead of a clustered index, since Heap inserts are significantly faster than clustered index ones.
There is little need to use the fast insert interface (ie. SqlBulkCopy). Using ordinary INSERTS and ExecuteNoQuery on batch commits you'll exhaust the drive sequential write throughput much faster than the need to deploy bulk insert. Bulk insert is needed on fast SAN connected machines, and you mention Express so it's probably not the case. There is a perception of the contrary out there, but is simply because people don't realize that bulk insert gives them batch commit, and its the batch commit that speeds thinks up, not the bulk insert.
As with any performance test, make sure you eliminate randomness, and preallocate the database and the log, you don't want to hit db or log growth event during test measurements or during production, that is sooo amateurish.

bulk insert would be the fastest since it is minimally logged
.NET also has the SqlBulkCopy Class

Here is a good way to insert a lot of records using table variables...
...but best to limit it to 1000 records at a time because table variables are "in Memory"
In this example I will insert 2 records into a table with 3 fields -
CustID, Firstname, Lastname
--first create an In-Memory table variable with same structure
--you could also use a temporary table, but it would be slower
declare #MyTblVar table (CustID int, FName nvarchar(50), LName nvarchar(50))
insert into #MyTblVar values (100,'Joe','Bloggs')
insert into #MyTblVar values (101,'Mary','Smith')
Insert into MyCustomerTable
Select * from #MyTblVar

This is typically done by way of a BULK INSERT. Basically, you prepare a file and then issue the BULK INSERT statement and SQL Server copies all the data from the file to the table with the fast method possible.
It does have some restrictions (for example, there's no way to do "update or insert" type of behaviour if you have possibly-existing rows to update), but if you can get around those, then you're unlikely to find anything much faster.

If you mean from .NET then use SqlBulkCopy

Things that can slow inserts include indexes and reads or updates (locks) on the same table. You can speed up situations like yours by avoiding both and inserting individual transactions to a separate holding table with no indexes or other activity. Then batch the holding table to the main table a little less frequently.

It can only really go as fast as your SP will run. Ensure that the table(s) are properly indexed and if you have a clustered index, ensure that it has a narrow, unique, increasing key. Ensure that the remaining indexes and constraints (if any) do not have a lot of overhead.
You shouldn't see much overhead in the ADO.NET layer (I wouldn't necessarily use any other .NET library above SQLCommand). You may be able to use ADO.NET Async methods in order to queue several calls to the stored proc without blocking a single thread in your application (this potentially could free up more throughput than anything else - just like having multiple machines inserting into the database).
Other than that, you really need to tell us more about your requirements.

Intriguing SQL Server performance-tuning problem

I have been working on a stored procedure performance problem for over a week now and is related to my other post on Stackoverflow here. Let me give you some background information.
We have a nightly process which runs and is started by a stored procedure which calls many many many other stored procedures. Lots of the called stored procedures call others, etc. I have looked at some of the called procs and there is all sorts of frightnening complicated stuff in there such as XML string processing, unnecessary over-use of cursors, NOLOCK hints over-used, rare use of set-based processing, etc - the list goes on, it's quite horrendous.
This nightly process in our production environment takes on average 1:15 to run. It sometimes takes 2 hours to run which is unacceptable. I have created a test environment on identical hardware to production and run the proc. It took 45 minutes the first time I ran it. If I restore the database to the exact same point and run it again, it takes longer: indeed, if I repeat this action several times (restoring and re-running), the proc takes progressively longer until it plateaus at around 2 hours. This really puzzles me because I restore the database to the exact same point every time. There are no other user databases on the server.
I thought of two lines of investigation to pursue:
Query plans and parameter spoofing
Tempdb
As a test, I restarted SQL Server to clear out both the cache and tempdb and re-ran the proc with the same database restore. The proc took 45 minutes. I repeated this several times to ensure that it was repeatable - again it took 45 minutes each time. I then embarked on several tests to try and isolate the puzzling increase in run times when SQL Server does not get restarted:
Run the initial stored procedure WITH RECOMPILE
Before running the procedure, executre DBCC FREEPROCCACHE to clear out the procedure cache
Before running the procedure, execute CHECKPOINT followed by DBCC DROPCLEANBUFFERS to ensure that the cache was empty and clean
Executed the following script to ensure all stored procedures were marked for recompilation:
DECLARE #proc_schema SYSNAME
DECLARE #proc_name SYSNAME
DECLARE prcCsr CURSOR local
FOR SELECT specific_schema,
specific_name
FROM INFORMATION_SCHEMA.routines
WHERE routine_type = 'PROCEDURE'
OPEN prcCsr
FETCH NEXT FROM prcCsr INTO #proc_schema, #proc_name
DECLARE #stmt NVARCHAR(MAX)
WHILE ##FETCH_STATUS = 0
BEGIN
SET #stmt = N'exec sp_recompile ''[' + #proc_schema + '].['
+ #proc_name + ']'''
-- PRINT #stmt -- DEBUG
EXEC ( #stmt
)
FETCH NEXT FROM prcCsr INTO #proc_schema, #proc_name
END
In all the above tests, the procedure takes longer and longer to run with the same database restore. I am really at a loss now as to what to try. Looking into the code at this point is an option but realistically its going to take 3-6 months to get that optimised as there is lots of room for improvement there. What I am really interested in getting to the bottom of, is why does the proc execution time get longer each time when a database restore has been performed even when the procedure and buffer caches have been cleaned?
I did also investigate tempdb, and try and clear out old tables in there as described in my other stackoverflow post, but I am unable to manually clear out temp tables that were created from table variables manually and they don't seem to want to disappear on their own (even after leaving them for 24 hours).
Any insight or suggestions for further testing would be greatly appreciated. I am running SQL Server 2005 SP3 64-bit Enterprise edition on a Windows 2003 R2 Ent. edition cluster.
Regards,
Mark.

One thing that could cause this is if the process is leaking XML documents. That would cause SQL Server to use more memory, and parts of that might be written to a page file on disk, causing the process to slow down.
Code that creates an XML document looks like:
EXEC sp_xml_preparedocument #idoc OUTPUT, #strXML
It leaks if there is no corresponding:
EXEC sp_xml_removedocument #idoc
XML documents are COM objects stored outside the configured SQL Server memory. Even if you set SQL Server to use max 5 GB, leaking XML documents grows memory usage beyond that.

Reviewing all posts to-date and your related question, it certainly sounds like your strongest lead is the mystery behind those tempdb objects. Some leading questions:
After a fresh start, after the process is run how many objects are in tempdb? Is it the same number after every fresh start?
Do the numbers grow after “successive” runs? Do they grow at the same rate?
Can you determine if they occupy space?
For that matter, your tempdb files grow with each successive run of your process?
I followed the links, but didn’t find any reference discussion the actual problem. You might want to raise the issue on the Microsoft SQL Technet forums here -- they can be pretty good with the abstract stuff. (If all else fails, you can open a case with MS technical support. It might take days, but odds are very good that they will figure things out. And if it is an MS bug, they refund your money!)
You've said that rewriting the code is not an option. However, if temp table abuse is a factor, identifying and refactoring those parts of the code first might help a lot. To find which those may be, run SQL Profiler while your process executes. This kind of work is, alas, subjective and highly iterative (meaning you hardly ever get just the right set of counters on the first pass). Some thoughts:
Start with tracking SP:Started, to track which stored proedures are being called.
SQL Profiler can be used to group data; it’s awkward and I’m not sure how to describe it in mere text, but configured properly you’ll get a Profiler display showing the number of times each procedures was. Ideally, this would show the most frequenly called procs, and you can analyze them for temp table abuse and refactor as necessary.
If nothing jumps out there, you can trace SP:StmtStarting and do the same thing for individual statements. The problem here is that in a 2+/- hour spaghetti-code run, you might run out of disk space, and analyzing 100s of MB of trace data can be a nightmare. (Hint: load it in a table, build indexes, then carefully delete out the cruft.) Again, the goal would be to identify overly used/abused temp table code to be refactored.

Mark-
So it might take 3-6 months to totally re-write this procedure, but that doesn't mean you can't do some relatively quick performance optimization.
Some of the routines I have to support run 30hrs+, I would be ecstatic to get them to run in 2hrs!! The kind of optimization that you do on these routines is a little different than your normal OLTP database:
Capture a trace of the entire process, making sure to capture SP:StmtCompleted and SQL:StmtCompleted events. Make sure to put a filter on Duration (>10ms or something) to eliminate all the quick, unimportant statements.
Pull this trace into a table, and do some filtering/sorting/grouping, focusing on Duration and Reads. You will likely end up with one of two situations:
(A) A handful of individual queries/statements are responsible for the bulk of the time of the procedure (good news)
(B) A whole lot of similar statements each take a short amount of time, but together they add up to a long time.
In scenario (A), just focus your attention on these queries. Optimize them using indexes, or using other standard techniques. I highly recommend Dan Tow's book "SQL Tuning" for a powerful technique to optimize queries, especially messy ones with complicated joins.
In scenario (B), step back a bit and look at the set of statements as a whole. Are they all similar in some way? Can you add an index on a key, common table that will improve them all? Can you eliminate a loop that executes 10,000 dynamic queries, and instead do a single set-based query?
Still two other possibilities, I suppose:
(C) 15,000 totally different dynamic SQL statements, each requiring its own painstaking optimization. In this case, try to focus on server-level optimizations, such as I/O based improvements that will benefit them all.
(D) Something else weird going on with TempDB or something mis-configured on the server. Not much else I can say here, other than find the problem, and fix it!
Hope this helps.

Can you try the following scenario on the test server:
Make two copies of the database on the server: [A] and [B]. [A] is the database in question, [B] is the copy.
Restart server
Run your process
Drop the database [A]
Rename [B] to [A]
Run your process
This would be like a hot database swap. If the second run takes longer, something on the server level is happening (tempdb, memory, I/O, etc). If the second run takes about the same time, then the problem is on the database level (locks, index fragmentation, etc).
Good luck!

Run the following script at start of test and then after each iteration:
select sum(single_pages_kb) as sum_bp_kb
, sum(multi_pages_kb) as sum_va_kb
, type
from sys.dm_os_memory_clerks
group by type
having sum(single_pages_kb+multi_pages_kb) > 16
order by sum(single_pages_kb+multi_pages_kb) desc
select sum(total_pages), type_desc
from tempdb.sys.allocation_units
group by type_desc;
select * from sys.dm_os_performance_counters
where counter_name in (
'Log Truncations'
,'Log Growths'
,'Log Shrinks'
,'Data File(s) Size (KB)'
,'Log File(s) Size (KB)'
,'Active Temp Tables');
If the results are not self-evident, you can post them somewhere and place a link here, I can look into them and see if something strikes as odd.

What does the overall process do, what is the purpose of the operation being performed?
I would assume that executing the process results in data modification within the database. Is this the case?
If this is the case, then each time you run the process, the data begin considered is different and so different execution plan production is a possibility and so too are differing execution times.
Assuming that modification to the database data is occuring then you should also investigate:
Updating relevant database statistics
between each process run.
Reviewing the level of index
fragmentation between each process
run and determine if defragmentation could prove benificial.

Apparently you want to try anything except what you really have to do which is fix the process. Start by getting rid of the cursors. If it takes two hours right now, without the cursors I'll bet you can get it down to less than ten minutes.

I would log information into a log_table and the time it took to run each steps...that will help you narrow down the issue and also help you progressively improve the process by tackling it one at time (from improving procs that take the longest).
Best way is to simply insert at the beginning and the end of each proc.

Cursors are not peformance boosters, others address that. (not your decision)
Look into the temp tables use/management. Are they global temp tables or session/local temp tables? The fact that they are hanging around looks interesting. The tempdb is locked when temp tables are created which might be part of the issue.
Local temp tables (#mytable syntax) should go away when the session goes out of scope, but you SHOULD have dropped these (release early) to free up resources.
Use of local temp tables in transaction then cancel without COMMIT/ROLLBACK can increase locking in tempdb causing performance issues.
Speaking of transactions - this will cause locks on syscolumns, sysindexes etc. if temp tables are created in transactions - thus other exeuctions are blocked from using the same query.
Use of temp tables created by calling procedures in the called procedures points to logic need - rethink and try to use relational structures instead.
IF you need temp tables (to eliminate cursors :) then avoid SELECT INTO - to avoid system objects locks.
Use of global temp tables (##myglobaltable syntax) should be avoided as multiple sessions accessing can be and issue (the table hangs around until all sessions clear), and for me at least, makes no additive logical value proposition (look into the use of a permanent table instead). Question if global, are there blocking procedures?
Are there a lot of sparse temp tables (grow with large data, but have smaller data sets in them?)
Microsoft SQL Server Book Online,
“Consider using table variables instead of temporary tables. Temporary tables are useful in cases when indexes need to be created explicitly on them, or when the table values need to be visible across multiple stored procedures or functions. In general, table variables contribute to more efficient query processing.”
Of course if the temp table needs indexes, tabel variables are not an option.

I don't have the answer but some ideas of what I would do to isolate issues like this.
First, I would take snapshots of sys.dm_os_wait_stats before and after each execution. You subtract the 2 snapshots (get a deltas) and see if any particular WAIT is prominent or gets worse with each run. An easy way to calculate deltas is to copy the sys.dm_os_wait_stats values into Excel worksheets and use VLOOKUP() to subtract corresponding values. I've used this investigation technique hundreds of times. You don't know what aspect SQL Server is hung up on?! Let SQL Server "tell" you via sys.dm_os_wait_stats !
The other thing I might try is to adjust the behavior of the loop to understand if the subsequent slower executions exhibit constant throughput for all records from beginning to end or does it only slow down for particular sproc(s) in INFORMATION_SCHEMA.routines ... 2 techniques for exploring this is:
1) Add a "top N" clause the SQL SELECT such as "top 100" or "top 1000" (create an artificial limit) to see if you get subsequent slowdowns for all record count scenarios ... or ... do you only get the slowdowns when the cursor resultset is large enough to include the offending sproc.
2) Instead of adding "top N", you can add more print statements (instrumentation) to calculate the throughput as it is processing.
Of course, you can do combination of both.
Maybe these diagnostics will get you closer to the root cause.
Edited to add: Btw, SQL2008 has a new performance monitor that makes it easy to "eyeball" the numbers of sys.dm_os_wait_stats. However for SQL2005, you'll have to manually calculate the deltas via Excel or a script.

These are long shots:
Quickly look through all of the
stored procedures for things that are
unusual and SQL Server should not
really be doing, for example sending
email or writing files, etc. SQL trying to send email to a non-exist email server could cause delays.
The other thing to keep in mind is
that as you restore the database
before each test possibly your disk
is getting more fragmented (not
really sure about this though). So
that may explain why run times get longer each time until they plateau.

Firstly, thanks to everyone for some really great help. I much appreciate your time and expertise in helping me to solve this very strange issue. I have an update.
I started a server-side trace to try and isolate the stored procs that were running slower between iterations. What I found surprised me. 96 stored procedures are involved in the process. Most of these stored procedures ran slower the second time around - about 50 of them. The rest were very quick to run and didn't influence the overall time at all, and in fact some of these ran a little quicker (as would be expected).
I failed over the database instance to another node in my cluster and ran the tests there with the exact same results - so I can rule out any OS differences between cluster nodes - when building the clusters I was very conscious to build them identically.
1100 temp tables get created during the process and persist after it has finished - these are all table variables and I found a way to remove them. Running sp_recompile on every proc and function in the database caused all the temp tables to get cleared up. However this did not improve the run times at all. The only thing that helps the run times is a restart of the SQL Server service. Unfortunately I am out of time now to investigate this further - I have other work to do, but would like to persist with it. Perhaps I will come back to it later if I get a spare few hours. In the meantime however, I have to admit defeat with no solution and no bounty to give.
Thanks again everyone.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight