Related
I'm working on a huge SQL code and unfortunately it has a CURSOR which handles another two nested CURSORS within it (totally three cursors inside a stored procedure), which handles millions of data to be DELETE,UPDATE and INSERT. This takes a whole lot of time because of row by row execution and I wish to modify this in to SET based approach
From many articles it shows use of CURSORs is not recommend and the alternate is to use WHILE loops instead, So I tried and replaced the three CUROSRs with three WHILE loops nothing more, though I get the same result but there is no improvement in performance, it took the same time as it took for CUROSRs.
Below is the basic structure of the code I'm working on (i Will try to put as simple as possible) and I will put the comments what they are supposed to do.
declare #projects table (
ProjectID INT,
fieldA int,
fieldB int,
fieldC int,
fieldD int)
INSERT INTO #projects
SELECT ProjectID,fieldA,fieldB,fieldC, fieldD
FROM ProjectTable
DECLARE projects1 CURSOR LOCAL FOR /*First cursor - fetch the cursor from ProjectaTable*/
Select ProjectID FROM #projects
OPEN projects1
FETCH NEXT FROM projects1 INTO #ProjectID
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN TRY
BEGIN TRAN
DELETE FROM T_PROJECTGROUPSDATA td
WHERE td.ID = #ProjectID
DECLARE datasets CURSOR FOR /*Second cursor - this will get the 'collectionDate'field from datasetsTable for every project fetched in above cursor*/
Select DataID, GroupID, CollectionDate
FROM datasetsTable
WHERE datasetsTable.projectID = #ProjectID /*lets say this will fetch ten records for a single projectID*/
OPEN datasets
FETCH NEXT FROM datasets INTO #DataID, #GroupID, #CollectionDate
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE period CURSOR FOR /*Third Cursor - this will process the records from another table called period with above fetched #collectionDate*/
SELECT ID, dbo.fn_GetEndOfPeriod(ID)
FROM T_PERIODS
WHERE DATEDIFF(dd,#CollectionDate,dbo.fn_GetEndOfPeriod(ID)) >= 0 /*lets say this will fetch 20 records for above fetched single #CollectionDate*/
ORDER BY [YEAR],[Quarter]
OPEN period
FETCH NEXT FROM period INTO #PeriodID, #EndDate
WHILE ##FETCH_STATUS = 0
BEGIN
IF EXISTS (some conditions No - 1 )
BEGIN
BREAK
END
IF EXISTS (some conditions No - 2 )
BEGIN
FETCH NEXT FROM period INTO #PeriodID, #EndDate
CONTINUE
END
/*get the appropirate ID from T_uploads table for the current projectID and periodID fetched*/
SET #UploadID = (SELECT ID FROM T_UPLOADS u WHERE u.project_savix_ID = #ProjectID AND u.PERIOD_ID = #PeriodID AND u.STATUS = 3)
/*Update some fields in T_uploads table for the current projectID and periodID fetched*/
UPDATE T_uploads
SET fieldA = mp.fieldA, fieldB = mp.fieldB
FROM #projects mp
WHERE T_UPLOADS.ID = #UploadID AND mp.ProjectID = #ProjectID
/*Insert some records in T_PROJECTGROUPSDATA table for the current projectID and periodID fetched*/
INSERT INTO T_PROJECTGROUPSDATA tpd ( fieldA,fieldB,fieldC,fieldD,uploadID)
SELECT fieldA,fieldB,fieldC,fieldD,#UploadID
FROM #projects
WHERE tpd.DataID = #DataID
FETCH NEXT FROM period INTO #PeriodID, #EndDate
END
CLOSE period
DEALLOCATE period
FETCH NEXT FROM datasets INTO #DataID, #GroupID, #CollectionDate, #Status, #Createdate
END
CLOSE datasets
DEALLOCATE datasets
COMMIT
END TRY
BEGIN CATCH
Error handling
IF ##TRANCOUNT > 0
ROLLBACK
END CATCH
FETCH NEXT FROM projects1 INTO #ProjectID, #FAID
END
CLOSE projects1
DEALLOCATE projects1
SELECT 1 as success
I request you to suggest any methods to rewrite this code to follow the SET based approach.
Until the table structure and expected result sample data is not provided, here are a few quick things I see that can be improved (some of those are already mentioned by others above):
WHILE Loop is also a cursor. So, changing into to while loop is not
going make things any faster.
Use LOCAL FAST_FORWARD cursor unless you have need to back track a record. This would make the execution much faster.
Yes, I agree that having a SET based approach would be the fastest in most cases, however if you must store intermediate resultset somewhere, I would suggest using a temp table instead of a table variable. Temp table is 'lesser evil' between these 2 options. Here are a few reason why you should try to avoid using a table variable:
Since SQL Server would not have any prior statistics on the table variable during building on Execution Plan, it will always consider that only one record would be returned by the table variable during construction of the execution plan. And accordingly Storage Engine would assign only as much RAM memory for execution of the query. But in reality, there could be millions of records that the table variable might hold during execution. If that happens, SQL Server would be forced spill the data to hard disk during execution (and you will see lots of PAGEIOLATCH in sys.dm_os_wait_stats) making the queries way slower.
One way to get rid of the above issue would be by providing statement level hint OPTION (RECOMPILE) at the end of each query where a table value is used. This would force SQL Server to construct the Execution Plan of those queries each time during runtime and the less memory allocation issue can be avoided. However the downside of this is: SQL Server will no longer be able to take advantage of an already cached execution plan for that stored procedure, and would require recompilation every time, which would deteriorate the performance by some extent. So, unless you know that data in the underlying table changes frequently or the stored procedure itself is not frequently executed, this approach is not recommended by Microsoft MVPs.
Replacing Cursor with While blindly, is not a recommended option, hence it would not impact your performance and might even have negative impact on the performance.
When you define the cursor using Declare C Cursor in fact you are going to create a SCROLL cursor which specifies that all fetch options (FIRST, LAST, PRIOR, NEXT, RELATIVE, ABSOLUTE) are available.
When you need just Fetch Next as scroll option, you can declare the cursor as FAST_FORWARD
Here is the quote about FAST_FORWARD cursor in Microsoft docs:
Specifies that the cursor can only move forward and be scrolled from
the first to the last row. FETCH NEXT is the only supported fetch
option. All insert, update, and delete statements made by the current
user (or committed by other users) that affect rows in the result set
are visible as the rows are fetched. Because the cursor cannot be
scrolled backward, however, changes made to rows in the database after
the row was fetched are not visible through the cursor. Forward-only
cursors are dynamic by default, meaning that all changes are detected
as the current row is processed. This provides faster cursor opening
and enables the result set to display updates made to the underlying
tables. While forward-only cursors do not support backward scrolling,
applications can return to the beginning of the result set by closing
and reopening the cursor.
So you can declare your cursors using DECLARE <CURSOR NAME> FAST_FORWARD FOR ... and you will get noticeable improvements
I think all the cursors code above can be simplified to something like this:
DROP TABLE IF EXISTS #Source;
SELECT DISTINCT p.ProjectID,p.fieldA,p.fieldB,p.fieldC,p.fieldD,u.ID AS [UploadID]
INTO #Source
FROM ProjectTable p
INNER JOIN DatasetsTable d ON d.ProjectID = p.ProjectID
INNER JOIN T_PERIODS s ON DATEDIFF(DAY,d.CollectionDate,dbo.fn_GetEndOfPeriod(s.ID)) >= 0
INNER JOIN T_UPLOADS u ON u.roject_savix_ID = p.ProjectID AND u.PERIOD_ID = s.ID AND u.STATUS = 3
WHERE NOT EXISTS (some conditions No - 1)
AND NOT EXISTS (some conditions No - 2)
;
UPDATE u SET u.fieldA = s.fieldA, u.fieldB = s.fieldB
FROM T_UPLOADS u
INNER JOIN #Source s ON s.UploadID = u.ID
;
INSERT INTO T_PROJECTGROUPSDATA (fieldA,fieldB,fieldC,fieldD,uploadID)
SELECT DISTINCT s.fieldA,s.fieldB,s.fieldC,s.fieldD,s.UploadID
FROM #Source s
;
DROP TABLE IF EXISTS #Source;
Also it would be nice to know "some conditions No" details as query can differ depends on that.
Backround
This question is a follow-up to a previous question. To give you context here as well, I would like to summarize the previous question: in my previous question I intended to have a methodology to execute selections without sending their result to the client. The goal was to measure performance without eating up a lot of resources by sending millions of data. I am only interested in the time needed to execute those queries and not in the time they will send the results to the client app, since I intend to optimize queries, so the results of the queries will not change at all, but the methodology will change and I intend to be able to compare the methodologies.
Current knowledge
In my other question several ideas were presented. An idea was to select the count of the records and put it into a variable. However, that changed the query plan significantly and the results were not accurate in terms of performance. The idea of using a temporary table was presented as well, but creating a temporary table and inserting into it is difficult if we do not know what query will be our input to measure and also introduces a lot of white noise, so, even though the idea was creative, it was not ideal for my problem. Finally Vladimir Baranov came with an idea to create as many variables as many columns the selection will return. This was a great idea, but I refined it further, by creating a single variable of nvarchar(max) and selecting all my columns into it. The idea works great, except for a few problems. I have the solution for most of the problems, but I would like to share them, so, I will describe them regardless, but do not misunderstand me, I have a single question.
Problem1
If I have a #container variable and I do a #container = columnname inside each selection, then I will have conversion problems.
Solution1
Instead of just doing a #container = columnname, I need to do a #container = cast(columnname as nvarchar(max))
Problem2
I will need to convert <whatever> as something into #container = cast(<whatever> as nvarchar(max)) for each columns in the selection, but not for subselections and I will need to have a general solution handling case when and parantheses, I do not want to have any instances of #container = anywhere, except to the left of the main selection.
Solution2
Since I am clueless about regular expressions, I can solve this by iterating the query string until I find the from of the main query and each time I find a parantheses, I will do nothing until that parantheses is closed, find the indexes where #container = should be put and as [customname] should be taken out and from right to left do all that in the query string. This will be a long and unelegant code.
Question
Is it possible to make sure that all my main columns but nothing else start with #container = and ends without as [Customname]?
This is much too long for a comment but I'd like to add my $.02 to the other answers and share the scripts I used to test the suggested methods.
I like #MartinSmith's TOP 0 solution but am concerned that it could result in a different execution plan shape in some cases. I didn't see this in the tests I ran but I think you'll need to verify the plan is about the same as the unmolested query for each query you test. However, the results of my tests suggest the number of columns and/or data types might skew performance with this method.
The SQLCLR method in #VladimirBaranov's answer should provide the exact plan as the app code generates (assuming identical SET options for the test) there will still be some slight overhead (YMMV) with SqlClient consuming results within the SQLCLR. There will be less server overhead with this method compared to returning results back to the calling application.
The SSMS discard results method I suggested in my first comment will incur more overhead than the other methods but does include the server-side work SQL Server will perform not only in running the query, but also filling buffers for the returned result. Whether or not this additional SQL Server work should be taken into account depends on the purpose of the test. For unit-level performance tests, I prefer to execute tests using the same API as the app code.
I captured server-side performance with these 3 methods with #MartinSmith's original query. The average of 1,000 iterations on my machine were:
test method cpu_time duration logical_reads
SSMS discard 53031.000000 55358.844000 7190.512000
TOP 0 52374.000000 52432.936000 7190.527000
SQLCLR 49110.000000 48838.532000 7190.578000
I did the same with a trivial query returning 10,000 rows and 2 columns (int and nvarchar(100)) from a user table:
test method cpu_time duration logical_reads
SSMS discard 4204.000000 9245.426000 402.004000
TOP 0 2641.000000 2752.695000 402.008000
SQLCLR 1921.000000 1878.579000 402.000000
Repeating the same test but with a varchar(100) column instead of nvarchar(100):
test method cpu_time duration logical_reads
SSMS discard 3078.000000 5901.023000 402.004000
TOP 0 2672.000000 2616.359000 402.008000
SQLCLR 1750.000000 1798.098000 402.000000
Below are the scripts I used for testing:
Source code for SQLCLR proc like #VladimirBaranov suggested:
public static void ExecuteNonQuery(string sql)
{
using (var connection = new SqlConnection("Context Connection=true"))
{
connection.Open();
var command = new SqlCommand(sql, connection);
command.ExecuteNonQuery();
}
}
Xe trace to capture the actual server-side timings and resource usage:
CREATE EVENT SESSION [test] ON SERVER
ADD EVENT sqlserver.sql_batch_completed(SET collect_batch_text=(1))
ADD TARGET package0.event_file(SET filename=N'QueryTimes')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF);
GO
User table create and load:
CREATE TABLE dbo.Foo(
FooID int NOT NULL CONSTRAINT PK_Foo PRIMARY KEY
, Bar1 nvarchar(100)
, Bar2 varchar(100)
);
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,t10k AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c CROSS JOIN t10 AS d)
INSERT INTO dbo.Foo WITH (TABLOCKX)
SELECT num, REPLICATE(N'X', 100), REPLICATE('X', 100)
FROM t10k;
GO
SQL script run from SSMS with the discard results query option to run 1000 iterations of the test with the 3 different methods:
SET NOCOUNT ON;
GO
--return and discard results
SELECT v.*,
o.name
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 v.*,
o.name
FOR XML PATH(''))
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT v.*,
o.name
FROM master..spt_values AS v
JOIN sys.objects o
ON o.object_id % NULLIF(v.number, 0) = 0;
'
GO 1000
--return and discard results
SELECT FooID, Bar1
FROM dbo.Foo;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 FooID, Bar1
FOR XML PATH(''))
FROM dbo.Foo;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT FooID, Bar1
FROM dbo.Foo
';
GO 1000
--return and discard results
SELECT FooID, Bar1
FROM dbo.Foo;
GO 1000
--TOP 0
DECLARE #X NVARCHAR(MAX);
SELECT #X = (SELECT TOP 0 FooID, Bar2
FOR XML PATH(''))
FROM dbo.Foo;
GO 1000
--SQLCLR ExecuteNonQuery
EXEC dbo.ExecuteNonQuery #sql = N'
SELECT FooID, Bar2
FROM dbo.Foo
';
GO 1000
I would try to write a single CLR function that runs as many queries as needed to measure. It may have a parameter with the text(s) of queries to run, or names of stored procedures to run.
You have a single request to the server. Everything is done locally on the server. No network overhead. You discard query result in the .NET CLR code without using explicit temp tables by using ExecuteNonQuery for each query that you need to measure.
Don't change the query that you are measuring. Optimizer is complex, changes to the query may have various effects on the performance.
Also, use SET STATISTICS TIME ON and let the server measure the time for you. Fetch what the server has to say, parse it and send it back in the format that suits you.
I think, that results of SET STATISTICS TIME ON / OFF are the most reliable and accurate and have the least amount of noise.
I am working on a client's database and there is about 1 million rows that need to be deleted due to a bug in the software. Is there an efficient way to delete them besides:
DELETE FROM table_1 where condition1 = 'value' ?
Here is a structure for a batched delete as suggested above. Do not try 1M at once...
The size of the batch and the waitfor delay are obviously quite variable, and would depend on your servers capabilities, as well as your need to mitigate contention. You may need to manually delete some rows, measuring how long they take, and adjust your batch size to something your server can handle. As mentioned above, anything over 5000 can cause locking (which I was not aware of).
This would be best done after hours... but 1M rows is really not a lot for SQL to handle. If you watch your messages in SSMS, it may take a while for the print output to show, but it will after several batches, just be aware it won't update in real-time.
Edit: Added a stop time #MAXRUNTIME & #BSTOPATMAXTIME. If you set #BSTOPATMAXTIME to 1, the script will stop on it's own at the desired time, say 8:00AM. This way you can schedule it nightly to start at say midnight, and it will stop before production at 8AM.
Edit: Answer is pretty popular, so I have added the RAISERROR in lieu of PRINT per comments.
DECLARE #BATCHSIZE INT, #WAITFORVAL VARCHAR(8), #ITERATION INT, #TOTALROWS INT, #MAXRUNTIME VARCHAR(8), #BSTOPATMAXTIME BIT, #MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET #BATCHSIZE = 4000
SET #WAITFORVAL = '00:00:10'
SET #MAXRUNTIME = '08:00:00' -- 8AM
SET #BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET #ITERATION = 0 -- LEAVE THIS
SET #TOTALROWS = 0 -- LEAVE THIS
WHILE #BATCHSIZE>0
BEGIN
-- IF #BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
IF CONVERT(VARCHAR(8),GETDATE(),108) >= #MAXRUNTIME AND #BSTOPATMAXTIME=1
BEGIN
RETURN
END
DELETE TOP(#BATCHSIZE)
FROM SOMETABLE
WHERE 1=2
SET #BATCHSIZE=##ROWCOUNT
SET #ITERATION=#ITERATION+1
SET #TOTALROWS=#TOTALROWS+#BATCHSIZE
SET #MSG = 'Iteration: ' + CAST(#ITERATION AS VARCHAR) + ' Total deletes:' + CAST(#TOTALROWS AS VARCHAR)
RAISERROR (#MSG, 0, 1) WITH NOWAIT
WAITFOR DELAY #WAITFORVAL
END
BEGIN TRANSACTION
DoAgain:
DELETE TOP (1000)
FROM <YourTable>
IF ##ROWCOUNT > 0
GOTO DoAgain
COMMIT TRANSACTION
Maybe this solution from Uri Dimant
WHILE 1 = 1
BEGIN
DELETE TOP(2000)
FROM Foo
WHERE <predicate>;
IF ##ROWCOUNT < 2000 BREAK;
END
(Link: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b5225ca7-f16a-4b80-b64f-3576c6aa4d1f/how-to-quickly-delete-millions-of-rows?forum=transactsql)
Here is something I have used:
If the bad data is mixed in with the good-
INSERT INTO #table
SELECT columns
FROM old_table
WHERE statement to exclude bad rows
TRUNCATE old_table
INSERT INTO old_table
SELECT columns FROM #table
Not sure how good this would be but what if you do like below (provided table_1 is a stand alone table; I mean no referenced by other table)
create a duplicate table of table_1 like table_1_dup
insert into table_1_dup select * from table_1 where condition1 <> 'value';
drop table table_1
sp_rename table_1_dup table_1
If you cannot afford to get the database out of production while repairing, do it in small batches. See also: How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table
If you are in a hurry and need the fastest way possible:
take the database out of production
drop all non-clustered indexes and triggers
delete the records (or if the majority of records is bad, copy+drop+rename the table)
(if applicable) fix the inconsistencies caused by the fact that you dropped triggers
re-create the indexes and triggers
bring the database back in production
Using SQL 2005 / 2008
I have to use a forward cursor, but I don't want to suffer poor performance. Is there a faster way I can loop without using cursors?
Here is the example using cursor:
DECLARE #VisitorID int
DECLARE #FirstName varchar(30), #LastName varchar(30)
-- declare cursor called ActiveVisitorCursor
DECLARE ActiveVisitorCursor Cursor FOR
SELECT VisitorID, FirstName, LastName
FROM Visitors
WHERE Active = 1
-- Open the cursor
OPEN ActiveVisitorCursor
-- Fetch the first row of the cursor and assign its values into variables
FETCH NEXT FROM ActiveVisitorCursor INTO #VisitorID, #FirstName, #LastName
-- perform action whilst a row was found
WHILE ##FETCH_STATUS = 0
BEGIN
Exec MyCallingStoredProc #VisitorID, #Forename, #Surname
-- get next row of cursor
FETCH NEXT FROM ActiveVisitorCursor INTO #VisitorID, #FirstName, #LastName
END
-- Close the cursor to release locks
CLOSE ActiveVisitorCursor
-- Free memory used by cursor
DEALLOCATE ActiveVisitorCursor
Now here is the example how can we get same result without using cursor:
/* Here is alternative approach */
-- Create a temporary table, note the IDENTITY
-- column that will be used to loop through
-- the rows of this table
CREATE TABLE #ActiveVisitors (
RowID int IDENTITY(1, 1),
VisitorID int,
FirstName varchar(30),
LastName varchar(30)
)
DECLARE #NumberRecords int, #RowCounter int
DECLARE #VisitorID int, #FirstName varchar(30), #LastName varchar(30)
-- Insert the resultset we want to loop through
-- into the temporary table
INSERT INTO #ActiveVisitors (VisitorID, FirstName, LastName)
SELECT VisitorID, FirstName, LastName
FROM Visitors
WHERE Active = 1
-- Get the number of records in the temporary table
SET #NumberRecords = ##RowCount
--You can use: SET #NumberRecords = SELECT COUNT(*) FROM #ActiveVisitors
SET #RowCounter = 1
-- loop through all records in the temporary table
-- using the WHILE loop construct
WHILE #RowCounter <= #NumberRecords
BEGIN
SELECT #VisitorID = VisitorID, #FirstName = FirstName, #LastName = LastName
FROM #ActiveVisitors
WHERE RowID = #RowCounter
EXEC MyCallingStoredProc #VisitorID, #FirstName, #LastName
SET #RowCounter = #RowCounter + 1
END
-- drop the temporary table
DROP TABLE #ActiveVisitors
"NEVER use Cursors" is a wonderful example of how damaging simple rules can be. Yes, they are easy to communicate, but when we remove the reason for the rule so that we can have an "easy to follow" rule, then most people will just blindly follow the rule without thinking about it, even if following the rule has a negative impact.
Cursors, at least in SQL Server / T-SQL, are greatly misunderstood. It is not accurate to say "Cursors affect performance of SQL". They certainly have a tendency to, but a lot of that has to do with how people use them. When used properly, Cursors are faster, more efficient, and less error-prone than WHILE loops (yes, this is true and has been proven over and over again, regardless of who argues "cursors are evil").
First option is to try to find a set-based approach to the problem.
If logically there is no set-based approach (e.g. needing to call EXEC per each row), and the query for the Cursor is hitting real (non-Temp) Tables, then use the STATIC keyword which will put the results of the SELECT statement into an internal Temporary Table, and hence will not lock the base-tables of the query as you iterate through the results. By default, Cursors are "sensitive" to changes in the underlying Tables of the query and will verify that those records still exist as you call FETCH NEXT (hence a large part of why Cursors are often viewed as being slow). Using STATIC will not help if you need to be sensitive of records that might disappear while processing the result set, but that is a moot point if you are considering converting to a WHILE loop against a Temp Table (since that will also not know of changes to underlying data).
If the query for the cursor is only selecting from temporary tables and/or table variables, then you don't need to prevent locking as you don't have concurrency issues in those cases, in which case you should use FAST_FORWARD instead of STATIC.
I think it also helps to specify the three options of LOCAL READ_ONLY FORWARD_ONLY, unless you specifically need a cursor that is not one or more of those. But I have not tested them to see if they improve performance.
Assuming that the operation is not eligible for being made set-based, then the following options are a good starting point for most operations:
DECLARE [Thing1] CURSOR LOCAL READ_ONLY FORWARD_ONLY STATIC
FOR SELECT columns
FROM Schema.ReadTable(s);
DECLARE [Thing2] CURSOR LOCAL READ_ONLY FORWARD_ONLY FAST_FORWARD
FOR SELECT columns
FROM #TempTable(s) and/or #TableVariables;
You can do a WHILE loop, however you should seek to achieve a more set based operation as anything in SQL that is iterative is subject to performance issues.
http://msdn.microsoft.com/en-us/library/ms178642.aspx
Common Table Expressions would be a good alternative as #Neil suggested. Here's an example from Adventureworks:
WITH cte_PO AS
(
SELECT [LineTotal]
,[ModifiedDate]
FROM [AdventureWorks].[Purchasing].[PurchaseOrderDetail]
),
minmax AS
(
SELECT MIN([LineTotal]) as DayMin
,MAX([LineTotal]) as DayMax
,[ModifiedDate]
FROM cte_PO
GROUP BY [ModifiedDate]
)
SELECT * FROM minmax ORDER BY ModifiedDate
Here's the top few lines of what it returns:
DayMin DayMax ModifiedDate
135.36 8847.30 2001-05-24 00:00:00.000
129.8115 25334.925 2001-06-07 00:00:00.000
Recursive Queries using Common Table Expressions.
I have to use a forward cursor, but I don't want to suffer poor performance. Is there a faster way I can loop without using cursors?
This depends on what you do with the cursor.
Almost everything can be rewritten using set-based operations in which case the loops are performed inside the query plan and since they involve no context switch are much faster.
However, there are some things SQL Server is just not good at, like computing cumulative values or joining on date ranges.
These kinds of queries can be made faster using a CURSOR:
Flattening timespans: SQL Server
But again, this is a quite a rare exception, and normally a set-based way performs better.
If you posted your query, we could probably optimize it and get rid of a CURSOR.
Depending on what you want it for, you may be able to use a tally table.
Jeff Moden has an excellent article on tally tables Here
Don't use a cursor, instead look for a set-based solution. If you can't find a set-based solution... still don't use a cursor! Post details of what you are trying to achieve, someone will be able to find a set-based solution for you.
There may be some scenarios where one can use Tally tables. It could be a good alternative of loop and cusrors but remember it cannot be applied in every case. A well explain case can be found here
On Sql Server 2008, I have a slow-running update query that has been working for 3 hours. Is there any way to get any statistics (say, how many rows have changed so far or something else) about this execution (of course, while executing) ? So that I can make a decision between canceling query and optimize it or let it finish.
Or else, should I have done something before execution ? If so, what can be done before execution for getting info about an update query while it is running. (of course, without affecting the performance largely)
an update for clarifying:
the update statement mentioned in question is (e.g.):
UPDATE myTable SET col_A = 'value1' WHERE col_B = 'value2'
In other words, it is a single query that updates whole table
What can you do now?
You could try running a separate query using the WITH(NOLOCK) table hint to see how many rows have been updated.
e.g. if the update statement is:
UPDATE MyTable
SET MyField = 'UPDATEDVALUE'
WHERE ID BETWEEN 1 AND 10000000
You could run this:
SELECT COUNT(*)
FROM MyTable WITH (NOLOCK)
WHERE ID BETWEEN 1 AND 10000000
AND MyField = 'UPDATEDVALUE'
What can you do in future?
You could do the update in batches, and output progress as it goes. e.g. use a loop to update the records in chunks of say 1000 (arbitary value for point of explanation). After each update chunk completes, print out the progress (assuming you are running it from SSMS) e.g.
DECLARE #RowCount INTEGER
SET #RowCount = 1
DECLARE #Message VARCHAR(500)
DECLARE #TotalRowsUpdated INTEGER
SET #TotalRowsUpdated = 0
WHILE (#RowCount > 0)
BEGIN
UPDATE TOP (1000) MyTable
SET MyField = 'UPDATEDVALUE'
WHERE ID BETWEEN 1 AND 10000000
AND MyField <> 'UPDATEDVALUE'
SELECT #RowCount = ##ROWCOUNT
SET #TotalRowsUpdated = #TotalRowsUpdated + #RowCount
SELECT #Message = CAST(GETDATE() AS VARCHAR) + ' : ' + CAST(#TotalRowsUpdated AS VARCHAR) + ' records updated in total'
RAISERROR (#Message, 0, 1) WITH NOWAIT
END
Using RAISERROR like this ensures progress messages are printed out immediately. If you used PRINT instead, the messages are spooled up and not output immediately so you wouldn't get to see real-time progress.
From http://www.sqlnewsgroups.net/group/microsoft.public.sqlserver.server/topic19776.aspx
"You should be able to do a dirty read to produce a rowcount of rows
satisfying the update clause (assuming that value wasn't already used)."
If you set your query to use "READ UNCOMMITTED" you should be able to see rows that have been updated by your statement with a select statement having the proper criteria. I'm no expert on this...so just my guess.
It would really help us if you posted the query so that we can look at it and try to help you.
One way to see what is going on is to use SQL Profiler to profile the updates / inserts.
Without knowning exactly what you're doing I can't say for sure, but if you put the update in a procedure then use the PRINT command to output status updates at specific steps, these messages are output during the procedures run time in the messages tab that's next to the results tab.