Using SQL 2005 / 2008
I have to use a forward cursor, but I don't want to suffer poor performance. Is there a faster way I can loop without using cursors?
Here is the example using cursor:
DECLARE #VisitorID int
DECLARE #FirstName varchar(30), #LastName varchar(30)
-- declare cursor called ActiveVisitorCursor
DECLARE ActiveVisitorCursor Cursor FOR
SELECT VisitorID, FirstName, LastName
FROM Visitors
WHERE Active = 1
-- Open the cursor
OPEN ActiveVisitorCursor
-- Fetch the first row of the cursor and assign its values into variables
FETCH NEXT FROM ActiveVisitorCursor INTO #VisitorID, #FirstName, #LastName
-- perform action whilst a row was found
WHILE ##FETCH_STATUS = 0
BEGIN
Exec MyCallingStoredProc #VisitorID, #Forename, #Surname
-- get next row of cursor
FETCH NEXT FROM ActiveVisitorCursor INTO #VisitorID, #FirstName, #LastName
END
-- Close the cursor to release locks
CLOSE ActiveVisitorCursor
-- Free memory used by cursor
DEALLOCATE ActiveVisitorCursor
Now here is the example how can we get same result without using cursor:
/* Here is alternative approach */
-- Create a temporary table, note the IDENTITY
-- column that will be used to loop through
-- the rows of this table
CREATE TABLE #ActiveVisitors (
RowID int IDENTITY(1, 1),
VisitorID int,
FirstName varchar(30),
LastName varchar(30)
)
DECLARE #NumberRecords int, #RowCounter int
DECLARE #VisitorID int, #FirstName varchar(30), #LastName varchar(30)
-- Insert the resultset we want to loop through
-- into the temporary table
INSERT INTO #ActiveVisitors (VisitorID, FirstName, LastName)
SELECT VisitorID, FirstName, LastName
FROM Visitors
WHERE Active = 1
-- Get the number of records in the temporary table
SET #NumberRecords = ##RowCount
--You can use: SET #NumberRecords = SELECT COUNT(*) FROM #ActiveVisitors
SET #RowCounter = 1
-- loop through all records in the temporary table
-- using the WHILE loop construct
WHILE #RowCounter <= #NumberRecords
BEGIN
SELECT #VisitorID = VisitorID, #FirstName = FirstName, #LastName = LastName
FROM #ActiveVisitors
WHERE RowID = #RowCounter
EXEC MyCallingStoredProc #VisitorID, #FirstName, #LastName
SET #RowCounter = #RowCounter + 1
END
-- drop the temporary table
DROP TABLE #ActiveVisitors
"NEVER use Cursors" is a wonderful example of how damaging simple rules can be. Yes, they are easy to communicate, but when we remove the reason for the rule so that we can have an "easy to follow" rule, then most people will just blindly follow the rule without thinking about it, even if following the rule has a negative impact.
Cursors, at least in SQL Server / T-SQL, are greatly misunderstood. It is not accurate to say "Cursors affect performance of SQL". They certainly have a tendency to, but a lot of that has to do with how people use them. When used properly, Cursors are faster, more efficient, and less error-prone than WHILE loops (yes, this is true and has been proven over and over again, regardless of who argues "cursors are evil").
First option is to try to find a set-based approach to the problem.
If logically there is no set-based approach (e.g. needing to call EXEC per each row), and the query for the Cursor is hitting real (non-Temp) Tables, then use the STATIC keyword which will put the results of the SELECT statement into an internal Temporary Table, and hence will not lock the base-tables of the query as you iterate through the results. By default, Cursors are "sensitive" to changes in the underlying Tables of the query and will verify that those records still exist as you call FETCH NEXT (hence a large part of why Cursors are often viewed as being slow). Using STATIC will not help if you need to be sensitive of records that might disappear while processing the result set, but that is a moot point if you are considering converting to a WHILE loop against a Temp Table (since that will also not know of changes to underlying data).
If the query for the cursor is only selecting from temporary tables and/or table variables, then you don't need to prevent locking as you don't have concurrency issues in those cases, in which case you should use FAST_FORWARD instead of STATIC.
I think it also helps to specify the three options of LOCAL READ_ONLY FORWARD_ONLY, unless you specifically need a cursor that is not one or more of those. But I have not tested them to see if they improve performance.
Assuming that the operation is not eligible for being made set-based, then the following options are a good starting point for most operations:
DECLARE [Thing1] CURSOR LOCAL READ_ONLY FORWARD_ONLY STATIC
FOR SELECT columns
FROM Schema.ReadTable(s);
DECLARE [Thing2] CURSOR LOCAL READ_ONLY FORWARD_ONLY FAST_FORWARD
FOR SELECT columns
FROM #TempTable(s) and/or #TableVariables;
You can do a WHILE loop, however you should seek to achieve a more set based operation as anything in SQL that is iterative is subject to performance issues.
http://msdn.microsoft.com/en-us/library/ms178642.aspx
Common Table Expressions would be a good alternative as #Neil suggested. Here's an example from Adventureworks:
WITH cte_PO AS
(
SELECT [LineTotal]
,[ModifiedDate]
FROM [AdventureWorks].[Purchasing].[PurchaseOrderDetail]
),
minmax AS
(
SELECT MIN([LineTotal]) as DayMin
,MAX([LineTotal]) as DayMax
,[ModifiedDate]
FROM cte_PO
GROUP BY [ModifiedDate]
)
SELECT * FROM minmax ORDER BY ModifiedDate
Here's the top few lines of what it returns:
DayMin DayMax ModifiedDate
135.36 8847.30 2001-05-24 00:00:00.000
129.8115 25334.925 2001-06-07 00:00:00.000
Recursive Queries using Common Table Expressions.
I have to use a forward cursor, but I don't want to suffer poor performance. Is there a faster way I can loop without using cursors?
This depends on what you do with the cursor.
Almost everything can be rewritten using set-based operations in which case the loops are performed inside the query plan and since they involve no context switch are much faster.
However, there are some things SQL Server is just not good at, like computing cumulative values or joining on date ranges.
These kinds of queries can be made faster using a CURSOR:
Flattening timespans: SQL Server
But again, this is a quite a rare exception, and normally a set-based way performs better.
If you posted your query, we could probably optimize it and get rid of a CURSOR.
Depending on what you want it for, you may be able to use a tally table.
Jeff Moden has an excellent article on tally tables Here
Don't use a cursor, instead look for a set-based solution. If you can't find a set-based solution... still don't use a cursor! Post details of what you are trying to achieve, someone will be able to find a set-based solution for you.
There may be some scenarios where one can use Tally tables. It could be a good alternative of loop and cusrors but remember it cannot be applied in every case. A well explain case can be found here
Related
I have following IF statement in my stored procedure:
IF #parameter2 IS NOT NULL
BEGIN
-- delete the existing data
DELETE FROM #tab
-- fetch data to #tab
INSERT INTO #tab EXECUTE sp_executesql #getValueSql, N'#parameter nvarchar(MAX)', #parameter2
SET #value2 = (SELECT * FROM #tab)
IF #value2 = #parameter2
RETURN 5
END
ELSE
RETURN 5
This is to check if the #parameter2 value already exists in the database. Now the trouble I have is that I have to do this for up to 10 parameters. I am wondering If it would be faster to just copy the statement and repeat the code for all the possible parameters. That would mean that I have 10 almost identical IF statements. The other option I see possible is inserting all the parameters to #tab and loop through them with CURSOR or WHILE. I am concerned about the speed of looping because as far as I know they are pretty slow.
As mentioned in comments, it would be about the same.
Indeed cursors are slow. But the reasons why they are slow is not avoided if you type a multitude of IFs in your code. You are essentially using a cursor in an "unrolled form"; The problems stay the same: all execution plans of your procedure will be re-created, and the optimizer will not have the chance to use a better, set-based plan.
So, your options are:
Either use a cursor of a multitude of IFs. For the sake of readability and ease, I would recommend the cursor. The performance will be the same
If possible, re-write the code of #getValueSql to support operating not on one value for #parameterN, but rather on a parameter table which will have N rows, one for each parameter value you're interested in. This is probably hard or impossible to do, but the only way to increase performance.
However, I should also mention that the cursor drawbacks won't be very noticeable on just 10 iterations, except maybe if you have exceptinally complex and nested queries. Remember, "Premature optimization is the root of all evil". The most probable thing is you don't need to worry about this.
I observed a strange thing inside a stored procedure with select on table variables. It always returns the value (on subsequent iterations) that was fetched in the first iteration of cursor. Here is some sample code that proves this.
DECLARE #id AS INT;
DECLARE #outid AS INT;
DECLARE sub_cursor CURSOR FAST_FORWARD
FOR SELECT [TestColumn]
FROM testtable1;
OPEN sub_cursor;
FETCH NEXT FROM sub_cursor INTO #id;
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #Log TABLE (LogId BIGINT NOT NULL);
PRINT 'id: ' + CONVERT (VARCHAR (10), #id);
INSERT INTO Testtable2 (TestColumn)
OUTPUT inserted.[TestColumn] INTO #Log
VALUES (#id);
IF ##ERROR = 0
BEGIN
SELECT TOP 1 #outid = LogId
FROM #Log;
PRINT 'Outid: ' + CONVERT (VARCHAR (10), #outid);
INSERT INTO [dbo].[TestTable3] ([TestColumn])
VALUES (#outid);
END
FETCH NEXT FROM sub_cursor INTO #id;
END
CLOSE sub_cursor;
DEALLOCATE sub_cursor;
However, while I was posting the code on SO and tried various combinations, I observed that removing top from the below line, gives me the right values out of table variable inside a cursor.
SELECT TOP 1 #outid = LogId FROM #Log;
which would make it like this
SELECT #outid = LogId FROM #Log;
I am not sure what is happening here. I thought TOP 1 on table variable should work, thinking that a new table is created on every iteration of the loop. Can someone throw light on the table variable scoping and lifetime.
Update: I have the solution to circumvent the strange behavior here.
As a solution, I have declared the table at the top before the loop and deleting all rows at the beginning of the loop.
There are numerous things a bit off with this code.
First off, you roll back your embedded transaction on error, but I never see you commit it on success. As written, this will leak a transaction, which could cause major issues for you in the following code.
What might be confusing you about the #Log table situation is that SQL Server doesn't use the same variable scoping and lifetime rules as C++ or other standard programming languages. Even when declaring your table variable in the cursor block you will only get a single #Log table which then lives for the remainder of the batch, and which gets multiple rows inserted into it.
As a result, your use of TOP 1 is not really meaningful, since there's no ORDER BY clause to impose any sort of deterministic ordering on the table. Without that, you get whatever order SQL Server sees fit to give you, which in this case appears to be the insertion order, giving you the first inserted element of that log table every time you run the SELECT.
If you truly want only the last ID value, you will need to provide some real ordering criterion for your #Log table -- some form of autonumber or date field alongside the data column that can be used to provide the proper ordering for what you want to do.
I'm getting started with my first use of a cursor in a stored procedure in sql server 2008. I've done some preliminary reading and I understand that they have significant performance limitations. In my current case I think they're necessary (I want to run multiple stored procedures for each stock symbol in a symbols table.
Edit:
The sprocs I'll be calling on each symbol will for the most part be insert operations to calculate symbol- dependent values, such as 5 day moving average, average daily volume, ATR (average true range). Most of these values will be calculated from data from a daily pricing and volume table... I'd like to streamline the retrieval of data values that would be retrieved redundantly otherwise... for example, I'd like to get for each symbol the daily pricing and volume data into a table variable... that temp table will then be passed in to the stored procedure that calls each of the aggregated functions I just mentioned. Hope that makes sense...
So my initial "outer loop" cursor- based stored procedure is below.. it times out after several minutes, without returning anything to the output window.
ALTER PROCEDURE dbo.sprocSymbolDependentAggsDriver2
AS
DECLARE #symbol nchar(10)
DECLARE symbolCursor CURSOR
STATIC FOR
SELECT Symbol FROM tblSymbolsMain ORDER BY Symbol
OPEN symbolCursor
FETCH NEXT FROM symbolCursor INTO #symbol
WHILE ##FETCH_STATUS = 0
SET #symbol = #symbol + ': Test.'
FETCH NEXT FROM symbolCursor INTO #symbol
CLOSE symbolCursor
DEALLOCATE symbolCursor
When I run it without the #symbol local variable and eliminate the assignment to it in the while loop, it seems to run ok. Is there a clear violation of performance best- practices within that assignment? Thanks..
"In my current case I think they're necessary (I want to run multiple
stored procedures for each stock symbol in a symbols table."
Cursors are rarely necessary.
From your example above, I think a simple WHILE loop will easily take the place of your cursor. Adapted from SQL Cursors - How to avoid them (one of my favorite SQL bookmarks)
-- Create a temporary table...
CREATE TABLE #Symbols (
RowID int IDENTITY(1, 1),
Symbol(nvarchar(max))
)
DECLARE #NumberRecords int, #RowCount int
DECLARE #Symbol nvarchar(max)
-- Get your data that you want to loop over
INSERT INTO #Symbols (Symbol)
SELECT Symbol
FROM tblSymbolsMain
ORDER BY Symbol
-- Get the number of records you just grabbed
SET #NumberRecords = ##ROWCOUNT
SET #RowCount = 1
-- Just do a WHILE loop. No cursor necessary.
WHILE #RowCount <= #NumberRecords
BEGIN
SELECT #Symbol = Symbol
FROM #Symbols
WHERE RowID = #RowCount
EXEC <myProc1> #Symbol
EXEC <myProc2> #Symbol
EXEC <myProc3> #Symbol
SET #RowCount = #RowCount + 1
END
DROP TABLE #Symbols
You don't really need all that explicit cursor jazz to build a string. Here is probably a more efficient way to do it:
DECLARE #symbol NVARCHAR(MAX) = N'';
SELECT #symbol += ': Test.'
FROM dbo.tblSymbolsMain
ORDER BY Symbol;
Though I suspect you actually wanted to see the names of the symbol, e.g.
DECLARE #symbol NVARCHAR(MAX) = N'';
SELECT #symbol += N':' + Symbol
FROM dbo.tblSymbolsMain
ORDER BY Symbol;
One caveat is that while you will typically observe the order to be observed, it is not guaranteed. So if you want to stick to the cursor, at least declare the cursor as follows:
DECLARE symbolCursor CURSOR
LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR
...
Also it seems to me like NCHAR(10) is not sufficient to hold the data you're trying to stuff into it, unless you only have one row (which is why I chose NVARCHAR(MAX) above).
And I agree with Abe... it is quite possible you don't need to fire a stored procedure for every row in the cursor, but to suggest ways around that (which will almost certainly be more efficient), we'd have to understand what those stored procedures actually do.
you need an begin end here:
WHILE ##FETCH_STATUS = 0 BEGIN
SET #symbol = #symbol + ': Test.'
FETCH NEXT FROM symbolCursor INTO #symbol
END
also try DECLARE symbolCursor CURSOR LOCAL READ_ONLY FORWARD_ONLY instead of STATIC to improve performance.
After reading all the suggestions, I ended up doing some old trick and it worked miracles!
I had this cursor which was taking almost 3 mins to run, while the enclosing query was instant. I have other databases with more complex cursors that were only taking 1 second or less, so I ruled out the global issue on using cursors. My solution:
Detach the database in question, but ensure you tick Update Statistics.
Attach the database and check performance
This seems to help optimize all the performance parameters without the detailed effort. I am using SQL Express 2008 R2.
Would like to know your experience.
I have a system which requires I have IDs on my data before it goes to the database. I was using GUIDs, but found them to be too big to justify the convenience.
I'm now experimenting with implementing a sequence generator which basically reserves a range of unique ID values for a given context. The code is as follows;
ALTER PROCEDURE [dbo].[Sequence.ReserveSequence]
#Name varchar(100),
#Count int,
#FirstValue bigint OUTPUT
AS
BEGIN
SET NOCOUNT ON;
-- Ensure the parameters are valid
IF (#Name IS NULL OR #Count IS NULL OR #Count < 0)
RETURN -1;
-- Reserve the sequence
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION
-- Get the sequence ID, and the last reserved value of the sequence
DECLARE #SequenceID int;
DECLARE #LastValue bigint;
SELECT TOP 1 #SequenceID = [ID], #LastValue = [LastValue]
FROM [dbo].[Sequences]
WHERE [Name] = #Name;
-- Ensure the sequence exists
IF (#SequenceID IS NULL)
BEGIN
-- Create the new sequence
INSERT INTO [dbo].[Sequences] ([Name], [LastValue])
VALUES (#Name, #Count);
-- The first reserved value of a sequence is 1
SET #FirstValue = 1;
END
ELSE
BEGIN
-- Update the sequence
UPDATE [dbo].[Sequences]
SET [LastValue] = #LastValue + #Count
WHERE [ID] = #SequenceID;
-- The sequence start value will be the last previously reserved value + 1
SET #FirstValue = #LastValue + 1;
END
COMMIT TRANSACTION
END
The 'Sequences' table is just an ID, Name (unique), and the last allocated value of the sequence. Using this procedure I can request N values in a named sequence and use these as my identifiers.
This works great so far - it's extremely quick since I don't have to constantly ask for individual values, I can just use up a range of values and then request more.
The problem is that at extremely high frequency, calling the procedure concurrently can sometimes result in a deadlock. I have only found this to occur when stress testing, but I'm worried it'll crop up in production. Are there any notable flaws in this procedure, and can anyone recommend any way to improve on it? It would be nice to do with without transactions for example, but I do need this to be 'thread safe'.
MS themselves offer a solution and even they say it locks/deadlocks.
If you want to add some lock hints then you'd reduce concurrency for your high loads
Options:
You could develop against the "Denali" CTP which is the next release
Use IDENTITY and the OUTPUT clause like everyone else
Adopt/modify the solutions above
On DBA.SE there is "Emulate a TSQL sequence via a stored procedure": see dportas' answer which I think extends the MS solution.
I'd recommend sticking with the GUIDs, if as you say, this is mostly about composing data ready for a bulk insert (it's simpler than what I present below).
As an alternative, could you work with a restricted count? Say, 100 ID values at a time? In that case, you could have a table with an IDENTITY column, insert into that table, return the generated ID (say, 39), and then your code could assign all values between 3900 and 3999 (e.g. multiply up by your assumed granularity) without consulting the database server again.
Of course, this could be extended to allocating multiple IDs in a single call - provided that your okay with some IDs potentially going unused. E.g. you need 638 IDs - so you ask the database to assign you 7 new ID values (which imply that you've allocated 700 values), use the 638 you want, and the remaining 62 never get assigned.
Can you get some kind of deadlock trace? For example, enable trace flag 1222 as shown here. Duplicate the deadlock. Then look in the SQL Server log for the deadlock trace.
Also, you might inspect what locks are taken out in your code by inserting a call to exec sp_lock or select * from sys.dm_tran_locks immediately before the COMMIT TRANSACTION.
Most likely you are observing a conversion deadlock. To avoid them, you want to make sure that your table is clustered and has a PK, but this advice is specific to 2005 and 2008 R2, and they can change the implementation, rendering this advice useless. Google up "Some heap tables may be more prone to deadlocks than identical tables with clustered indexes".
Anyway, if you observe an error during stress testing, it is likely that sooner or later it will occur in production as well.
You may want to use sp_getapplock to serialize your requests. Google up "Application Locks (or Mutexes) in SQL Server 2005". Also I described a few useful ideas here: "Developing Modifications that Survive Concurrency".
I thought I'd share my solution. I doesn't deadlock, nor does it produce duplicate values. An important difference between this and my original procedure is that it doesn't create the queue if it doesn't already exist;
ALTER PROCEDURE [dbo].[ReserveSequence]
(
#Name nvarchar(100),
#Count int,
#FirstValue bigint OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
IF (#Count <= 0)
BEGIN
SET #FirstValue = NULL;
RETURN -1;
END
DECLARE #Result TABLE ([LastValue] bigint)
-- Update the sequence last value, and get the previous one
UPDATE [Sequences]
SET [LastValue] = [LastValue] + #Count
OUTPUT INSERTED.LastValue INTO #Result
WHERE [Name] = #Name;
-- Select the first value
SELECT TOP 1 #FirstValue = [LastValue] + 1 FROM #Result;
END
simple problem, but perhaps no simple solution, at least I can't think of one of the top of my head but then I'm not the best at finding the best solutions.
I have a stored proc, this stored proc does (in a basic form) a select on a table, envision this:
SELECT * FROM myTable
okay, simple enough, except the table name it needs to search on isn't known, so we ended up with something pretty similiar to this:
-- Just to give some context to the variables I'll be using
DECLARE #metaInfoID AS INT
SET #metaInfoID = 1
DECLARE #metaInfoTable AS VARCHAR(200)
SELECT #metaInfoTable = MetaInfoTableName FROM MetaInfos WHERE MetaInfoID = #MetaInfoID
DECLARE #sql AS VARCHAR(200)
SET #sql = 'SELECT * FROM ' + #metaInfoTable
EXEC #sql
So, I, recognize this is ultimately bad, and can see immediately where I can perform a sql injection attack. So, the question is, is there a way to achieve the same results without the construction of the dynamic sql? or am I going to have to be super, super careful in my client code?
You have to use dynamic sql if you don't know the table name up front. But yes, you should validate the value before attempting to use it in an SQL statement.
e.g.
IF EXISTS(SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME=#metaInfoTable)
BEGIN
-- Execute the SELECT * FROM #metaInfoTable dynamic sql
END
This will make sure a table with that name exists. There is an overhead to doing this obviously as you're querying INFORMATION_SCHEMA. You could instead validate the #metaInfoTable contains only certain characters:
-- only run dynamic sql if table name value contains 0-9,a-z,A-Z, underscores or spaces (enclose table name in square brackets, in case it does contain spaces)
IF NOT #metaInfoTable LIKE '%^[0-9a-zA-Z_ ]%'
BEGIN
-- Execute the SELECT * FROM #metaInfoTable dynamic sql
END
Given the constraints described, I'd suggest 2 ways, with slight variations in performance an architecture.
Choose At the Client & Re-Architect
I'd suggest that you should consider a small re-architecture as much as possible to force the caller/client to decide which table to get its data from. It's a code smell to hold table names in another table.
I am taking an assumption here that #MetaInfoID is being passed from a webapp, data access block, etc. That's where the logic of which table to perform the SELECT on should be housed. I'd say that the client should know which stored procedure (GetCustomers or GetProducts) to call based on that #MetaInfoID. Create new method in your DAL like GetCustomersMetaInfo() and GetProductsMetaInfo() and GetInvoicesMetaInfo() which call into their appropriate sprocs (with no dynamic SQL needed, and no maintenance of a meta table in the DB).
Perhaps try to re-architect the system a little bit.
In SQL Server
If you absolutely have to do this lookup in the DB, and depending on the number of tables that you have, you could perform a handful of IF statements (as many as needed) like:
IF #MetaInfoID = 1
SELECT * FROM Customers
IF #MetaInfoID =2
SELECT * FROM Products
-- etc
That would probably become to be a nightmare to maintain.
Perhaps you could write a stored procedure for each MetaInfo. In this way, you gain the advantage of pre-compilation, and no SQL injection can occur here. (imagine if someone sabotaged the MetaInfoTableName column)
IF #MetaInfoID = 1
EXEC GetAllCustomers
IF #MetaInfoID = 2
EXEC GetAllProducts