Getting info about running update query - sql-server

On Sql Server 2008, I have a slow-running update query that has been working for 3 hours. Is there any way to get any statistics (say, how many rows have changed so far or something else) about this execution (of course, while executing) ? So that I can make a decision between canceling query and optimize it or let it finish.
Or else, should I have done something before execution ? If so, what can be done before execution for getting info about an update query while it is running. (of course, without affecting the performance largely)
an update for clarifying:
the update statement mentioned in question is (e.g.):
UPDATE myTable SET col_A = 'value1' WHERE col_B = 'value2'
In other words, it is a single query that updates whole table

What can you do now?
You could try running a separate query using the WITH(NOLOCK) table hint to see how many rows have been updated.
e.g. if the update statement is:
UPDATE MyTable
SET MyField = 'UPDATEDVALUE'
WHERE ID BETWEEN 1 AND 10000000
You could run this:
SELECT COUNT(*)
FROM MyTable WITH (NOLOCK)
WHERE ID BETWEEN 1 AND 10000000
AND MyField = 'UPDATEDVALUE'
What can you do in future?
You could do the update in batches, and output progress as it goes. e.g. use a loop to update the records in chunks of say 1000 (arbitary value for point of explanation). After each update chunk completes, print out the progress (assuming you are running it from SSMS) e.g.
DECLARE #RowCount INTEGER
SET #RowCount = 1
DECLARE #Message VARCHAR(500)
DECLARE #TotalRowsUpdated INTEGER
SET #TotalRowsUpdated = 0
WHILE (#RowCount > 0)
BEGIN
UPDATE TOP (1000) MyTable
SET MyField = 'UPDATEDVALUE'
WHERE ID BETWEEN 1 AND 10000000
AND MyField <> 'UPDATEDVALUE'
SELECT #RowCount = ##ROWCOUNT
SET #TotalRowsUpdated = #TotalRowsUpdated + #RowCount
SELECT #Message = CAST(GETDATE() AS VARCHAR) + ' : ' + CAST(#TotalRowsUpdated AS VARCHAR) + ' records updated in total'
RAISERROR (#Message, 0, 1) WITH NOWAIT
END
Using RAISERROR like this ensures progress messages are printed out immediately. If you used PRINT instead, the messages are spooled up and not output immediately so you wouldn't get to see real-time progress.

From http://www.sqlnewsgroups.net/group/microsoft.public.sqlserver.server/topic19776.aspx
"You should be able to do a dirty read to produce a rowcount of rows
satisfying the update clause (assuming that value wasn't already used)."
If you set your query to use "READ UNCOMMITTED" you should be able to see rows that have been updated by your statement with a select statement having the proper criteria. I'm no expert on this...so just my guess.

It would really help us if you posted the query so that we can look at it and try to help you.
One way to see what is going on is to use SQL Profiler to profile the updates / inserts.

Without knowning exactly what you're doing I can't say for sure, but if you put the update in a procedure then use the PRINT command to output status updates at specific steps, these messages are output during the procedures run time in the messages tab that's next to the results tab.

Related

How to convert Row by row execution in to SET based approach in SQL

I'm working on a huge SQL code and unfortunately it has a CURSOR which handles another two nested CURSORS within it (totally three cursors inside a stored procedure), which handles millions of data to be DELETE,UPDATE and INSERT. This takes a whole lot of time because of row by row execution and I wish to modify this in to SET based approach
From many articles it shows use of CURSORs is not recommend and the alternate is to use WHILE loops instead, So I tried and replaced the three CUROSRs with three WHILE loops nothing more, though I get the same result but there is no improvement in performance, it took the same time as it took for CUROSRs.
Below is the basic structure of the code I'm working on (i Will try to put as simple as possible) and I will put the comments what they are supposed to do.
declare #projects table (
ProjectID INT,
fieldA int,
fieldB int,
fieldC int,
fieldD int)
INSERT INTO #projects
SELECT ProjectID,fieldA,fieldB,fieldC, fieldD
FROM ProjectTable
DECLARE projects1 CURSOR LOCAL FOR /*First cursor - fetch the cursor from ProjectaTable*/
Select ProjectID FROM #projects
OPEN projects1
FETCH NEXT FROM projects1 INTO #ProjectID
WHILE ##FETCH_STATUS = 0
BEGIN
BEGIN TRY
BEGIN TRAN
DELETE FROM T_PROJECTGROUPSDATA td
WHERE td.ID = #ProjectID
DECLARE datasets CURSOR FOR /*Second cursor - this will get the 'collectionDate'field from datasetsTable for every project fetched in above cursor*/
Select DataID, GroupID, CollectionDate
FROM datasetsTable
WHERE datasetsTable.projectID = #ProjectID /*lets say this will fetch ten records for a single projectID*/
OPEN datasets
FETCH NEXT FROM datasets INTO #DataID, #GroupID, #CollectionDate
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE period CURSOR FOR /*Third Cursor - this will process the records from another table called period with above fetched #collectionDate*/
SELECT ID, dbo.fn_GetEndOfPeriod(ID)
FROM T_PERIODS
WHERE DATEDIFF(dd,#CollectionDate,dbo.fn_GetEndOfPeriod(ID)) >= 0 /*lets say this will fetch 20 records for above fetched single #CollectionDate*/
ORDER BY [YEAR],[Quarter]
OPEN period
FETCH NEXT FROM period INTO #PeriodID, #EndDate
WHILE ##FETCH_STATUS = 0
BEGIN
IF EXISTS (some conditions No - 1 )
BEGIN
BREAK
END
IF EXISTS (some conditions No - 2 )
BEGIN
FETCH NEXT FROM period INTO #PeriodID, #EndDate
CONTINUE
END
/*get the appropirate ID from T_uploads table for the current projectID and periodID fetched*/
SET #UploadID = (SELECT ID FROM T_UPLOADS u WHERE u.project_savix_ID = #ProjectID AND u.PERIOD_ID = #PeriodID AND u.STATUS = 3)
/*Update some fields in T_uploads table for the current projectID and periodID fetched*/
UPDATE T_uploads
SET fieldA = mp.fieldA, fieldB = mp.fieldB
FROM #projects mp
WHERE T_UPLOADS.ID = #UploadID AND mp.ProjectID = #ProjectID
/*Insert some records in T_PROJECTGROUPSDATA table for the current projectID and periodID fetched*/
INSERT INTO T_PROJECTGROUPSDATA tpd ( fieldA,fieldB,fieldC,fieldD,uploadID)
SELECT fieldA,fieldB,fieldC,fieldD,#UploadID
FROM #projects
WHERE tpd.DataID = #DataID
FETCH NEXT FROM period INTO #PeriodID, #EndDate
END
CLOSE period
DEALLOCATE period
FETCH NEXT FROM datasets INTO #DataID, #GroupID, #CollectionDate, #Status, #Createdate
END
CLOSE datasets
DEALLOCATE datasets
COMMIT
END TRY
BEGIN CATCH
Error handling
IF ##TRANCOUNT > 0
ROLLBACK
END CATCH
FETCH NEXT FROM projects1 INTO #ProjectID, #FAID
END
CLOSE projects1
DEALLOCATE projects1
SELECT 1 as success
I request you to suggest any methods to rewrite this code to follow the SET based approach.
Until the table structure and expected result sample data is not provided, here are a few quick things I see that can be improved (some of those are already mentioned by others above):
WHILE Loop is also a cursor. So, changing into to while loop is not
going make things any faster.
Use LOCAL FAST_FORWARD cursor unless you have need to back track a record. This would make the execution much faster.
Yes, I agree that having a SET based approach would be the fastest in most cases, however if you must store intermediate resultset somewhere, I would suggest using a temp table instead of a table variable. Temp table is 'lesser evil' between these 2 options. Here are a few reason why you should try to avoid using a table variable:
Since SQL Server would not have any prior statistics on the table variable during building on Execution Plan, it will always consider that only one record would be returned by the table variable during construction of the execution plan. And accordingly Storage Engine would assign only as much RAM memory for execution of the query. But in reality, there could be millions of records that the table variable might hold during execution. If that happens, SQL Server would be forced spill the data to hard disk during execution (and you will see lots of PAGEIOLATCH in sys.dm_os_wait_stats) making the queries way slower.
One way to get rid of the above issue would be by providing statement level hint OPTION (RECOMPILE) at the end of each query where a table value is used. This would force SQL Server to construct the Execution Plan of those queries each time during runtime and the less memory allocation issue can be avoided. However the downside of this is: SQL Server will no longer be able to take advantage of an already cached execution plan for that stored procedure, and would require recompilation every time, which would deteriorate the performance by some extent. So, unless you know that data in the underlying table changes frequently or the stored procedure itself is not frequently executed, this approach is not recommended by Microsoft MVPs.
Replacing Cursor with While blindly, is not a recommended option, hence it would not impact your performance and might even have negative impact on the performance.
When you define the cursor using Declare C Cursor in fact you are going to create a SCROLL cursor which specifies that all fetch options (FIRST, LAST, PRIOR, NEXT, RELATIVE, ABSOLUTE) are available.
When you need just Fetch Next as scroll option, you can declare the cursor as FAST_FORWARD
Here is the quote about FAST_FORWARD cursor in Microsoft docs:
Specifies that the cursor can only move forward and be scrolled from
the first to the last row. FETCH NEXT is the only supported fetch
option. All insert, update, and delete statements made by the current
user (or committed by other users) that affect rows in the result set
are visible as the rows are fetched. Because the cursor cannot be
scrolled backward, however, changes made to rows in the database after
the row was fetched are not visible through the cursor. Forward-only
cursors are dynamic by default, meaning that all changes are detected
as the current row is processed. This provides faster cursor opening
and enables the result set to display updates made to the underlying
tables. While forward-only cursors do not support backward scrolling,
applications can return to the beginning of the result set by closing
and reopening the cursor.
So you can declare your cursors using DECLARE <CURSOR NAME> FAST_FORWARD FOR ... and you will get noticeable improvements
I think all the cursors code above can be simplified to something like this:
DROP TABLE IF EXISTS #Source;
SELECT DISTINCT p.ProjectID,p.fieldA,p.fieldB,p.fieldC,p.fieldD,u.ID AS [UploadID]
INTO #Source
FROM ProjectTable p
INNER JOIN DatasetsTable d ON d.ProjectID = p.ProjectID
INNER JOIN T_PERIODS s ON DATEDIFF(DAY,d.CollectionDate,dbo.fn_GetEndOfPeriod(s.ID)) >= 0
INNER JOIN T_UPLOADS u ON u.roject_savix_ID = p.ProjectID AND u.PERIOD_ID = s.ID AND u.STATUS = 3
WHERE NOT EXISTS (some conditions No - 1)
AND NOT EXISTS (some conditions No - 2)
;
UPDATE u SET u.fieldA = s.fieldA, u.fieldB = s.fieldB
FROM T_UPLOADS u
INNER JOIN #Source s ON s.UploadID = u.ID
;
INSERT INTO T_PROJECTGROUPSDATA (fieldA,fieldB,fieldC,fieldD,uploadID)
SELECT DISTINCT s.fieldA,s.fieldB,s.fieldC,s.fieldD,s.UploadID
FROM #Source s
;
DROP TABLE IF EXISTS #Source;
Also it would be nice to know "some conditions No" details as query can differ depends on that.

Why is my UPDATE stored procedure executed multiple times?

I use stored procedures to manage a warehouse. PDA scanners scan the added stock and send it in bulk (when plugged back) to a SQL database (SQL Server 2016).
The SQL database is fairly remote (other country), so there's sometimes delay in some queries, but this particular one is problematic: even if the stock table is fine, I had some problems with updating the occupancy of the warehouse spots. The PDA tracks the added items in every spot as a SMALLINT, then send back this value to the stored procedure below.
PDA "send_spots" query:
SELECT spot, added_items FROM spots WHERE change=1
Stored procedure:
CREATE PROCEDURE [dbo].[update_spots]
#spot VARCHAR(10),
#added_items SMALLINT
AS
BEGIN
BEGIN TRAN
UPDATE storage_spots
SET remaining_capacity = remaining_capacity - #added_items
WHERE storage_spot=#spot
IF ##ROWCOUNT <> 1
BEGIN
ROLLBACK TRAN
RETURN - 1
END
ELSE
BEGIN
COMMIT TRAN
RETURN 0
END
END
GO
If the remaining_capacity value is 0, the PDAs can't add more items to it on next round. But with this process, I had negative values because the query allegedly ran two times (so subtracted #added_items two times).
Is there a way for that to be possible? How could I fix it? From what I understood the transaction should be cancelled (ROLLBACK) if the affected rows are != 1, but maybe that's something else.
EDIT: current solution with the help of #Zero:
CREATE PROCEDURE [dbo].[update_spots]
#spot VARCHAR(10),
#added_racks SMALLINT
AS
BEGIN
-- Recover current capacity of the spot
DECLARE #orig_capacity SMALLINT
SELECT TOP 1
#orig_capacity = remaining_capacity
FROM storage_spots
WHERE storage_spot=#spot
-- Test if double is present in logs by comparing dates (last 10 seconds)
DECLARE #is_double BIT = 0
SELECT #is_double = CASE WHEN EXISTS(SELECT *
FROM spot_logs
WHERE log_timestamp >= dateadd(second, -10, getdate()) AND storage_spot=#spot AND delta=#added_racks)
THEN 1 ELSE 0 END
BEGIN
BEGIN TRAN
UPDATE storage_spots
SET remaining_capacity= #orig_capacity - #added_racks
WHERE storage_spot=#spot
IF ##ROWCOUNT <> 1 OR #is_double <> 0
-- If double, rollback UPDATE
ROLLBACK TRAN
ELSE
-- If no double, commit UPDATE
COMMIT TRAN
-- write log
INSERT INTO spot_logs
(storage_spot, former_capacity, new_capacity, delta, log_timestamp, double_detected)
VALUES
(#spot, #orig_capacity, #orig_capacity-#added_racks, #added_racks, getdate(), #is_double)
END
END
GO
I was thinking about possible causes (and a way to trace them) and then it hit me - you have no value validation!
Here's a simple example to illustrate the problem:
Spot | capacity
---------------
x1 | 1
Update spots set capacity = capacity - 2 where spot = 'X1'
Your scanner most likely gave you higher quantity than you had capacity to take in.
I am not sure how your business logic goes, but you need to perform something in lines of
Update spots set capacity = capacity - #added_items where spot = 'X1' and capacity >= #added_items
if ##rowcount <> 1;
rollback;
EDIT: few methods to trace your problem without implementing validation:
Create a logging table (with timestamp, user id (user, that is connected to DB), session id, old value, new value, delta value (added items).
Option 1:
Log all updates that change value from positive to negative (at least until you figure out the problem).
The drawback of this option is that it will not register double calls that do not result in a negative capacity.
Option 2: (logging identical updates):
Create script that creates a global temporary table and deletes records, from that table timestamps older than... let's say 10 minutes once every minute or so (play around with numbers).
This temporary table should hold the data passed to your update statement so 'spot', 'added_items' + 'timestamp' (for tracking).
Now the crucial part: When you call your update statement check if a similar record exists in the temporary table (same spot and added_items and where current timestamp is between timestamp and [timestamp + 1 second] - again play around with numbers). If a record like that exist log that update, if not add it to temporary table.
This will register identical updates that go within 1 second of each other (or whatever time-frame you choose).
I found here an alternative SQL query that does the update the way I need, but with a temporary value using DECLARE. Would it work better in my case, or is my initial query correct?
Initial query:
UPDATE storage_spots
SET remaining_capacity = remaining_capacity - #added_items
WHERE storage_spot=#spot
Alternative query:
DECLARE #orig_capacity SMALLINT
SELECT TOP 1 #orig_capacity = remaining_capacity
FROM storage_spots
WHERE spot=#spot
UPDATE Products
SET remaining_capacity = #orig_capacity - #added_items
WHERE spot=#spot
Also, should I get rid of the ROLLBACK/COMMIT instructions?

Deleting 1 millions rows in SQL Server

I am working on a client's database and there is about 1 million rows that need to be deleted due to a bug in the software. Is there an efficient way to delete them besides:
DELETE FROM table_1 where condition1 = 'value' ?
Here is a structure for a batched delete as suggested above. Do not try 1M at once...
The size of the batch and the waitfor delay are obviously quite variable, and would depend on your servers capabilities, as well as your need to mitigate contention. You may need to manually delete some rows, measuring how long they take, and adjust your batch size to something your server can handle. As mentioned above, anything over 5000 can cause locking (which I was not aware of).
This would be best done after hours... but 1M rows is really not a lot for SQL to handle. If you watch your messages in SSMS, it may take a while for the print output to show, but it will after several batches, just be aware it won't update in real-time.
Edit: Added a stop time #MAXRUNTIME & #BSTOPATMAXTIME. If you set #BSTOPATMAXTIME to 1, the script will stop on it's own at the desired time, say 8:00AM. This way you can schedule it nightly to start at say midnight, and it will stop before production at 8AM.
Edit: Answer is pretty popular, so I have added the RAISERROR in lieu of PRINT per comments.
DECLARE #BATCHSIZE INT, #WAITFORVAL VARCHAR(8), #ITERATION INT, #TOTALROWS INT, #MAXRUNTIME VARCHAR(8), #BSTOPATMAXTIME BIT, #MSG VARCHAR(500)
SET DEADLOCK_PRIORITY LOW;
SET #BATCHSIZE = 4000
SET #WAITFORVAL = '00:00:10'
SET #MAXRUNTIME = '08:00:00' -- 8AM
SET #BSTOPATMAXTIME = 1 -- ENFORCE 8AM STOP TIME
SET #ITERATION = 0 -- LEAVE THIS
SET #TOTALROWS = 0 -- LEAVE THIS
WHILE #BATCHSIZE>0
BEGIN
-- IF #BSTOPATMAXTIME = 1, THEN WE'LL STOP THE WHOLE JOB AT A SET TIME...
IF CONVERT(VARCHAR(8),GETDATE(),108) >= #MAXRUNTIME AND #BSTOPATMAXTIME=1
BEGIN
RETURN
END
DELETE TOP(#BATCHSIZE)
FROM SOMETABLE
WHERE 1=2
SET #BATCHSIZE=##ROWCOUNT
SET #ITERATION=#ITERATION+1
SET #TOTALROWS=#TOTALROWS+#BATCHSIZE
SET #MSG = 'Iteration: ' + CAST(#ITERATION AS VARCHAR) + ' Total deletes:' + CAST(#TOTALROWS AS VARCHAR)
RAISERROR (#MSG, 0, 1) WITH NOWAIT
WAITFOR DELAY #WAITFORVAL
END
BEGIN TRANSACTION
DoAgain:
DELETE TOP (1000)
FROM <YourTable>
IF ##ROWCOUNT > 0
GOTO DoAgain
COMMIT TRANSACTION
Maybe this solution from Uri Dimant
WHILE 1 = 1
BEGIN
DELETE TOP(2000)
FROM Foo
WHERE <predicate>;
IF ##ROWCOUNT < 2000 BREAK;
END
(Link: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/b5225ca7-f16a-4b80-b64f-3576c6aa4d1f/how-to-quickly-delete-millions-of-rows?forum=transactsql)
Here is something I have used:
If the bad data is mixed in with the good-
INSERT INTO #table
SELECT columns
FROM old_table
WHERE statement to exclude bad rows
TRUNCATE old_table
INSERT INTO old_table
SELECT columns FROM #table
Not sure how good this would be but what if you do like below (provided table_1 is a stand alone table; I mean no referenced by other table)
create a duplicate table of table_1 like table_1_dup
insert into table_1_dup select * from table_1 where condition1 <> 'value';
drop table table_1
sp_rename table_1_dup table_1
If you cannot afford to get the database out of production while repairing, do it in small batches. See also: How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table
If you are in a hurry and need the fastest way possible:
take the database out of production
drop all non-clustered indexes and triggers
delete the records (or if the majority of records is bad, copy+drop+rename the table)
(if applicable) fix the inconsistencies caused by the fact that you dropped triggers
re-create the indexes and triggers
bring the database back in production

Stored Procedure Does Not Fire Last Command

On our SQL Server (Version 10.0.1600), I have a stored procedure that I wrote.
It is not throwing any errors, and it is returning the correct values after making the insert in the database.
However, the last command spSendEventNotificationEmail (which sends out email notifications) is not being run.
I can run the spSendEventNotificationEmail script manually using the same data, and the notifications show up, so I know it works.
Is there something wrong with how I call it in my stored procedure?
[dbo].[spUpdateRequest](#packetID int, #statusID int output, #empID int, #mtf nVarChar(50)) AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
DECLARE #id int
SET #id=-1
-- Insert statements for procedure here
SELECT A.ID, PacketID, StatusID
INTO #act FROM Action A JOIN Request R ON (R.ID=A.RequestID)
WHERE (PacketID=#packetID) AND (StatusID=#statusID)
IF ((SELECT COUNT(ID) FROM #act)=0) BEGIN -- this statusID has not been entered. Continue
SELECT ID, MTF
INTO #req FROM Request
WHERE PacketID=#packetID
WHILE (0 < (SELECT COUNT(ID) FROM #req)) BEGIN
SELECT TOP 1 #id=ID FROM #req
INSERT INTO Action (RequestID, StatusID, EmpID, DateStamp)
VALUES (#id, #statusID, #empID, GETDATE())
IF ((#mtf IS NOT NULL) AND (0 < LEN(RTRIM(#mtf)))) BEGIN
UPDATE Request SET MTF=#mtf WHERE ID=#id
END
DELETE #req WHERE ID=#id
END
DROP TABLE #req
SELECT #id=##IDENTITY, #statusID=StatusID FROM Action
SELECT TOP 1 #statusID=ID FROM Status
WHERE (#statusID<ID) AND (-1 < Sequence)
EXEC spSendEventNotificationEmail #packetID, #statusID, 'http:\\cpweb:8100\NextStep.aspx'
END ELSE BEGIN
SET #statusID = -1
END
DROP TABLE #act
END
Idea of how the data tables are connected:
From your comments I get you do mainly C# development. A basic test is to make sure the sproc is called with the exact same arguments you expect
PRINT '#packetID: ' + #packetID
PRINT '#statusID: ' + #statusID
EXEC spSendEventNotificationEmail #packetID, #statusID, 'http:\\cpweb:8100\NextStep.aspx'
This way you 1. know that the exec statement is reached 2. the exact values
If this all works than I very good candidate is that you have permission to run the sproc and your (C#?) code that calls it doesn't. I would expect that an error is thrown tough.
A quick test to see if the EXEC is executed fine is to do an insert in a dummy table after it.
Update 1
I suggested to add PRINT statements but indeed as you say you cannot (easily) catch them from C#. What you could do is insert the 2 variables in a log table that you newly create. This way you know the exact values that flow from the C# execution.
As to the why it now works if you add permissions I can't give you a ready answer. SQL security is not transparent to me either. But its good to research yourself a but further. Do you have to add both guest and public?
It would also help to see what's going inside spSendEventNotificationEmail. Chances are good that sproc is using a resource where it didn't have permission before. This could be an object like a table or maybe another sproc. Security is heavily dependent on context/settings and not an easy problem to tackle with a Q/A site like SO.

How can I get number of deleted records?

I have a stored procedure which deletes certain records. I need to get the number of deleted records.
I tried to do it like this:
DELETE FROM OperationsV1.dbo.Files WHERE FileID = #FileID
SELECT ##ROWCOUNT AS DELETED;
But DELETED is shown as 0, though the appropriate records are deleted. I tried SET NOCOUNT OFF; without success. Could you please help?
Thanks.
That should work fine. The setting of NOCOUNT is irrelevant. This only affects the n rows affected information sent back to the client and has no effect on the workings of ##ROWCOUNT.
Do you have any statements between the two that you have shown? ##ROWCOUNT is reset after every statement so you must retrieve the value immediately with no intervening statements.
I use this code snippet when debugging stored procedures to verify counts after operations, such as a delete:
DECLARE #Msg varchar(30)
...
SELECT #Msg = CAST(##ROWCOUNT AS VARCHAR(10)) + ' rows affected'
RAISERROR (#Msg, 0, 1) WITH NOWAIT
START TRANSACTION;
SELECT #before:=(SELECT count(*) FROM OperationsV1.dbo.Files);
DELETE FROM OperationsV1.dbo.Files WHERE FileID = #FileID;
SELECT #after:=(SELECT count(*) FROM OperationsV1.dbo.Files);
COMMIT;
SELECT #before-#after AS DELETED;
I don't know about SQL server, but in MySQL SELECT count(*) FROM ... is an extremely cheap operation.

Resources