I have a database where data is processed in some kind of batches, where each batch may contain even a million records. I am processing data in a console application, and when I'm done with a batch, I mark it as Done (to avoid reading it again in case it does not get deleted), delete it and move on to a next batch.
I have the following simple stored procedure which deletes processed "batches" of data
CREATE PROCEDURE [dbo].[DeleteBatch]
(
#BatchId bigint
)
AS
SET XACT_ABORT ON
BEGIN TRANSACTION
DELETE FROM table1 WHERE BatchId = #BatchId
DELETE FROM table2 WHERE BatchId = #BatchId
DELETE FROM table3 WHERE BatchId = #BatchId
COMMIT
RETURN ##Error
I am using NHibernate with command timeout value 10 minutes, and the DeleteBatch procedure call times out occasionally.
Actually I don't want to wait for DeleteBatch to complete. I already have marked the batch as Done, so I want to go processing a next batch or maybe even exit my console application, if there are no more pending batches.
I am using Microsoft SQL Express 2012.
Is there any simple solution to tell the SQL server - "launch DeleteBatch and run it asynchronously even if I disconnect, and I don't even need the result of the procedure"?
It would also be great if I could set a lower processing priority for DeleteBatch because other queries are more important than DeleteBatch.
I dont know much about NHibernate. But if you were or can use ADO.NET in this scenario then you can implement asynchronous database operations easliy using the SqlCommand.BeginExecuteNonQuery Method in C#. This method starts the process of asynchronously executing a Transact-SQL statement or stored procedure that does not return rows, so that other tasks can run concurrently while the statement is executing.
EDIT: If you really want to exit from your console app before the db operation ends then you will have to manually create threads in your code and perform the db operation in those threads. Now when you close your console app these threads would still be alive because Threads created using System.Thread.Thread are foreground threads by default. But having said that it is also important to consider how many threads you will create. In your case you would have to assign 1 thread for each batch. If number of batches is very large then large number of threads would need to be created which would inturn eat a large amount of your CPU resources and would even freeze your OS for a long time.
Another simple solution I could suggest is to insert the BatchIds into some database table. Create an INSERT TRIGGER on that table. This trigger would then call a stored proc with BatchId as its parameter and would perform the required tasks.
Hope it helps.
What if your console application were, instead of trying to delete the batch, just write the batch id into a "BatchIdsToDelete" table. Then, you could use an agent job running every x minutes/seconds or whatever, to delete the top x percent records for a given batch id, and maybe sleeping a little before tackling the next x percent.
Maybe worth having a look at that?
Look at this article which explains how to do reliable asynchronous procedure execution, code included. IS based on Service Broker.
the problem with trying to use .NEt async features (like BeginExecute, or task etc) is that the call is unreliable: if the process exits before the procedure completes the execution is canceled in the server as the session is disconnected.
But you need to also look at the task itself, why is the deletion taking +10 minutes? is it blocked by contention? are you missing indexes on BatchId? Use the Performance Troubleshooting Flowchart.
Late to the party, but if someone else has this problem use SQLCMD. With express you are limited in the number of users (I think 2, but it may have changed since I the last time I did much with express). You can have sqlcmd, run queries, stored procedures ...
And you can kick off the sqlcmd with Windows Scheduler. A script, an outlook rule ...
I used it to manage like 3 or 4 thousand SQL Server Express instances, with their nightly maintenance scheduled with the Windows Scheduler.
You could also create and run a PowerShell script, it's more versatile and probably a more widely used than sqlcmd.
I needed a same thing..
After searching for long time I found the solution
Its d easiest way
SqlConnection connection = new SqlConnection();
connection.ConnectionString = "your connection string";
SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder(connection.ConnectionString);
builder.AsynchronousProcessing = true;
SqlConnection newSqlConn = new SqlConnection(builder.ConnectionString);
newSqlConn.Open();
SqlCommand cmd = new SqlCommand(storeProcedureName, newSqlConn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.BeginExecuteNonQuery(null, null);
Ideally SQLConnection object should take an optional parameter / property, URL of a web service, be that WCF or WebApi, or something yet to be named, and if the user wishes to, notify user of execution advance and / or completion status by calling this URL with well known message.
Theoretically DBConnection is extensible object one is free to implement. However, it will take some review of what really can be and needs to be done, before this approach can be said feasible.
Related
I want to automate some DB scaling in my Azure SQL database.
This can be easily initiated using this:
ALTER DATABASE [myDatabase]
MODIFY (EDITION ='Standard', SERVICE_OBJECTIVE = 'S3', MAXSIZE = 250 GB);
But that command returns instantly, whilst the resize takes a few 10s of seconds to complete.
We can check the actual current size using the following, which doesn't update until the change is complete:
SELECT DATABASEPROPERTYEX('myDatabase', 'ServiceObjective')
So naturally I wanted to combine this with a WHILE loop and a WAITFOR DELAY, in order to create a stored procedure that will change the DB size, and not return until the change has completed.
But when I wrote that stored procedure (script below) and ran it, I get the following error every time (at about the same time that the size change completes):
A severe error occurred on the current command. The results, if any, should be discarded.
The resize succeeds, but I get errors instead of a cleanly finishing stored procedure call.
Various things I've already tested:
If I separate the "Initiate" and the "WaitLoop" sections, and start the WaitLoop in a separate connection, after initiation but before completion, then that also gives the same error.
Adding a TRY...CATCH block doesn't help either.
Removing the stored procedure aspect, and just running the code directly doesn't fix it either
My interpretation is that the Resize isn't quite as transparent as one might hope, and that connections created before the resize completes get corrupted in some sense.
Whatever the exact cause, it seems to me that this stored procedure just isn't achievable at all; I'll have to do the polling from my external process - opening new connections each time. It's not an awful solution, but it is less pleasant than being able to encapsulate the whole thing in a single stored procedure. Ah well, such is life.
Question:
Before I give up on this entirely ... does anyone have an alternative explanation or solution for this error, which would thus allow a single stored procedure call to change the size and then not return until that sizeChange actually completed?
Initial stored procedure code (simplified to remove parameterisation complexity):
CREATE PROCEDURE [trusted].[sp_ResizeAzureDbToS3AndWaitForCompletion]
AS
ALTER DATABASE [myDatabase]
MODIFY (EDITION ='Standard', SERVICE_OBJECTIVE = 'S3', MAXSIZE = 250 GB);
WHILE ((SELECT DATABASEPROPERTYEX('myDatabase', 'ServiceObjective')) != 'S3')
BEGIN
WAITFOR DELAY '00:00:05'
END
RETURN 0
Whatever the exact cause, it seems to me that this stored procedure
just isn't achievable at all; I'll have to do the polling from my
external process - opening new connections each time.
Yes this is correct. As described here when you change the service objective of a database
A new compute instance is created with the requested service tier and
compute size... the database remains online during this step, and
connections continue to be directed to the database in the original
compute instance ... [then] existing connections to the database in
the original compute instance are dropped. Any new connections are
established to the database in the new compute instance.
The bolded text will kill your stored procedure execution. You need to do this check externally
I have a Perl code which connects with database and scan data from different different table. I face a problem if I lose my connection: it roll back all the transaction. How could I make the Perl script resume the connection and start the process from where the interruption took place? Can I use the Perl to resume the connection or any thing other technique to start the process from where the interruption took place if so could any one guide me with the steps please.
It is actually required because of we have lots of data and takes 1 week to scan all the data and insert in specific table, in between if we run database offline backup it disconnect all the connection and whatever transaction happens it roll back and need to run once again from the beginning.
We can commit the transaction whatever done but challenge is how we can start process from where the interruption took place so we don't require to run from the beginning.
Relying on a DB connection to be consistently open for over a day is the wrong approach.
A possible solution involves:
1) Connect to the DB to create a DB Handle. Use an infinite loop and sleep to wait until you have a good db handle. Put this into a subroutine.
2) Put the individual requests for individual tables in a data structure like an array
Execute separate Queries in separate statements in a loop (.
Check if the dbh is stale before each individual request. Clean and recreate the handle if necessary. Use the subroutine from 1)
Handle breakdowns during a request in EVAL blocks using the "redo" statement to make sure no statement gets skipped.
3) Keep the data between requests either in memory or any non-SQL storage like a Key/Value Store ( Redis, )
4) Compute whatever needs computation
5) When you have all data for your commit transaction, do.
This solution assumes you don't care about changes between reading and committing back. If you do, you need to LOCK your affected tables first. You probably don't want to lock a table for a week though.
I've got several SQL Server Agent jobs that should run sequentially. To keep a nice overview of the jobs that should execute I have created a main job that calls the other jobs with a call to EXEC msdb.dbo.sp_start_job N'TEST1'. The sp_start_job finishes instantly (Job Step 1), but then I want my main job to wait until job TEST1 has finished before calling the next job.
So I have written this small script that starts executing right after the job is called (Job Step 2), and forces the main job to wait until the sub job has finished:
WHILE 1 = 1
BEGIN
WAITFOR DELAY '00:05:00.000';
SELECT *
INTO #jobs
FROM OPENROWSET('SQLNCLI', 'Server=TESTSERVER;Trusted_Connection=yes;',
'EXEC msdb.dbo.sp_help_job #job_name = N''TEST1'',
#execution_status = 0, #job_aspect = N''JOB''');
IF NOT (EXISTS (SELECT top 1 * FROM #jobs))
BEGIN
BREAK
END;
DROP TABLE #jobs;
END;
This works well enough. But I got the feeling smarter and/or safer (WHILE 1 = 1?) solutions should be possible.
I'm curious about the following things, hope you can provide me with some insights:
What are the problems with this approach?
Can you suggest a better way to do this?
(I posted this question at dba.stackexchange.com as well, to profit from the less-programming-more-dba'ing point of view too.)
If you choose to poll a table, then you'd need to look at msdb.dbo.sysjobhistory and wait until the run_status is not 4. Still gonna be icky though.
Perhaps a different approach would be for the last step of the jobs, fail or success, to make an entry back on the "master" job server that the process has completed and then you simply look locally. Might also make tracking down "what the heck happened" easier by consolidating starts and stops at a centralized job server.
A third and much more robust approach would be to use something like Service Broker to handle communicating and signaling between processes. That'll require much more setup but it'd be the most mechanism for communicating between processes.
No problem with the approach. I was doing somewhat like your requirement only and i used sysjobhistory table from msdb to see the run status because of some other reasons.
Coming back to your question, Please refer msdb.dbo.sp_start_job stored procedure using the same approach and its been used by one default Microsoft BizTalk job 'MessageBox_Message_ManageRefCountLog_BizTalkMsgBoxDb' to call another dependent default biztalk job 'MessageBox_Message_Cleanup_BizTalkMsgBoxDb'. Even there is one stored procedure in BizTalk messagebox to check the status of job. Please refer 'int_IsAgentJobRunning' in BizTalk messagebox.
If a user inserts rows into a table, i would like SQL Server to perform some additional processing - but not in the context of the user's transaction.
e.g. The user gives read access to a folder:
UPDATE Folders SET ReadAccess = 1
WHERE FolderID = 7
As far as the user is concerned i want that to be the end of the atomic operation. In reality i have to now go find all child files and folders and give them ReadAccess.
EXECUTE SynchronizePermissions
This is a potentially lengthy operation (over 2s). i want this lengthy operation to happen "later". It can happen 0 seconds later, and before the carbon-unit has a chance to think about it the asynchronous update is done.
How can i run this required operation asychronously when it's required (i.e. triggered)?
The ideal would be:
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
EXECUTEASYNCHRONOUS SynchronizePermissions
or
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
EXECUTE SynchronizePermissions WITH(ASYNCHRONOUS)
Right now this happens as a trigger:
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
EXECUTE SynchronizePermissions
and the user is forced to wait the 3 seconds every time they make a change to the Folders table.
i've thought about creating a Scheduled Task on the user, that runs every minute, and check for an PermissionsNeedSynchronizing flag:
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
UPDATE SystemState SET PermissionsNeedsSynchronizing = 1
The scheduled task binary can check for this flag, run if the flag is on:
DECLARE #FlagValue int
SET #FlagValue = 0;
UPDATE SystemState SET #FlagValue = PermissionsNeedsSynchronizing+1
WHERE PermissionsNeedsSynchronizing = 1
IF #FlagValue = 2
BEGIN
EXECUTE SynchronizePermissions
UPDATE SystemState SET PermissionsNeedsSynchronizing = 0
WHERE PermissionsNeedsSynchronizing = 2
END
The problem with a scheduled task is:
- the fastest it can run is every 60 seconds
- it's suffers from being a polling solution
- it requires an executable
What i'd prefer is a way that SQL Server could trigger the scheduled task:
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
EXECUTE SynchronizePermissionsAsychronous
CREATE PROCEDURE dbo.SynchronizePermissionsAsychronous AS
EXECUTE sp_ms_StartWindowsScheduledTask #taskName="SynchronousPermissions"
The problem with this is:
- there is no sp_ms_StartWinodowsScheduledTask system stored procedure
So i'm looking for ideas for better solutions.
Update: The previous example is a problem, that has has no good solution, for five years now. A problem from 3 years ago, that has no good solution is a table that i need to update a meta-data column after an insert/update. The metadata takes too long to calculate in online transaction processing, but i am ok with it appearing 3 or 5 seconds later:
CREATE TRIGGER dbo.UpdateFundsTransferValues FOR INSERT, UPDATE AS
UPDATE FundsTransfers
SET TotalOrderValue = (SELECT ....[snip]....),
TotalDropValue = (SELECT ....,[snip]....)
WHERE FundsTransfers.FundsTransferID IN (
SELECT i.FundsTransferID
FROM INSERTED i
)
And the problem that i'm having today is a way to asychronously update some metadata after a row has been transitionally inserted or modified:
CREATE TRIGGER dbo.UpdateCDRValue FOR INSERT, UPDATE AS
UPDATE LCDs
SET CDRValue = (SELECT ....[snip]....)
WHERE LCDs.LCDGUID IN (
SELECT i.LCDGUID
FROM INSERTED i
)
Update 2: i've thought about creating a native, or managed, dll and using it as an extended stored procedure. The problem with that is:
you can't script a binary
i'm now allowed to do it
Use a queue table, and have a different background process pick things up off the queue and process them. The trigger itself is by definition a part of the user's transaction - this is precisely why they are often discouraged (or at least people are warned to not use expensive techniques inside triggers).
Create a SQL Agent job and run it with sp_start_job..it shouldn't wait for completion
However you need the proper permission to run jobs
Members of SQLAgentUserRole and SQLAgentReaderRole can only start jobs
that they own. Members of SQLAgentOperatorRole can start all local
jobs including those that are owned by other users. Members of
sysadmin can start all local and multiserver jobs.
The problem with this approach is that if the job is already running it can't be started until it is finished
Otherwise go with the queue table that Aaron suggested, it is cleaner and better
We came across this problem some time ago, and I figured out a solution that works beautifully. I do have a process running in the background-- but just like you, I didn't want it to have to poll every 60 seconds.
Here are the steps:
(1) Our trigger doesn't run the db update itself. It merely throws a "flag file" into a folder that is monitored by the background process.
(2) The background process monitors that folder using Windows Change Notification (this is the really cool part, because you don't have to poll the folder-- your process sleeps until Windows notifies it that a file has appeared). Whenever the background process is awoken by Windows, it runs the db update. Then it deletes the flag file(s), goes to sleep again and tells Windows to wake it up when another file appears in the folder.
This is working exactly as you described: the triggered update runs shortly after the main database event, and voila, the user doesn't have to wait the extra few seconds. I just love it.
You don't necessarily need to compile your own executable to do this: many scripting languges can use Windows Change Notification. I wrote the background process in Perl and it only took a few minutes to get it working.
I am running a bunch of database migration scripts. I find myself with a rather pressing problem, that business is waking up and expected to see their data, and their data has not finished migrating. I also took the applications offline and they really need to be started back up. In reality "the business" is a number of companies, and therefore I have a number of scripts running SPs in one query window like so:
EXEC [dbo].[MigrateCompanyById] 34
GO
EXEC [dbo].[MigrateCompanyById] 75
GO
EXEC [dbo].[MigrateCompanyById] 12
GO
EXEC [dbo].[MigrateCompanyById] 66
GO
Each SP calls a large number of other sub SPs to migrate all of the data required. I am considering cancelling the query, but I'm not sure at what point the execution will be cancelled. If it cancels nicely at the next GO then I'll be happy. If it cancels mid way through one of the company migrations, then I'm screwed.
If I cannot cancel, could I ALTER the MigrateCompanyById SP and comment all the sub SP calls out? Would that also prevent the next one from running, whilst completing the one that is currently running?
Any thoughts?
One way to acheive a controlled cancellation is to add a table containing a cancel flag. You can set this flag when you want to cancel exceution and your SP's can check this at regular intervals and stop executing if appropriate.
I was forced to cancel the script anyway.
When doing so, I noted that it cancels after the current executing statement, regardless of where it is in the SP execution chain.
Are you bracketing the code within each migration stored proc with transaction handling (BEGIN, COMMIT, etc.)? That would enable you to roll back the changes relatively easily depending on what you're doing within the procs.
One solution I've seen, you have a table with a single record having a bit value of 0 or 1, if that record is 0, your production application disallows access by the user population, enabling you to do whatever you need to and then set that flag to 1 after your task is complete to enable production to continue. This might not be practical given your environment, but can give you assurance that no users will be messing with your data through your app until you decide that it's ready to be messed with.
you can use this method to report execution progress of your script.
the way you have it now is every sproc is it's own transaction. so if you cancel the script you will get it update only partly up to the point of the last successfuly executed sproc.
you cna however put it all in a singel transaction if you need all or nothign update.