How we can make database resumable in perl script - database

I have a Perl code which connects with database and scan data from different different table. I face a problem if I lose my connection: it roll back all the transaction. How could I make the Perl script resume the connection and start the process from where the interruption took place? Can I use the Perl to resume the connection or any thing other technique to start the process from where the interruption took place if so could any one guide me with the steps please.
It is actually required because of we have lots of data and takes 1 week to scan all the data and insert in specific table, in between if we run database offline backup it disconnect all the connection and whatever transaction happens it roll back and need to run once again from the beginning.
We can commit the transaction whatever done but challenge is how we can start process from where the interruption took place so we don't require to run from the beginning.

Relying on a DB connection to be consistently open for over a day is the wrong approach.
A possible solution involves:
1) Connect to the DB to create a DB Handle. Use an infinite loop and sleep to wait until you have a good db handle. Put this into a subroutine.
2) Put the individual requests for individual tables in a data structure like an array
Execute separate Queries in separate statements in a loop (.
Check if the dbh is stale before each individual request. Clean and recreate the handle if necessary. Use the subroutine from 1)
Handle breakdowns during a request in EVAL blocks using the "redo" statement to make sure no statement gets skipped.
3) Keep the data between requests either in memory or any non-SQL storage like a Key/Value Store ( Redis, )
4) Compute whatever needs computation
5) When you have all data for your commit transaction, do.
This solution assumes you don't care about changes between reading and committing back. If you do, you need to LOCK your affected tables first. You probably don't want to lock a table for a week though.

Related

Launch stored procedure and continue running it even if disconnected

I have a database where data is processed in some kind of batches, where each batch may contain even a million records. I am processing data in a console application, and when I'm done with a batch, I mark it as Done (to avoid reading it again in case it does not get deleted), delete it and move on to a next batch.
I have the following simple stored procedure which deletes processed "batches" of data
CREATE PROCEDURE [dbo].[DeleteBatch]
(
#BatchId bigint
)
AS
SET XACT_ABORT ON
BEGIN TRANSACTION
DELETE FROM table1 WHERE BatchId = #BatchId
DELETE FROM table2 WHERE BatchId = #BatchId
DELETE FROM table3 WHERE BatchId = #BatchId
COMMIT
RETURN ##Error
I am using NHibernate with command timeout value 10 minutes, and the DeleteBatch procedure call times out occasionally.
Actually I don't want to wait for DeleteBatch to complete. I already have marked the batch as Done, so I want to go processing a next batch or maybe even exit my console application, if there are no more pending batches.
I am using Microsoft SQL Express 2012.
Is there any simple solution to tell the SQL server - "launch DeleteBatch and run it asynchronously even if I disconnect, and I don't even need the result of the procedure"?
It would also be great if I could set a lower processing priority for DeleteBatch because other queries are more important than DeleteBatch.
I dont know much about NHibernate. But if you were or can use ADO.NET in this scenario then you can implement asynchronous database operations easliy using the SqlCommand.BeginExecuteNonQuery Method in C#. This method starts the process of asynchronously executing a Transact-SQL statement or stored procedure that does not return rows, so that other tasks can run concurrently while the statement is executing.
EDIT: If you really want to exit from your console app before the db operation ends then you will have to manually create threads in your code and perform the db operation in those threads. Now when you close your console app these threads would still be alive because Threads created using System.Thread.Thread are foreground threads by default. But having said that it is also important to consider how many threads you will create. In your case you would have to assign 1 thread for each batch. If number of batches is very large then large number of threads would need to be created which would inturn eat a large amount of your CPU resources and would even freeze your OS for a long time.
Another simple solution I could suggest is to insert the BatchIds into some database table. Create an INSERT TRIGGER on that table. This trigger would then call a stored proc with BatchId as its parameter and would perform the required tasks.
Hope it helps.
What if your console application were, instead of trying to delete the batch, just write the batch id into a "BatchIdsToDelete" table. Then, you could use an agent job running every x minutes/seconds or whatever, to delete the top x percent records for a given batch id, and maybe sleeping a little before tackling the next x percent.
Maybe worth having a look at that?
Look at this article which explains how to do reliable asynchronous procedure execution, code included. IS based on Service Broker.
the problem with trying to use .NEt async features (like BeginExecute, or task etc) is that the call is unreliable: if the process exits before the procedure completes the execution is canceled in the server as the session is disconnected.
But you need to also look at the task itself, why is the deletion taking +10 minutes? is it blocked by contention? are you missing indexes on BatchId? Use the Performance Troubleshooting Flowchart.
Late to the party, but if someone else has this problem use SQLCMD. With express you are limited in the number of users (I think 2, but it may have changed since I the last time I did much with express). You can have sqlcmd, run queries, stored procedures ...
And you can kick off the sqlcmd with Windows Scheduler. A script, an outlook rule ...
I used it to manage like 3 or 4 thousand SQL Server Express instances, with their nightly maintenance scheduled with the Windows Scheduler.
You could also create and run a PowerShell script, it's more versatile and probably a more widely used than sqlcmd.
I needed a same thing..
After searching for long time I found the solution
Its d easiest way
SqlConnection connection = new SqlConnection();
connection.ConnectionString = "your connection string";
SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder(connection.ConnectionString);
builder.AsynchronousProcessing = true;
SqlConnection newSqlConn = new SqlConnection(builder.ConnectionString);
newSqlConn.Open();
SqlCommand cmd = new SqlCommand(storeProcedureName, newSqlConn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.BeginExecuteNonQuery(null, null);
Ideally SQLConnection object should take an optional parameter / property, URL of a web service, be that WCF or WebApi, or something yet to be named, and if the user wishes to, notify user of execution advance and / or completion status by calling this URL with well known message.
Theoretically DBConnection is extensible object one is free to implement. However, it will take some review of what really can be and needs to be done, before this approach can be said feasible.

How to only allow one update/insert to a table if row is outdated

I have a file stored on disk that can be access across multiple servers in a web farm. This file is updated as necessary based on data changes in the database. I have a database table that stores a row with a URI for this file and some hashes based off of some database tables. If the hashes don't match their respective tables, then the file need to be regenerated and a new row needs to be inserted.
How do I make it so that only 1 client regenerates this file and inserts a row?
The easiest but worst solution (because of locks) is to:
BEGIN TRANSACTION
SELECT ROW FROM TABLE (lock the table for the remainder of the transaction)
IF ROW IS OUT OF DATE:
REGENERATE FILE
INSERT ROW INTO TABLE
DO SOME STUFF WITH FILE (30s)
COMMIT TRANSACTION
However, if multiple clients execute this code, all of the subsequent clients sit for a long time while the "DO SOME STUFF WITH FILE" processes.
Is there a better way to handle this? Maybe changing the way I process the file before the commit to make it faster? I've been stumped on this for a couple days.
It sounds like you need to do your file processing asynchronously, so the file process is spun off and the transaction completes in a timely manner. There are a few ways to do that, but the easiest might be to replace the "do stuff with file" with a "insert a record into the table This_File_Needs_To_Be_Updated, then run a job every few minutes that updates each record in that table. Or HERE is some code that generates a job on the fly. Or see THIS question on Stack Overflow.
The answer depends on the details of file level processing.
If you just swap the database and file operations, you risk corruption of the file or busy waiting (depending on how exactly you open it, and what your code does when a concurrent open is rejected). Busy waiting would definitely be worse than waiting on a database lock from a throughput (or any other) perspective.
If your file processing really takes so long as to be frequently causing queueing of requests, the only solutions are to add more powerful hardware or optimize file level processing.
For example, if the file only reflects the data in the database, you might get away with not updating it at all, and having a background process that periodically regenerates its content based on the data in the database. You might need to add versioning that makes sure that whoever reads the file is not receiving stale data. If the file pointed to by the URL has a new name every time, you might need an error handler that makes sure that GET requests are not habitually receiving a 404 response on new files.

Synchronizing sqlite database from memory to file

I'm writing an application which must log information pretty frequently, say, twice in a second. I wish to save the information to an sqlite database, however I don't mind to commit changes to the disk once every ten minutes.
Executing my queries when using a file-database takes to long, and makes the computer lag.
An optional solution is to use an in-memory database (it will fit, no worries), and synchronize it to the disk from time to time,
Is it possible? Is there a better way to achieve that (can you tell sqlite to commit to disk only after X queries?).
Can I solve this with Qt's SQL wrapper?
Let's assume you have an on-disk database called 'disk_logs' with a table called 'events'. You could attach an in-memory database to your existing database:
ATTACH DATABASE ':memory:' AS mem_logs;
Create a table in that database (which would be entirely in-memory) to receive the incoming log events:
CREATE TABLE mem_logs.events(a, b, c);
Then transfer the data from the in-memory table to the on-disk table during application downtime:
INSERT INTO disk_logs.events SELECT * FROM mem_logs.events;
And then delete the contents of the existing in-memory table. Repeat.
This is pretty complicated though... If your records span multiple tables and are linked together with foreign keys, it might be a pain to keep these in sync as you copy from an in-memory tables to on-disk tables.
Before attempting something (uncomfortably over-engineered) like this, I'd also suggest trying to make SQLite go as fast as possible. SQLite should be able to easily handly > 50K record inserts per second. A few log entries twice a second should not cause significant slowdown.
If you're executing each insert within it's own transaction - that could be a significant contributor to the slow-downs you're seeing. Perhaps you could:
Count the number of records inserted so far
Begin a transaction
Insert your record
Increment count
Commit/end transaction when N records have been inserted
Repeat
The downside is that if the system crashes during that period you risk loosing the un-committed records (but if you were willing to use an in-memory database, than it sounds like you're OK with that risk).
A brief search of the SQLite documentation turned up nothing useful (it wasn't likely and I didn't expect it).
Why not use a background thread that wakes up every 10 minutes, copies all of the log rows from the in-memory database to the external database (and deletes them from the in-memory database). When your program is ready to end, wake up the background thread one last time to save the last logs, then close all of the connections.

database record locking

I have a server application, and a database. Multiple instances of the server can run at the same time, but all data comes from the same database (on some servers it is postgresql, in other cases ms sql server).
In my application, there is a process that is performed which can take hours. I need to ensure that this process is only executed one at a time. If one server is processing, no other server instance can process until the first one has completed.
The process depends on one table (let's call it 'ProcessTable'). What I do is, before any server starts the hour-long process, I set a boolean flag in the ProcessTable which indicates that this record is 'locked' and is being processed (not all records in this table are processed / locked, so I need to specifically mark each record which is needed by the process). So when the next server instance comes along while the previous instance is still processing, it sees the boolean flags and throws an exception.
The problem is, that 2 server instances might both be activated at nearly the same time, and when both check the ProcessTable, there may not be any flags set, but both servers are actually in the process of 'setting' the flags but since the transaction hasn't yet commited for either process, neither process will see the locking done by the other process. This is because the locking mechanism itself may take a few seconds, so there is that window of opportunity where 2 servers might still be able to process at the same time.
It appears that what I need is a single record in my 'Settings' table which should store a boolean flag called 'LockInProgress'. So before even a server can lock the needed records in the ProcessTable, it first must make sure that it has full rights to do the locking by checking the 'LockInProgress' column in the Settings table.
So my question is, how do I prevent two servers from both modifying that LockInProgress column in the settings table, at the same time... or am I going about this in the wrong manner?
Please note that I need to support both postgresql and ms sql server as some servers use one database, and some servers use the other.
Thanks in advance...
How about obtaining a lock on the record first and then update the record to show "locked". This would avoid the 2nd instance to get a lock successfully and thereby the update of record fails.
The point is to make sure the lock and update as one atomic step.
Make a stored procedure that hands out the lock, and run it under 'serializable' isolation. This will guarantee that one and only one process can get at the resource at any given time.
Note that this means that the second process trying to get at the lock will block until the first process releases it. Also, if you have to get multiple locks in this manner, make sure that the design of the process guarantees that the locks will be acquired and released in the same order. This will avoid deadlock situations where two processes hold resources while waiting for each other to release locks.
Unless you can't deal with your other processes blocking this would probably be easier to implement and more robust than attempting to implement 'test and set' semantics.
I've been thinking about this, and I think this is the simplest way of doing things; I just execute a command like this:
update settings set settingsValue = '333' where settingsKey = 'ProcessLock' and settingsValue = '0'
'333' would be a unique value which each server process gets (based on date/time, server name, + random value etc).
If no other process has locked the table, then the settingsValue would be = to 0, and that statement would adjust the settingsValue.
If another process has already locked the table, then that statement becomes a no-op, and nothing get's modified.
I then immediately commit the transaction.
Finally, I requery the table for the settingsValue, and if it is the correct value, then our lock succeeded and we continue on, otherwise an exception is thrown, etc. When we're done with the lock, we reset the value back down to 0.
Since I'm using SERIALIZATION transaction mode, I can't see this causing any issues... please correct me if I'm wrong.

Sql Server 2005 - manage concurrency on tables

I've got in an ASP.NET application this process :
Start a connection
Start a transaction
Insert into a table "LoadData" a lot of values with the SqlBulkCopy class with a column that contains a specific LoadId.
Call a stored procedure that :
read the table "LoadData" for the specific LoadId.
For each line does a lot of calculations which implies reading dozens of tables and write the results into a temporary (#temp) table (process that last several minutes).
Deletes the lines in "LoadDate" for the specific LoadId.
Once everything is done, write the result in the result table.
Commit transaction or rollback if something fails.
My problem is that if I have 2 users that start the process, the second one will have to wait that the previous has finished (because the insert seems to put an exclusive lock on the table) and my application sometimes falls in timeout (and the users are not happy to wait :) ).
I'm looking for a way to be able to have the users that does everything in parallel as there is no interaction, except the last one: writing the result. I think that what is blocking me is the inserts / deletes in the "LoadData" table.
I checked the other transaction isolation levels but it seems that nothing could help me.
What would be perfect would be to be able to remove the exclusive lock on the "LoadData" table (is it possible to force SqlServer to only lock rows and not table ?) when the Insert is finished, but without ending the transaction.
Any suggestion?
Look up SET TRANSACTION ISOLATION LEVEL READ COMMITTED SNAPSHOT in Books OnLine.
Transactions should cover small and fast-executing pieces of SQL / code. They have a tendancy to be implemented differently on different platforms. They will lock tables and then expand the lock as the modifications grow thus locking out the other users from querying or updating the same row / page / table.
Why not forget the transaction, and handle processing errors in another way? Is your data integrity truely being secured by the transaction, or can you do without it?
if you're sure that there is no issue with cioncurrent operations except the last part, why not start the transaction just before those last statements, Whichever they are that DO require isolation), and commit immediately after they succeed.. Then all the upfront read operations will not block each other...

Resources