I maintain an an MVC application which incorporates some long running batch processes for sending newsletters, generating reports etc.
I had previously encountered a lot of issues with deadlocks, where one of these long running queries might be holding a lock on a row which then needs to be updated by another process.
The solution I originally came up with, was to have a scheduled task, which creates database snapshots, like so...
CREATE DATABASE MyDatabase_snapshot_[yyyyMMddHHmmss]... AS SNAPSHOT OF MyDatabase
My application then has some logic which will find the latest available snapshot, and use this for the readonly connection for the long-running processes, or anywhere else where a read-only connection was required.
The current setup is perfectly functional, and reliable. However being dependent on that scheduled task doesn't make me happy. I can imagine, at some stage in the future, if someone else is looking after this project, this could be an easy source of confusing issues. If the database was moved to another server, for example, and the snapshot creation scheduled task wasn't setup correctly.
I've since realised I could achieve a similar result by using snapshot transaction issolation, and avoid all the extra complexity of managing the creation and cleanup of the database snapshots.
However I'm now wondering whether there may be any performance drawbacks for doing this using transactions vs continuing to use the static snapshots.
Consider the following scenario.
The system periodically sends personalised job lists to approximately 20K subscribers. For each of these subscribers it does database lookups to create the matching jobs list.
What is has been doing, is looping through the full recipient list, and for each one...
Open a connection to the snapshot db
Run the query to find matching jobs
Close the snapshot db connection
If instead, it does the following...
Open the database connection to the normal database
(non-snapshot)
Create a snapshot issolated transaction
Run the query to find matching jobs
Close the transaction
Close the database connection
Does this actually translate to more work for the database server?
Specifically I'm wondering about what's involved at step #2.
Removing complexity from the application is a good thing, but not at the expense of performance. Particularly since this particular process is already quite server intensive, and takes quite a long time to run.
Related
TL;DR: Is it possible to basically create a fast, temporary, "fork" of a database (like a snapshot transaction) without any locks given that I know for a fact that the changes will never be committed and always be rolled back.
Details:
I'm currently working with SQL Server and am trying to implement a feature where the user can try all sorts of stuff (in the application) that is never persisted in the database.
My first instinct was to (mis)use snapshot transactions for that to basically "fork" the database into a short lived (under 15min) user-specific context. The rest of the application wouldn't even have to know that all the actions the user performs will later be thrown away (I currently persist the connection across requests - it's a web application).
Problem is that there are situations where the snapshot transaction locks and waits for other transactions to complete. My guess is that this happens because SQL server has to make sure it can merge the data if one of the open transactions commits, but in my case I know for a fact that I will never commit the changes from this transactions and always throw the data away (note that not everything happens in this transactions, there are other things that a user can do that happen on a different connection and are persisted).
Are there other ideas, that don't involve cloning the database (too large/slow) or updating/changing the schema of all tables (I'd like to avoid "poisoning" the schema with the implemenation detail of the "try out" feature).
No. SQL Server has copy-on-write Database Snapshots, but the snapshots are read-only. So where a SNAPSHOT transaction acquires regular exclusive locks when it modifies the database, a Database Snapshot would just give you an error.
There are storage technologies that can a writable copy-on-write storage snapshot, like NetApp. You would run a command to create a new LUN that is a snapshot of an existing LUN, present it to your server as a disk, mount its volume in a folder or drive letter, and attach the files you find there as a database. This is often done for cloning across environments to refresh dev/test with prod data without having to copy all the data. But it seems like way too much infrastructure work for your use case.
I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.
I am designing a solution for a SQL Server 2012 database for the following scenario
The database contains about 1M records with some simple parent child relationships between 4 or 5 tables
There is a 24x7 high load of reads on the database
Once a day we receive a batch with about 1000 inserts, updates and deletes that should be merged into the database, occasionally this number could be higher.
Apart from the daily batch there are no other writers to the database
There are a few 'special' requirements
Readers should not experience any significant latency due to these updates
The entire batch should be processed atomically from the readers perspective. The reader should not see a partially processed batch
If the update fails halfway we need to rollback all changes of the batch
Processing of the batch itself is not time-critical, with a simple implementation it now takes up to a few minutes which is just fine.
The options I am thinking of are
Wrap a single database transaction around the entire update batch (this could be a large transaction), and using snapshot isolation to allow readers to read the original data while the update is running.
Use partition switching, It seems like this feature was designed with this kind of usecase in mind. The downside seems to be that before we can start processing the batch we need to create a copy of all the original data.
Switch the entire database. We could create a copy of the entire database, process the batch in this copy and then redirect all clients to this database(e.g. by changing their connection string). This should even allow us to make the database read only and possibly even create multiple copies of the database for scalability.
Which of these options, or another, would best fit this scenario and why?
the transaction strat will block and cause latency.
partition switching is not really going to solve your solution as you should consider that the same as doing it against the database as you have it today... (so the rollback/insert) would still be blocking however it could be isolated to just part of your data not all...
Your best bet is to use 2 databases and switch connection strings...
OR use 1 database and have 2 sets of tables and use views or sprocs that are swapped to look at the "active" tables. You still could have disk contention issues but from a locking perspective you would be fine.
I've outgrown the Sql Server custom actions available in WiX, so I'm taking the bold step of creating my own using Deployment Tools Foundation. I want to be a good citizen and make sure that mine support rollback. But what's the best way of doing it?
I need to support SQL Server 2005 and later, all editions.
The problem, as I see it, is that Windows Installer works in two phases: it does the work, storing undo information as it goes. Then, when all the pieces are in place it either commits (deleting the undo information) or does a rollback.
This means that standard transactions won't do the job. They would have to be completed inside my Execute custom action, and I wouldn't get a chance to roll them back later.
I've considered taking a copy-only backup of the database that I can restore in the rollback action if necessary but I think this approach, whilst simple has shortcomings. I don't know how big our databases will get, for example - so I can't guarantee that there will be space available to hold the backup on the target machine. Also, backup and restore can take a while to complete, and I don't want typical installs (where rollback doesn't happen) to be unnecessarily slow.
So that brings me to my current favoured idea: make sure the Distributed Transaction Coordinator is started up, then initialise a Distributed Transaction before making changes, then either committing it or rolling it back in the appropriate custom actions.
It seems I can uses the members of the TransactionInterop class to export a cookie that will enable me to share the transaction between my different custom actions.
Can anyone with experience of this kind of thing say if it is likely to work?
Some database/instance operations cannot be done inside a transaction (eg. CREATE/ALTER/DROP ENDPOINT), and other operations cannot be done inside a distributed transaction (eg. SAVE TRANSACTION). So you won't be able to do them at all in your proposed plan. Also your DB upgrade scripts will have to all work correctly when run inside an uncommitted transaction.
I would say that there are fewer risks of going down the backup/restore path (or alternatively creating a database snapshot and restoring from the snapshot on rollback, with the drawback of requiring EE).
Also an option is to have an undo script for every do script run during upgrade, and have the undo script run during rollback and remove the effects of the installation. I understand that this is a hard problem, probably doubles the amount of scripts that have to be developed (and tested...) and requires some serious developer discipline.
I've done quite a few installers with SQL scripts over the years and I've kind of come to the opinion that it's only suited for simple databases like here's my VB app with a local MSDE / MySQL database or here's my local store for code table lookups and temporary commits while we wait to sync it somewhere else.
Once you get into industrial strength heavy lifting enterprise app type situations I like to get my DB configuration out of the installer and into the application as a first run type story. You can do a lot heavier lifting with C# there and not be constrained by MSI.
I have set up transactional replication between two SQL Servers on different ends of a relatively slow VPN connection. The setup is your standard "load snapshot immediately" kind of thing where the first thing it does after initializing the subscription is to drop and recreate all tables on the subscriber side and then start doing a BCP of all the data. The problem is that there are a few tables with several million rows in them, and the process either a) takes a REALLY long time or b) just flat out fails. The messages I keep getting when I look in Replication Monitor are:
The process is running and is waiting for a response from the server.
Query timeout expired
Initializing
It then tries to restart the bulk loading process (skipping any BCP files that it has already loaded).
I am currently stuck where it just keeps doing this over and over again. It's been running for a couple days now.
My questions are:
Is there something I could do to improve this situation given that the network connection is so slow? Maybe some setting or something? I don't mind waiting a long time as long as the process doesn't keep timing out.
Is there a better way to do this? Perhaps make a backup, zip it, copy it over and then restore? If so, how would the replication process know where to pick up when it starts applying the transactions, since updates will be occurring between the time I make the backup and get it restored and running on the other side.
Yes.
You can apply the initial snapshot manually.
It's been a while for me, but the link (into BOL) has alternatives to setting up the subscriber.
Edit: From BOL How-tos, Initialize a Transactional Subscriber from a Backup
In SQL 2005, you have a "compact snapshot" option, that allow you to reduce the total size of the snapshot. When applied over a network, snapshot items "travel" compacted to the suscriber, where they are then expanded.
I think you can easily figure the potential speed gain by comparing sizes of standard and compacted snapshots.
By the way, there is a (quite) similar question here for merge replication, but I think that at the snapshot level there is no difference.