Reliable alternative to replication for continous data sync between two databases - sql-server

I have one central database and 25 client databases and all have same schema.
I want that whenever some changes are done in some tables of the central database then these changes flow down to the client database.
The databases used is SQL Express so I cannot use replication.
The solution that I have today is to make keep track of the changes in the central database and then a program makes a text file with these changes and sends them down to the client databases.Another program reads these text files and updates the client database.
There are three problems with this:-
1. The files get lost or arrive in jumbled order which messes up the client data
2. the process is slow
3. the programs are sometimes shutdown so the whole sync flow gets stopped.
Is there a reliable alternative that is fast and secure ?
I wonder how banking software are made ...they never lose transactions and they are fast.

Add an UpdateDate column to all the entities that need to be replicated. At each client add a linked server to the central repository. Now, every 5 minutes or so, poll your central repository for changes using the last UpdateDate of a client entity and grab the delta.
Then use merge or insert and update to merge data on the client. That's a very reliable way of doing homebrew replication. To keep track of deleted elements you would either want to mark them as deleted or have another table to keep track of entity kind and its reference, again combined with UpdateDate for replication.
Update
Then you mention transactions and banking software. When you do your replication via files, we ain't talkin' about no transactional replication here, not by a long shot.
If you need transactional consistency you need to subscribe to the transaction flow of the data warehouse.

I don't want to be unhelpful and you haven't given any background about your business needs, but you have to decide if your priority is really "fast and secure" or if it's actually "cheap". Replicating changes between multiple databases in a reliable, consistent way is not easy (as you know) and it's highly unlikely that you will be able to develop a solution yourself that has the features, stability and performance of SQL Server replication.
SQL Express can be a replication subscriber, by the way, so it's not clear why it doesn't meet your needs. But if it doesn't, you should estimate the cost to your business (or customer) of dealing with issues caused by an unreliable solution: your time, business downtime, finding and correcting incorrect data, customer complaints, lost business etc. Then compare that to the cost of 25 SQL Server licenses (you should certainly be able to get a good discount when you order that volume), additional hardware (if any) and the costs of training, consulting and/or learning how to use replication. Then extrapolate those costs over 5 years or so. You may find that it's cheaper just to buy the solution you need. And of course buying the full SQL Server edition means you get a lot of other new features that might be useful to you.
If you (or your boss) is really determined to get something for nothing, you might want to investigate PostgreSQL or MySQL. They both have free replication solutions that seem to be widely enough used to be reliable for many companies. Of course, you then need to calculate the costs of switching to a new database platform.

If you have one central database and 25 clients, you can easily do it with one (yes only one) SQL server licence for the main database. Subscribers to this database can run SQL express. As long as users access the the client databases, you are not even obliged to buy SQL CALs.
Back to banking software, be sure that they are paying good money for their server licenses! So don't be surprised if these are reliable and fast ...

Related

Sql Server distribution and configuration for best performance

I want design and implement an enterprise software with silverlight.I use sql server database for this.many useres run sql queireis on sql server database.
how can i configure sql server database for best performance?
how can i distribute sql server database for best performance?
how can i distribute sql server database between some servers for best performance?
and so what technologies can i use in sql server for best performance?
In addition to replication you can use mirroring or log shipping for this. Note that I am talking only about scaling out reads, not write. So reports etc. can be run from the copies of the database but writes must go to the main copy (unless you are using merge replication, which is frightening to me). There are some caveats of course.
With database mirroring, you can use the secondary as a read-only reporting source by taking a snapshot. There are limits here to how many databases you can mirror and there is of course maintenance to manage the snapshots. It is not quite true distribution of resources here, but it can be helpful to offload some of the load. In the next version of SQL Server (Denali), you will be able to set secondaries as read-only, so you can avoid the maintenance of snapshots.
With log shipping, you can essentially keep a stale version of the database around for reporting, and replace it periodically by restoring logs to it. You have a lot more flexibility here compared to replication or mirroring, as you can actually define a delay (like every 6 hours or once a day, you refresh the copy) - which can also serve as a "recover from a shoot-yourself-in-the-foot" scenario. The downside is that to restore a new copy of the database you need to kick all the current users out, as the database needs to be in single user mode in order to recover.
Those are just a couple of ideas for helping scale out reads, but deep down I agree with #gbn - are you solving a problem you don't have yet? It's one thing to design for scalability, but it's very easy to step over that line and completely over-engineer.
Well, SQL Server doesn't really have a load balancing mchanism in and off itself. What it does support, however, is an active/passive node configuration and also replication.
We are using the replication strategy in one application I support. You can read more about it here:
http://msdn.microsoft.com/en-us/library/ms151198.aspx
In our configuration, we basically have a transactional database and a reporting database. We replicate the data from our transactional DB to the reporting DB. Any reporting is done against this reporting DB, so that we don't slow down work being done on the transactional DB due to some long running report.
Note that the replication isn't truly real time. In other words, there's some time involved in replicating the data from the transactional to the reporting DB, albeit a very small time amount. But replication is certainly one strategy you could consider if you are trying to balance workload.
Other things you might consider are partitioning large tables for better performance.
As gbn pointed out in his comment though, it's better to determine if you actually need these strategies before implementing them, because they add a lot of complexity and maintenance efforts, which may not even be needed. It's important to properly analyze how much data you think you will have, and how much activity will be occurring against that data to determine if strategies such as the ones I just described are even needed.
Also, you can refer to this link for some other helpful information and some links to whitepapers you may find helpful:
http://social.msdn.microsoft.com/Forums/en/sqldisasterrecovery/thread/05cf41b7-c558-44bf-86c6-12f5c2b2ffe2

Best solution for reporting database

Here is the situation:
There is a transaction intensive database - used for both routine transactions and reports.
I was wondering if I could isolate these two operations and 2 independent databases, so reports could run off of one database and all the transactions could occur in another one. This would improve performance for the OLTP SQL database.
I have gone over a few options like, Mirroring, Log shipping, Replication, Snapshots, Clustering - but would like to discuss the best possible strategy for the desired result.
Please advise the best solution to implement this strategy, or any other thoughts/suggestion you may have.
I am thinking this is a classic textbook case of separation of frontend and backend database.
For the projects and people I have worked with, there was a strong agreement that the two should be separated.
In one case, there were three tiers of databases:
Frontend transactions Middle summary
repository for reference by frontend transactions
Backend information repository
The frontend transaction speed was so critical, even that layer was dissected into multiple databases, one database per manufacturing area. The transactions were performed by equipment requiring very fast response.
Data from the frontend databases were used, together with customer and management -oriented databases to construct records for the backend reporting repository at an hourly frequency, because management needed short information latency for their operational and engineering decisions. If we could perform the information-compilation at 15 minute intervals, we would have done it. Depending on project, that backend repository could either be Oracle or Sybase IQ.
However, the frontend transactions performed by equipment needed to refer to some meta information. Response time required by the equipment could not run the risk of being interrupted by someone running a huge adhoc query on the backend repository, which was frequent.
So, a middle layer bridging database was created, which consists of nightly abstracts of information from the backend repository.
Schema designed with commonality-keys
Schema design is very important, to optimise the response and performance of all the databases. You have to ensure your database records are commonality-key-indexed and discrete-time-indexed.
For a manufacturing plant filled with robots and equipment, divided into manufacturing areas, each area has a frontend transaction database. Each area database needs to have a commonality-key dispatcher. When
a piece of equipment needed to perform a batch of operations, the beginOp event requests for a discrete-key from the dispatcher. An operation cycle may take seconds, or days, or weeks. Every time a piece of equipment needed to perform a transaction on its state of operation, it includes that commonality-key. An operation could have sub-operations and sub-sub-operations, etc - but each of such operation is required to obtain a commonality-key from its area dispatcher.
The commonality-key dispatcher is simply the beginOp table in the database with an auto-increment key. Any equipment sharing a same begun operation, it is able to infer/obtain that commonality-key from the table due to meticulous process sequencing strategy.
For areas where we could ensure that no two operations on the whole floor could start at the same 100 millisecond, there was no need for a dispatcher because we could simply use the date-time of a beginOp event, where the datetime function of the database server is the natural/spontaneous key dispatcher.
The reason for this discussion on commonality-key is because the transaction response required is so quick, you do not want pieces of equipment to have to communicate with each other unnecessarily just to tell each other they are recording events of the same operation. The robots and equipment simply perform the transactions with the commonality-key they are holding.
The hourly compilation of information for insertion into the backend repository conveniently uses the composite-key of commonality+area, to construct the hierarchy of events.
Frontend piping database
OK, this is really extreme. In some areas, the transactions were so frequent, that we had a FIFO database. We introduced a fourth tier database. For optimal transaction response, we had to keep a database size below 1GB. A transaction-piping process existed to empty old transactions into the fourth tier databases. I found that it was easier (and better response) to create a pool of new databases, so that every time its size reaches 1GB, it is moved out and immediately replaced with a new database from the pool - leaving the machines performing the hourly compilation to join up the databases. So that left us with depending on an existing metadata database to house the commonality-key dispatcher table with some meta-data tables.
In retrospect, one might think the commonality-key dispatcher table and metadata tables could have been housed in the middle tier bridging database, but because the database management processes were automated and cookie-cut, it was cleaner to create a new process than to modify the process managing the mid-tier bridging database. Those management routines were used across the world, so you cannot willy-nilly change them without causing havoc to the financial performance of the company or offending the respective data layer architects maintaining them.
It took a lot of organisational skills for the managers to pull all these together. So transactional data design is not just simply a technical skill but process planning skills involving a whole lot of people head-butting each other until you get it right.
What you ask for is totally standard - OLAP and OLTP do not mix in heavy load scenarios.
You use SQL Server. Look into SSAS (SQL Server Analytical Processing) for something to build cubes (different approach than SQL) that you can then report against.
If you do not wnat that, then mirriroring is the next best solution - you can put a mirror online in read only mode for reporting, and it gives you, also, a backup to activate if the main server fails ;) Always good.
CLustering is a non-issue - it will allow you to move the database to another node, but it does not solve the performance issue at all. Log file shipping, replication - good, though I would go with mirroring, read only copy for reporting, loading the data into SSAS.
We have a read/write Cluster which replicates (using transactional replication) to "read only" servers (not physically read only , the web app just performs reads on them). We do the same for reporting and this scales pretty good.
We have multiple sites, 32+ servers and a couple of reporting servers in this configuration with very high volume of inserts, updates and reads.
We primarily use reporting services for internal reporting. Reporting doesnt effect our core business , which I guess is your main concern.

Would it ever be wise to have a SQL server per web server?

I'm wondering if, under the circumstances that
You get lots more reads than writes
Your SQL server of choice is cheap/free and offers a fast mirroring/replication service
Your database isn't insanely large
rather than having separate SQL servers it would be better to have an instance of SQL on each machine getting instant updates from the master. This way there would be no network latency when doing all the read queries, but there would be a per box performance hit as the SQL instance has to execute. Would this be better overall for performance? Are there any other pros/cons that might come up?
Your SQL Server should always be on a different box to the webserver, of that there is no question.
How many DB servers and webservers you have, and how they mirror (or otherwise) is up to how you scale your application.
You have SQL Server on a different machine because it needs (and deserves) a lot of RAM.
It's quite a common architectural pattern to have read-only replicas of a database. We accept some degree of stalesness in them, perhaps they are even only updated once a day.
The general rule will be that multiple copies will introduce complexity in terms of operations and management and tend to introduce the possibilities of inconsistency of data - almost inevitably the copies will not be perfectly is step (or the costs of making them soo will be too high.)
An example: what happens if your replication processing breaks a bit. So that some, but not all copies become stale. Now your users start to see radically different views of the world. How much might that matter to you? If it's a site with low value data (eg. celebrity sightings in London suberbs) then perhaps that's fine. If it's on hand inventory, and being out of date means that your customers can't place orders, then maybe you care rather more.
My advice: things that sound simple at a boxed on paper sort of level don't always work out that way when you're sitting in an operations room at 3AM. Be very sure that you can easily operate your solution.
How would your SQL Server be cheap/free? I should have said the licensing costs for this setup would be crippling. At retail prices you're looking at $6000 per server. See also Jeff's comments about costs. Scale out the web servers by all means, but not your SQL Server until it's pretty much on its' knees.
You might instead want to think about a distributed cache like Velocity or NCache.
Either way, run your site first with one SQL server and see how it copes with the load, then think about mirroring/replication across servers, otherwise you're just optimising prematurely. Measure first!
An immediate con is that there is no distributed lock co-ordinator in SQL Server so you can get merge conflicts as updates can change the same row on two different servers at the same time.
Depending on the size of the database and the disks in the web servers, you will find your network latency is smaller than the disk latency you will start suffering as the web server disks will not usually be as performant as the disk array you give to the database. If you wanted that kind of performance, you would be buying it per web server.
Replication performance is not without latency either, the distribution of the transactions isn't 'free' and careful maintenance of the transaction log would have to be planned to ensure you did not get log fragmentation (too many vlog's wthin the transaction log) which kills replication performance.

SQL Server Replication, Distributor

I need to implement a SQL Server replication solution. Very simple need for now. I just need to replicate one pretty simple table from 200 remote sites or so to one central server. The data is not really transactional in nature. I just need it moved up to the central server once a day. I can't decide if I should use push or pull, and I'm not sure if the distributor should live on the server side, or on all the clients.
The server and all the remote sites all live on a fairly decent VPN. The server is 2005, and it's not being pushed very hard at the moment. Just a few jobs here and there collecting data (which I want to get away from) and pushing reports/exports to various vendors once a day. The sites are a mix of 2000/2005.
I'd recommend you do some scalability tests first. Replication is very verbose in terms of agent jobs and T-SQL connections for reading and writing data. 200 publications you're talking 200 publisher agents, 200 subscription agents, plus the distributor maintenance. Most sites complain about maintenance problems of having 1 publisher and 1 subscriber... Say you manage to pull this off and operate it successfully, what is going to be your upgrade story? And how are you going to implement a schema change?
The largest replication deployment I heard of (some years ago) had I believe 450 publishers and was implemented by an army of Microsoft field consultants sweating for months to bend the behemoth into shape. Your 200 replication sites project is way more ambitious than you realize.
I suggest you explore some alternatives too. If you need a periodic table snapshot then SSIS can be a good match. If you need a continuous stream of changes then Service Broker can scale way way easier than replication.
If there is need to adjust the replication down the road, having the central server initiate a pull will be much easier to administrate than adjusting 200 sites to accomplish the same thing. Also, that would naturally manage the load, rather than some scheme to prevent, say, 100 remote sites all connecting at once.
Push subscriptions are the way to go here if you wish to centrally manage the data distribution of your application platform.
From what you have described you will need to make a choice between Snapshot Replication and Transactional Replication for your architecture.
Dependent on how much data you are looking to push and also the schedule of your updates will determine the most appropriate Replication Method for you to use. For example, if you looking to update all Subscriptions at the same time then dependent on how much data you need to push Snapshot Replication may not be suitable and you may be better off using Transactional Replication, perhaps pushed at specific determined intervals. Your network may even be able to support near real-time replication however conducting a small test of your environment will determine this for you. For example, setup the Publisher, local Distributor and a handful of Subscribers at geographically different locations on your network in order to test network transfer times and Replication Latency.
Things to consider:
How much data is to be moved across
the network? Size in Kb and record
volume.
Consider the physical location of
your sites
What is the suitability of your
network? Seed, capacity etc.
You may wish to consider using a
dedicated Distributor.

Backing up SQL Database for Reports

I'm looking for some help/suggestions for backing up two large databases to one server dedicated to reports. The situation is;
My company has two databases for its internal website. One for the UK and one for Europe. Both are mirrored for DR.
I have a server based in Europe which is dedicated to Microsoft Reporting Services, where we run reports based on the data collected in those two databases.
We do not want to point reporting services to the live databases for performance/security reasons so we currently backup both databases on a daily basis and restore them to our Reporting Services server.
However this means we are putting a strain on our networks by backing up the entire databases, and also the data is only up-to-date by midnight yesterday.
Our aim is to have the data up to date by at least 15 minutes, it has been suggested to look at Log Shipping so I wondered if anyone had any experience in setting this up and what are the pros and cons and whether there is a better alternative?
Any help would be greatley appreciated,
Thanks
We developed a similar environment. We used Mirroring to get the data off to our reporting server and created an automated routine to create Snapshots of the database every 15 min. These snapshots only take 1 to 2 seconds to create in our environment and give us a read only copy of the database. Let me know if you would like me to go into deeper detail.
Note we are running Enterprise on both servers.
Log shipping is a great solution for this. We've got articles about it over at SQLServerPedia's Log Shipping section, and I've got a video tutorial on there talking you through your different options. One thing to keep in mind about log shipping is that when the restores happen, your users will be kicked out of the reporting database.
Replication doesn't have that problem, but replication is nowhere near "set-it-and-forget-it" - it's time-intensive to manage, and isn't quite as reliable as you'd like it to be. In addition, you may have to make schema modifications in order to use replication. Log shipping is more automatic & stable, but at the cost of kicking users out at restore time.
You can minimize that by having two log shipping schedules - one for daytime during business hours, and one for the rest. During business hours, you only restore the data once per hour (or less), and the rest of the time you do it every 15 minutes.
You should look at replication as an alternative to backups.
I would recommend that you look into using Transactional Replication.
It sounds as though you are looking to implement a scenario that is similar to what we are currently implementing ourselves.
We use Transaction Replication (albeit real time, you would most likely wish to synchronize your environment on a less frequent schedule) to offload a copy of our live production database to another server for reporting purposes.
Offloading reporting data is a common replication scenario and is described here in the Microsoft Replication documentation.
http://msdn.microsoft.com/en-us/library/ms151784.aspx
Brent is right in that there is indeed an element of configuration required with Replication, along with security considerations that would need to be addressed however, there are a number of key advantages to using Replication in my opinion, including:
Reduced latency in comparison to log
shipping.
The ability to Publish only the
Articles (tables) that are required
for reporting.
Reduced storage requirements.
Less data being published means less
network traffic.
Access to your reporting
data/database at all times.
For example, in our environment, we decided to replicate only the specific tables (articles) from our production database that we actually require for reporting.
I hope what I have described is clear and makes sense but please do feel free to contact me if you have any queries.

Resources