Is SQL replication the answer? - sql-server

For our application(desktop in .net), we want to have 2 databases in 2 different remote places(different countries).Is it possible to use replication to keep the data in sync in both the databases while application changes data?. What other strategies can be used? Should the sync happen instantaneously or, at a scheduled time? What if we decide to keep one database 'readonly'?
thanks

You need to go back to your requirements I think.
Does data need to be shared between two sites?
Can both sites update the same data?
What's the minimum acceptable time for an update in one location to be visible in another?
Do you need failover/disaster recovery capability?
Do you actually need two databases? (e.g is it for capacity, for failover or simply because the network link between the two sites is slow? etc)
Any other requirements around data access/visibility?
Real-time replication is one solution, an overnight extract-transform-load process could be another. It really depends on your requirements.

I think the readonly question is key. If one database is readonly then you can use mirroring to sync them, assuming you have a steady connection.
What is the bandwidth and reliability of connection between the sites?
If updates are happening at both locations (on the same data) then Merge Replication is a possibility. It's really designed for mobile apps where users in the field have some subset of the data and conflicts may need to be resolved at replication time.
High level explanation of the various replication types in SQL Server including the new Sync Framework in SQL Server 2008 can be found here: http://msdn.microsoft.com/en-us/library/ms151198.aspx
-Krip

Related

Options for a secondary SQL database

I have a VM in Azure running a single SQL Server instance.
I also have recently setup Power BI to refresh from this source at 1am every morning. Unfortunately, this refresh is causing performance issues, where all queries/operations are timing out due to stress.
What are my options regarding a secondary DB for reporting purposes? Main requirements are ease of maintenance and cost (dont need anything enterprise level).
Things that come to mind:
Secondary DB on same VM. Use replication to mirror data
Another cheap VM. Use replication
Use sql server availability sets, connect to read only replica
SQL data warehouse
Can anyone provide some guidance, or ask questions that may help find my answer?
Thanks.
I think Always ON availability group with secondary read-only replica will be best suited for your needs.
Building a separate DW for reporting purpose will be an overkill, as your reporting needs are satisfied from current database already, except for performance.
Transactional replication could be of help here. But, it also needs lot of knowledge on setup and maintenance.
I can think of several options, but in general this sounds like a canonical OLTP vs. OLAP issue, or a call for data warehouse, but since you are on the budget, let's consider low cost options.
Assuming the databases are small (GBs not TBs), I would separate operational and reporting instances either to be on the same machine if it is a pretty beefy machine, or better have two VMs so you can manage capacity separately.
I would consider replication from one instance to another.
Can you boost your VM resources during the period of the Power BI refresh only?
That's one of the key benefits of Azure - you can scale up and down and save money. How long does the refresh take? Who is using your DB at 1am?
I guess for a VM it's difficult to do this so you'd need to migrate to SQL Azure rather than a VM

Best approach to sync multiple clients with server

I am working on an application which will run on 5 different machine. Data of each machine will be stored on the same machine. I have to sync data of all machine with a centralized database. I am unable to find answer to this question: What will be the best approach for this?
On Each machine a schedule sql job will sync the data with centralized server
Server will request each machine to sync the data
Kindly advice with advantages and disadvantages.
You can configure sql server to use replication and publish a replica to each of the 5 machines.
You have to evaluate if you need all changes in all machines if yes you need merge replication, they have other structures and options such as filters to only have specific data in your remote machines.
Microsoft has solved this problem for you - they have developed the Sync framework. I used this around 4 years ago, and it was rather good - but as #Juan says in his comment, it's now no longer actively developed by Microsoft, and the Open Source version seems to be pretty quiet.
There are two obvious options.
You can use (merge) replication if you want to solve this as a database issue. This is the simplest solution - replication simply copies the data from your clients to the target database, and Microsoft deals with all the scheduling, conflict resolution etc.
However, often there is some data or business logic between the clients - a common problem is that there may be a conflict between primary keys or lookup values between your clients. For instance, if you're using integers for your primary keys, and machine 1 and machine 2 both assign "1000" as the primary key for a record in the "sales" table, replication becomes tricky.
Another common scenario is that there is some business logic that must be enforced which is simple on a single machine, but hard when dealing with multiple occasionally connected machines. For instance, if a customer places an order, the application may want to check that the total order value doesn't exceed the customer's credit limit - but what if there is another transaction pending on a separate client?
In those cases, you may want to create a service-oriented synchronization design.
Microsoft provided WCF for this. Whilst more flexible, it's probably much more effort to implement. In this model, you probably treat the local database on your clients as a cache, and you push messages to the central machine to reflect user interaction.

Simplest solution for high availability of SQL server 2008?

I have a bunch of SQL servers which I periodically performs maintainance on (Windows Update patches etc.). Now I want to the database online 24/7 and need to implement one of the high-availability solutions for SQL server.
The solutions needs to be cheap and simple to use.
I have no problems tweaking the connection strings for the clients of the database, so currently I'm looking into database mirroring with manual failovers when taking down a partner instance for patching etc.
Is this the best thing to do or are there other options which doesn't involve setting up a failover cluster?
The servers are virtualized with a fully redundant storage solution.
Any tips are appreciated, thanks in advance!
Mirroring with a PARTNER-server would probably be the cheapest solution (you can skip the PARTNER-server if you plan to switch manually).
Failover requires shared disks (NAS) aswell as cluster-capable Windows-licenses (very expensive).
I'm not sure about replication, or how it differs from mirroring, but my research I did gave the conclusion that mirroring was the one for me. However I don't mind some downtime when doing upgrades, I just keep mirrored instances of the database in case of severe hardware failure.
It might be that replication is for a complete instance of an SQL-server, whilst mirroring is done per database. In my case, I have 2 production servers, that both replicates it's databases to a third, backup-server for disaster-recovery. I think that wouldn't have been possible with replication.
The four high availability solutions I'm aware of are:
Failover cluster
Log shipping
Mirroring
Replication
Log shipping is probably not 24/7, so that leaves three. Serverfault is definitely a better place to ask about their relative merits.
For auto failover I would choose mirroring. You can build a 2nd database connection string into your app and whenever the preferred isnt available it will default to the backup - therefore giving your app 24/7. This has its downsides though, once 'flipped' to the mirror you have to either accept that this is the way it is until another maintenance job requires the mirror to shift back again or you have to manually swap the mirror over.
In order for this to be truly 24/7 you will need to enable auto rather than manual, maybe you will need a witness server to make the decision... There are lots of factors to include in the choice - are you working with servers on different sites, clustering, multiple web/app servers ... ?
As previous answers have suggested, https://serverfault.com/search?q=sql+mirroring will have people who have made just this choice, ready to help you in much more detail
A big benefit of mirroring is that providing the mirror server has no other activity it is license free, the live server license transfers over if the mirror takes over. Full details on SQL licensing pages at microsoft.com

SQL Server Replication, Distributor

I need to implement a SQL Server replication solution. Very simple need for now. I just need to replicate one pretty simple table from 200 remote sites or so to one central server. The data is not really transactional in nature. I just need it moved up to the central server once a day. I can't decide if I should use push or pull, and I'm not sure if the distributor should live on the server side, or on all the clients.
The server and all the remote sites all live on a fairly decent VPN. The server is 2005, and it's not being pushed very hard at the moment. Just a few jobs here and there collecting data (which I want to get away from) and pushing reports/exports to various vendors once a day. The sites are a mix of 2000/2005.
I'd recommend you do some scalability tests first. Replication is very verbose in terms of agent jobs and T-SQL connections for reading and writing data. 200 publications you're talking 200 publisher agents, 200 subscription agents, plus the distributor maintenance. Most sites complain about maintenance problems of having 1 publisher and 1 subscriber... Say you manage to pull this off and operate it successfully, what is going to be your upgrade story? And how are you going to implement a schema change?
The largest replication deployment I heard of (some years ago) had I believe 450 publishers and was implemented by an army of Microsoft field consultants sweating for months to bend the behemoth into shape. Your 200 replication sites project is way more ambitious than you realize.
I suggest you explore some alternatives too. If you need a periodic table snapshot then SSIS can be a good match. If you need a continuous stream of changes then Service Broker can scale way way easier than replication.
If there is need to adjust the replication down the road, having the central server initiate a pull will be much easier to administrate than adjusting 200 sites to accomplish the same thing. Also, that would naturally manage the load, rather than some scheme to prevent, say, 100 remote sites all connecting at once.
Push subscriptions are the way to go here if you wish to centrally manage the data distribution of your application platform.
From what you have described you will need to make a choice between Snapshot Replication and Transactional Replication for your architecture.
Dependent on how much data you are looking to push and also the schedule of your updates will determine the most appropriate Replication Method for you to use. For example, if you looking to update all Subscriptions at the same time then dependent on how much data you need to push Snapshot Replication may not be suitable and you may be better off using Transactional Replication, perhaps pushed at specific determined intervals. Your network may even be able to support near real-time replication however conducting a small test of your environment will determine this for you. For example, setup the Publisher, local Distributor and a handful of Subscribers at geographically different locations on your network in order to test network transfer times and Replication Latency.
Things to consider:
How much data is to be moved across
the network? Size in Kb and record
volume.
Consider the physical location of
your sites
What is the suitability of your
network? Seed, capacity etc.
You may wish to consider using a
dedicated Distributor.

Caching to a local SQL instance on a web server

I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.
I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.
Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.
Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.
It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.
Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).

Resources