Is there anyway to identify by how much time we can get a deadlock when we enable Snapshot replication for our subscriber, we have a subscriber considered as Azure SQL Database, and we have 8 databases one of them consists of 200GB in size. Is there any way to identify the downtime for genereating snapshots?
Related
A SQL Server that serves as a DW for reporting purposes (OLAP) is also used directly by users to perform direct ad-hoc queries. User queries add a lot of extra load on this server because of number and concurrency and since this is not its primary role I am considering replicating the relatively big database to another machine and have users execute their queries on that machine to offload the DW.
This replication should be done daily.
My question is: what is the faster solution for this - Doing a Backup/Restore of the original DB or setting up Snapshot Replication ?
Is there any other approach to this problem - any suggestions?
When the initial snapshot is being generated while configuring SQL Server
Transactional Replication, does anyone know if the snapshot agent places locks on the tables (articles) you have selected? I have some tables that contain 2+ millions rows and wanted to know if SQL Server actually places table locks to prevent updates while the publishing database is online. If locks are placed, then I want to run the initial snapshot during off peak hours in production.
Thanks!
In Transactional replication or any other type of replication the starting point is a snapshot of the database. The initial step of creating the snapshot is exactly the same in any type of the replication.
SQL Server does not obtain any kind of locks at all when creating a snapshot, it literally is a snapshot of the database at a certain point in time and creating snapshot does not interfere with any transactions. Uncommitted transactions are rolled back in the snapshot once it is created.
To read more about how database snapshot works read this article from MSDN How Database Snapshots Work
If you're running on an edition of SQL Server that supports database snapshots (as in create database [foo]... as snapshot of [bar]), then you can optionally use those as the basis of the snapshot. Check the #sync_method parameter of sp_addpublication. The caveat is that you still probably want to do it during a non-busy time of the day because of how database snapshots work (i.e. copy-on-write will slow down any write activity), but you won't be contending on locks.
Starting SQL Server 2005, the default #sync_method value for sp_addpublication is "concurrent", which means the tables are not locked during snaphsot agent run. Note this is not entirely true - the snapshot agent places schema locks on the tables, but the duration of that lock is mere seconds at most.
So if you set #sync_method = "concurrent", then no, updates, in theory, will not be blocked. If #sync_method = "native" (default in SQL Server 2000) or "character", then yes, updates will be blocked.
I have installed SQL server database (mainserver) in one instance and SQL server database for RerportServer in others. what is the best way to replicate data from mainServer to report Server? Data in mainServer changes frequently and actual information in the ReportSever is very important.
And there is many ways to do this:
mirroring
shipping log
transactional replication
merge replication
snapshot replication
are there some best-practices about this?
Thanks
You need Transactional Replication for your case. Here is why you would not need the other 4 cases:
Mirroring
This is generally used to increase the availability of a database server and provides for automatic failover in case of a disaster.
Typically even though you have more than a single copy of the database (recommended to be on different server instances), only one of them is active at a time, called the principle server.
Every operation on this server instance is mirrored on the others continuously (as soon as possible), so this doesn't fit your use case.
Log Shipping
In this case, apart from the production database servers, you have extra failover servers such that the backup of the production server's database, differential & transactional logs are automatically shipped (copied) to the failovers, and restored.
The replication here is relatively scheduled to be at a longer interval of time than the other mechanisms, typically ranging from an hour to a couple of hours.
This also provides for having the failver servers readies manually in case of a disaster at the production sites.
This also doesn't fit your use case.
Merge Replication
The key difference between this and the others is that the replicated database instances can communicate to the different client applications independent of the changes being made to each other.
For example a database server in North America being updated by clients across Americas & Europe and another one in Australia being updated by clients across the Asia-Pacific region, and then the changes being merged to one another.
Again, it doesn't fit your use case.
Snapshot Replication
The whole snapshot of the database is published to be replicated to the secondary database (different from just the log files being shipped for replication.)
Initially however, for each type of replication a snapshot is generated to initialized the subscribing database, i.e only once.
Why you should use Transactional Replication?
You can choose the objects (Tables, Views, etc) to be replicated continuously, so if there are only a subset of the tables which are used to reporting, it would save a lot of bandwidth. This is not possible in Mirroring and Log Shipping.
You can redirect traffic from your application to the reporting server for all the reads and reports (which you can also do in others too, btw).
You can have independent batch jobs generating some of the more used reports running on the reporting server, reducing the load on the main server if it has quite frequent Inserts, Updates or Deletes.
Going through your list from top to bottom.
Mirroring: If you mirror your data from your mainServer to your reportServer you will not be able to access your reportServer. Mirroring puts the mirrored database into a continuous restoring state. Mirroring is a High Availability solution. In your case the reportServer will only be available to query if you do a fail over. The mirrored server is never operational till fail over. This is not what you want as you cannot use the reportServer till it is operational.
Log Shipping: Log shipping will allow you to apply transactional log backups on a scheduled event to the reportServer. If you backup the transaction log every 15 minutes and apply the data to the reportServer you will have a delay of 15+ minutes between your mainServer and Log server. Mirroring is actually real time log shipping. Depending on how you setup log shipping your client will have to disconnect while the database is busy restoring the log files. Thus during a long restore it might be impossible to use reporting. Log Shipping is also a High Availability feature and not really useful for reporting. See this link for a description of trying to access a database while it is trying to restore http://social.msdn.microsoft.com/forums/en-US/sqldisasterrecovery/thread/c6931747-9dcb-41f6-bdf4-ae0f4569fda7
Replication : I am lumping all the replication together here. Replication especially transactional replication can help you scale out your reporting needs. It would generally be mush easier to implement and also you would be able to report on the data all of the time where in mirroring you cant report on the data in transaction log shipping you will have gaps. So in your case replication makes much more sense. Snapshot replication would be useful if your reports could be say a day old. You can make a snapshot every morning of the data you need from mainServer and publish this to the subscribers reportServer. However if the database is extremely large then Snapshot is going to be problematic to deal with on a daily basis. Merge replication is only usefull when you want to update the replicated data. In your case you want to have a read only copy of the data to report on so Merge replication is not going to help. Transactional Replication would allow you to send replications across the wire. In your case where you need frequently updated information in your reportServer this would be extremely useful. I would probably suggest this route for you.
Just remember that by implementing the replication/mirroring/log shipping you are creating more maintenance work. Replication CAN fail. So can mirroring and so can transaction log shipping. You will need to monitor these solutions to make sure they are running smoothly. So the question is do you really need to scale out your reports to another server or maybe spend time identifying why you cant report on the production server?
Hope that helps!
What is transaction replication used? I seemed to create transaction replication following this tutorial:
http://www.sql-server-performance.com/2010/transactional-replication-2008-r2/
And I know when I change some objects i.e any DML or DDL statement, those changes will be reflected to the other server where I did replication. But it's not clear to me why should we use transaction replication. Does SQL server automatically start using the 2nd server where replication was done when the main instance fails? or do we have to manually restore the database from the server where replication was done in case of failure of 1st instance?
Thanks in advance :)
You can use transactional replication to maintain a warm standby SQL server. Transactional replication replicates the data on one server (the publisher) to another server (the subscriber) with less latency than log shipping.
You can implement transactional replication at the database object level such as the table level. Therefore, Microsoft recommends that you use transactional replication when you have less data to protect, and you must have a fast data recovery plan.
This solution is vulnerable to the failure of the publisher and the subscriber at the same time. In such a scenario, you cannot protect your data. In all other scenarios such as the failure of a distributor or a subscriber, it is best to resynchronize the data in the subscriber with the data in the publisher.
You should use transactional replication to maintain a warm standby SQL server only when you do not implement schema changes or you do not implement other changes to your database such as security changes that replication does not support.
Note Replication is not designed for the maintenance of warm standby servers. With replication, you can use replicated data at the subscriber to generate reports. You can also use replication for other general uses without having to perform processing on your relatively busy publisher.
Disadvantages
Schema changes or security changes that are performed at the
publisher after establishing replication will not be available at
the subscriber. The distributor in transactional replication uses an
Open Database Connectivity (ODBC) connection or an OLE Database
(OLEDB) connection to distribute data. However, log shipping uses
the RESTORE TRANSACTION low-level Transact-SQL statement to
distribute the transaction logs. A RESTORE TRANSACTION statement is
much faster than an ODBC connection or an OLEDB connection.
Typically, switching servers erases replication configurations.
Therefore, you have to configure replication two times:
a. When you switch to the subscriber.
b. When you switch back to the publisher.
If a disaster occurs, you must manually switch servers by
redirecting all the applications to the subscriber.
Read more here http://sqlserverdatarecovery.com/transactional_replication.html
What's the difference between peer-to-peer replication and merge replication using SQL Server?
Peer-to-Peer Transactional
Replication is typically used to support
applications that distribute read
operations across a number of server
nodes.
Although peer-to-peer replication enables scaling out of read operations, write performance for the topology is like that for a
single node, this is because ultimately all inserts, updates, and
deletes are propagated to all nodes. If one of the nodes in the system
fails, an application layer can redirect the writes for that node to
another node, this is not a requirement but does maintain availability if a node fails.
See: Peer-To-Peer Replicaiton
Merge Replication is bi-directional
i.e. read and wrtie operations are
propogated to and from all nodes.
Merge replication often requires the
implementation of conflict
resolution.
See: How Merge Replication
Works
The main difference is that for merge replication there is only one publisher and one or more subscribers, but in peer-to-peer replication all nodes are both publishers and subscribers(though original node is highlighted with green arrow).
Secondly peer-to-peer replication is transactional which means it transmits transactionally consistent changes. In contrast, merge replication is trigger based. In the background implementation they also use different agents.
Merge replication has conflict resolution(you can specify conflict resolution priority), peer-to-peer doesn't. During a conflict peer-to-peer generates an alert if conflict resolution is enabled, stops replication while allowing both instances to work independently till the conflict is solved. In production, it is advisable to do schema changes only from the original node.
In peer-to-peer replication all nodes are identical while in merge they can differ. I mean that subscribers can get different data from the publisher.
They both are basically doing the same job - providing scale-out, disaster recovery, and in some cases where updates are rare and locks do not bother that much, also high availability by providing data redundancy. Sometimes, peer-to-peer is related as the replacement for the merge replication.
EDIT Peer to Peer replicaiton is of two types - Transactional and Snapshot. Both of these are one way - from publisher to subscriber.
Transactional and Snapshot replication move data from publisher to subscriber. They are used primarily for editing in a single place and viewing / reporting data in multiple places. Transactional is almost instantaneous, while snapshot has to be scheduled. Transactional has a heavy initial resource requirement because it creates an initial snapshot and then it reads subsequent transactions from the transaction log to send data over. Snapshot is resource intensive every time it runs because it generates a new snapshot every time.
Merge replication lets you have multiple places where you can edit the data, and have it synchronized in near-real-time with the peers. Merge replications essentially runs a transactional replication engine to distribute the transactions, and additional logic to apply the transactions at the destination(s).
Here is some reading material http://technet.microsoft.com/en-us/library/ms152531.aspx
Updateable subscribers are designed for scenarios where the majority of your changes occur at the publisher but you want to be able to have some small number of changes originate at the subscriber.
P2p does not have such a limit.
P2P is designed to scale out reads, although many people wrongly use them as an update anywhere topology. p2p is also an Enterprise Edition only feature, where as updateable subscribers work on the Standard Edition of SQL Server and above.