I've replicated a large database (close to 1TB) to three remote servers using pushing transactional replication. The subscribers are read-only. A lot of data is inserted and updated (from other sources) in one day every month. It always fail the replication after the day and we manually initialize the replication from backup every month.
Is it possible to switch to log shipping before the inserting day and switch back to transactional replication after the bulk insertions are log shipped? So there is no need to copy the big backup file for re-initialization?
No. Transactional replication is logical while log shipping is physical. You can't switch at will between the two. But if your subscribers are read only to start with then transactional replication can be replaced out of the box with log shipping, at the cost of a slight delay in updates and having to disconnect readers on the stand-by sites every time a log is being applied (usually this is nowhere near as bad as it sounds). Given how much more efficient and less problematic log shipping is compared to transactional replication, I would not hesitate for a single second in doing this replace for good.
I question your need to re-initialize on a scheduled basis. I've had replication topologies go for a really long time without having the need to re-init. And when we did, it was only because there was a schema change that didn't play nice. When you say that the large amount of data fails replication, what does that mean? Replication will gladly deliver large data changes that to the subscribers. If you're running afoul of latency limits, you can either increase those or break down large transactions into smaller ones at the publisher. You also have the option of setting the MaxCmdsInTran option for the log reader agent to have it break up your transactions for you.
Related
Our database is MSSQL and we are currently using High Availability Group with fail over in multiple node cluster so the idea of redundancy and backup is already there.
All of our current servers are in the same location; imagine a scenario where an earthquake takes out the entire hosting facility then we are sitting duck.
I'm exploring a disaster recover (DR) strategy to have another DR backup at a different location so when this happens I can bring back the entire database using the DR's backup set with a minimum down time and the data needs to be guarantee to be up to the minutes if possible.
I've read around the Microsoft doc but I don't really see one that talks about this in details.
I need a true backup that is up to the minutes, do I need to do this full backup (once every day) along with transaction log backup (one every minute) and then save it to the other different geological location? Can you point me to a guide or best practice documentation on how to achieve this?
I'm exploring a disaster recover (DR) strategy to have another DR backup at a different location so when this happens I can bring back the entire database using the DR's backup set with a minimum down time and the data needs to be guarantee to be up to the minutes if possible
Following are your options, since you already have Availability Groups in-place.
Multi sub-net WSFC You need to add additional node into WSFC, that work as DR Replica in availability group, consider it's another copy of secondary that you already have in same location but this copy laying different geographical location, since it's multi sub-net WSFC required attention for care-full Quorum configuration.
Log-Shipping it's simple solution compare to Multi sub-net WSFC and easy to administrate. It basically takes log backups on scheduled basis from Primary replica in you current Availability Group, and restores into secondary replica. You can have multiple secondary replicas and each of them different geographical locations depending on network bandwidth.
I need a true backup that is up to the minutes, do I need to do this full backup (once every day) along with transaction log backup (one every minute) and then save it to the other different geological location? Can you point me to a guide or best practice documentation on how to achieve this?
This post at DBA.SE would help you..
Could anybody tell me more about difference between physical replication and logical replication in PostgreSQL?
TL;DR: Logical replication sends row-by-row changes, physical replication sends disk block changes. Logical replication is better for some tasks, physical replication for others.
Note that in PostgreSQL 12 (current at time of update) logical replication is stable and reliable, but quite limited. Use physical replication if you are asking this question.
Streaming replication can be logical replication. It's all a bit complicated.
WAL-shipping vs streaming
There are two main ways to send data from master to replica in PostgreSQL:
WAL-shipping or continuous archiving, where individual write-ahead-log files are copied from pg_xlog by the archive_command running on the master to some other location. A restore_command configured in the replica's recovery.conf runs on the replica to fetch the archives so the replica can replay the WAL.
This is what's used for point-in-time replication (PITR), which is used as a method of continuous backup.
No direct network connection is required to the master server. Replication can have long delays, especially without an archive_timeout set. WAL shipping cannot be used for synchronous replication.
streaming replication, where each change is sent to one or more replica servers directly over a TCP/IP connection as it happens. The replicas must have a direct network connection the master configured in their recovery.conf's primary_conninfo option.
Streaming replication has little or no delay so long as the replica is fast enough to keep up. It can be used for synchronous replication. You cannot use streaming replication for PITR1 so it's not much use for continuous backup. If you drop a table on the master, oops, it's dropped on the replicas too.
Thus, the two methods have different purposes. However, both of them transport physical WAL archives from primary to replica; they differ only in the timing, and whether the WAL segments get archived somewhere else along the way.
You can and usually should combine the two methods, using streaming replication usually, but with archive_command enabled. Then on the replica, set a restore_command to allow the replica to fall back to restore from WAL archives if there are direct connectivity issues between primary and replica.
Asynchronous vs synchronous streaming
On top of that, there's synchronous and asynchronous streaming replication:
In asynchronous streaming replication the replica(s) are allowed to fall behind the master in time when the master is faster/busier. If the master crashes you might lose data that wasn't replicated yet.
If the asynchronous replica falls too far behind the master, the master might throw away information the replica needs if max_wal_size (was previously called wal_keep_segments) is too low and no slot is used, meaning you have to re-create the replica from scratch. Or the master's pg_wal(waspg_xlog) might fill up and stop the master from working until disk space is freed if max_wal_size is too high or a slot is used.
In synchronous replication the master doesn't finish committing until a replica has confirmed it received the transaction2. You never lose data if the master crashes and you have to fail over to a replica. The master will never throw away data the replica needs or fill up its xlog and run out of disk space because of replica delays. In exchange it can cause the master to slow down or even stop working if replicas have problems, and it always has some performance impact on the master due to network latency.
When there are multiple replicas, only one is synchronous at a time. See synchronous_standby_names.
You can't have synchronous log shipping.
You can actually combine log shipping and asynchronous replication to protect against having to recreate a replica if it falls too far behind, without risking affecting the master. This is an ideal configuration for many deployments, combined with monitoring how far the replica is behind the master to ensure it's within acceptable disaster recovery limits.
Logical vs physical
On top of that we have logical vs physical streaming replication, as introduced in PostgreSQL 9.4:
In physical streaming replication changes are sent at nearly disk block level, like "at offset 14 of disk page 18 of relation 12311, wrote tuple with hex value 0x2342beef1222....".
Physical replication sends everything: the contents of every database in the PostgreSQL install, all tables in every database. It sends index entries, it sends the whole new table data when you VACUUM FULL, it sends data for transactions that rolled back, etc. So it generates a lot of "noise" and sends a lot of excess data. It also requires the replica to be completely identical, so you cannot do anything that'd require a transaction, like creating temp or unlogged tables. Querying the replica delays replication, so long queries on the replica need to be cancelled.
In exchange, it's simple and efficient to apply the changes on the replica, and the replica is reliably exactly the same as the master. DDL is replicated transparently, just like everything else, so it requires no special handling. It can also stream big transactions as they happen, so there is little delay between commit on the master and commit on the replica even for big changes.
Physical replication is mature, well tested, and widely adopted.
logical streaming replication, new in 9.4, sends changes at a higher level, and much more selectively.
It replicates only one database at a time. It sends only row changes and only for committed transactions, and it doesn't have to send vacuum data, index changes, etc. It can selectively send data only for some tables within a database. This makes logical replication much more bandwidth-efficient.
Operating at a higher level also means that you can do transactions on the replica databases. You can create temporary and unlogged tables. Even normal tables, if you want. You can use foreign data wrappers, views, create functions, whatever you like. There's no need to cancel queries if they run too long either.
Logical replication can also be used to build multi-master replication in PostgreSQL, which is not possible using physical replication.
In exchange, though, it can't (currently) stream big transactions as they happen. It has to wait until they commit. So there can be a long delay between a big transaction committing on the master and being applied to the replica.
It replays transactions strictly in commit order, so small fast transactions can get stuck behind a big transaction and be delayed quite a while.
DDL isn't handled automatically. You have to keep the table definitions in sync between master and replica yourself, or the application using logical replication has to have its own facilities to do this. It can be complicated to get this right.
The apply process its self is more complicated than "write some bytes where I'm told to" as well. It also takes more resources on the replica than physical replication does.
Current logical replication implementations are not mature or widely adopted, or particularly easy to use.
Too many options, tell me what to do
Phew. Complicated, huh? And I haven't even got into the details of delayed replication, slots, max_wal_size, timelines, how promotion works, Postgres-XL, BDR and multimaster, etc.
So what should you do?
There's no single right answer. Otherwise PostgreSQL would only support that one way. But there are a few common use cases:
For backup and disaster recovery use pgbarman to make base backups and retain WAL for you, providing easy to manage continuous backup. You should still take periodic pg_dump backups as extra insurance.
For high availability with zero data loss risk use streaming synchronous replication.
For high availability with low data loss risk and better performance you should use asynchronous streaming replication. Either have WAL archiving enabled for fallback or use a replication slot. Monitor how far the replica is behind the master using external tools like Icinga.
References
continuous archiving and PITR
high availability, load balancing and replication
replication settings
recovery.conf
pgbarman
repmgr
wiki: replication, clustering and connection pooling
I have installed SQL server database (mainserver) in one instance and SQL server database for RerportServer in others. what is the best way to replicate data from mainServer to report Server? Data in mainServer changes frequently and actual information in the ReportSever is very important.
And there is many ways to do this:
mirroring
shipping log
transactional replication
merge replication
snapshot replication
are there some best-practices about this?
Thanks
You need Transactional Replication for your case. Here is why you would not need the other 4 cases:
Mirroring
This is generally used to increase the availability of a database server and provides for automatic failover in case of a disaster.
Typically even though you have more than a single copy of the database (recommended to be on different server instances), only one of them is active at a time, called the principle server.
Every operation on this server instance is mirrored on the others continuously (as soon as possible), so this doesn't fit your use case.
Log Shipping
In this case, apart from the production database servers, you have extra failover servers such that the backup of the production server's database, differential & transactional logs are automatically shipped (copied) to the failovers, and restored.
The replication here is relatively scheduled to be at a longer interval of time than the other mechanisms, typically ranging from an hour to a couple of hours.
This also provides for having the failver servers readies manually in case of a disaster at the production sites.
This also doesn't fit your use case.
Merge Replication
The key difference between this and the others is that the replicated database instances can communicate to the different client applications independent of the changes being made to each other.
For example a database server in North America being updated by clients across Americas & Europe and another one in Australia being updated by clients across the Asia-Pacific region, and then the changes being merged to one another.
Again, it doesn't fit your use case.
Snapshot Replication
The whole snapshot of the database is published to be replicated to the secondary database (different from just the log files being shipped for replication.)
Initially however, for each type of replication a snapshot is generated to initialized the subscribing database, i.e only once.
Why you should use Transactional Replication?
You can choose the objects (Tables, Views, etc) to be replicated continuously, so if there are only a subset of the tables which are used to reporting, it would save a lot of bandwidth. This is not possible in Mirroring and Log Shipping.
You can redirect traffic from your application to the reporting server for all the reads and reports (which you can also do in others too, btw).
You can have independent batch jobs generating some of the more used reports running on the reporting server, reducing the load on the main server if it has quite frequent Inserts, Updates or Deletes.
Going through your list from top to bottom.
Mirroring: If you mirror your data from your mainServer to your reportServer you will not be able to access your reportServer. Mirroring puts the mirrored database into a continuous restoring state. Mirroring is a High Availability solution. In your case the reportServer will only be available to query if you do a fail over. The mirrored server is never operational till fail over. This is not what you want as you cannot use the reportServer till it is operational.
Log Shipping: Log shipping will allow you to apply transactional log backups on a scheduled event to the reportServer. If you backup the transaction log every 15 minutes and apply the data to the reportServer you will have a delay of 15+ minutes between your mainServer and Log server. Mirroring is actually real time log shipping. Depending on how you setup log shipping your client will have to disconnect while the database is busy restoring the log files. Thus during a long restore it might be impossible to use reporting. Log Shipping is also a High Availability feature and not really useful for reporting. See this link for a description of trying to access a database while it is trying to restore http://social.msdn.microsoft.com/forums/en-US/sqldisasterrecovery/thread/c6931747-9dcb-41f6-bdf4-ae0f4569fda7
Replication : I am lumping all the replication together here. Replication especially transactional replication can help you scale out your reporting needs. It would generally be mush easier to implement and also you would be able to report on the data all of the time where in mirroring you cant report on the data in transaction log shipping you will have gaps. So in your case replication makes much more sense. Snapshot replication would be useful if your reports could be say a day old. You can make a snapshot every morning of the data you need from mainServer and publish this to the subscribers reportServer. However if the database is extremely large then Snapshot is going to be problematic to deal with on a daily basis. Merge replication is only usefull when you want to update the replicated data. In your case you want to have a read only copy of the data to report on so Merge replication is not going to help. Transactional Replication would allow you to send replications across the wire. In your case where you need frequently updated information in your reportServer this would be extremely useful. I would probably suggest this route for you.
Just remember that by implementing the replication/mirroring/log shipping you are creating more maintenance work. Replication CAN fail. So can mirroring and so can transaction log shipping. You will need to monitor these solutions to make sure they are running smoothly. So the question is do you really need to scale out your reports to another server or maybe spend time identifying why you cant report on the production server?
Hope that helps!
I have two SQL Server 2005 instances that are geographically separated. Important databases are replicated from the primary location to the secondary using transactional replication.
I'm looking for a way that I can monitor this replication and be alerted immediately if it fails.
We've had occasions in the past where the network connection between the two instances has gone down for a period of time. Because replication couldn't occur and we didn't know, the transaction log blew out and filled the disk causing an outage on the primary database as well.
My google searching some time ago led to us monitoring the MSrepl_errors table and alerting when there were any entries but this simply doesn't work. The last time replication failed (last night hence the question), errors only hit that table when it was restarted.
Does anyone else monitor replication and how do you do it?
Just a little bit of extra information:
It seems that last night the problem was that the Log Reader Agent died and didn't start up again. I believe this agent is responsible for reading the transaction log and putting records in the distribution database so they can be replicated on the secondary site.
As this agent runs inside SQL Server, we can't simply make sure a process is running in Windows.
We have emails sent to us for Merge Replication failures. I have not used Transactional Replication but I imagine you can set up similar alerts.
The easiest way is to set it up through Replication Monitor.
Go to Replication Monitor and select a particular publication. Then select the Warnings and Agents tab and then configure the particular alert you want to use. In our case it is Replication: Agent Failure.
For this alert, we have the Response set up to Execute a Job that sends an email. The job can also do some work to include details of what failed, etc.
This works well enough for alerting us to the problem so that we can fix it right away.
You could run a regular check that data changes are taking place, though this could be complex depending on your application.
If you have some form of audit train table that is very regularly updated (i.e. our main product has a base audit table that lists all actions that result in data being updated or deleted) then you could query that table on both servers and make sure the result you get back is the same. Something like:
SELECT CHECKSUM_AGG(*)
FROM audit_base
WHERE action_timestamp BETWEEN <time1> AND BETWEEN <time2>
where and are round values to allow for different delays in contacting the databases. For instance, if you are checking at ten past the hour you might check items from the start the last hour to the start of this hour. You now have two small values that you can transmit somewhere and compare. If they are different then something has most likely gone wrong in the replication process - have what-ever pocess does the check/comparison send you a mail and an SMS so you know to check and fix any problem that needs attention.
By using SELECT CHECKSUM_AGG(*) the amount of data for each table is very very small so the bandwidth use of the checks will be insignificant. You just need to make sure your checks are not too expensive in the load that apply to the servers, and that you don't check data that might be part of open replication transactions so might be expected to be different at that moment (hence checking the audit trail a few minutes back in time instead of now in my example) otherwise you'll get too many false alarms.
Depending on your database structure the above might be impractical. For tables that are not insert-only (no updates or deletes) within the timeframe of your check (like an audit-trail as above), working out what can safely be compared while avoiding false alarms is likely to be both complex and expensive if not actually impossible to do reliably.
You could manufacture a rolling insert-only table if you do not already have one, by having a small table (containing just an indexed timestamp column) to which you add one row regularly - this data serves no purpose other than to exist so you can check updates to the table are getting replicated. You can delete data older than your checking window, so the table shouldn't grow large. Only testing one table does not prove that all the other tables are replicating (or any other tables for that matter), but finding an error in this one table would be a good "canery" check (if this table isn't updating in the replica, then the others probably aren't either).
This sort of check has the advantage of being independent of the replication process - you are not waiting for the replication process to record exceptions in logs, you are instead proactively testing some of the actual data.
I have set up transactional replication between two SQL Servers on different ends of a relatively slow VPN connection. The setup is your standard "load snapshot immediately" kind of thing where the first thing it does after initializing the subscription is to drop and recreate all tables on the subscriber side and then start doing a BCP of all the data. The problem is that there are a few tables with several million rows in them, and the process either a) takes a REALLY long time or b) just flat out fails. The messages I keep getting when I look in Replication Monitor are:
The process is running and is waiting for a response from the server.
Query timeout expired
Initializing
It then tries to restart the bulk loading process (skipping any BCP files that it has already loaded).
I am currently stuck where it just keeps doing this over and over again. It's been running for a couple days now.
My questions are:
Is there something I could do to improve this situation given that the network connection is so slow? Maybe some setting or something? I don't mind waiting a long time as long as the process doesn't keep timing out.
Is there a better way to do this? Perhaps make a backup, zip it, copy it over and then restore? If so, how would the replication process know where to pick up when it starts applying the transactions, since updates will be occurring between the time I make the backup and get it restored and running on the other side.
Yes.
You can apply the initial snapshot manually.
It's been a while for me, but the link (into BOL) has alternatives to setting up the subscriber.
Edit: From BOL How-tos, Initialize a Transactional Subscriber from a Backup
In SQL 2005, you have a "compact snapshot" option, that allow you to reduce the total size of the snapshot. When applied over a network, snapshot items "travel" compacted to the suscriber, where they are then expanded.
I think you can easily figure the potential speed gain by comparing sizes of standard and compacted snapshots.
By the way, there is a (quite) similar question here for merge replication, but I think that at the snapshot level there is no difference.