SQL Server Merge Replication Problems - sql-server

I have Merge replication setup on a CRM system. Sales reps data merges when they connect to the network (I think when SQL detects the notebooks are connected), and then they take the laptops away and merge again when they come back (there are about 6 laptops in total merging via 1 server).
This system seems fine when initially setup, but then it almost grinds to a halt after about a month passes, with the merge job taking nearly 2 hours to run, per user, the server is not struggling in any way.
If I delete the whole publication and recreate all the subscriptions it seems to work fine until about another month passes, then I am back to the same problem.
The database is poorly designed with a lack of primary keys/indexes etc, but the largest table only has about 3000 rows in it.
Does anyone know why this might be happening and if there is a risk of losing data when deleting and recreating the publication?

The problem was the metadata created by sql server replication, there is an overnight job that emptys and refills a 3000 row table. This causes replication to replicate all of these rows each day.
The subscriptions were set to never expire which means the old meta data was never being deleted by sql server.
I have set the subscription period to 7 days now in the hope tat it will now clean up the meta data after this period. I did some testing and proved that changes were not lost if a subscription expired. But any updates on the server took priority over the client.

I have encountered with "Waiting 60 second(s) before polling for further changes" recently in 2008 R2.
Replication monitor shows "In progress state" for replication but only step 1 (Initialization) and step 2 (Schema changes and bulk inserts) were performed.
I was very puzzled why other steps do not executed?
The reason was simple - it seems that for merge replication demands tcp/ip (and or not sure) named pipes protocols activation.
No errors were reported.
Probably the similar problem (some sort of connection problem) became apparent in Ryan Stephens case.

Related

SQL Server 2012 replication strategy

I've got SQL2012 running on 2 different servers with public, static IP addresses. I want to implement replication in a way that will keep both servers in sync at all times, regardless of which server is actually receiving the data. I've been reading about the subscriber/publisher model but I'm not exactly sure which should be which. A few facts about our setup:
I'm trying to achieve failover. If server A goes down, I need server B to be operational and have all latest data, or as close as possible. And vice versa. When the server comes back online, I need the replication to get caught up quickly and start working again. I need failures to be graceful, in other words I can't have server A get weird just because server B went offline.
I don't need realtime replication, but close would be nice. If server A was 10 seconds behind server B with data updates, nobody would care. But if it were an hour behind, that would be bad. Fast DB performance is more important that realtime replication, but again, close would be nice.
My database is just shy of 900Mb, and grows by 3Mb per day.
I am looking for advice on the best way to set this up given my setup and needs. Much appreciated.
Since one server will be Primary and the other Failover, use Log Shipping. It will keep two databases the same for all transactions completed on Primary server upto the failure moment. All transactions that have not completed at the moment of failure, will not appear on Failover server, so they should resubmitted by the application and hit Failover server.
Also there should be a Recovery procedure, to ensure than Primary server is up to date.
Useful articles:
Database Mirroring and Log Shipping.
Configure Log Shipping

What is the best solution for POS application?

I'm current on POS project. User require this application can work both online and offline which mean they need local database. I decide to use SQL Server replication between each shop and head office. Each shop need to install SQL Server Express and head office already has SQL Server Enterprise Edition. Replication will run every 30 minutes as schedule and I choose Merge Replication because data can change at both shop and head office.
When I'm doing POC, I found this solution not work properly, sometime job is error and I need to re-initialize its. This solution also take a very long time, which obviously unacceptable to user.
I want to know, are there any solutions better than one that I'm doing now?
Update 1:
Constraints of the system are
Almost of transactions can occur at
both shop and head office.
Some transaction need to work in real-time mode, that being said,
after user save data to their local shop that data should go to update at head office too. (If they're currently online)
User can working even their shop has disconnected from head office database.
Our estimation about amount of data is at-most 2,000 rows in each day.
Windows 2003 is OS of Server at head office and Windows XP is OS of all clients.
Update 2:
Currently they're about 15 clients, but this number will growing in fairly slow rate.
Data's size is about 100 to 200 rows per replication, I think it may not more than 5 MB.
Client connect to server by lease-line connection; 128 kbps.
I'm in situation that replication take a very long time (about 55 minutes while we've only 5 minutes or so) and almost of times I need to re-initialize job to start replicate again, if I don't re-initialize job, it can't replicate at all. In my POC, I find that it always take very long time to replicate after re-initialize, amount of time doesn't depend on amount of data. By the way, re-initialize is only solution I find it work for my problem.
As above, I conclude that, replication may not suitable for my problem and I think it may has another better solution that can serve what I need in Update 1:
Sounds like you may need to roll your own bi-directional replication engine.
Part of the reason things take so long is that over such a narrow link (128kbps), the two databases have to be consistent (so they need to check all rows) before replication can start. As you can imagine, this can (and does) take a long time. Even 5Mb would take around a minute to transfer over this link.
When writing your own engine, decide what needs to be replicated (using timestamps for when items changed), figure out conflict resolution (what happens if the same record changed in both places between replication periods) and more. This is not easy.
My suggestion is to use MS access locally and keep updating data to the server after a certain interval. Add a updated column to every table. When a record is added or updated, set the updated coloumn. For deletion you need to have a seprate table where you can put primary key value and table name. When synchronizing fetch all local records whose updated field not set and update (modify or insert) it to central server. Delete all records using local deleted table and you are done!
I assume that your central server is only for collecting data.
I currently do exactly what you describe using SQL Server Merge Replication configured for Web Synchronization. I have my agents run on a 1-minute schedule and have had success.
What kind of error messages are you seeing?

How do I ensure SQL Server replication is running?

I have two SQL Server 2005 instances that are geographically separated. Important databases are replicated from the primary location to the secondary using transactional replication.
I'm looking for a way that I can monitor this replication and be alerted immediately if it fails.
We've had occasions in the past where the network connection between the two instances has gone down for a period of time. Because replication couldn't occur and we didn't know, the transaction log blew out and filled the disk causing an outage on the primary database as well.
My google searching some time ago led to us monitoring the MSrepl_errors table and alerting when there were any entries but this simply doesn't work. The last time replication failed (last night hence the question), errors only hit that table when it was restarted.
Does anyone else monitor replication and how do you do it?
Just a little bit of extra information:
It seems that last night the problem was that the Log Reader Agent died and didn't start up again. I believe this agent is responsible for reading the transaction log and putting records in the distribution database so they can be replicated on the secondary site.
As this agent runs inside SQL Server, we can't simply make sure a process is running in Windows.
We have emails sent to us for Merge Replication failures. I have not used Transactional Replication but I imagine you can set up similar alerts.
The easiest way is to set it up through Replication Monitor.
Go to Replication Monitor and select a particular publication. Then select the Warnings and Agents tab and then configure the particular alert you want to use. In our case it is Replication: Agent Failure.
For this alert, we have the Response set up to Execute a Job that sends an email. The job can also do some work to include details of what failed, etc.
This works well enough for alerting us to the problem so that we can fix it right away.
You could run a regular check that data changes are taking place, though this could be complex depending on your application.
If you have some form of audit train table that is very regularly updated (i.e. our main product has a base audit table that lists all actions that result in data being updated or deleted) then you could query that table on both servers and make sure the result you get back is the same. Something like:
SELECT CHECKSUM_AGG(*)
FROM audit_base
WHERE action_timestamp BETWEEN <time1> AND BETWEEN <time2>
where and are round values to allow for different delays in contacting the databases. For instance, if you are checking at ten past the hour you might check items from the start the last hour to the start of this hour. You now have two small values that you can transmit somewhere and compare. If they are different then something has most likely gone wrong in the replication process - have what-ever pocess does the check/comparison send you a mail and an SMS so you know to check and fix any problem that needs attention.
By using SELECT CHECKSUM_AGG(*) the amount of data for each table is very very small so the bandwidth use of the checks will be insignificant. You just need to make sure your checks are not too expensive in the load that apply to the servers, and that you don't check data that might be part of open replication transactions so might be expected to be different at that moment (hence checking the audit trail a few minutes back in time instead of now in my example) otherwise you'll get too many false alarms.
Depending on your database structure the above might be impractical. For tables that are not insert-only (no updates or deletes) within the timeframe of your check (like an audit-trail as above), working out what can safely be compared while avoiding false alarms is likely to be both complex and expensive if not actually impossible to do reliably.
You could manufacture a rolling insert-only table if you do not already have one, by having a small table (containing just an indexed timestamp column) to which you add one row regularly - this data serves no purpose other than to exist so you can check updates to the table are getting replicated. You can delete data older than your checking window, so the table shouldn't grow large. Only testing one table does not prove that all the other tables are replicating (or any other tables for that matter), but finding an error in this one table would be a good "canery" check (if this table isn't updating in the replica, then the others probably aren't either).
This sort of check has the advantage of being independent of the replication process - you are not waiting for the replication process to record exceptions in logs, you are instead proactively testing some of the actual data.

SQL Server transactional replication for very large tables

I have set up transactional replication between two SQL Servers on different ends of a relatively slow VPN connection. The setup is your standard "load snapshot immediately" kind of thing where the first thing it does after initializing the subscription is to drop and recreate all tables on the subscriber side and then start doing a BCP of all the data. The problem is that there are a few tables with several million rows in them, and the process either a) takes a REALLY long time or b) just flat out fails. The messages I keep getting when I look in Replication Monitor are:
The process is running and is waiting for a response from the server.
Query timeout expired
Initializing
It then tries to restart the bulk loading process (skipping any BCP files that it has already loaded).
I am currently stuck where it just keeps doing this over and over again. It's been running for a couple days now.
My questions are:
Is there something I could do to improve this situation given that the network connection is so slow? Maybe some setting or something? I don't mind waiting a long time as long as the process doesn't keep timing out.
Is there a better way to do this? Perhaps make a backup, zip it, copy it over and then restore? If so, how would the replication process know where to pick up when it starts applying the transactions, since updates will be occurring between the time I make the backup and get it restored and running on the other side.
Yes.
You can apply the initial snapshot manually.
It's been a while for me, but the link (into BOL) has alternatives to setting up the subscriber.
Edit: From BOL How-tos, Initialize a Transactional Subscriber from a Backup
In SQL 2005, you have a "compact snapshot" option, that allow you to reduce the total size of the snapshot. When applied over a network, snapshot items "travel" compacted to the suscriber, where they are then expanded.
I think you can easily figure the potential speed gain by comparing sizes of standard and compacted snapshots.
By the way, there is a (quite) similar question here for merge replication, but I think that at the snapshot level there is no difference.

SQL Server Merge Replication Schedule

We're replicating a database between London and Hong Kong using SQL Server 2005 Merge replication. The replication is set to synchronise every one minute and it works just fine. There is however the option to set the synchronisation to be "Continuous". Is there any real difference between replication every one minute and continuously?
The only reason for us doing every one minute rather than continuous in the first place was that it recovered better if the line went down for a few minutes, but this experience was all from SQL Server 2000 so it might not be applicable any more...
We have been trying the continuous replication solution on SQL SERVER 2005 and it appeared to be less efficient than a scheduled solution: as your process is continuous, you will not get all the info related to your passed replications (how many replications failed, how long did the process take, why was the process stopped, how many records were updated, how many database structure modifications were replicated to suscribers, and so on), making the replication follow-up a lot more difficult.
We have also been experiencing troubles while modifying database structure (ALTER TABLE instructions) and/or making bulk updates on one of the databases with continuous replication going on.
Keep you "every minute" synchro as it is and just forget about this "continuous" option.

Resources