Need a little advice here. We do some windows mobile development using the .NET Compact framework and SQL CE on the mobile along with a central SQL 2005 database at the customers offices. Currently we synchronize the data using merge replication technology.
Lately we've had some annoying problems with synchronization throwing errors and generally being a bit unreliable. This is compounded by the fact that there seems to be limited information out there on replication issues. This suggests to me that it isn't a commonly used technology.
So, I was just wondering if replication was the way to go for synchronizing data or are there more reliable methods? I was thinking web services maybe or something like that. What do you guys use for this implementing this solution?
Dave
I haven't used replication a great deal, but I have used it and I haven't had problems with it. The thing is, you need to set things up carefully. No matter which method you use you need to decide on the rules governing all of the various possible situations - changes in both databases, etc.
If you are more specific about the "generally being a bit unreliable" then maybe you'll get more useful advice. As it is all I can say is, I haven't had issues with it.
EDIT: Given your response below I'll just say that you can certainly go with a custom replication that uses SSIS or some other method, but there are definitely shops out there using replication successfully in a production environment.
well we've had the error occur twice which was a real pain fixing :-
The insert failed. It conflicted with an identity range check constraint in database 'egScheduler', replicated table 'dbo.tblServiceEvent', column 'serviceEventID'. If the identity column is automatically managed by replication, update the range as follows: for the Publisher, execute sp_adjustpublisheridentityrange; for the Subscriber, run the Distribution Agent or the Merge Agent.
When we tried running the stored procedure it messed with the identities so now when we try to synchronize it throws the following error in the replication monitor.
The row operation cannot be reapplied due to an integrity violation. Check the Publication filter. [,,,Table,Operation,RowGuid] (Source: MSSQLServer, Error number: 28549)
We've also had a few issues were snapshots became invalid but these were relatively easy to fix. However all this is making me wonder whether replication is the best method for what we're trying to do here or whether theres an easier method. This is what prompted my original question.
We're working on a similar situation, but ours is involved with programming a tool that works in a disconnected model, and runs on the Windows Desktop... We're using SQL Server Compact Edition for the clients and Microsoft SQL Server 2005 with a web service for the server solution.
TO enable synchronization services, we initially started by building our own synchronization framework, but after many issues with keeping that framework in sync with the rest of the system, we opted to go with Microsoft Synchronization Framework. (http://msdn.microsoft.com/en-us/sync/default.aspx for reference). Our initial requirements were to make the application as easy to use as installing other packages like Intuit QuickBooks, and I think that we have closely succeeded.
The Synchronization Framework from Microsoft has its ups and downs, but the only bad thing that I can say at this point is that documentation is horrendous.
We're in discussions now to decide whether or not to continue using it or to go back to maintaining our own synchronization subsystem. YMMV on it, but for us, it was a quick fix to the issue.
You're definitely pushing the stability envelope for CE, aren't you?
When I've done this, I've found it necessary to add in a fair amount of conflict tolerance, by not thinking of it so much as synchronization as simultaneous asynchronous data collection, with intermittent mutual updates and/or refreshes. In particular, I've always avoided using identity columns for anything. If you can strictly adhere to true Primary Keys based on real (not surrogate) data, it makes things easier. Sometimes a PK comprising SourceUnitNumber and timestamp works well.
If you can, view the remotely collected data as a simple timestamped, sourceided, userided log of cumulative chronologically ordered transactions. Going the other way, the host provides static validation info which never needs to go back - send back the CRUD transactions instead.
Post back how this turns out. I'm interested in seeing any kind of reliable Microsoft technology that helps with this.
TomH & le dorfier - I think that part of our problem is that we're allowing the customer to insert a large number of rows into one of the replicated table with an identity field. Its a scheduling application which can automatically multiple tasks up to a specified month/year. One of the times that it failed was around the time they entered 15000 rows into the table. We'll look into increasing the identity range.
The synchronization framework sounds interesting but sounds like it suffers from a similar problem to replication of having poor documentation. Trying to find help on replication is a bit of a nightmare and I'm not sure I want us to move to something with similar issues. Wish M'soft would stop releasing stuff that seems to have the support of beta s'ware!
Related
We're having a problem with scaling out with SQL server. This is largely because of a few reasons: 1) poorly designed data structures, 2) heavy lifting and business/processing logic is all done in T-SQL. This was verified by a Microsoft SQL guy from Redmond we hired to perform an analysis on our server. We're literally solving issues by continually increasing the command timeout, which is ridiculous, and not a good long term solution. We have since put together the following strategy and set of phases:
Phase 1: Throw hardware/software at the issue to stop the bleeding.
This includes a few different things like a caching server, but what I'd like to ask everyone here about is specifically related to implementing bi-directional transactional replication on a new SQL server. We have two use-cases for wanting to implement this:
We were thinking of running the long running (and table/row locking) SELECTs on this new SQL "processing box" and throwing them into a caching layer and having the UI read them from the cache. These SELECTs are generating reports and also returning results on the web.
Most of the business logic is in SQL. We have some LONG running queries for SELECTs, INSERTs, UPDATEs, and DELETEs which perform processing logic. The end result is really just a hand-full of INSERTs, UPDATEs, and DELETEs after the processing is complete (lots of cursors). The thought would be to balance the load between these two servers.
I have some questions:
Are these good use-cases for bi-directional transactional replication?
I need to ensure that this solution is going to "just work" and not have to worry about conflicts. Where would conflicts arise within this solution? I have read a few articles about resetting the increment on your identity seed in order to prevent collisions, which makes sense, but how does it handle UPDATEs/DELETEs or other places where conflicts might occur?
What other issues might I run into and we need to watch out for?
Is there a better solution to this problem?
Phase 2: Rewrite the logic into .NET, where it should be, and optimize SQL stored procedures to perform only set-based operations, as it should also be.
This will obviously take a while, which is why we wanted to see if there were some preliminary steps we could take to stop the pain our users are experiencing.
Thanks.
Imho bidirectional replication is very very far from 'it will just work'. Preventing update conflicts requires exquisite planning, ensuring that all that 'processing' is carefully orchestrated never to work on overlapping data. Master-master replication is one of the most difficult solution to pull off.
Consider this: you envision a solution that is providing a cheap 2x scale out with nearly no code modification. such a solution would be quite useful, one would expect to see it deployed everywhere. Yet is nowhere to be seen.
I recommend you search for the many blogs and articles describing gotchas and warnings about (the much more popular) MySQL master-master deployments (eg. If You Must Deploy Multi-Master Replication, Read This First), judge for yourself is the trouble is worth it.
I don't have all the details you do, but I would focus on the application. If you want to just throw money at the problem short term I would make sure that the cheap scale-up is exhausted before considering scale-out (SSD/Fusion drives, more RAM). Also investigate snapshot isolation level/read committed snapshot first, if locking is the main issue.
Please, give me the most serious arguments against this.
Application directly opens a connection to ms sql server, directly executes queries.
So what I'd like to ask:
1) Why it is wrong when the number of users can be up to 1000 executing huge queries?
2) What serious problems can that cause?
3) What should I do?:)
Arguments, the most serious arguments against this kind of implementation!
One of the things to consider is how the queries are done. 1000 queries against a SQL Server DB might be manageable, but 1000 Access queries in which the table is locked, or which are actually joins or views, could use dramatically more memory. It really depends on how the application is written. Some Access apps open a recordset and page through the records one at a time, or fetch a few dozen and work on those, but sometimes Access grabs the whole recordset, for example to allow users to page through data. And I have seen Access lock a set of tables to allow editing of them. That would be bad in your scenario.
Of course, I wholeheartedly agree with the "10 years out of support" issue. That is a guaranteed problem. Mine is only a possibility. And you should probably update SQL Server to a current version also, for the same reason.
What about:
The version of access 97 is totally outdated and wont get any updates, has a crappy look, crappy functionality and in general - IF it requires rework - should be updated.
Problems? You run on a 10 year old out of support platform. What problem can that cause? Well - what about limited support?
Upgrade at least to 2007, better 2010 (coming in a couple of weeks) when you have a momnent time. I personally dispise access based applications (crappy architecture to start with etc.), but if there is one, the update to access 2010 is possible the most painless way to go.
Access 2003 or 2007 would be just fine for the scenario as long as you had an Access developer who was up to speed on how to develop for client/server with large user populations.
Access 97 is still an awfully nice version of Access. I think it's the best version ever produced.
But it is out of support and predates the alteration of default permissions in Windows implemented with the release of Windows 2000. This means that it has some problems in installing with its default permissions (it expects write access to its application folders and registry keys). An installation script can easily alter these appropriately, but you're still left with problems in certain contexts, like trying to run it in Windows Terminal Server/Citrix, where it very often just completely breaks.
I would like to hear an explanation of exactly why someone would choose A97 for new development. Of course, I may be misinterpreting. You may be asking about an existing app, in which case I'd go with "if it ain't broke, don't fix it," and then ask exactly what it is that is perceived as "broken." Those things can be fixed, though it's unlikely that simply upgrading from A97 to something more recent is going to do the job.
I’m currently nearly finished with a brand new application written in access 97 that stores its data in SQL server 2008. As has been said many times before the access/SQL server combination really works great.
Inline with my other applications it is completely unbound using ADO to get the data from the server. I wont drag up that debate again here but it is something you really want to look into as it can offer some great benefits.
Most of the SQL server guide you will find will ask you to check that you have the correct indexes and try to identify the slowest running parts of the system or the ones that get called a lot and then look at making them faster. That might cause you to make a covering index or to denormalise the data in someway.
Generally what is good practice for JET also works well for SQL server, make a good table schema with a good clustered index choice and good supporting indexes and you are 95% of the way there
I've asked about each of these technologies separately, and really haven't found a suitable answer.
We have a server in our central office running SQL Server 2005 Enterprise that has several large (large in the sense that DSL is the limiting factor) databases that we need local copies of at each of our locations. We currently have a few dozen locations, and are needing to bring even more online. The total number of locations we'll need to sync these databases to will be in the several hundreds in the next 2 years.
We are trying to overcome issues with the WAN connection at each location. These are DSL lines and the wiring at the locations isn't always the best. We currently have issues with some of the locations going down as often as every hour. While we are working to resolve these issues with rewiring and assistance from the local telcos, it mainly highlights the problem at hand: we need a two-way sync that can handle being occasionally-connected.
We tried transactional replication for a while, and while it worked some of the time, it was too high maintenance for us, and it seemed to randomly error out often with no possible explanations, forcing us to reinitialize subscriptions (which could take upwards of 4 hours assuming the location would stay connected long enough to get the entire snapshot in one go). We've looked at rolling our own solution from scratch, but I don't feel this would be the best idea given the scale and reliability we are needing.
So far we've also looked at Sync Framework, and as suggested by someone else, Service Broker. Sync Framework seems a better fit, but I was told that Service Broker scales better and is more reliable? I can't find any empirical data on the overhead involved with Sync Framework or Service Broker, so it's proving impossible to compare the two in this regard.
What we really need is a two-way sync between the central office server and a remote client that can run autonomously and can report to an admin in the event of a failure that requires our intervention.
There are so many possible solutions to this problem, all involving completely different technologies, that I need a fresh eye on this.
What do you think would be the optimal solution for our situation, and why?
EDIT: Obviously, upgrading to SQL Server 2008 would solve this problem easily. However, we would like to try to less expensive options first.
I don't have any hard data to offer on this, but we used the sync framework on a project a while ago. My experience with it is really bad. It's slow (even when synchronizing relatively small tables across a LAN), scales terribly and requires a lot of work to manually handle error conditions (it'll happy produce larger packets than WCF can handle by default -- and is only able to split updates into batches when syncing one way, not the other.) And it only works with a few select databases (the client must use MS SQL Compact Edition, as I recall), unless you're willing to write your own SyncAdapter.
Overall, a lot of work just to get a fragile and inefficient solution to your problem. I wouldn't recommend it.
You can Use sync framwork with SQL express 2008 R1/R2 on one end and multitenant db SQl server enterprise on central end. Below is the sample application for n-tier sync over secure WCF channel.You could write windows service to sync data from backend:
http://www.rajneeshnoonia.com/blog/2012/03/n-tier-sync-framework/
It sould be capable enough to handle large number of clients (thousands).
I think we'll look into the SQL Server 2008 upgrade route. It seems the native change tracking support will be the easiest way to accomplish this.
This question was asked quite some time ago, and while it covers possible solutions for SQL 2005 and 2008, it lacks a good solution for SQL 2000, which is still far too common.
I need a way to monitor certain fields of a database table for changes, and notify my application when these changes occur so that I can blast them out on the local network as broadcast messages where anyone with a client can listen for them and display them as alerts (think something similar to stock market data reaching specific thresholds).
I do NOT want to poll the database for several reasons. 1) I don't wish to add additional load to the servers. 2) I would rather get notifications in near real-time rather than wait for the polling frequency to expire.
Now, I could put logic in the applications that update the database, but the data can be updated from several sources, including the web and I don't want to deal with web servers sending notifications across DMZ boundaries, etc.. And I don't want to have to maintain this in 20 different applications (the more overpowering issue).
I've seen this done on SQL 2000 using extended stored procs and triggers, but the xp's seem to be difficult to make cross platform, and they break when installed on SQL 2005 and 2008. Maybe that's just bad code in the examples i've seen, i'm not sure, but I am looking for something that works in SQL 2000 and later versions.
Any ideas?
EDIT:
I've thought about dropping support for 2000, but that really doesn't solve my problem. I would like a solution that is going to continue to work for years to come. One problem with many microsoft technologies is that they drop support for them. For instance, Notification services does what I need it to do, but they decided to deprecate that in 2008 and it won't be available in the next version. So i'm looking for a solution that has a good chance of sticking around.
Very simple solution
You could have a trigger that calls a webpage, notifying of an update.
This may be quite bad, because if the server can't get to the web, for some reason, it may make the insert operation quite slow. Also, depending on the frequency of inserts, it could be equally bad.
Alternative plan
In a trigger, write to a queue. (I happen to be in love with MSMQ). Then, have something waiting against that queue, and you will get the messages in 'real time'. Again, it's prone to the frequency of updates, as above.
Better plan
Have a trigger that posts the data to a 'tblUpdatedThings' table, which you then poll. But I know you don't want to poll. Regardless, I consider this better, due to the reasons I describe.
You want your solution to be in the database, but you want it to be database-independent. You can't have it both ways. Pick one. If you want to be independent of the database, don't allow the sources to write to the database directly, but to call a central service that you control, and where you can trap any events of interest to you.
If you want to use database functionality without polling, you have to deploy code that the database invokes, and you will have a dependency on future versions supporting your code.
I'm currently estimating how to best share data between offices at different geographical locations.
My current preference is for using SQL Server Merge Replication and have a main database and handful of subscribers.
The system will also need to allow a few work sites to work disconnected (no or little connectivity on construction sites).
The amount of data is not going to be large, we're talking about sharing data from a custom ERP system between a manufacturing plant, a handful of regional offices and work sites.
The Sync Framework also looks good and seems to have good support in SQL Server 2008.
What other proven system out there should I investigate that can answer these needs?
For those with experience on sharing data in a similar environment, do you have any particular recommendation and tips?
How difficult has it been for you to deal with data conflicts?
Definitely stick with SQL Server replication, then decide to go down the path of 'build your own replication framework.' I've seen some applications become horrible messes that way.
I've had environments that are setup for snapshot replication in a disconnected model, but the remote sites were read-only. They worked quite well with minimal issues.
I'd also be interested in hearing people's experiences with the sync framework.
You may want to look at what microsoft calls smart clients which is an architecture microsoft talk about for applications that may have temporary network connectivity.
I have already discussed my own experience of SQLServer2005 with #cycnus. My answer is not a real one, just a few arguments to initiate a subject I am very interested in.
Our choice for 'not allways connected' sites is to implement web-based merge replication. Data exchanges happen to be even quicker than through VPNs (as we also have a combination of LAN merge replications). I will easily get a speed of 30 to 40 rows per second through web (512 Down/128 Up, shared) while I'll get a 5 rows per second through LAN (overseas, 256 Up/Down, dedicated). Don't ask me why ...
Tips are numerous: subscription should be of the client type (data circulating basically from the suscriber to the publisher before being distributed). Primary Keys should allways be GUID, for many reasons exposed here, but also for replication issues: we are then sure that any newly created record will be able to find its way up to the publisher, as its PK will be unique. Moreover, I recently had a non-convergence issue with one of my replications (bad experience, exposed here) , where I felt very happy not to use natural keys, as the problem occured on the potential "natural key" column.
Data conflicts should then be basically limited to work organisation problems, where (ususally for bad reasons) the same data is modified by different users in different places at the same time. With our "PK is GUID rule", we do not have conflicts out of these specific situations.
One should always have the possibility to modify its database structure, even if replications are running. It is possible to keep on adding fields, indexes, constraints while running merge replication processes. I also find a workaround for adding tables without reinitialising the replication process (exposed here, still did not understand why I was downvoted on this answer!)