I am working on an application which will run on 5 different machine. Data of each machine will be stored on the same machine. I have to sync data of all machine with a centralized database. I am unable to find answer to this question: What will be the best approach for this?
On Each machine a schedule sql job will sync the data with centralized server
Server will request each machine to sync the data
Kindly advice with advantages and disadvantages.
You can configure sql server to use replication and publish a replica to each of the 5 machines.
You have to evaluate if you need all changes in all machines if yes you need merge replication, they have other structures and options such as filters to only have specific data in your remote machines.
Microsoft has solved this problem for you - they have developed the Sync framework. I used this around 4 years ago, and it was rather good - but as #Juan says in his comment, it's now no longer actively developed by Microsoft, and the Open Source version seems to be pretty quiet.
There are two obvious options.
You can use (merge) replication if you want to solve this as a database issue. This is the simplest solution - replication simply copies the data from your clients to the target database, and Microsoft deals with all the scheduling, conflict resolution etc.
However, often there is some data or business logic between the clients - a common problem is that there may be a conflict between primary keys or lookup values between your clients. For instance, if you're using integers for your primary keys, and machine 1 and machine 2 both assign "1000" as the primary key for a record in the "sales" table, replication becomes tricky.
Another common scenario is that there is some business logic that must be enforced which is simple on a single machine, but hard when dealing with multiple occasionally connected machines. For instance, if a customer places an order, the application may want to check that the total order value doesn't exceed the customer's credit limit - but what if there is another transaction pending on a separate client?
In those cases, you may want to create a service-oriented synchronization design.
Microsoft provided WCF for this. Whilst more flexible, it's probably much more effort to implement. In this model, you probably treat the local database on your clients as a cache, and you push messages to the central machine to reflect user interaction.
Related
I have one central database and 25 client databases and all have same schema.
I want that whenever some changes are done in some tables of the central database then these changes flow down to the client database.
The databases used is SQL Express so I cannot use replication.
The solution that I have today is to make keep track of the changes in the central database and then a program makes a text file with these changes and sends them down to the client databases.Another program reads these text files and updates the client database.
There are three problems with this:-
1. The files get lost or arrive in jumbled order which messes up the client data
2. the process is slow
3. the programs are sometimes shutdown so the whole sync flow gets stopped.
Is there a reliable alternative that is fast and secure ?
I wonder how banking software are made ...they never lose transactions and they are fast.
Add an UpdateDate column to all the entities that need to be replicated. At each client add a linked server to the central repository. Now, every 5 minutes or so, poll your central repository for changes using the last UpdateDate of a client entity and grab the delta.
Then use merge or insert and update to merge data on the client. That's a very reliable way of doing homebrew replication. To keep track of deleted elements you would either want to mark them as deleted or have another table to keep track of entity kind and its reference, again combined with UpdateDate for replication.
Update
Then you mention transactions and banking software. When you do your replication via files, we ain't talkin' about no transactional replication here, not by a long shot.
If you need transactional consistency you need to subscribe to the transaction flow of the data warehouse.
I don't want to be unhelpful and you haven't given any background about your business needs, but you have to decide if your priority is really "fast and secure" or if it's actually "cheap". Replicating changes between multiple databases in a reliable, consistent way is not easy (as you know) and it's highly unlikely that you will be able to develop a solution yourself that has the features, stability and performance of SQL Server replication.
SQL Express can be a replication subscriber, by the way, so it's not clear why it doesn't meet your needs. But if it doesn't, you should estimate the cost to your business (or customer) of dealing with issues caused by an unreliable solution: your time, business downtime, finding and correcting incorrect data, customer complaints, lost business etc. Then compare that to the cost of 25 SQL Server licenses (you should certainly be able to get a good discount when you order that volume), additional hardware (if any) and the costs of training, consulting and/or learning how to use replication. Then extrapolate those costs over 5 years or so. You may find that it's cheaper just to buy the solution you need. And of course buying the full SQL Server edition means you get a lot of other new features that might be useful to you.
If you (or your boss) is really determined to get something for nothing, you might want to investigate PostgreSQL or MySQL. They both have free replication solutions that seem to be widely enough used to be reliable for many companies. Of course, you then need to calculate the costs of switching to a new database platform.
If you have one central database and 25 clients, you can easily do it with one (yes only one) SQL server licence for the main database. Subscribers to this database can run SQL express. As long as users access the the client databases, you are not even obliged to buy SQL CALs.
Back to banking software, be sure that they are paying good money for their server licenses! So don't be surprised if these are reliable and fast ...
I'm helping out a business by providing an Access DB to manage requests of various types. As they are a construction company, they have one machine in an 'office' on the building site, plus 3 based in their main office. The machine on site has no internet connectivity.
Is there any (reasonably simple) way to synchronise the offsite and onsite databases every so often? I realise the tables could be merged, but each has an autoincrement field which must be synced between instances (i.e. when merging two tables the autoincrement should be reassigned based on the combination of records).
Cheers in advance,
Paul
Jet Replication is one answer, but not an easy one, as for a remote location you have to use indirect or Internet replication, both of which are pretty complex to set up and require regular maintenance to keep running reliably. That said, indirect replication works very well (I've never used Internet replication because of the hardwired dependency on IIS, which I consider unacceptable).
For one-stop shopping on the subject of Jet Replication, see the Jet Replication Wiki.
Microsoft is gradually phasing out support for Jet replication in Access (though I expect it to be supported as long as MDB files are supported without conversion), so a better solution to the problem might be to use the tools Microsoft has put in place to replace the functionality Jet replication provided. This would be Sharepoint, of course. In A2007, Sharepoint was way too inadequate to be a proper replacement for Jet replication, but starting with A2010 and Sharepoint 2010, all that changes.
If I had a new client coming to me with this requirement, even though I've got years and years of experience with Jet replication, I'd recommend A2010 and Sharepoint 2010 as the solution to the problem, and would say to wait.
It may be that a client doesn't want to spring for a Sharepoint server, and in that case, there's hosted Sharepoint available, which should be supporting Sharepoing 2010 shortly after the release of Office 2010 in May.
Of course, it's also possible to program synchronization manually, but that's quite complex in a multi-master scenario. However, if the records in the two databases do not overlap (i.e., records created in one are not updated in the other, or put another way, it's mostly and add-only app for each database), it's not as bad a problem. Deletes are a harder problem, but not unresolvable.
For your Autonumber PK field use a ReplicationID (GUID) instead of a long so that the numbers will be unique across all copies of the database, even if they are disconnected.
There are a lot of options for replication with Access. Here is an article to get you started.
Understanding Access Replication
For our application(desktop in .net), we want to have 2 databases in 2 different remote places(different countries).Is it possible to use replication to keep the data in sync in both the databases while application changes data?. What other strategies can be used? Should the sync happen instantaneously or, at a scheduled time? What if we decide to keep one database 'readonly'?
thanks
You need to go back to your requirements I think.
Does data need to be shared between two sites?
Can both sites update the same data?
What's the minimum acceptable time for an update in one location to be visible in another?
Do you need failover/disaster recovery capability?
Do you actually need two databases? (e.g is it for capacity, for failover or simply because the network link between the two sites is slow? etc)
Any other requirements around data access/visibility?
Real-time replication is one solution, an overnight extract-transform-load process could be another. It really depends on your requirements.
I think the readonly question is key. If one database is readonly then you can use mirroring to sync them, assuming you have a steady connection.
What is the bandwidth and reliability of connection between the sites?
If updates are happening at both locations (on the same data) then Merge Replication is a possibility. It's really designed for mobile apps where users in the field have some subset of the data and conflicts may need to be resolved at replication time.
High level explanation of the various replication types in SQL Server including the new Sync Framework in SQL Server 2008 can be found here: http://msdn.microsoft.com/en-us/library/ms151198.aspx
-Krip
I need to implement a SQL Server replication solution. Very simple need for now. I just need to replicate one pretty simple table from 200 remote sites or so to one central server. The data is not really transactional in nature. I just need it moved up to the central server once a day. I can't decide if I should use push or pull, and I'm not sure if the distributor should live on the server side, or on all the clients.
The server and all the remote sites all live on a fairly decent VPN. The server is 2005, and it's not being pushed very hard at the moment. Just a few jobs here and there collecting data (which I want to get away from) and pushing reports/exports to various vendors once a day. The sites are a mix of 2000/2005.
I'd recommend you do some scalability tests first. Replication is very verbose in terms of agent jobs and T-SQL connections for reading and writing data. 200 publications you're talking 200 publisher agents, 200 subscription agents, plus the distributor maintenance. Most sites complain about maintenance problems of having 1 publisher and 1 subscriber... Say you manage to pull this off and operate it successfully, what is going to be your upgrade story? And how are you going to implement a schema change?
The largest replication deployment I heard of (some years ago) had I believe 450 publishers and was implemented by an army of Microsoft field consultants sweating for months to bend the behemoth into shape. Your 200 replication sites project is way more ambitious than you realize.
I suggest you explore some alternatives too. If you need a periodic table snapshot then SSIS can be a good match. If you need a continuous stream of changes then Service Broker can scale way way easier than replication.
If there is need to adjust the replication down the road, having the central server initiate a pull will be much easier to administrate than adjusting 200 sites to accomplish the same thing. Also, that would naturally manage the load, rather than some scheme to prevent, say, 100 remote sites all connecting at once.
Push subscriptions are the way to go here if you wish to centrally manage the data distribution of your application platform.
From what you have described you will need to make a choice between Snapshot Replication and Transactional Replication for your architecture.
Dependent on how much data you are looking to push and also the schedule of your updates will determine the most appropriate Replication Method for you to use. For example, if you looking to update all Subscriptions at the same time then dependent on how much data you need to push Snapshot Replication may not be suitable and you may be better off using Transactional Replication, perhaps pushed at specific determined intervals. Your network may even be able to support near real-time replication however conducting a small test of your environment will determine this for you. For example, setup the Publisher, local Distributor and a handful of Subscribers at geographically different locations on your network in order to test network transfer times and Replication Latency.
Things to consider:
How much data is to be moved across
the network? Size in Kb and record
volume.
Consider the physical location of
your sites
What is the suitability of your
network? Seed, capacity etc.
You may wish to consider using a
dedicated Distributor.
Is it possible to configure multiple database servers (all hosting the same database) to execute a single query simultaneously?
I'm not asking about executing queries using multiple CPUs simultaneously - I know this it possible.
UPDATE
What I mean is something like this:
There are two 2 servers: Server1 and Server2
Both server host database Foo and both instances of Foo are identical
I connect to Server1 and submit a complicated (lots of joins, many calculations) query
Server1 decides that some calculations should be made on Server2 and some data should be read from that server, too - appropriate parts of the query are sent to Server2
Both servers read data and perform necessary calculations
Finally, results from Server1 and Server2 are merged and returned to the client
All this should happen automatically, without need to explicitly reference Server1 or Server2. I mean such parallel query execution - is it possible?
UPDATE 2
Thanks for the tips, John and wuputah.
I am researching alternatives of increasing both availability and capacity of MOSS database backend. So what I'm looking for is some kind out-of-the-box SQL Server load balancing solution that would be transparent to the application, because I cannot modify the application in any way. I guess SQL Server has no such feature (and Oracle, as far as I understand it, does - it is RAC mentioned by wuputah).
UPDATE 3
A quote from the Top Tips for SQL Server Clustering article:
Let's start by debunking a common
misconception. You use MSCS clustering
for high availability, not for load
balancing. Also, SQL Server does not
have any built-in, automatic
load-balancing capability. You have to
load balance through your
application's physical design.
What you're really talking about is a clustering solution. It looks like SQL Server and Oracle have solutions to this, but I don't know anything about them. I can guess they would be very costly to buy and implement.
Possible alternate suggestions would be as follows:
Use master-slave replication, and do your complex read queries from the slave. All writes must go to the master, which are then sent to the slave, so things stay in sync. This helps things go faster because the slave only has to worry about the writes coming from the master, which are already predetermined on behalf of the slave (no deadlocks etc). If you're looking to utilize multiple servers, this is the first place I would start.
Use master-master replication. This means that all writes from both servers go to each other, so they stay in sync (at least theoretically). This has some of the benefits as master-slave but you don't have to worry about writes going to one server instead of the other. The more common use of master-master replication is for failover support; master-slave is really better suited to performance.
Use the feature John Sansom talked about. I don't know much about it, but it seems its basis is splitting your database into tables on different servers, which will have some benefits as well as drawbacks. The big issue is that since the two systems can't share memory, they will have to share a lot of data over the network to compute complex joins.
Hope this helps!
RE Update 1:
If you can't modify the application, there is hope, but it might be a bit complicated. If you were to set up master-slave replication, you can then set up a proxy to send read queries to the slave(s) and write queries to the master(s). I've seen this done with MySQL, but not SQLServer. That's a bit of a problem unless you want to write the proxy yourself.
This has been discussed on SO previously, so you can find more information there.
RE Update 2:
Microsoft's clustering might not be designed for performance, but that's Microsoft fault. That's still the level of complexity you're talking about here. If they say it won't help, then your options are limited to those above and by what you do with your application (like sharding, splitting into multiple databases, etc).
Yes I believe it is possible, well sort of, let me explain.
You need to look into and research the use of Distributed Queries. A distributed query runs across multiple servers and is typically used to reference data that is not stored locally.
http://msdn.microsoft.com/en-us/library/ms191440.aspx
For example, Server A may hold my Customers table and Server B holds my Orders table. It is possible using distributed queries to run a query that references both Server A and Server B, with each server managing the processing of its local data (which could incorporate the use of parallelism).
Now in theory you could store the exact same data on each server and design your queries specifically so that only certain table were referenced on certain servers, thereby distributing the query load. This is not true parallel processing however, in terms of CPU.
If your intended goal is to distribute the processing load of your application then the typical approach with SQL Server is to use Replication to distribute data processing across multiple servers. This method is also not to be confused with parallel processing.
http://databases.about.com/cs/sqlserver/a/aa041303a.htm
I hope this helps but of course please feel free to pose any questions you may have.
Interesting question, but I'm struggling to get my head around this being beneficial for a multi-user system.
If I'm the only user having half my query done on Server1 and the other half on Server2 sounds cool :)
If there are two concurrent users (lets say with queries of identical difficulty) then I'm struggling to see that this helps :(
I could have identical data on both servers and load balancing - so I get Server1, my mate gets Server2 - or I could have half the data on Server1 and the other half on Server2, and each will be optimised, and cache, just their own data - spreading the load. But whenever you have to do a merge to complete a query the limiting factor becomes the pipe-size between them.
Which is basically Federated Database Servers. Instead of having all my Customers on one server and all my Orders on the other I could, say, have my USA customers and their orders on one, and my European customers/orders on the other, and only if my query spans both is there any need for a merge step.