I have 2 databases that have the same structure, one on a local machine and one on the company's server. Every determined amount of time, the data from the local DB should be synchronized to the server DB.
I have a general idea on how to do this - create a script that somehow "merges" the information that is not on the server DB, then make this script run as a scheduled job for the server. However, my problem lies in the fact that I am not very well experienced with this.
Does SQL Server Management Studio provide an easy way to do this (some kind of wizard) and generates this kind of script? Is this something I'll have to build from scratch?
I've done some basic google searches and came across the term 'Replication' but I don't fully understand it. I would rather hear some input from people who have actually done this or who are good with explaining this kind of stuff.
Thanks.
Replication sounds like a good option for this, but there would be some overhead (not technical overhead, but the knowledge need to support it).
Another SQL Server option is SSIS. SSIS provides graphical tools to design what you're trying to do. The SSIS package can also run SQL statements, if appropriate. An SSIS package can be started, and therefore scheduled, from a SQL Server job.
You should consider the complexity of the synchronization rules when choosing your solution. For example, would it be difficult to resolve conflicts, such as a duplicate key, when merging the data. A SQL script may be easy to create if the rules are simple. But, complex conflict rules may be more difficult to implement in a script (or, replication).
SQL Server Management Studio unfortunately doesn't offer much in this way.
You should have a serious look at some of the excellent commercial offerings out there:
Red Gate Software's SQL Compare and SQL Data Compare - excellent tools, highly recommended! You can even compare a live database against a backup from another database and synchronize the data - pretty nifty!
ApexSQL's SQL Diff and SQL Data Diff
They all cost money - but if you're serious about it, and you use them in your daily routine, they're paid for in no time at all - well worth every dime.
The only "free" option you have in SQL Server 2008 would be to create a link between the two servers and then use something like the MERGE statement (new in SQL Server 2008) to transfer the data. That doesn't work for structural changes, and it's limited only to having a live connection between the two servers.
You should definitely read up on transactional replication. It sounds like a good fit for the situation you've described. Here are a few links to get you started.
How Transactional Replication
Works
How do I... Configure
transactional replication between two
SQL Server 2005 systems?
Performance Tuning SQL Server
Transactional Replication
What you want is Peer-to-Peer Transactional Replication, which allows data to be updated at both databases yet keep them in sync through a contiguous merge of changes. This is the closes match to what you want, but is a fairly costly option (requires Enterprise Edition on both sites). Another option is Bidirectional Transactional Replication, but since this requires also two EE licenses, I say that peer-to-peer is easier to deploy for the same money.
A more budget friendly option is Updatable Subscriptions for Transactional Replication, but updatable subscriptions are being deprecated and you'd bet your money on a loosing horse.
Another option is to use Merge Replication. And finally, for the cases when the 'local' database is quite mobile there is Sync Framework.
Note that all these options require some configuration and cooperation from the Company's server DB.
There are some excellent third party tools out there. For me, xSQL Data Compare has always done the trick. And because the comparisons are highly modifiable it is suitable for almost every data compare or data-synchronization scenario. Hope this helps!
Related
Recently I had to implement transactional replication to have a live copy of that database on another server for reporting purposes. While configuring replication I realized that a lot of tables didn't have a primary key, so I could not publish all the tables I wanted to.
Second option was to implement merge replication but that would have added a GUID column to all the tables. Since it is a database for a vendor application and vendor has warned us to not "touch" the database structure because any change in the database structure can cause their application to break. So merge replication is not an option anymore.
I have been doing some research on other available options for me in this scenario; the only thing I could find is Log Shipping. I know it will leave my database in Read-Only mode but (to my knowledge) since this is the only option I am left with and it will be strictly used for Reporting purposes only I think I can live with this.
Can anyone suggest a better solution for this? Or is Log Shipping the only option left for me?
It is SQL Server 2008 R2 64-bit DataCenter Edition.
Your other options are:
Database mirroring, and using a snapshot for read-only operations. It can be a pain to manage snapshots.
Upgrading to SQL Server 2012, and make use of Readable Secondaries in Availability Groups. This can be a pain in the wallet.
You mention log shipping but, based on your follow-up comments I don't think it's clear that, every time you restore a log to the log shipped copy, you need to kick out all of the users that may be running reports. This is because you need exclusive access to the database in order to restore the log. This is another case of "you get what you pay for" - you can log ship to Express instances, if you want to (and if your database supports it), but it's not exactly a watertight solution.
I want design and implement an enterprise software with silverlight.I use sql server database for this.many useres run sql queireis on sql server database.
how can i configure sql server database for best performance?
how can i distribute sql server database for best performance?
how can i distribute sql server database between some servers for best performance?
and so what technologies can i use in sql server for best performance?
In addition to replication you can use mirroring or log shipping for this. Note that I am talking only about scaling out reads, not write. So reports etc. can be run from the copies of the database but writes must go to the main copy (unless you are using merge replication, which is frightening to me). There are some caveats of course.
With database mirroring, you can use the secondary as a read-only reporting source by taking a snapshot. There are limits here to how many databases you can mirror and there is of course maintenance to manage the snapshots. It is not quite true distribution of resources here, but it can be helpful to offload some of the load. In the next version of SQL Server (Denali), you will be able to set secondaries as read-only, so you can avoid the maintenance of snapshots.
With log shipping, you can essentially keep a stale version of the database around for reporting, and replace it periodically by restoring logs to it. You have a lot more flexibility here compared to replication or mirroring, as you can actually define a delay (like every 6 hours or once a day, you refresh the copy) - which can also serve as a "recover from a shoot-yourself-in-the-foot" scenario. The downside is that to restore a new copy of the database you need to kick all the current users out, as the database needs to be in single user mode in order to recover.
Those are just a couple of ideas for helping scale out reads, but deep down I agree with #gbn - are you solving a problem you don't have yet? It's one thing to design for scalability, but it's very easy to step over that line and completely over-engineer.
Well, SQL Server doesn't really have a load balancing mchanism in and off itself. What it does support, however, is an active/passive node configuration and also replication.
We are using the replication strategy in one application I support. You can read more about it here:
http://msdn.microsoft.com/en-us/library/ms151198.aspx
In our configuration, we basically have a transactional database and a reporting database. We replicate the data from our transactional DB to the reporting DB. Any reporting is done against this reporting DB, so that we don't slow down work being done on the transactional DB due to some long running report.
Note that the replication isn't truly real time. In other words, there's some time involved in replicating the data from the transactional to the reporting DB, albeit a very small time amount. But replication is certainly one strategy you could consider if you are trying to balance workload.
Other things you might consider are partitioning large tables for better performance.
As gbn pointed out in his comment though, it's better to determine if you actually need these strategies before implementing them, because they add a lot of complexity and maintenance efforts, which may not even be needed. It's important to properly analyze how much data you think you will have, and how much activity will be occurring against that data to determine if strategies such as the ones I just described are even needed.
Also, you can refer to this link for some other helpful information and some links to whitepapers you may find helpful:
http://social.msdn.microsoft.com/Forums/en/sqldisasterrecovery/thread/05cf41b7-c558-44bf-86c6-12f5c2b2ffe2
Anyone know how the underlying replication model in sql server works? Do they essentially depend on UTC datetime values to determine if something is new or do they keep a table of all the changes (like a table of tableID+rowid that have changed).
I am building my own "replication" system and was planning on using the dates to know what to replicate. Then I started wondering what would happen if the date got off in the computer for some reason. The obvious choice is to keep a log of the changes as you go and once you replicate those changes, you remove from the log of changes. But thats a lot of extra work, instead of just checking dates.
I figure if sql server replication works by just checking the dates, then that should be good enough for me.
Any wisdom here?
thanks
As a transaction occurs in SQL Server, it is written to the transaction log along with information pertinent to the transaction.
SQL Server replication uses this transaction log to determine which transactions have not yet been processed and to move them to the subscriber. There is a lot more going on under the hood to keep track of the intersection between transactions, publications, subscriptions, etc. but I will leave that to MSDN documentation about SQL Server replication http://msdn.microsoft.com/en-us/library/ms151198.aspx
Moving on to your point about building your own replication system:
Do not build your own replication system. There are too many complications involved that will cause you to spend many many days working. You will be much better off using the items that are shipped with SQL Server.
SQL Server replication methods are pretty impressive out of the box.
If you outline what causes you to think in terms of building your own replication system, we can help you figure out how to use existing items to provision what you need.
Also, read up as much as you can here to get an idea of what it can do for you http://msdn.microsoft.com/en-us/library/ms151198.aspx
SQL Server has a LogReader job that is aptly named. Replication reads the transaction log and applies appropriate transactions to the subscribing databases.
For one thing SQLServer (and it's not the only one) supports multiple replication algorithms.
You can find here details about the ones implemented in SQLServer 2008. Read first the X Replication Overview then follow the How X Replication works for more details.
Ok, the scenario is... two servers, on completely different parts of the internet.
The sql 2008 database just needs to get data updates and schema changes. It doesn't need to send anything to the 2005 database. Basically just suck data and schema as efficiently as possible automatically as a scheduled task.
The database is quite huge.... but the changes per day are probablly around 20/30 megabytes of data/
I can't run any of the inbuilt replication on the 2005 database.
I've had a wee look at the Sync Framework, I think that might do what I want, but seems a bit painful and requires a bit of work to get going. I'm wondering if there is tooling out there to make this easier?
or?? not quite sure what my options are.
I can't run any of the inbuilt
replication on the 2005 database.
Any reason for this restriction? Replication is the way to solve your problem. W/o a replication infrastructure you simply won't be able to detect data changes, nor schema changes. There are only two ways to detect the changes: either via triggers and tracking tables (and that is Merge Replication) or via the database log (and that is Transactional Replication).
Sync Framework itself, if it would be used, would require either Change Tracking or Change Data Capture. But these are 2008 specific technologies and they're really nothing else but replication in disguise (they use the very same infrastructure used by Merge and respectively Transactional Replication).
Even if you want to roll your own, you'll find out quickly that shipping the changes over is the trivial part, eg. using Service Broker for reliable delivery semantics. But the Real hard problem is detecting the changes, and that is hard. Diff-ing a 'quite huge' database over the internet to detect changes is just not going to work. So relying on the built-in infrastructure to detect changes, namely the two forms of Replication, is just the obvious solution.
Could you automate RedGate's SQL Compare and/or SQL Data Compare? http://www.red-gate.com/products/SQL_Compare/index.htm ... you could at least try that out with the 14-day trial and see if it is worth the investment. Much cheaper than tooling it yourself, IMHO.
maybe these questions help you:
Microsoft Sync Framework Or Replication
SQL Server Data Archive Solution
Is there a way to replicate some data not all data in db by sql server replication?
you can make an application that generate a script from your changed data in your favorite period and then run this script in your target server.
Are there any performance benefits of using SQL Server 2008 over SQL Server 2005?
Moving a single database from SQL Server 2005-2008 will not notice a difference really. However, there are new tools and options available in SQL Server 2008 that you MIGHT be able to leverage to provider better performance later on in your application.
One item that comes to mind is filtered indexes. Allowing to create an index on a subset of information.
There may be new features in the engine which execute queries in different ways. This includes changes to the optimiser.
Therefore, the only way you can POSSIBLY tell, is to gather detailed performance data from your application on MSSQL2005, and then repeat the experiment on the same (production-quality) hardware with SQL2008.
You will need to make sure your application works correctly- such a migration can't be done lightly as any change could introduce bugs.
Also, the new version of the database could have performance regressions - which you need to be very careful about.
So in summary:
Benchmark YOUR application on SQL2005
Benchmark it on SQL2008
Use the same production-grade test hardware in your lab both times
Don't run VMs (unless that's what you do in production)
Don't change other parameters
This may not be easy if your application is big / complicated.
Yes. You can compress data in SQL 2008 which can have drastic impact on backup and data transfer times.
Actually SQL2008 has built-in compression that you can enable out of the box which could definately improve performance, but it may depend on what is being returned. I would try this option and benchmark to see if you feel its a worthy change.