sql server replication algorithm - sql-server

Anyone know how the underlying replication model in sql server works? Do they essentially depend on UTC datetime values to determine if something is new or do they keep a table of all the changes (like a table of tableID+rowid that have changed).
I am building my own "replication" system and was planning on using the dates to know what to replicate. Then I started wondering what would happen if the date got off in the computer for some reason. The obvious choice is to keep a log of the changes as you go and once you replicate those changes, you remove from the log of changes. But thats a lot of extra work, instead of just checking dates.
I figure if sql server replication works by just checking the dates, then that should be good enough for me.
Any wisdom here?
thanks

As a transaction occurs in SQL Server, it is written to the transaction log along with information pertinent to the transaction.
SQL Server replication uses this transaction log to determine which transactions have not yet been processed and to move them to the subscriber. There is a lot more going on under the hood to keep track of the intersection between transactions, publications, subscriptions, etc. but I will leave that to MSDN documentation about SQL Server replication http://msdn.microsoft.com/en-us/library/ms151198.aspx
Moving on to your point about building your own replication system:
Do not build your own replication system. There are too many complications involved that will cause you to spend many many days working. You will be much better off using the items that are shipped with SQL Server.
SQL Server replication methods are pretty impressive out of the box.
If you outline what causes you to think in terms of building your own replication system, we can help you figure out how to use existing items to provision what you need.
Also, read up as much as you can here to get an idea of what it can do for you http://msdn.microsoft.com/en-us/library/ms151198.aspx

SQL Server has a LogReader job that is aptly named. Replication reads the transaction log and applies appropriate transactions to the subscribing databases.

For one thing SQLServer (and it's not the only one) supports multiple replication algorithms.
You can find here details about the ones implemented in SQLServer 2008. Read first the X Replication Overview then follow the How X Replication works for more details.

Related

Replicating a SQL Server database for read access

I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.

Sql Server distribution and configuration for best performance

I want design and implement an enterprise software with silverlight.I use sql server database for this.many useres run sql queireis on sql server database.
how can i configure sql server database for best performance?
how can i distribute sql server database for best performance?
how can i distribute sql server database between some servers for best performance?
and so what technologies can i use in sql server for best performance?
In addition to replication you can use mirroring or log shipping for this. Note that I am talking only about scaling out reads, not write. So reports etc. can be run from the copies of the database but writes must go to the main copy (unless you are using merge replication, which is frightening to me). There are some caveats of course.
With database mirroring, you can use the secondary as a read-only reporting source by taking a snapshot. There are limits here to how many databases you can mirror and there is of course maintenance to manage the snapshots. It is not quite true distribution of resources here, but it can be helpful to offload some of the load. In the next version of SQL Server (Denali), you will be able to set secondaries as read-only, so you can avoid the maintenance of snapshots.
With log shipping, you can essentially keep a stale version of the database around for reporting, and replace it periodically by restoring logs to it. You have a lot more flexibility here compared to replication or mirroring, as you can actually define a delay (like every 6 hours or once a day, you refresh the copy) - which can also serve as a "recover from a shoot-yourself-in-the-foot" scenario. The downside is that to restore a new copy of the database you need to kick all the current users out, as the database needs to be in single user mode in order to recover.
Those are just a couple of ideas for helping scale out reads, but deep down I agree with #gbn - are you solving a problem you don't have yet? It's one thing to design for scalability, but it's very easy to step over that line and completely over-engineer.
Well, SQL Server doesn't really have a load balancing mchanism in and off itself. What it does support, however, is an active/passive node configuration and also replication.
We are using the replication strategy in one application I support. You can read more about it here:
http://msdn.microsoft.com/en-us/library/ms151198.aspx
In our configuration, we basically have a transactional database and a reporting database. We replicate the data from our transactional DB to the reporting DB. Any reporting is done against this reporting DB, so that we don't slow down work being done on the transactional DB due to some long running report.
Note that the replication isn't truly real time. In other words, there's some time involved in replicating the data from the transactional to the reporting DB, albeit a very small time amount. But replication is certainly one strategy you could consider if you are trying to balance workload.
Other things you might consider are partitioning large tables for better performance.
As gbn pointed out in his comment though, it's better to determine if you actually need these strategies before implementing them, because they add a lot of complexity and maintenance efforts, which may not even be needed. It's important to properly analyze how much data you think you will have, and how much activity will be occurring against that data to determine if strategies such as the ones I just described are even needed.
Also, you can refer to this link for some other helpful information and some links to whitepapers you may find helpful:
http://social.msdn.microsoft.com/Forums/en/sqldisasterrecovery/thread/05cf41b7-c558-44bf-86c6-12f5c2b2ffe2

Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008

Whats the best way to track/Log inserted/updated/deleted rows in all tables for a given database in SQL Server 2008?
Or is there a better "Audit" feature in SQL Server 2008?
Short answer is that there is no one single solution fits all. It depends on the system but and requirements but here are couple different approaches.
DML Triggers
Relatively easy to implement, because you have to write one that works well for one table and then apply it to other tables.
Downside is that it can get messy when you have a lot of tables and even more triggers. Managing 600 triggers for 200 tables (insert, update and delete trigger per table) is not an easy task.
Also, it might cause a performance impact.
Creating audit triggers in SQL Server
Log changes to database table with trigger
Change Data Capture
Very easy to implement, natively supported but only in enterprise edition which can cost a lot of $ ;). Another disadvantage is that CDC is still not as evolved as it should be. For example, if you change your schema, history data is lost.
Transaction log analysis
Biggest advantage of this is that all you need to do is to put the database in full recovery mode and all info will be stored in transaction log
However, if you want to do this correctly you’ll need a third party log reader because this is not natively supported.
Read the log file (*.LDF) in SQL Server 2008
SQL Server Transaction Log Explorer/Analyzer
If you want to implement this I’d recommend you try out some of the third party tools that exist out there. I worked with couple tools from ApexSQL but there are also good tools from Idera and Netwrix
ApexSQL Log – auditing by reading transaction log
ApexSQL Comply – uses traces in the background and then parses those traces and stores results in central database.
Disclaimer: I’m not affiliated with any of the companies mentioned above.
Change Data Capture is designed to do what you want, but it requires each table be set up individually, so depending on the number of tables you have, there may be some logistics to it. It will also only store the data in capture tables for a couple of days by default, so you may need an SSIS package to pull it out and store for longer periods.
I don't remember whether there is already some tool for this, but you could always use triggers (then you will have access for temporal tables with changed rows- INSERTED and DELETED). Unfortunately, it could be quite a work to do if you would like to track all tables. I believe that there should be some simpler solution, but do not remember as I said.
EDIT.
Maybe this could be helpful:
--Change tracking
http://msdn.microsoft.com/en-us/library/cc280462.aspx
http://msdn.microsoft.com/en-us/library/cc280386.aspx
This allows you to do audits at the database level; it may or may not be enough to meet the business requirements, as database records usually don't make all that much sense without the logic to glue them together. For instance, knowing that user x inserted a record into the "time_booked" table with a foreign key to the "projects", "users", "time_status" tables may not make all that much sense without the SQL query to glue those 4 tables together.
You may also need to have each database user connect with their own user ID - this is fine with integrated security and a client app, but probably won't work with a website using a connection pool.
The sql server logs are not possible to analyze just like that. There are some 3rd party tools available to read the logs but as far as I know you can't query them for statistics and such. If you need this kind of info you'll have to create some sort of auditing to capture all these events in separate tables. You can use "DDL triggers".

How to synchronize databases in different servers in SQL Server 2008?

I have 2 databases that have the same structure, one on a local machine and one on the company's server. Every determined amount of time, the data from the local DB should be synchronized to the server DB.
I have a general idea on how to do this - create a script that somehow "merges" the information that is not on the server DB, then make this script run as a scheduled job for the server. However, my problem lies in the fact that I am not very well experienced with this.
Does SQL Server Management Studio provide an easy way to do this (some kind of wizard) and generates this kind of script? Is this something I'll have to build from scratch?
I've done some basic google searches and came across the term 'Replication' but I don't fully understand it. I would rather hear some input from people who have actually done this or who are good with explaining this kind of stuff.
Thanks.
Replication sounds like a good option for this, but there would be some overhead (not technical overhead, but the knowledge need to support it).
Another SQL Server option is SSIS. SSIS provides graphical tools to design what you're trying to do. The SSIS package can also run SQL statements, if appropriate. An SSIS package can be started, and therefore scheduled, from a SQL Server job.
You should consider the complexity of the synchronization rules when choosing your solution. For example, would it be difficult to resolve conflicts, such as a duplicate key, when merging the data. A SQL script may be easy to create if the rules are simple. But, complex conflict rules may be more difficult to implement in a script (or, replication).
SQL Server Management Studio unfortunately doesn't offer much in this way.
You should have a serious look at some of the excellent commercial offerings out there:
Red Gate Software's SQL Compare and SQL Data Compare - excellent tools, highly recommended! You can even compare a live database against a backup from another database and synchronize the data - pretty nifty!
ApexSQL's SQL Diff and SQL Data Diff
They all cost money - but if you're serious about it, and you use them in your daily routine, they're paid for in no time at all - well worth every dime.
The only "free" option you have in SQL Server 2008 would be to create a link between the two servers and then use something like the MERGE statement (new in SQL Server 2008) to transfer the data. That doesn't work for structural changes, and it's limited only to having a live connection between the two servers.
You should definitely read up on transactional replication. It sounds like a good fit for the situation you've described. Here are a few links to get you started.
How Transactional Replication
Works
How do I... Configure
transactional replication between two
SQL Server 2005 systems?
Performance Tuning SQL Server
Transactional Replication
What you want is Peer-to-Peer Transactional Replication, which allows data to be updated at both databases yet keep them in sync through a contiguous merge of changes. This is the closes match to what you want, but is a fairly costly option (requires Enterprise Edition on both sites). Another option is Bidirectional Transactional Replication, but since this requires also two EE licenses, I say that peer-to-peer is easier to deploy for the same money.
A more budget friendly option is Updatable Subscriptions for Transactional Replication, but updatable subscriptions are being deprecated and you'd bet your money on a loosing horse.
Another option is to use Merge Replication. And finally, for the cases when the 'local' database is quite mobile there is Sync Framework.
Note that all these options require some configuration and cooperation from the Company's server DB.
There are some excellent third party tools out there. For me, xSQL Data Compare has always done the trick. And because the comparisons are highly modifiable it is suitable for almost every data compare or data-synchronization scenario. Hope this helps!

what's a good way to synchronize a sql server 2008 database from a 2005 database automatically?

Ok, the scenario is... two servers, on completely different parts of the internet.
The sql 2008 database just needs to get data updates and schema changes. It doesn't need to send anything to the 2005 database. Basically just suck data and schema as efficiently as possible automatically as a scheduled task.
The database is quite huge.... but the changes per day are probablly around 20/30 megabytes of data/
I can't run any of the inbuilt replication on the 2005 database.
I've had a wee look at the Sync Framework, I think that might do what I want, but seems a bit painful and requires a bit of work to get going. I'm wondering if there is tooling out there to make this easier?
or?? not quite sure what my options are.
I can't run any of the inbuilt
replication on the 2005 database.
Any reason for this restriction? Replication is the way to solve your problem. W/o a replication infrastructure you simply won't be able to detect data changes, nor schema changes. There are only two ways to detect the changes: either via triggers and tracking tables (and that is Merge Replication) or via the database log (and that is Transactional Replication).
Sync Framework itself, if it would be used, would require either Change Tracking or Change Data Capture. But these are 2008 specific technologies and they're really nothing else but replication in disguise (they use the very same infrastructure used by Merge and respectively Transactional Replication).
Even if you want to roll your own, you'll find out quickly that shipping the changes over is the trivial part, eg. using Service Broker for reliable delivery semantics. But the Real hard problem is detecting the changes, and that is hard. Diff-ing a 'quite huge' database over the internet to detect changes is just not going to work. So relying on the built-in infrastructure to detect changes, namely the two forms of Replication, is just the obvious solution.
Could you automate RedGate's SQL Compare and/or SQL Data Compare? http://www.red-gate.com/products/SQL_Compare/index.htm ... you could at least try that out with the 14-day trial and see if it is worth the investment. Much cheaper than tooling it yourself, IMHO.
maybe these questions help you:
Microsoft Sync Framework Or Replication
SQL Server Data Archive Solution
Is there a way to replicate some data not all data in db by sql server replication?
you can make an application that generate a script from your changed data in your favorite period and then run this script in your target server.

Resources