create data patch for database (synchronize databases) - database

There is 2 databases: "temp" and "production". Each night production database should be "synchronized", so it will have exactly same data as in "temp". Database sizes are several GB and just copying all data is not an option. But changes are usually quite small: ~100 rows added, ~1000 rows updated and some removed. About 5-50Mb per day.
I was thinking maybe there is some tool (preferably free) that could go trough both databases and create patch, that could be applied to "production database". Or as option just "synchronize" both databases. And it should quite fast. In other words something like rsync for data in databases.
If there is some solution for specific database (mysql, h2, db2, etc), it will be also fine.
PS: structure is guaranteed to be same, so this question is only about transferring data

Finally i found a way to do it in Kettle (PDI):
http://wiki.pentaho.com/display/EAI/Synchronize+after+merge
Only one con: I need create such transformation for each table separately.

Why not setup database replication from Temp Database to your Production database where your temp database will act as the Master and Production will act as a slave. Here is a link for setting up replication in MySql. MSSQL also supports database replication as well. Google should show up many tutorials.

Related

db replication vs mirroring

Can anyone explain the differences from a replication db vs a mirroring db server?
I have huge reports to run. I want to use a secondary database server to run my report so I can off load resources from the primary server.
Should I setup a replication server or a mirrored server and why?
For your requirements the replication is the way to go. (asumming you're talking about transactional replication) As stated before mirroring will "mirror" the whole database but you won't be able to query unless you create snapshots from it.
The good point of the replication is that you can select which objects will you use and you can also filter it, and since the DB will be open you can delete info if it's not required( just be careful as this can lead to problems maintaining the replication itself), or create specific indexes for the report which are not needed in "production". I used to maintain this kind of solutions for a long time with no issues.
(Assuming you are referring to Transactional Replication)
The biggest differences are: 1) Replication operates on an object-by-object basis whereas mirroring operates on an entire database. 2) You can't query a mirrored database directly - you have to create snapshots based on the mirrored copy.
In my opinion, mirroring is easier to maintain, but the constant creation of snapshots may prove to be a hassle.
As mentioned here
Database mirroring and database replication are two high data
availability techniques for database servers. In replication, data and
database objects are copied and distributed from one database to
another. It reduces the load from the original database server, and
all the servers on which the database was copied are as active as the
master server. On the other hand, database mirroring creates copies of
a database in two different server instances (principal and mirror).
These mirror copies work as standby copies and are not always active
like in the case of data replication.
This question can also be helpful or have a look at MS Documentation

Replicating a SQL Server database for read access

I have an application that is in production with its own database for more than 10 years.
I'm currently developing a new application (kind of a reporting application) that only needs read access to the database.
In order not to be too much linked to the database and to be able to use newer DAL (Entity Framework 6 Code First) I decided to start from a new empty database, and I only added the tables and columns I need (different names than the production one).
Now I need some way to update the new database with the production database regularly (would be best if it is -almost- immediate).
I hesitated to ask this question on http://dba.stackexchange.com but I'm not necessarily limited to only using SQL Server for the job (I can develop and run some custom application if needed).
I already made some searches and had those (part-of) solutions :
Using Transactional Replication to create a smaller database (with only the tables/columns I need). But as far as I can see, the fact that I have different table names / columns names will be problematic. So I can use it to create a smaller database that is automatically replicated by SQL Server, but I would still need to replicate this database to my new one (it may avoid my production database to be too much stressed?)
Using triggers to insert/update/delete the rows
Creating some custom job (either a SQL Job or some Windows Service that runs every X minutes) that updates the necessary tables (I have a LastEditDate that is updated by a trigger on my tables, so I can know that a row has been updated since my last replication)
Do you some advice or maybe some other solutions that I didn't foresee?
Thanks
I think that the Transactional replication is the better than using triggers.
Too much resources would be used in source server/database due to the trigger fires by each DML transaction.
Transactional rep could be scheduled as a SQL job and run it few times a day/night or as a part of nightly scheduled job. IT really depends on how busy the source db is...
There is one more thing that you could try - DB mirroring. it depends on your sql server version.
If it were me, I'd use transactional replication, but keep the table/column names the same. If you have some real reason why you need them to change (I honestly can't think of any good ones and a lot of bad ones), wrap each table in a view. At least that way, the view is the documentation of where the data is coming from.
I'm gonna throw this out there and say that I'd use Transaction Log shipping. You can even set the secondary DBs to read-only. There would be some setting up for full recovery mode and transaction log backups but that way you can just automatically restore the transaction logs to the secondary database and be hands-off with it and the secondary database would be as current as your last transaction log backup.
Depending on how current the data needs to be, if you only need it done daily you can set up something that will take your daily backups and then just restore them to the secondary.
In the end, we went for the Trigger solution. We don't have that much changes a day (maybe 500, 1000 top), and it didn't put too much pressure on the current database. Thanks for your advices.

SQL Server replication for 70 databases with transformation in a small time window

We have 70+ SQL Server 2008 databases that need to be copied from an OLTP environment to a separate reporting server. Once the DB's are copied, we will do some partial data transformation: de-normalization, row level security, etc.
SSRS Reports will be written based on these static denormalized tables and views.
We have a small nightly window for copying and transforming all 70 databases (3 hours).
Currently databases average about 10GB.
Options:
1. Transactional replication:
We would need to create 100+ static denormalized tables on each reporting database.
Doing this for all 70 databases almost reaches our nightly time limit.
As the databases grow we will exceed the time limit. We thought of mixing denormalized tables with views to speed up transformation. But then there would be some dynamic and some static data which is not a solution we can use.
Also with 70 databases using transactional replication we are concerned about bandwidth usage.
2. Snapshot replication:
Copy the entire database each night.
This means we could have a mixture of denormalized tables and views so the data transformation process is quicker.
But the snapshot is a full data copy, so as the DB grows, we will exceed our time limit for completing copy and transformation.
3. Log shipping:
In our nightly window, we could use the log shipping to update the reporting databases, then truncate and repopulate the denormalized tables and use some views.
However, I understand that with log shipping, extra tables and views cannot be added to the subscribing database.
4. Mirroring:
Mirroring is being deprecated, but also the DB is not active for reporting against until failover.
5. SQL Server 2012 AlwaysOn.
We don't have SQL Server 2012 yet, can this be configured to do an update once a day instead of realtime?
And can extra tables and views be created on the subscribing database (our reporting databases)?
6. Merge replication:
This is meant to be for combining multiple data sources into one database.
But is looks like it allows for a scheduled update (once per day) and only updates the subscriber DB with the latest changes rather than doing an entire snapshot.
It requires adding a rowversion column to every table but we could handle this. Also with this solution would additional tables be able to be created on the subscriber database without the update getting out of sync?
The final option is that we use SSIS to select only the data we need from the OLTP databases. I think this options creates more risk as we would have to handle inserts/updates/deletes to our denormalized tables, rather than just drop and recreate the denormalized tables daily.
Any help on our options would be greatly appreciated.
If I've made any incorrect assumptions, please say.
If it were me, I'd go with transactional replication that runs continuously and have views (possibly indexed) at the subscriber. This has the advantage of not having to wait for the data to come over since it's always coming over.

What is the DBA function/tool called for keeping many remote DBs synchronized with a main DB

I am a front-end developer being asked to fulfil some DBA tasks. Uncharted waters.
My client has 10 remote (off network) data collection terminals hosting a PostgreSQL application. My task is to take the .backup or .sql files those terminals generate and add them to the main DB. The schema for all of these DBs will match. But the merge operation will lead to many duplicates. I am looking for a tool that can add a backup file to an existing DB, filter out duplicates, and provide a report on the merge.
Is there a term for this kind of operation in the DBA domain?
Is this function normally built into basic DB admin suites (e.g. pgAdmin III), are enterprise-level tools required, or is this something that can be done on the command-line easily enough?
Update
PostgreSQL articles on DB replication here and glossary.
You can't "merge a bunch of tables" but you could use Slony to replicate child tables (i.e. one partition per location) back to a master db.
This is not an out of the box solution but with something like Bucardo or Slony it can be done, albeit with a fair bit of work and added maintenance.

Database replication. 2 servers, Master database and the 2nd is read-only

Say you have 2 database servers, one database is the 'master' database where all write operations are performed, it is treated as the 'real/original' database. The other server's database is to be a mirror copy of the master database (slave?), which will be used for read only operations for a certain part of the application.
How do you go about setting up a slave database that mirrors the data on the master database? From what I understand, the slave/readonly database is to use the master db's transaction log file to mirror the data correct?
What options do I have in terms of how often the slave db mirrors the data? (real time/every x minutes?).
What you want is called Transactional Replication in SQL Server 2005. It will replicate changes in near real time as the publisher (i.e. "master") database is updated.
Here is a pretty good walk through of how to set it up.
SQL Server 2008 has three different modes of replication.
Transactional for one way read only replication
Merge for two way replication
Snapshot
From what I understand, the slave/readonly database is to use the master db's transaction log file to mirror the data correct?
What options do I have in terms of how often the slave db mirrors the data? (real time/every x minutes?).
This sounds like you're talking about log shipping instead of replication. For what you're planning on doing though I'd agree with Jeremy McCollum and say do transactional replication. If you're going to do log shipping when the database is restored every x minutes the database won't be available.
Here's a good walkthrough of the difference between the two. Sad to say you have to sign up for an account to read it though. =/ http://www.sqlservercentral.com/articles/Replication/logshippingvsreplication/1399/
The answer to this will vary depending on the database server you are using to do this.
Edit: Sorry, maybe i need to learn to look at the tags and not just the question - i can see you tagged this as sqlserver.
Transactional replication is real time.
If you do not have any updates to be done on your database , what you need is just retrieving of data say once a day : then use snapshot replication instead of transactional replication. In snapshot replication, changes will replicate when and as defined by the user say once in 24 hrs.

Resources