Question
Is there any generic (reusable between multiple tables) and easy to implement/understand way to capture only updates at SQL Server table?
Objective
I have SQL Server 2012 SP4 transactional replica. I'm reading (loading) multiple 10GB+ tables on daily basis from SQL Server to my cloud storage.
It appears those 10GB+ tables aren't 100% immutable: 98-99% of operations are inserts, BUT 1-2% - updates.
Mitigations
Full table load
It requires more resources at Nifi side, but the implementation is very
simple and straighforward
Incremental load by ID column plus capture last day updates with triggers
Performance-wise but it is more complex: it requires configuring triggers on SQL Server side plus reducing multiple versions of row (updates) on ETL layer.
Generate MODIFIEDAT column during replication for all tables
It would be greate, but I think very hard to impossible: we don't have right to modify master database.
CDC looks like not an option because 98-99% of operations are inserts, means CDC will duplicate tables/database volume almost twice. AFAIK, from SQL Server CDC doc and this post it looks like SQL Server CDC can't be configured to capture updates only.
Related
We have an Oracle OLTP system and a SQL Server reporting solution. We run nightly stored procedures to extract the data using linked server but it it very slow as it is transferring millions of records. It would be great to have some transactional replication to get this data into our SQL Server environment in a near real time manner for reporting purposes. Has anyone tried this? Without buying an expensive piece of software what would be the best bet?
We've looked into Apache Kafka to capture the changes and apply them to another database but it seems like a high maintenance approach.
If we switched on CDC on the Oracle side could we write a stored procedure to query the CDC table and make the changes?
Is there a simpler approach that we've overlooked?
We only have read access to the Oracle server but it is managed by a third party that could set up something on our behalf.
I have 2 servers. I need to copy some columns from 4 different tables from server 1 into the corresponding (empty) tables in server 2.
So basically, it's about replicating data from one table to another. How is this done best (and easiest)? Also, how do I make sure that the copied/replicated data is updated at the same frequency as the source (which runs completely fine and automatically)?
I want to avoid using Linked Server.
How is this done best (and easiest)?
For a one time replication consider a SQL Server Import and Export Wizard. This approach can also be scheduled by saving a final package and schedule it by SQL Agent
Example: Simple way to import data into SQL Server
For a continuous, low latency data syncronization - SQL Server Transactional Replication.
Further read: Tutorial: Configure replication between two fully connected servers (transactional)
Worth to mention, that transactional replication is not the easiest topic, however, it fits quite good to a requirement.
I have a local SQL Server database that I copy large amounts of data from and into a remote SQL Server database. Local version is 2008 and remote version is 2012.
The remote DB has transactional replication set-up to one local DB and another remote DB. This all works perfectly.
I have created an SSIS package that empties the destination tables (the remote DB) and then uses a Data Flow object to add the data from the source. For flexibility, I have each table in it's own Sequence Container (this allows me to run one or many tables at a time). The data flow settings are set to Keep Identity.
Currently, prior to running the SSIS package, I drop the replication settings and then run the package. Once the package completes, I then re-create the replication settings and reinitialise the subscribers.
I do it this way (deleting the replication and then re-creating) for fear of overloading the server with replication commands. Although most tables are between 10s and 1000s of rows, a couple of them are in excess of 35 million.
Is there a recommended way of emptying and re-loading the data of a large replicated database?
I don't want to replicate my local DB to the remote DB as that would not always be appropriate and doing a back and restore of the local DB would also not work due to the nature of the more complex permissions, etc. on the remote DB.
It's not the end of the world to drop and re-create the replication settings each time as I have it all scripted. I'm just sure that there must be a recommended way of managing this...
Not doing it. Empty / Reload is bad. Try to update the table via merge - this way you can avoid the drop and recreate, which also will result in 2 replicated operations. Load the new data into temp tables on the other server (not replicated), then merge them into the replicated tables. If a lot of data is unchanged, this will seriously reduce the replication load.
We have 70+ SQL Server 2008 databases that need to be copied from an OLTP environment to a separate reporting server. Once the DB's are copied, we will do some partial data transformation: de-normalization, row level security, etc.
SSRS Reports will be written based on these static denormalized tables and views.
We have a small nightly window for copying and transforming all 70 databases (3 hours).
Currently databases average about 10GB.
Options:
1. Transactional replication:
We would need to create 100+ static denormalized tables on each reporting database.
Doing this for all 70 databases almost reaches our nightly time limit.
As the databases grow we will exceed the time limit. We thought of mixing denormalized tables with views to speed up transformation. But then there would be some dynamic and some static data which is not a solution we can use.
Also with 70 databases using transactional replication we are concerned about bandwidth usage.
2. Snapshot replication:
Copy the entire database each night.
This means we could have a mixture of denormalized tables and views so the data transformation process is quicker.
But the snapshot is a full data copy, so as the DB grows, we will exceed our time limit for completing copy and transformation.
3. Log shipping:
In our nightly window, we could use the log shipping to update the reporting databases, then truncate and repopulate the denormalized tables and use some views.
However, I understand that with log shipping, extra tables and views cannot be added to the subscribing database.
4. Mirroring:
Mirroring is being deprecated, but also the DB is not active for reporting against until failover.
5. SQL Server 2012 AlwaysOn.
We don't have SQL Server 2012 yet, can this be configured to do an update once a day instead of realtime?
And can extra tables and views be created on the subscribing database (our reporting databases)?
6. Merge replication:
This is meant to be for combining multiple data sources into one database.
But is looks like it allows for a scheduled update (once per day) and only updates the subscriber DB with the latest changes rather than doing an entire snapshot.
It requires adding a rowversion column to every table but we could handle this. Also with this solution would additional tables be able to be created on the subscriber database without the update getting out of sync?
The final option is that we use SSIS to select only the data we need from the OLTP databases. I think this options creates more risk as we would have to handle inserts/updates/deletes to our denormalized tables, rather than just drop and recreate the denormalized tables daily.
Any help on our options would be greatly appreciated.
If I've made any incorrect assumptions, please say.
If it were me, I'd go with transactional replication that runs continuously and have views (possibly indexed) at the subscriber. This has the advantage of not having to wait for the data to come over since it's always coming over.
How can I record all the Inserts and Updates being performed on a database (MS SQL Server 2005 and above)?
Basically I want a table in which I can record all the inserts andupdates issues on my database.
Triggers will be tough to manage because there are 100s of tables and growing.
Thanks
Bullish
We have hundreds of tables and growing and use triggers. In newer versions of SQL server you can use change Data Capture or Change Tracking but we have not found them adequate for auditing.
What we have is are two separate audit tables for each table (one for recording the details of the instance (1 row even if you updated a million records) and one for recording the actual old and new values), but each has the same structure and is created by running a dynamic SQL proc that looks for unauditied tables and creates the audit triggers. This proc is run every time we deploy.
Then you should also take the time to write a proc to pull the data back out of the audit tables if you want to restore the old values. This can be tricky to write on the fly with this structure, so it is best to have it handy before you have the CEO peering down your neck while you restore the 50,000 users accidentally deleted.
As of SQL Server 2008 and above you have change data capture.
Triggers, although unwieldy and a maintenance nightmare, will do the job on versions prior to 2008.