insert data from different db server every second - sql-server

Primary DB have all the raw data every 10 minutes, but it only store for 1 week. I would like to keep all the raw data for 1 year in another DB, and it is different server. How can it possible?
I have created T-query to select the required data from Primary DB. How can it keep update the data from primary DB and insert to secondary DB accordingly? The table has Datetime, would it able to insert new data for latest datetime?
Notes: source data SQL 2012
secondary db SQL 2005

If you are on sql2008 or higher the merge command (ms docs) may be very useful in your actual update process. Be sure to you understand it.
You table containing the full year data sounds like it could be OLAP, so I refer to it that way occasionally (if you don't know what OLAP is, look it up sometime, but it does not matter to this answer)
If you are only updating 1 or 2 tables, log shipping replication and failover may not work well for you, especially since you are not replicating the table due to different retention policies if nothing else. So make sure you understand how replication, etc. work before you go down that path. If these tables are over perhaps 50% of the total database, log shipping style methods might still be your best method. They work well and handle downtime issues for you -- you just replicate the source database to the OLAP server and then update from the duplicate database into your OLAP database.
Doing an update this every second is an unusual requirement. However, if you create a linked server, you be able to insert your selected rows into a staging table on the remote sever and them update from them to your OLAP table(s). If you can reliably update your OLAP table(s) on the remote server in 1 second, you have a potentially useful method. If not, you may fall behind on posting data to your OLAP tables. If you can update once a minute, you may find you are much less likely to fall behind on the update cycle (at the cost of being slightly less current at all times).
You want to consider putting after triggers on the source table(s) that copies the changes to a staging table (still on the source database) into staging table(s) with an identity on this staging table along with a flag to indicate Insert, Update or Delete and you are well positioned to ship updates for one or a few tables instead of the whole database. You don't need to requery your source database repeatedly to determine what data needs to be transmitted, just select top 1000 from from your staging table(s) (order by the staging id) and move them to the remote staging table.
If your fall behind, a top 1000 loop keeps from trying to post to much data in any one cross server call.
Depending on your data, you may be able to optimize storage and reduce log churn by not copying all columns to your staging table, just the staging id and the primary key of the source table and pretend that whatever data is in the source record at the time you post it to the OLAP database accurately reflects the data at the time the record was staged. It won't be 100% accurate on your OLAP table at all times, but it will be accurate eventually.
Cannot over emphasize that you need to accommodate the downtime in your design -- unless you can live with data loss or just wrong data. Even reliable connections are not 100% reliable.

Related

Is there a way I can fast load data with SSIS?

I'm moving data from ODBC to OLE Destination, records get inserted everyday on the ODBC in different tables. The packages gets slower and slower it take about a day for million records sometimes more. The tables can have new data inserted or new updated data and the loading and looking up of new data slows the processs. Is the anyway i can fast track the ETL process or is there any open source platform i can use to load the data faster
Tried to count the number of rows in the OLE Destination to check and only insert new records that are greater than the ones in the ODBC Source, but to my surprise the ROW_NUMBER() function isn't supported in Openedge ODBC
Based on the limited information in your question, I'd design your packages like the following
SEQC PG to SQL
The point of these operations is to transfer data from our source system verbatim to the target. The target table should be brand new and the SQL Server equivalent of the PG table from a data type perspective. Clustered Key if one exists, otherwise, see how a heap performs. I am going to reference this as a staging table.
The Data Flow itself is going to be bang simple
By default, the destination will perform a fast load and lock the table.
Run the package and observe times.
Edit the OLE DB Destination and change the Maximum Commit Size to something less than 2147483647. Try 100000 - is it better, worse? Move up/down an order of magnitude until you have an idea of what it looks like will be the fastest the package can move data.
There are a ton of variables at this stage of the game - how busy is the source PG database, what are the data types involved, how far does the data need to travel from the Source, to your computer, to the Destination but this can at least help you understand "can I pull (insert large number here) rows from the source system within the expected tolerance" If you can get the data moved from PG to SQL within the expected SLA and you still have processing time left, then move on to the next section.
Otherwise, you have to rethink your strategy for what data gets brought over. Maybe there's reliable (system generated) insert/update times associated to the rows. Maybe it's a financial-like system where rows aren't updated, just new versions of the row are insert and the net values are all that matters. Too many possibilities here but you'll likely need to find a Subject Matter Expert on the system - someone who knows the logical business process the database models as well as how the data is stored in the database. Buy that person some tasty snacks because they are worth their weight in gold.
Now what?
At this point, we have transferred the data from PG to SQL Server and we need to figure out what to do with it. 4 possibilities exist
The data is brand new. We need to add the row into the target table
The data is unchanged. Do nothing
The data exists but is different. We need to change the existing row in the target table
There is data in the target table that isn't in the staging table. We're not going to do anything about this case either.
Adding data, inserts, are easy and can be fast - it depends on table design.
Changing data, updates, are less easy in SSIS and are slower than adding new rows. Slower because behind the scenes, the database will delete and add the row back in.
Non-Clustered indexes are also potential bottlenecks here, but they can also be beneficial. Welcome to the world of "it depends"
Option 1 is to just write the SQL statements to handle the insert and update. Yes, you have a lovely GUI tool for creating data flows but you need speed and this is how you get it (especially since we've already moved all the data from the external system to a central repository)
Option 2 is to use a Data Flow and potentially an Execute SQL Task to move the data. The idea being, the Data Flow will segment your data into New which will use an OLE DB Destination to write the inserts. The updates - it depends on volume what makes the most sense from an efficiency perspective. If it's tens, hundreds, thousands of rows to update, eh take the performance penalty and use an OLE DB Command to update the row. Maybe it's hundreds of thousands and the package runs good enough, then keep it.
Otherwise, route your changed rows to yet another staging table and then do a mass update from the staged updates to the target table. But at this point, you just wrote half the query you needed for the first option so just write the Insert and be done (and speed up performance because now everything is just SQL Engine "stuff")
You might want to investigate Progress' Change Data Capture feature. If you have a modern release of OpenEdge (11.7 or better) and the proper licenses you can enable CDC policies to track changes. Your ETL process could then use that information to target its efforts.
Warning: it's complicated. There is a lot more to actually doing it than marketing would have you believe. But if your use-case is straight-forward it might not be too terrible.
Or you could implement Progress "Pro2" product to do all the dirty work for you. (That's an extra cost option.)

SQL Server - Rolling back particular transaction only at a later date

I have SQL Server 2014, standard edition. We have several tables where we delete data from, then re-insert it under different primary keys (to merge records for two people in our system that are actually the same). All these changes are performed with a T-SQL transaction.
I understand how transactions and rollbacks work, but what I need is more of an audit/rollback since my users may need to rollback just this transaction only at a later date (not restoring the whole database or table). "Change Data Capture" is not an option since I only have standard edition.
My real question lies in how to store this auditing information. I imagine I'll need a unique key to keep track of this being one unit of work so all these table changes get tied to same group as far as the user is concerned. But if I have a DELETE WHERE ID = #ID query for example, how do I store all these deleted records before deleting so that I can re-insert them later if needed? I'm fine with even storing a large rollback T-SQL script of some kind, I'm just not sure how to generate INSERT scripts that I can store and run later for data that I'm about to delete.
I'm open to any ideas, I just need an architecture that's generic enough to handle multiple tables and the ability to rollback deletions and insertions. I care more about the rollback ability than keeping a pretty audit table.
You can not do that out of the box as even with full logging you can roll back an entire database to a point in time but not specific transactions.
You will have to code something for un-doing transactions but I believe simple audit triggers will give you the data you need to make it happen. Here is a good article to get you started.
https://www.mssqltips.com/sqlservertip/4055/create-a-simple-sql-server-trigger-to-build-an-audit-trail/

Recording all Sql Server Inserts and Updates

How can I record all the Inserts and Updates being performed on a database (MS SQL Server 2005 and above)?
Basically I want a table in which I can record all the inserts andupdates issues on my database.
Triggers will be tough to manage because there are 100s of tables and growing.
Thanks
Bullish
We have hundreds of tables and growing and use triggers. In newer versions of SQL server you can use change Data Capture or Change Tracking but we have not found them adequate for auditing.
What we have is are two separate audit tables for each table (one for recording the details of the instance (1 row even if you updated a million records) and one for recording the actual old and new values), but each has the same structure and is created by running a dynamic SQL proc that looks for unauditied tables and creates the audit triggers. This proc is run every time we deploy.
Then you should also take the time to write a proc to pull the data back out of the audit tables if you want to restore the old values. This can be tricky to write on the fly with this structure, so it is best to have it handy before you have the CEO peering down your neck while you restore the 50,000 users accidentally deleted.
As of SQL Server 2008 and above you have change data capture.
Triggers, although unwieldy and a maintenance nightmare, will do the job on versions prior to 2008.

Over Web Service, Update A Table From Another Same Table Which Is In Different Location

I have two different database.
One of them, original database and another one is cache database.
This databases are in different location.
Ones a day, I must update cache database from original database.
And I must this update progress with a Web Service which is working on Original Database machine.
I can it with clear all Cache DB Tables and Insert Original Datas in every progress.
But I think is a Bad scenario.
So how can I this update progress with efficiency.
And have you any suggestion.
I'm pretty sure that there are DB syncing technologies out there, but since you already have the requirement, I'd recommend to use a change-log.
So, you'll have a "CHANGE_LOG" table, to which you insert rows whenever you do "writes" on your tables (INSERT,UPDATE,DELETE). Once a day, you can apply these changes one-by-one to the cache DB.
Deleting the change-log once it's applied is okay, but you can also confer "version" to the DBs. So each change to the DB will increment the version number. That can be used to manage more than one chache DBs.
To provide additional assurance for example, you can have a trigger in the cache DB that increment their own version numbers. That way, your process can inquire a cache DB and will know what changes must be applied, without maintaining that in the master DB (that way, hooking up a new cache DB, bringing up a crashed cache DB up to date is easy, too.).
Note that you probably need to purge the change log from time to time.
The way I see it you're going to have to grab all the data from the source database, as you don't seem to have any way of interrogating it to see what data has changed. A simple way to do it would be to copy all the data from the source database into temporary or staging tables in the cache database. Then you can do a diff between both sets of tables and update the records that have changed. Or once you have all the data in the staging tables drop/rename the existing tables and rename the staging tables to the existing table names.

How reliable is SQL server replication?

We have a database on SQL Server 2000 which should be truncated from time to time. It looks like the easiest solution would be to create a duplicate database and copy the primary database there. Then the primary database may be safely truncated by specially tailored stored procedures.
One way replication would guarantee that the backup database contains all updates from the primary one.
We plan to use backup database for reporting and primary for operative data.
Primary database will be truncated at night once in 2 days.
Database is several gigabytes. Only several tables are quite large (1-2 mln rows)
What are possible pitfalls? How reliable would such a solution be? Will it slow down the primary database?
Update: Variant with DTS for doing copying sounds good but has own disadvantages. It requires quite robust script which would run for about an hour to copy updated rows. There is also issue with integrity constraints in primary database which would make truncating it non-trivial task. Because of this replication cold straighten things up considerably.
It is also possible but not quite good variant to use union VIEW because system which woks mostly in unattended mode whiteout dedicated support personnel. It is related issue but not technical though.
While replication is usually robust, there are times where it can break and require a refresh. Managing and maintaining replication can become complicated. Once the primary database is truncated, you'll have to make sure that action is not replicated. You may also need an improved system of row identification as after you've truncated the primary database tables a couple of times, you'll still have a complete history in your secondary database.
There is a performance hit on the publisher (primary) as extra threads have to run to read the transaction log. Unless you're under heavy load at the moment, you likely won't notice this effect. Transaction log management can become more important also.
Instead, I'd look at a different solution for your problem. For example, before truncating, you can take a backup of the database, and restore it as a new database name. You then have a copy of the database as it was before the truncation, and you can query both at once using three-part names.
You've mentioned that the purpose of the secondary data is to keep report off. In this case you can create a view like SELECT * FROM Primary.dbo.Table UNION ALL SELECT * FROM SecondaryDBJune2008.dbo.Table UNION ALL SELECT * FROM SecondaryDBOctober2008.dbo.Table. You wouild then need to keep this view up to date whenever you perform a truncate.
The other alternative would be to take a snapshot of the current data before truncation and insert it into a single reporting database. Then you'd just have the Primary and the Historical databases - no need to modify views once they're created.
How much data are we talking about in GB?
As you're planning to perform the truncation once every two days, I'd recommend the second alternative, snapshotting the data before truncation into a single Historical database. This can be easily done with a SQL Agent job, without having to worry about replication keeping the two sets of data in synch.
I would not use replication for this. We have a fairly complex replication setup running with 80+ branches replicating a few tables to one central database. When connectivity goes down for a few days, the data management issues are hair raising.
If you want to archive older data, rather use DTS. You can then build the copying and truncation/deletion of data into the same DTS package, setting it so that the deletion only happens if the copy was successful.

Resources