Configuring SQL Server to Oracle initial data load in Goldengate - sql-server

As per my understanding before setting up transaction replication in Oracle Goldengate, we have to setup initial data load. In my case the source is SQL Server 2012 and the destination is Oracle 12 and both are residing in the same system. Now my questions are
1. What is the best way to setup the initial load? I meant to use some SQL Server utility such as SSIS or use Goldengate's "Direct Bulk load" feature?
2. Though my source DB and destination DB are residing on the same machine, do I still have to use two installations (one for source and other for destination) of the Goldengate for transaction replication?

I used GG direct load for MSSQL initial load; the database was huge and it went fine. The downside of it is that if a failure occurs, then you'll need to truncate the target table ans start the load from the beginning. As for multiple installations, in one environment I have both target and source Oracle databases running on the same machine and using the same installation, so I think you'll be fine with just one.
Look at the link it could be beneficial
http://www.ateam-oracle.com/oracle-goldengate-heterogeneous-database-initial-load-using-oracle-goldengate/

Related

Optimizing OLE DB Destination for Fast load from Oracle to SQL Server for SSIS

I'm working with a SSIS package for importing from an Oracle Table to an SQL Server Table. for this in between I had to put a data conversion.
the OLE DB Source is retrieving the complete Table, then being converted by the data conversion and then sent to the OLE DB Destination with current setup
now, the table I'm trying to import has around 7.3 Million records with 53 columns.
I need to know how can I setup (or what changes should do to current setup) to speed up as much as possible this process.
This package is going to run scheduled as a job in the SQL server agent.
In the last run inserted 78k records in 15 minutes. at this pace is too slow.
I believe I have to tune setting with the "rows per batch" and "maximum insert commit size" but looking around I haven't found information about what settings should work, and I've tried different settings here, not finding actual difference between them.
UPDATE: After a bit more test, the delay is from getting records from Oracle, not to insert them into SQL server. I need to check on how can I improve this
I think that the main problem is not loading data into SQL Server, check the OLE DB provider you are using to extract data from Oracle.
There are many suggestions you can go with:
Use Attunity connectors which are the fastest one available
Make sure you are not using the old Microsoft OLEDB Provider for Oracle (part of MDAC). Use the Oracle Provider for OLEDB (part of ODAC) instead
If it didn't work, try using an ODBC connection / ODBC Source to read data from Oracle

SQL Server Destination vs OLE DB Destination

I was using OLE Db destination for Bulk import of multiple Flat Files. After some tuning I ended up with SQL Server Destination to be 25 - 50 % faster.
Though I am confused about this destination as there are contradictory information on the web, some are against it, some are suggesting using it. I would like to know, are there any serious pitfalls before I deploy it to production? Thanks
In this answer, I will try to provide information from official documentation of SSIS and I will mention my personal experience with SQL Server destination.
1. SQL Server Destination
According to the official SQL Server Destination documentation:
The SQL Server destination connects to a local SQL Server database and bulk loads data into SQL Server tables and views. You cannot use the SQL Server destination in packages that access a SQL Server database on a remote server. Instead, the packages should use the OLE DB destination.
The SQL Server destination offers the same high-speed insertion of data into SQL Server that the Bulk Insert task provides; however, by using the SQL Server destination, a package can apply transformations to column data before the data is loaded into SQL Server.
For loading data into SQL Server, you should consider using the SQL Server destination instead of the OLE DB destination
2. OLEDB Destination
According to the official OLEDB Destination documentation:
OLEDB Destination - fast load option: Load data into a table or view in the OLE DB destination and use the fast load option, which are optimized for bulk inserts
3. OLEDB Destination vs SQL Server Destination
According to SQL Server Destination Vs OLE DB Destination - MSDN topic:
Donald Farmer, the former Group Program Manager for Integration Services said that you can get a 5 to 10% increase in performance using the SQL Server Destination.
In addition, refering to the following post of Matt Masson a data integration specialist at Microsoft where he answered the following question:
Should I use the SQL Server Destination?
The Answer was
No
...
My recommendation is that if you need every bit of performance (a 10% perf increase on a 10 hour load can be significant), try out the SQL Server Destination to see how it works for you. However – keep in mind the following limitations of the SQL Server Destination:
You must have SSIS running on the same machine as the destination database
You must run the package as an administrator
It is very difficult to debug when things go wrong
Given these limitations, I recommend using the OLE DB Destination even if you are seeing a performance increase with the SQL Server Destination.
3.1. The Data Loading Performance Guide
(Update # 2019-03-25)
While searching on SSIS best practices i found a very helpful Microsoft artcile that can be used as a reference:
The Data Loading Performance Guide
In this article they made a comparison between all data loads methods including SQL Server destination and OLEDB destination, they mentioned that:
SQL Server Destination The SQL Server destination is the fastest way to bulk load data from an Integration Services data flow to SQL Server. This destination supports all the bulk load options of SQL Server – except ROWS_PER_BATCH.
Be aware that this destination requires shared memory connections to SQL Server. This means that it can only be used when Integration Services is running on the same physical computer as SQL Server.
OLE DB Destination: The OLE DB destination supports all of the bulk load options for SQL Server. However, to support ordered bulk load, some additional configuration is required. For more information, see “Sorted Input Data”. To use the bulk API, you have to configure this destination for “fast load”.
The OLE DB destination can use both TCP/IP and named pipes connections to SQL Server. This means that the OLE DB destination, unlike the SQL Server destination, can be run on a computer other than the bulk load target. Because Integration Services packages that use the OLE DB destination do not need to run on the SQL Server computer itself, you can scale out the ETL flow with workhorse servers.
3.2. Personal experience
(Update # 2019-03-25)
Since this question is used as a reference by many, and after being more experienced in this domain, i added this section to mention my personal experience using SQL Server destination.
While official documentation mentioned that SQL Server destination will increase performance, i don't recommend at all using this components due to many reasons:
It requires that destination server and the ETL server are the same (works only with Local SQL server)
It always throw exception that don't have any meaning
After testing on huge volume of data the performance difference with OLEDB destination is negligible (tested on about 500 GB data loaded in chunks and the time difference is less than one minute)
You can also refer to the following post (from #billinkc) to get more information about this topic:
Should SSIS packages and SQL database be on same server?
4. Conclusion
Based on Microsoft articles, you can say that SQL Server Destination increase the performance of inserting data (it uses BULK insert), but it is designed for a specific case which is the Local SQL server. OLEDB Destination is more general and recommended in the other cases and by using the Fast Load data access mode (which uses also BULK insert) on the OLE DB destination it will increase the performance of data load.
On the other hand, based on my experience and from many articles written by SSIS experts, it is not recommended at all to use SQL Server Destination since it is not stable and it often throws exception and the performance can be considered as negligible.
Additional Information
Recently, I published a detailed article about this topic. You can check it at:
SSIS OLE DB Destination Vs SQL Server Destination
To augment Hadi's fine answer, don't use the SQL Server Destination.
In my experience, the performance benefit does not outweigh the restriction that the package must be executed on the same machine as the destination database. It forces a processing architecture that may or may not be right for you today or a year from now. It's just too inflexible for my tastes.
The other, bigger reason I advocate avoiding the SQL Server Destination is the flat out bugginess I've experienced with it. Same flat file to an empty table- round 1, it aborts with a vague error message (can't recall specific) that something went wrong. Immediately restart the package and it works as expected.
Maybe you, most humble reader, can accept that trade off in processing time for the reprocessing time but for me, it's not been worth it since probably 2008.

Fastest way to copy large amounts of data from Oracle to SQL Server

I need to copy large amounts of data from an Oracle database to a SQL Server database. What is the fastest way to do this?
I am looking at data that takes 60 - 70 gig of storage in Oracle. There are no particular restrictions on the method that I use. I can use the SQL Server Management Studio, or the SQL Serer import/export program, or a .NET app, or the developer interface in Oracle, or third party tools, or ----. I just need to move the data as quickly as possible.
The data is geographically organized. The data for each state comes is updated separately into the Oracle database and can be moved over to SQL Server on its own. So the entire volume of the data will rarely be all moved over at once.
So what suggestions would people have?
The fastest way to insert large amounts of data into SQL Server is with SQL Server bulk insert. Common bulk insert techniques are:
T-SQL BULK INSERT statement
BCP command-line utility
SSIS package OLE DB destination with the fast load option
ODBC bcp API from unmanaged code
OLE DB IRowsetFastLoad from unmanaged code
SqlBulkCopy from a .NET application
T-SQL BULK INSERT and the command-line BCP utility use a flat file source so the implication is that you'll need to first export data to files. The other methods can use Oracle SELECT query results directly without the need for an intermediate file, which should perform better overall as long as source/destination network bandwidth and latency isn't a concern.
With SSIS, one would typically create a data flow task for each table to be copied with a OLE DB source (Oracle) and OLE DB destination (SQL Server). The Oracle source provider can be downloaded separately depending on the SSIS version. The latest is the Microsoft Connector v4.0 for Oracle. The SSMS import wizard can be used to generate an SSIS package for the task, which may be run immediately and/or saved and customized as desired. For example, you could create a package variable for the state to be copied and use that in the source SELECT query and in a target DELETE query prior to refreshing data. That would allow the same package to be reused for any state.
OLE DB IRowSetFastLoad or ODBC bcp calls should perform similarly to SSIS but you might be able to eek out some additional performance gains with a lot of attention to detail. However, using these APIs is not trivial unless you are already familiar with C++ and the APIs.
SqlBulkCopy is fast (generally millions of rows per minute), which is good enough performance for most applications without the additional complexity of unmanaged code. It will be best to use the Oracle managed provider for the source SELECT query rather than ODBC or OLE DB provider in .NET code.
My recommendation is you consider not only performance but also your existing skillset.
I actually used the "Microsoft SQL Server Migration Assistant (SSMA)" from MS once for this and it actually did what it promised to do:
SQL Server Migration Assistant for
Oracle
(documentation)
Microsoft SQL Server Migration Assistant v6.0 for
Oracle
(download)
SQL Server Migration Assistant (SSMA) Team's
Blog
However in my case it was not as fast as I would have expected for a 80 GB Oracle-DB (4 hours or something) and I had to do some manual steps afterwards, but the application was developed in hell anyway (one table had 90+ columns and 100+ indices).

reading oracle .arc files

we have an Oracle DB that cannot take up any additional insert/update load. Is it possible to extract such commands from the .arc files and update another non-oracle DB so that I can run reports off the new DB? Once that is done, I can reduce the load of all queries and reports from the main DB!
I understand that it is these very .arc files that are used for replicating to another oracle DB and that is what I want to do - except that the target DB is not oracle.

Compare millions of records from Oracle to SQL server

I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.
I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.
Which is best approach for this?
Create SSIS package.
Create Windows service.
I want to consume less resource to achieve this functionality which takes less time and less resource.
Eg : 18 millions records in oracle and 16/17 millions in SQL Server
This situation of two different database arise because two different application online and offline
EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to
1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.
2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.
3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.
What do you think of this approach ?
I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.
Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.
Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.
This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.
SSIS also has the advantage that it has lots of other transformation functions available if you need them later.
What you basically want to do is replication from Oracle to SQL Server.
You could do this in SSIS, A windows Service or indeed a multitude of platforms.
The real trick is using the correct design pattern.
There are two general design patterns
Snapshot Replication
You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)
You are comparing 18 million records here so this is a lot of work
Differential replication
You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)
You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.
Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx
You're going to have to try a few of these and see what works for you.
Given this objective:
I want to consume less resource to achieve this functionality which takes less time and less resource
transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?
Other alternatives:
If you can use Oracle gateway for SQL Server then you do not need to transfer data and can make the query directly.
If you can't use Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Is easy to use.
I think the best approach is using oracle gateway.Just follow the steps. I have similar type of experience.
Install and Configure Oracle Database Gateway for SQL Server.
https://docs.oracle.com/cd/B28359_01/gateways.111/b31042/installsql.htm
Now you can create a dblink from oracle to sql server.
Create a procedure which compare the missing records in oracle database and insert into sql server database.
For example, you can use this statement inside your procedure.
INSERT INTO "dbo"."sql_server_table"#dblink_name("column1","column2"...."column5")
VALUES
(
select column1,column2....column5 from oracle_table
minus
select "column1","column2"...."column5" from "dbo"."sql_server_table"#dblink_name
)
Create a scheduler which execute the procedure daily.
When both databases are online, missing records will be inserted to sql server. Otherwise the scheduler fail or you can execute the procedure manually.
It takes minimum resource.
I will suggest having a homemade ETL solution.
Schedule an oracle job to export source table data (on a daily
manner based on the application logic ) to plain CSV format.
Schedule a SQL-Server job (with acceptable delay from first oracle job) to read this CSV file and import it
to a medium table inside sql-servter using BULK INSERT.
Last part of the SQL-Server job will be reading medium table data
and do the logic(insert, update target table). I suggest having another table to store reports of this daily job result.

Resources