SSIS Transferring Data to an Oracle DB is Extremely Slow - sql-server

We are transferring data to an Oracle Database from two different sources and it's extremely slow.
Please see notes and images below. Any suggestions?
Notes:
We're using the Microsoft OLE DB Provider for Oracle.
One data source is SQL Server and includes about 5M records.
The second data source is Oracle and includes about 700M records.
When trying to transfer the SQL Server data, we broke it up into
five "Data Flow Tasks" in the "Control Flow". Each "Data Flow Task"
in turn use an "OLE DB Source" which internally uses a "SQL command"
that effectively selects 1M of the 5M records. When we run this
package it ran the first data flow task for about 3 hours and only
transferred about 50,000 records until we ended the process.
We had similar experience with the Oracle data as well.
For some reason saving to a Oracle Destination is extremely slow.
Interestingly, we once transfer the same 700M records from Oracle to
SQL Server (so the opposite direction) and it worked as expected in
about 4.5 to 5 hours.
Images:

On the Oracle side you can examine v$session to see where the time is being spent (if AWR is licensed on the Oracle instance you can use DBA_HIST_ACTIVE_SESS_HISTORY or v$active_session_history).
I work on Oracle performance problems every day (over 300 production Oracle instances), so I feel qualified to say that I can't give you a specific answer to your question, but I can point you in the right direction.
Typical process mistakes that make inserts slow:
not using array insert
connecting to the DB for each insert (sound strange? believe me
I've seen DataStage and other ETL tools set-up this way)
app server/client not on same local area network as the Oracle instance
indexes on table(s) being inserted into (especially problematic with
bit mapped indexes); requires index update and table update per
statement
redo log files too small on Oracle instance (driving up
redo log file switching)
log_buffer parameter on DB side too small
not enough db writers (see db_writer_processes initialization
parameter)
committing too often

Not an answer, just a bunch of observations and questions...
Any one of the components in the data pipeline could be the bottleneck.
You first need to observe the row counts when running interactively in SSIS and see if there is any obvious clogging going on - i.e. do you have a large rowcount right before your Data conversion transformation and a low one after? Or is it at the Oracle destination? Or is it just taking a long time to come out of SQL? A quick way to check the SQL side is to dump it to a local file instead - that mostly measures the SQL select performance without any blocking from Oracle.
When you run your source query in SQL Server, how long does it take to return all rows?
Your data conversion transformation can be performed in the source query. Every transformation requires set up of buffers, memory etc. and can slow down and block your dataflow. Avoid these and do it in the source query instead
Various buffers and config that exists in Oracle driver. Already addressed in detail by #RogerCornejo. For read performance out of Oracle, I have found altering FetchBufferSize made a huge difference, but you are doing writes here so that's not the case.
Lastly, where are the two database servers and the SSIS client tool situated network wise? If you are running this across three different servers then you have network throughput to consider.
If you use a linked server as suggested, note that SSIS doesn't do any processing at all so you take that whole piece out of the equation
And if you're just looking for the fastest way to transfer data, you might find that dumping to a file and bulk inserting is the fastest

Thank you all for your suggestions. For those who may run into a similar problem in the future, I'm posting what finally worked for me. The answer was ... switching the provider. The ODBC or Attunity providers were much faster, by a factor of almost 800X.
Remember that my goal was to move data from a SQL Server Database to an Oracle database. I originally used an OLE DB provider for both the source and destination. This provider works fine if you are moving data from SQL Server to SQL Server because it allows you to use the "Fast Load" option on the destination which in turn allows you to use batch processing.
However, the OLE DB provider doesn't allow the "Fast Load" option with an Oracle DB as the destination (couldn't get it to work and read elsewhere that it doesn't work). Because I couldn't use the "Fast Load" option I couldn't batch and instead was inserting records row by row which was extremely slow.
A colleague suggested trying ODBC and others suggested trying Microsoft's Attunity Connectors for Oracle. I didn't think the difference would be so great because in my experience ODBC had similar (and sometimes less) performance than OLE DB (hadn't tried Attunity). BUT... that was when moving data from and to a SQL Server database or staying in the Microsoft world.
When moving data from a SQL Server database to an Oracle database, there was a huge difference! Both ODBC and Attunity out performed OLE DB dramatically.
Here were my summarized performance test results inserting 5.4M records from a SQL Server database to an Oracle Database.
When doing all the work on one local computer.
OLE DB source and destination inserted 12 thousand records per minute which would have taken approx. 7 hours to complete.
ODBC source and destination inserted 9 Million records per minute which only took approx. 30 seconds to complete.
When moving data from one network/remote computer to another network/remote computer.
OLE DB source and destination inserted 115 records per minute which would have taken approx. 32 days to complete.
ODBC source and destination inserted 1 Million records per minute which only took approx. 5 minutes to complete.
Big difference!
Now why when working locally it only took 30 seconds and remotely it took 5 minutes is another issue for another day, but for now I have something workable (it should be slower on the network, but surprised it's that much slower).
Thanks again to everyone!
Extra notes:
My OLE DB results were similar with either Microsoft's or Oracle OLE DB providers for Oracle databases.
Attunity was a little faster than ODBC. I didn't get to test on remote servers or on larger data set, but locally it was a consitently about 2 to 3 seconds faster than ODBC. Those seconds could add up on a large data set so take note.

Related

Alteryx - bulk copy from SQL Server to Greenplum - need tips to increase performance

Need advise here: using Alteryx Designer, I'm pulling a large dataset from SQL Server (10M rows) and need to move into Greenplum DB
I tried both with connecting using Input Data (SQL Server) and Output Data (GP) and also Connect In-DB (SQL Server) and Write Data In-DB (GP)
Any approach is taking a life to complete at the point that i have to cancel the process (to give an idea, over the weekend it ran for 18hours and advanced no further than 1%)
Any good advice or trick to speed up these sort of massive bulk data loading would be very very highly appreciated!
I can control or do modifications on SQL Server and Alteryx to increase performance but not in Greenplum
Thanks in advance.
Regards,
Erick
I'll break down the approaches that you're taking.
You won't be able to use IN-DB tools as the Databases are different, hence you can't push the processing on to the DB...
Using the standard Alteryx Tools, you are bringing the whole table on to your machine and then pushing it out again, there are multiple ways that this could be done depending on where your blockage is.
Looking first at the extract from SQL, 10M rows isn't that much and so you could split the process and write it as a yxdb. If that fails or takes several hours, then you will need to look at the connection to the SQL Server or the resources available on the SQL Server.
Then for the push into Greenplum, there is no PostgreS bulk loader at present and so you can either just try and write the whole table, Or you can write segments of the table into temp tables in Greenplum and then execute a command to combine those tables.
We are pulling millions of rows daily from SQL servers to Greenplum and we use open source tool called Outsourcer. it's great tool and take care of cleansing and other.. We are using this tool for past 3.5 yrs and no issue till now.. It take care of all parallelism and millions of rows are loaded within minutes.
It support incremental or full load. If you need supports Jon Robert owner of the Outsourcers will response to your email within minutes. Here is the link for the tool
https://www.pivotalguru.com/

Speeding up SQL transfer over a network

I have two SQL Server environments, data warehouse which collects data and a datamart which people access for a subset of the data, each with their own SQL Server 2016 databases. I run a script which pulls out data, transforms it and transfer it from the data warehouse to the datamart using Linked Servers. The entire process takes around 60+ hours to run. I want to avoid at all costs at having the data warehouse data in the datamart.
I experimented to see why the whole process was taking so long. I did a backup of the data warehouse, restored it onto the datamart and ran the import script and the entire process took around 3 hours to run. The script itself to 1.5 hours, telling me of the 60+ hours its the linked server transfer of data between the two servers that is the slowest part. I've pretty much ruled out network speed or issues between the two servers; this is all SQL. I'm trying to avoid having to write an application to do all of this in .NET if I can keep it in SQL Server.
Does anyone have any suggestions on how to improve performance time between SQL Server transfers?
The slowliness could be from the destination database
try disable triggers, indexes, locks, etc.
look at this link may help more
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/e04a8e21-54a9-46a4-8eb2-67da291dc7e1/slow-data-transfer-through-linked-server?forum=transactsql

SSIS 2014 ADO.NET connection to Oracle slow for 1 table (the rest is fine)

We're in the middle of doing a new data warehouse roll-out using SQL Server 2014. One of my data sources is Oracle, and unfortunately the recommended Attunity component for quick data access is not available for SSIS 2014 just yet.
I've tried avoiding using OLEDB, as that requires installation of specific Oracle client tools that have caused me a lot of frustration before, and with the Attunity stuff supposedly being in the works (MS promised they'd arrive in August already), I'm reluctant to go through the ordeal again.
Therefore, I'm using ADO.NET. All things considered, performance is acceptable for the time being, with the exception of 1 particular table.
This particular table in Oracle has a bunch of varchar columns, and I've come to the conclusion that it's because of the width of the selected row that this table performs particularly slow. To prove that, rather than selecting all columns as they exist in Oracle (which is the original package I created), I truncated all widths to the maximum length of the values actually stored (CAST(column AS varchar(46)). This reduced the time to run the same package to 17 minutes (still way below what I'd call acceptable, and it's not something I'd put in production because it'll open up a world of future pain, but it proves the width of the columns are definitely a factor).
I increased the network packet size in SQL Server, but that did not seem to help much. I have not managed to figure out a good way to alter the packet size on the ADO.NET connector for Oracle (SQL Server does have that option). I attempted to see if adding Packet size=32000;to the connection string for the Oracle connector, but that just threw an error, indicating it simply won't be accepted. The same applies to FetchSize.
Eventually, I came up with a compromise where I split the load into three different parts, dividing the varchar columns between these parts, and using two MERGE JOIN objects to well, merge the data back into a single combined dataset. Running that and doing some extrapolation leads me to think that method would have taken roughly 30 minutes to complete (but without the potential of data loss using the CAST solution from above). However, that's still not acceptable.
I'm currently in the process of trying some other options (not using MERGE JOIN but dumping into three different tables, and then merging those on the SQL Server itself, and splitting the package up into even more different loads in an attempt to further speed up the individual parts of the load), but surely there must be something easier.
Does anyone have experience with how to load data from Oracle through ADO.NET, where wide rows would cause delays? If so, are there any particular guidelines I should be aware of, or any additional tricks you might have come across that could help me reduce load time while the Attunity component is unavailable?
Thanks!
The updated Attunity drivers have just been released by Microsoft:
Hi all, I am pleased to inform you that the Oracle and TeraData connector V3.0 for SQL14
SSIS is now available for download!!!!!
Microsoft SSIS Connectors by Attunity Version 3.0 is a minor release.
It supports SQL Server 2014 Integration Services and includes bug
fixes and support for updated Oracle and Teradata product releases.
For details, please look at the download page.
http://www.microsoft.com/en-us/download/details.aspx?id=44582
Source: https://connect.microsoft.com/SQLServer/feedbackdetail/view/917247/when-will-attunity-ssis-connector-support-sql-server-2014

DataFlow task in SSIS is very slow as compared to writing the sql query in Execute SQL task

I am new to SSIS and have a pair of questions
I want to transfer 1,25,000 rows from one table to another in the same database. But When I use Data Flow Task, it is taking too much time. I tried using an ADO NET Destination as well as an OLE DB Destination but the performance was unacceptable. When I wrote the equivalent query inside an Execute SQL Task it provided acceptable performance. Why is such a difference in performance.
INSERT INTO table1 select * from table2
Based on the first observation, I changed my package. It is exclusively composed of Execute SQL Tasks either with a direct query or with a stored procedure. If I can solve my problem using only the Execute SQL Task, then why would one use SSIS as so many documents and articles indicate. I have seen as it's reliable, easy to maintain and comparatively fast.
Difference in performance
There are many things that could cause the performance of a "straight" data flow task and the equivalent Execute SQL Task.
Network latency. You are performing insert into table a from table b on the same server and instance. In an Execute SQL Task, that work would be performed entirely on the same machine. I could run a package on server B that queries 1.25M rows from server A which will then be streamed over the network to server B. That data will then be streamed back to server A for the corresponding INSERT operation. If you have a poor network, wide data-especially binary types, or simply great distance between servers (server A is in the US, server B is in the India) there will be poor performance
Memory starvation. Assuming the package executes on the same server as the target/source database, it can still be slow as the Data Flow Task is an in-memory engine. Meaning, all of the data that is going to flow from the source to the destination will get into memory. The more memory SSIS can get, the faster it's going to go. However, it's going to have to fight the OS for memory allocations as well as SQL Server itself. Even though SSIS is SQL Server Integration Services, it does not run in the same memory space as the SQL Server database. If your server has 10GB of memory allocated to it and the OS uses 2GB and SQL Server has claimed 8GB, there is little room for SSIS to operate. It cannot ask SQL Server to give up some of its memory so the OS will have to page out while trickles of data move through a constricted data pipeline.
Shoddy destination. Depending on which version of SSIS you are using, the default access mode for an OLE DB Destination was "Table or View." This was a nice setting to try and prevent a low level lock escalating to a table lock. However, this results in row by agonizing row inserts (1.25M unique insert statements being sent). Contrast that with the set-based approach of the Execute SQL Tasks INSERT INTO. More recent versions of SSIS default the access method to the "Fast" version of the destination. This will behave much more like the set-based equivalent and yield better performance.
OLE DB Command Transformation. There is an OLE DB Destination and some folks confuse that with the OLE DB Command Transformation. Those are two very different components with different uses. The former is a destination and consumes all the data. It can go very fast. The latter is always RBAR. It will perform singleton operations for each row that flows through it.
Debugging. There is overhead running a package in BIDS/SSDT. That package execution gets wrapped in DTS Debugging Host. That can cause a "not insignificant" slowdown of package execution. There's not much the debugger can do about an Execute SQL Task-it runs or it doesn't. A data flow, there's a lot of memory it can inspect, monitor, etc which reduces the amount of memory available (see pt 2) as well as just slows it down because of assorted checks it's performing. To get a more accurate comparison, always run packages from the command line (dtexec.exe /file MyPackage.dtsx) or schedule it from SQL Server Agent.
Package design
There is nothing inherently wrong with an SSIS package that is just Execute SQL Tasks. If the problem is easily solved by running queries, then I'd forgo SSIS entirely and write the appropriate stored procedure(s) and schedule it with SQL Agent and be done.
Maybe. What I still like about using SSIS even for "simple" cases like this is it can ensure a consistent deliverable. That may not sound like much, but from a maintenance perspective, it can be nice to know that everything that is mucking with the data is contained in these source controlled SSIS packages. I don't have to remember or train the new person that tasks A-C are "simple" so they are stored procs called from a SQL Agent job. Tasks D-J, or was it K, are even simpler than that so it's just "in line" queries in the Agent jobs to load data and then we have packages for the rest of stuff. Except for the Service Broker thing and some web services, those too update the database. The older I get and the more places I get exposed to, the more I can find value in a consistent, even if overkill, approach to solution delivery.
Performance isn't everything, but the SSIS team did set the ETL benchmarks using SSIS so it definitely has the capability to push some data in a hurry.
As this answer grows long, I'd simply leave it as the advantages of SSIS and the Data Flow over straight TSQL are native, out of the box
logging
error handling
configuration
parallelization
It's hard to beat those for my money.
If you are Passing SSIS Variables As Parameter in Parameter mapping Tab and assigning values to These Variables by Expression Then Your Execute SQL Task consume a lot of time in Evaluating that Expression.
Use Expression Task(Separately) To assign Variables Instead of using Expression in Variable Tab.

Copying 6000 tables and data from sqlserver to oracle ==> fastest method?

i need to copy the tables and data (about 5 yrs data, 6200 tables) stored in sqlserver, i am using datastage and odbc connection to connect and datstage automatically creates the table with data, but its taking 2-3 hours per table as tables are very large(0.5 gig, 300+columns and about 400k rows).
How can i achieve this the fastes as at this rate i am able to only copy 5 tables per day but within 30 days i need move over these 6000 tables.
6000 tables at 0.5 Gb each would be about 3 terabytes. Plus indexes.
I probably wouldn't go for an ODBC connection, but the question is where is the bottleneck.
You have an extract stage from SQL Server. You have the transport from the SQL Server box to the Oracle box. You have the load.
If the network is the limiting capability, you are probably best off extracting to a file, compressing it, transferring the compressed file, uncompressing it, and then loading it. External tables in Oracle are the fastest way to load data from flat file (delimited or fixed length), preferably spread over multiple physical disks to spread the load and without logging.
Unless there's a significant transformation happening, I'd forget datastage. Anything that isn't extracting or loading is excess to be minimised.
Can you do the transfer of separate tables simultaneously in parallel?
We regularly transfer large flat files into SQL Server and I run them in parallel - it uses more bandwidth on the network and SQL Server, but they complete together faster than in series.
Have you thought about scripting out the table schemas and creating them in Oracle and then using SSIS to bulk-copy the data into Oracle? Another alternative would be to use linked servers and a series of "Select * INTO xxx" statements that would copy the schema and data over (minus key constarints), but I think the performance would be quite pitiful with 6000 tables.

Resources