i need to copy the tables and data (about 5 yrs data, 6200 tables) stored in sqlserver, i am using datastage and odbc connection to connect and datstage automatically creates the table with data, but its taking 2-3 hours per table as tables are very large(0.5 gig, 300+columns and about 400k rows).
How can i achieve this the fastes as at this rate i am able to only copy 5 tables per day but within 30 days i need move over these 6000 tables.
6000 tables at 0.5 Gb each would be about 3 terabytes. Plus indexes.
I probably wouldn't go for an ODBC connection, but the question is where is the bottleneck.
You have an extract stage from SQL Server. You have the transport from the SQL Server box to the Oracle box. You have the load.
If the network is the limiting capability, you are probably best off extracting to a file, compressing it, transferring the compressed file, uncompressing it, and then loading it. External tables in Oracle are the fastest way to load data from flat file (delimited or fixed length), preferably spread over multiple physical disks to spread the load and without logging.
Unless there's a significant transformation happening, I'd forget datastage. Anything that isn't extracting or loading is excess to be minimised.
Can you do the transfer of separate tables simultaneously in parallel?
We regularly transfer large flat files into SQL Server and I run them in parallel - it uses more bandwidth on the network and SQL Server, but they complete together faster than in series.
Have you thought about scripting out the table schemas and creating them in Oracle and then using SSIS to bulk-copy the data into Oracle? Another alternative would be to use linked servers and a series of "Select * INTO xxx" statements that would copy the schema and data over (minus key constarints), but I think the performance would be quite pitiful with 6000 tables.
Related
I know 2008 is outdated but a need is a need irrelevant to sources
I am using SSIS 2008 and SQL Server 2008 R2 ,
My requirements are
Load multiple source file data using Bulk load in control flow (Each File size is 20 Gig .txt files-- In total 150 Gig for 5 files apprx)
The SQL table is single (Tabledata with 243 columns)
I used SSIS 5 files bulk loading but it got blocked for a long time any help is appreciated..
To get the best performance in loading a table, we want to shovel the data in by the truckload (table lock). The problem is, there's only room for one truck in the bay. Otherwise, if you want multiple feeds into a table at one time, you're likely looking at throwing data in by the shovel full - that way, 5 workers can be there and their loading won't block each other but the throughput is lessened.
If you're Enterprise Edition, or you want to go old school with a partitioned view on Standard Edition, then you could have each partition/individual table loaded in parallel and then you have N worker processes pouring the data in as fast as the disk subsystem allows and none of the contention you're currently experiencing.
As #David Browne points out, SQL Server supports parallel bulk loads to unindexed heap tables
To get BU lock you need to specify the TABLOCK option with each bulk import stream without blocking other bulk import streams
In your OLE DB Destination, this will be a Fast Load (default) and check the Table lock checkbox. As the msdn article calls out, the destination table needs to be empty or the lock will be IX-Tab and not BU-tab
Yes, you can load from multiple sources. I read from multiple sources at the same time, and write using the fast load option.
Need advise here: using Alteryx Designer, I'm pulling a large dataset from SQL Server (10M rows) and need to move into Greenplum DB
I tried both with connecting using Input Data (SQL Server) and Output Data (GP) and also Connect In-DB (SQL Server) and Write Data In-DB (GP)
Any approach is taking a life to complete at the point that i have to cancel the process (to give an idea, over the weekend it ran for 18hours and advanced no further than 1%)
Any good advice or trick to speed up these sort of massive bulk data loading would be very very highly appreciated!
I can control or do modifications on SQL Server and Alteryx to increase performance but not in Greenplum
Thanks in advance.
Regards,
Erick
I'll break down the approaches that you're taking.
You won't be able to use IN-DB tools as the Databases are different, hence you can't push the processing on to the DB...
Using the standard Alteryx Tools, you are bringing the whole table on to your machine and then pushing it out again, there are multiple ways that this could be done depending on where your blockage is.
Looking first at the extract from SQL, 10M rows isn't that much and so you could split the process and write it as a yxdb. If that fails or takes several hours, then you will need to look at the connection to the SQL Server or the resources available on the SQL Server.
Then for the push into Greenplum, there is no PostgreS bulk loader at present and so you can either just try and write the whole table, Or you can write segments of the table into temp tables in Greenplum and then execute a command to combine those tables.
We are pulling millions of rows daily from SQL servers to Greenplum and we use open source tool called Outsourcer. it's great tool and take care of cleansing and other.. We are using this tool for past 3.5 yrs and no issue till now.. It take care of all parallelism and millions of rows are loaded within minutes.
It support incremental or full load. If you need supports Jon Robert owner of the Outsourcers will response to your email within minutes. Here is the link for the tool
https://www.pivotalguru.com/
I have two SQL Server environments, data warehouse which collects data and a datamart which people access for a subset of the data, each with their own SQL Server 2016 databases. I run a script which pulls out data, transforms it and transfer it from the data warehouse to the datamart using Linked Servers. The entire process takes around 60+ hours to run. I want to avoid at all costs at having the data warehouse data in the datamart.
I experimented to see why the whole process was taking so long. I did a backup of the data warehouse, restored it onto the datamart and ran the import script and the entire process took around 3 hours to run. The script itself to 1.5 hours, telling me of the 60+ hours its the linked server transfer of data between the two servers that is the slowest part. I've pretty much ruled out network speed or issues between the two servers; this is all SQL. I'm trying to avoid having to write an application to do all of this in .NET if I can keep it in SQL Server.
Does anyone have any suggestions on how to improve performance time between SQL Server transfers?
The slowliness could be from the destination database
try disable triggers, indexes, locks, etc.
look at this link may help more
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/e04a8e21-54a9-46a4-8eb2-67da291dc7e1/slow-data-transfer-through-linked-server?forum=transactsql
We are transferring data to an Oracle Database from two different sources and it's extremely slow.
Please see notes and images below. Any suggestions?
Notes:
We're using the Microsoft OLE DB Provider for Oracle.
One data source is SQL Server and includes about 5M records.
The second data source is Oracle and includes about 700M records.
When trying to transfer the SQL Server data, we broke it up into
five "Data Flow Tasks" in the "Control Flow". Each "Data Flow Task"
in turn use an "OLE DB Source" which internally uses a "SQL command"
that effectively selects 1M of the 5M records. When we run this
package it ran the first data flow task for about 3 hours and only
transferred about 50,000 records until we ended the process.
We had similar experience with the Oracle data as well.
For some reason saving to a Oracle Destination is extremely slow.
Interestingly, we once transfer the same 700M records from Oracle to
SQL Server (so the opposite direction) and it worked as expected in
about 4.5 to 5 hours.
Images:
On the Oracle side you can examine v$session to see where the time is being spent (if AWR is licensed on the Oracle instance you can use DBA_HIST_ACTIVE_SESS_HISTORY or v$active_session_history).
I work on Oracle performance problems every day (over 300 production Oracle instances), so I feel qualified to say that I can't give you a specific answer to your question, but I can point you in the right direction.
Typical process mistakes that make inserts slow:
not using array insert
connecting to the DB for each insert (sound strange? believe me
I've seen DataStage and other ETL tools set-up this way)
app server/client not on same local area network as the Oracle instance
indexes on table(s) being inserted into (especially problematic with
bit mapped indexes); requires index update and table update per
statement
redo log files too small on Oracle instance (driving up
redo log file switching)
log_buffer parameter on DB side too small
not enough db writers (see db_writer_processes initialization
parameter)
committing too often
Not an answer, just a bunch of observations and questions...
Any one of the components in the data pipeline could be the bottleneck.
You first need to observe the row counts when running interactively in SSIS and see if there is any obvious clogging going on - i.e. do you have a large rowcount right before your Data conversion transformation and a low one after? Or is it at the Oracle destination? Or is it just taking a long time to come out of SQL? A quick way to check the SQL side is to dump it to a local file instead - that mostly measures the SQL select performance without any blocking from Oracle.
When you run your source query in SQL Server, how long does it take to return all rows?
Your data conversion transformation can be performed in the source query. Every transformation requires set up of buffers, memory etc. and can slow down and block your dataflow. Avoid these and do it in the source query instead
Various buffers and config that exists in Oracle driver. Already addressed in detail by #RogerCornejo. For read performance out of Oracle, I have found altering FetchBufferSize made a huge difference, but you are doing writes here so that's not the case.
Lastly, where are the two database servers and the SSIS client tool situated network wise? If you are running this across three different servers then you have network throughput to consider.
If you use a linked server as suggested, note that SSIS doesn't do any processing at all so you take that whole piece out of the equation
And if you're just looking for the fastest way to transfer data, you might find that dumping to a file and bulk inserting is the fastest
Thank you all for your suggestions. For those who may run into a similar problem in the future, I'm posting what finally worked for me. The answer was ... switching the provider. The ODBC or Attunity providers were much faster, by a factor of almost 800X.
Remember that my goal was to move data from a SQL Server Database to an Oracle database. I originally used an OLE DB provider for both the source and destination. This provider works fine if you are moving data from SQL Server to SQL Server because it allows you to use the "Fast Load" option on the destination which in turn allows you to use batch processing.
However, the OLE DB provider doesn't allow the "Fast Load" option with an Oracle DB as the destination (couldn't get it to work and read elsewhere that it doesn't work). Because I couldn't use the "Fast Load" option I couldn't batch and instead was inserting records row by row which was extremely slow.
A colleague suggested trying ODBC and others suggested trying Microsoft's Attunity Connectors for Oracle. I didn't think the difference would be so great because in my experience ODBC had similar (and sometimes less) performance than OLE DB (hadn't tried Attunity). BUT... that was when moving data from and to a SQL Server database or staying in the Microsoft world.
When moving data from a SQL Server database to an Oracle database, there was a huge difference! Both ODBC and Attunity out performed OLE DB dramatically.
Here were my summarized performance test results inserting 5.4M records from a SQL Server database to an Oracle Database.
When doing all the work on one local computer.
OLE DB source and destination inserted 12 thousand records per minute which would have taken approx. 7 hours to complete.
ODBC source and destination inserted 9 Million records per minute which only took approx. 30 seconds to complete.
When moving data from one network/remote computer to another network/remote computer.
OLE DB source and destination inserted 115 records per minute which would have taken approx. 32 days to complete.
ODBC source and destination inserted 1 Million records per minute which only took approx. 5 minutes to complete.
Big difference!
Now why when working locally it only took 30 seconds and remotely it took 5 minutes is another issue for another day, but for now I have something workable (it should be slower on the network, but surprised it's that much slower).
Thanks again to everyone!
Extra notes:
My OLE DB results were similar with either Microsoft's or Oracle OLE DB providers for Oracle databases.
Attunity was a little faster than ODBC. I didn't get to test on remote servers or on larger data set, but locally it was a consitently about 2 to 3 seconds faster than ODBC. Those seconds could add up on a large data set so take note.
What is the fastest method to fill a database table with 10 Million rows? I'm asking about the technique but also about any specific database engine that would allow for a way to do this as fast as possible. I"m not requiring this data to be indexed during this initial data-table population.
Using SQL to load a lot of data into a database will usually result in poor performance. In order to do things quickly, you need to go around the SQL engine. Most databases (including Firebird I think) have the ability to backup all the data into a text (or maybe XML) file and to restore the entire database from such a dump file. Since the restoration process doesn't need to be transaction aware and the data isn't represented as SQL, it is usually very quick.
I would write a script that generates a dump file by hand, and then use the database's restore utility to load the data.
After a bit of searching I found FBExport, that seems to be able to do exactly that - you'll just need to generate a CSV file and then use the FBExport tool to import that data into your database.
The fastest method is probably running an INSERT sql statement with a SELECT FROM. I've generated test data to populate tables from other databases and even the same database a number of times. But it all depends on the nature and availability of your own data. In my case i had enough rows of collected data where a few select/insert routines with random row selection applied half-cleverly against real data yielded decent test data quickly. In some cases where table data was uniquely identifying i used intermediate tables and frequency distribution sorting to eliminate things like uncommon names (eliminated instances where a count with group by was less than or equal to 2)
Also, Red Gate actually provides a utility to do just what you're asking. It's not free and i think it's Sql Server-specific but their tools are top notch. Well worth the cost. There's also a free trial period.
If you don't want to pay or their utility you could conceivably build your own pretty quickly. What they do is not magic by any means. A decent developer should be able to knock out a similarly-featured though alpha/hardcoded version of the app in a day or two...
You might be interested in the answers to this question. It looks at uploading a massive CSV file to a SQL server (2005) database. For SQL Server, it appears that a SSIS DTS package is the fastest way to bulk import data into a database.
It entirely depends on your DB. For instance, Oracle has something called direct path load (http://download.oracle.com/docs/cd/B10501_01/server.920/a96652/ch09.htm), which effectively disables indexing, and if I understand correctly, builds the binary structures that will be written to disk on the -client- side rather than sending SQL over.
Combined with partitioning and rebuilding indexes per partition, we were able to load a 1 billion row (I kid you not) database in a relatively short order. 10 million rows is nothing.
Use MySQL or MS SQL and embedded functions to generate records inside the database engine. Or generate a text file (in cvs like format) and then use Bulk copy functionality.