We are building a DWH and the initial load would be millions of rows(a few tables have like around 300 million rows). The data will later be updated every 10 minutes using SSIS package which will be like a few thousand rows.Data migration would be from Oracle to SQL Server.
Can you suggest an efficient way of extracting data initially. Is using SQL Server Import and Export a good and faster option than SSIS for initial load?
Thanks
First: the SQL Server Import and Export wizard creates an SSIS "package under the covers".
I recently had to solve the same problem - our Oracle-to-SQL Server replication infrastructure cratered and we had to rebuild it, which involved initial table loads of the same size that you describe. We used SSIS packages for all of them, and the performance was sufficient to complete the task in the window we had available.
Another option to consider would be getting the Oracle data as a flat file export and BCP import, if the Oracle data are clean enough. If you go that route, though, I'm afrad that others will need to assist - I can barely spell "BCP".
I just extracted and loaded 24.5 million rows in 9 minutes from Oracle DB to SQL Server which I found super awesome!!!
Solution : Used Attunity connector for Oracle and change the batch size to whatever suits to you(1000/5000/10000) 1000 worked for me. (default 100)
Related
I'm working with a SSIS package for importing from an Oracle Table to an SQL Server Table. for this in between I had to put a data conversion.
the OLE DB Source is retrieving the complete Table, then being converted by the data conversion and then sent to the OLE DB Destination with current setup
now, the table I'm trying to import has around 7.3 Million records with 53 columns.
I need to know how can I setup (or what changes should do to current setup) to speed up as much as possible this process.
This package is going to run scheduled as a job in the SQL server agent.
In the last run inserted 78k records in 15 minutes. at this pace is too slow.
I believe I have to tune setting with the "rows per batch" and "maximum insert commit size" but looking around I haven't found information about what settings should work, and I've tried different settings here, not finding actual difference between them.
UPDATE: After a bit more test, the delay is from getting records from Oracle, not to insert them into SQL server. I need to check on how can I improve this
I think that the main problem is not loading data into SQL Server, check the OLE DB provider you are using to extract data from Oracle.
There are many suggestions you can go with:
Use Attunity connectors which are the fastest one available
Make sure you are not using the old Microsoft OLEDB Provider for Oracle (part of MDAC). Use the Oracle Provider for OLEDB (part of ODAC) instead
If it didn't work, try using an ODBC connection / ODBC Source to read data from Oracle
I have more than 100 million records data in file txt. I would like to import them to SQL server. So, which one I can choose between data loader and SSIS. Thank you so much!
Assuming you are referring to the Import Data wizard when you say "The data loader", it is just a wizard that creates you an SSIS package to import your data. You even get the option to save your import as an SSIS package at the end of the process.
If you care more about the speed of the import, for 100 million records within a text file you would probably (but not definitely) be better off using the Bulk Copy Program (BCP) Utility provided by Mircosoft.
Edit following comments
From what I can see, DataLoader.io is a Salesforce only tool. It seems you cannot use it to load data into SQL Server. In this case, out of the two options you have suggested SSIS is the only viable option. Whether or not SSIS is suitable for your current and on-going situation however, is a larger discussion not really suited to the Stack Overflow format.
I have a production database of 20 TB data. We migrated our database from Oracle to SQL Server. Our old application was based on a Cobol based platform. After migrating to SQL Server indexes are giving good results.
I am creating a schema with new set of indexes without any data. Now I want to migrate only the data.
Import/Export utility will take load log time and will fill up the log files also. Is there any other alternative of this ?
My advice would be:
Set the recovery model to simple. See here.
Remove the indexes.
Batch insert the rows or use select into (this minimizes logging).
Re-create the indexes.
I admit that I haven't had to do this sort of thing in a long time in SQL Server. There may be other methods that are faster -- such as backing up a table space/partition and restoring it in another location.
You may use bcp utility to import/export data. For full details see here https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017
I need to copy large amounts of data from an Oracle database to a SQL Server database. What is the fastest way to do this?
I am looking at data that takes 60 - 70 gig of storage in Oracle. There are no particular restrictions on the method that I use. I can use the SQL Server Management Studio, or the SQL Serer import/export program, or a .NET app, or the developer interface in Oracle, or third party tools, or ----. I just need to move the data as quickly as possible.
The data is geographically organized. The data for each state comes is updated separately into the Oracle database and can be moved over to SQL Server on its own. So the entire volume of the data will rarely be all moved over at once.
So what suggestions would people have?
The fastest way to insert large amounts of data into SQL Server is with SQL Server bulk insert. Common bulk insert techniques are:
T-SQL BULK INSERT statement
BCP command-line utility
SSIS package OLE DB destination with the fast load option
ODBC bcp API from unmanaged code
OLE DB IRowsetFastLoad from unmanaged code
SqlBulkCopy from a .NET application
T-SQL BULK INSERT and the command-line BCP utility use a flat file source so the implication is that you'll need to first export data to files. The other methods can use Oracle SELECT query results directly without the need for an intermediate file, which should perform better overall as long as source/destination network bandwidth and latency isn't a concern.
With SSIS, one would typically create a data flow task for each table to be copied with a OLE DB source (Oracle) and OLE DB destination (SQL Server). The Oracle source provider can be downloaded separately depending on the SSIS version. The latest is the Microsoft Connector v4.0 for Oracle. The SSMS import wizard can be used to generate an SSIS package for the task, which may be run immediately and/or saved and customized as desired. For example, you could create a package variable for the state to be copied and use that in the source SELECT query and in a target DELETE query prior to refreshing data. That would allow the same package to be reused for any state.
OLE DB IRowSetFastLoad or ODBC bcp calls should perform similarly to SSIS but you might be able to eek out some additional performance gains with a lot of attention to detail. However, using these APIs is not trivial unless you are already familiar with C++ and the APIs.
SqlBulkCopy is fast (generally millions of rows per minute), which is good enough performance for most applications without the additional complexity of unmanaged code. It will be best to use the Oracle managed provider for the source SELECT query rather than ODBC or OLE DB provider in .NET code.
My recommendation is you consider not only performance but also your existing skillset.
I actually used the "Microsoft SQL Server Migration Assistant (SSMA)" from MS once for this and it actually did what it promised to do:
SQL Server Migration Assistant for
Oracle
(documentation)
Microsoft SQL Server Migration Assistant v6.0 for
Oracle
(download)
SQL Server Migration Assistant (SSMA) Team's
Blog
However in my case it was not as fast as I would have expected for a 80 GB Oracle-DB (4 hours or something) and I had to do some manual steps afterwards, but the application was developed in hell anyway (one table had 90+ columns and 100+ indices).
I have a unique query regarding Apache Sqoop. I have imported data using apache Sqoop import facility into my HDFS files.
Next ,. I need to put the data back into another database (basically I am performing data transfer from one database vendor to another database vendor) using Hadoop (Sqoop).
To Put data into Sql Server , there are 2 options.
1) Using Sqoop Export facility to connect to my RDBMS,(SQL server) and export data directly.
2) Copy the HDFS data files (which are in CSV format) into my local machine using copyToLocal command and then perform BCP ( or Bulk Insert Query) on those CSV files to put the data into SQL server database.
I would like to understand which is the perfect(or rather correct) approach to do so and which one of them is more Faster out of the two - The Bulk Insert or Apache Sqoop Export from HDFS into RDBMS. ??
Are there any other ways apart from these 2 ways mentioned above which can transfer faster from one database vendor to another.?
I am using 6-7 mappers (records to be transferred is around 20-25 millions)
Please suggest and Kindly let me know if my Question is unclear.
Thanks in Advance.
If all you do is ETL from one vendor to another, then going through Sqoop/HDFS is a poor choice. Sqoop makes perfect sense if the data originates in HDFS or is meant to stay in HDFS. I would also consider sqoop if the set is so large as to warrant a large cluster for the transformation stage. But a mere 25 million records is not worth it.
With SQL Server import it is imperative, on large imports, to achieve minimally logging, which require bulk insert. Although 25 mil is not so large as to make the bulk option imperative, still AFAIK sqoop, nor sqoop2, do not support bulk insert for SQL Server yet.
I recommend SSIS instead. Is much more mature than sqoop, it has bulk insert task and has a rich transformation featureset. Your small import is well within the size SSIS can handle.