Sqoop Export into Sql Server VS Bulk Insert into SQL server - sql-server

I have a unique query regarding Apache Sqoop. I have imported data using apache Sqoop import facility into my HDFS files.
Next ,. I need to put the data back into another database (basically I am performing data transfer from one database vendor to another database vendor) using Hadoop (Sqoop).
To Put data into Sql Server , there are 2 options.
1) Using Sqoop Export facility to connect to my RDBMS,(SQL server) and export data directly.
2) Copy the HDFS data files (which are in CSV format) into my local machine using copyToLocal command and then perform BCP ( or Bulk Insert Query) on those CSV files to put the data into SQL server database.
I would like to understand which is the perfect(or rather correct) approach to do so and which one of them is more Faster out of the two - The Bulk Insert or Apache Sqoop Export from HDFS into RDBMS. ??
Are there any other ways apart from these 2 ways mentioned above which can transfer faster from one database vendor to another.?
I am using 6-7 mappers (records to be transferred is around 20-25 millions)
Please suggest and Kindly let me know if my Question is unclear.
Thanks in Advance.

If all you do is ETL from one vendor to another, then going through Sqoop/HDFS is a poor choice. Sqoop makes perfect sense if the data originates in HDFS or is meant to stay in HDFS. I would also consider sqoop if the set is so large as to warrant a large cluster for the transformation stage. But a mere 25 million records is not worth it.
With SQL Server import it is imperative, on large imports, to achieve minimally logging, which require bulk insert. Although 25 mil is not so large as to make the bulk option imperative, still AFAIK sqoop, nor sqoop2, do not support bulk insert for SQL Server yet.
I recommend SSIS instead. Is much more mature than sqoop, it has bulk insert task and has a rich transformation featureset. Your small import is well within the size SSIS can handle.

Related

data warehouse Initial load from Oracle to SQL Server

We are building a DWH and the initial load would be millions of rows(a few tables have like around 300 million rows). The data will later be updated every 10 minutes using SSIS package which will be like a few thousand rows.Data migration would be from Oracle to SQL Server.
Can you suggest an efficient way of extracting data initially. Is using SQL Server Import and Export a good and faster option than SSIS for initial load?
Thanks
First: the SQL Server Import and Export wizard creates an SSIS "package under the covers".
I recently had to solve the same problem - our Oracle-to-SQL Server replication infrastructure cratered and we had to rebuild it, which involved initial table loads of the same size that you describe. We used SSIS packages for all of them, and the performance was sufficient to complete the task in the window we had available.
Another option to consider would be getting the Oracle data as a flat file export and BCP import, if the Oracle data are clean enough. If you go that route, though, I'm afrad that others will need to assist - I can barely spell "BCP".
I just extracted and loaded 24.5 million rows in 9 minutes from Oracle DB to SQL Server which I found super awesome!!!
Solution : Used Attunity connector for Oracle and change the batch size to whatever suits to you(1000/5000/10000) 1000 worked for me. (default 100)

Bulk Insert(BCP) into SQL server VS Sqoop Export into Sql Server

Which one is better option among following options in-terms of speed and performance for the purpose of exporting data from hive/hdfs to sql server.
1) Using Sqoop Export facility to connect to RDBMS (SQL server) and export data directly.
2) Dump CSV file using HIVE using INSERT OVERWRITE LOCAL DIRECTORY command and then perform BCP ( or Bulk Insert Query) on those CSV files to put the data into SQL server database.
Or,
Is there any other better option?
In my experience, I use bcp whenever I can. It's from what I can tell the fastest way to shotgun data into a database and is configurable on a (somewhat) fine grain level.
Couple things to consider:
Use a staging table. No primary key, no indexes, just raw data.
Have a "consolidation" proc to move data around after loading.
Use a row size of about 5000 to start, but if performance is of utmost concern, then test.
Make sure you increase your timeout.

Mass export and import of sql table rows

I need to export the data from 36 SQL tables containing 24GB of data into flat files, copy them to the client and import them there into the existing tables in his SQL database.
And I will need this for several customers (same tables, though).
How do I mass export and import data?
Is there a command line tool for this so I can write a script for repeated use?
The basic knowledge you will find here Importing and Exporting Bulk Data
What is bcp ?
bcp.exe is the standard bulk import/export tool for MSSQL. Using SSIS packages is an alternative, but brings a lot of overhead with it: it's a full ETL tool. In TSQL there's also a BULK INSERT statement that you can use as an alternative to "bcp in", but I personally haven't played around to see which one is faster or more useful etc.
See "bulk exporting" and "bulk importing" in Books Online for all the details.

Best way to migrate export/import from SQL Server to oracle

I'm faced with needing access for reporting to some data that lives in Oracle and other data that lives in a SQL Server 2000 database. For various reasons these live on different sides of a firewall. Now we're looking at doing an export/import from sql server to oracle and I'd like some advice on the best way to go about it... The procedure will need to be fully automated and run nightly, so that excludes using the SQL developer tools. I also can't make a live link between databases from our (oracle) side as the firewall is in the way. The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting.
What I'm thinking about is writing a monster query for SQL Server (which I mostly have already) that will denormalise and read out the data from SQL Server into a flat file using the sql server equivalent of sqlplus as a scheduled task, dump into a Well Known Location, then on the oracle side have a cron job that copies down the file and loads it with sql loader and rebuilds indexes etc.
This is all doable, but very manual. Is there one or a combination of FOSS or standard oracle/SQL Server tools that could automate this for me? the Irreducible complexity is the query on one side and building indexes on the other, but I would love to not have to write the CSV dumping detail or the SQL loader script, just say dump this view out to CSV on one side, and on the other truncate and insert into this table from CSV and not worry about mapping column names and all other arcane sqlldr voodoo...
best practices? thoughts? comments?
edit: I have about 50+ columns all of varying types and lengths in my dataset, which is why I'd prefer to not have to write out how to generate and map each single column...
"The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting."
You are really looking for an ETL tool. If you have no money in the till, I suggest you check out the Open Source Talend and Pentaho offerings.

Most efficient way to move a few SQL Server tables to SQLite?

I have a fairly large SQL Server database; I'd like to pull 4 tables out and dump them directly into an sqlite.db for remote querying (via nightly batch).
I was about to write a script to step through(most likely on a unix host kicked off via cron); but there should be a simpler method to export the tables directly (SQLite not an option in the included DTS Import/Export wizard)
What would the most efficient method of dumping the SQL Server tables to SQLite via batch be?
You could export your data from ms-sql with sqlcmd to a text file, and later import this with a bulk import in sqlite. Read this question and answers to get an idea how to do this in sqlite.
You could create a batch file and run this with cron, I guess.
If you were considering DTS, then you might be able to do so via ODBC. MSSQL -> ODBC -> Sqlite
http://www.ch-werner.de/sqliteodbc/

Resources