How does sqoop handle SQL Server DB locks? - database

On the subject of importing data into sqoop from Microsoft SQL Server. How does sqoop handle database locks when running import table commmands?
More info:
Sqoop is using a JDBC driver.

Sqoop handles database locks by taking required locks and respecting conflicting locks acquired by other processes. Same as everybody else.
What exactly are you worried about? Sqoop does ordinary INSERT operations.

Related

Querying a HIVE Table from SQL Server 2016 or later

I'm trying to query my Hortonworks cluster Hive tables from SQL Server. My scenario below:
HDP 2.6, Ambari, HiveServer2
SQL Server 2016 Enterprise
Kerberos configuration for secure logins in HDP
I was reading about the PolyBase service in SQL Server 2016 and I suppose later versions. However, I realize that according to the documentation what this service is going to perform in SQL Server is a bridge to reach out my HDFS and recreate external tables based in this data source.
Otherwise what I'm expecting is query Hive objects like these would be SQL Server objects as well, such as a linked server.
Someone has an example or knows if this is possible within SQL Server and Hive?
Thanks so much
Hive acts more as a job compiler than a database. This means every SQL statement you are writing will be translated into a job for Hadoop, sent over to the cluster and become executed there. From the user perspective it looks like querying a table.
The already mentioned approach by reading the HDFS data source and re-create it in SQL Server is the correct one. Since both, Hive and database server are different technologies, something like a linked server seems to technically not possible for me.
Hive provides nowadays a JDBC interface which could be used to connect to it. But even with Hive JDBC, every query will end up as cluster job for distributed computing, running over the files in HDFS, create a result set and present that to you.
If you want to querying Hive from SQL server, you can download ODBC driver (Microsoft or Hortonsworks) and create a Data Source Name (DSN) for Hive. In Advanced option check Use Native Query. Then just create new linked server in the SQL server with the same name of datasource as Data Source Name in ODBC driver.
Write openquery something like:
select top 100 * from
openquery(HadoopLinkedServer,
'column1, column2 from databaseInHadoop.tableInHadoop')

Sqoop Export into Sql Server VS Bulk Insert into SQL server

I have a unique query regarding Apache Sqoop. I have imported data using apache Sqoop import facility into my HDFS files.
Next ,. I need to put the data back into another database (basically I am performing data transfer from one database vendor to another database vendor) using Hadoop (Sqoop).
To Put data into Sql Server , there are 2 options.
1) Using Sqoop Export facility to connect to my RDBMS,(SQL server) and export data directly.
2) Copy the HDFS data files (which are in CSV format) into my local machine using copyToLocal command and then perform BCP ( or Bulk Insert Query) on those CSV files to put the data into SQL server database.
I would like to understand which is the perfect(or rather correct) approach to do so and which one of them is more Faster out of the two - The Bulk Insert or Apache Sqoop Export from HDFS into RDBMS. ??
Are there any other ways apart from these 2 ways mentioned above which can transfer faster from one database vendor to another.?
I am using 6-7 mappers (records to be transferred is around 20-25 millions)
Please suggest and Kindly let me know if my Question is unclear.
Thanks in Advance.
If all you do is ETL from one vendor to another, then going through Sqoop/HDFS is a poor choice. Sqoop makes perfect sense if the data originates in HDFS or is meant to stay in HDFS. I would also consider sqoop if the set is so large as to warrant a large cluster for the transformation stage. But a mere 25 million records is not worth it.
With SQL Server import it is imperative, on large imports, to achieve minimally logging, which require bulk insert. Although 25 mil is not so large as to make the bulk option imperative, still AFAIK sqoop, nor sqoop2, do not support bulk insert for SQL Server yet.
I recommend SSIS instead. Is much more mature than sqoop, it has bulk insert task and has a rich transformation featureset. Your small import is well within the size SSIS can handle.

Transfer data from SQL Server to PostgreSQL on Linux

What is the best way to transfer the data from SQL Server database on Windows to a PostgreSQL database on Linux?
The current SQL Server database there are about 500,000 rows per table, and about 80 tables all together.
Thanks!
I'm not familiar enough with sql-server to assert this is the best way, but if you just need the data, you could try using odbc and a foreign data wrapper:
http://wiki.postgresql.org/wiki/Foreign_data_wrappers

Can Sqoop export create a new table?

It is possible to export data from HDFS to RDBMS table using Sqoop.
But it seems like we need to have existing table.
Is there some parameter to tell Sqoop do the 'CREATE TABLE' thing and export data to this newly crated table?
If yes, is it going to work with Oracle?
I'm afraid that Sqoop do not support creating tables in the RDBMS at the moment. Sqoop uses the table in RDBMS to get metadata (number of columns and their data types), so I'm not sure where Sqoop could get the metadata to create the table for you.
You can actually execute arbitrary SQL queries and DDL via sqoop eval, at least with MySQL and MSSQL. I'd expect it to work with Oracle as well. MSSQL example:
sqoop eval --connect 'jdbc:sqlserver://<DB SERVER>:<DB PORT>;
database=<DB NAME>' --query "CREATE TABLE..."
--username <USERNAME> -P
I noticed you use Oracle too. Certain sqoop vendor-specific sqoop connectors support that, including Oracle. Sqoop's Oracle direct connect mode has option to do that
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_create_oracle_tables
24.8.5.4. Create Oracle Tables
-Doraoop.template.table=TemplateTableName
Creates OracleTableName by replicating the structure and data types of
TemplateTableName. TemplateTableName is a table that exists in Oracle
prior to executing the Sqoop command.
ps. You'll have to use --direct sqoop export option to activate sqoop direct mode = 'Data Connector for Oracle and Hadoop' (aka OraOOP - older name).

Most efficient way to move a few SQL Server tables to SQLite?

I have a fairly large SQL Server database; I'd like to pull 4 tables out and dump them directly into an sqlite.db for remote querying (via nightly batch).
I was about to write a script to step through(most likely on a unix host kicked off via cron); but there should be a simpler method to export the tables directly (SQLite not an option in the included DTS Import/Export wizard)
What would the most efficient method of dumping the SQL Server tables to SQLite via batch be?
You could export your data from ms-sql with sqlcmd to a text file, and later import this with a bulk import in sqlite. Read this question and answers to get an idea how to do this in sqlite.
You could create a batch file and run this with cron, I guess.
If you were considering DTS, then you might be able to do so via ODBC. MSSQL -> ODBC -> Sqlite
http://www.ch-werner.de/sqliteodbc/

Resources