Importing data from SQL Server to HIVE using SQOOP - sql-server

I am able to successfully import data from SQL Server to HDFS using sqoop. However, when it tries to link to HIVE I get an error. I am not sure I understand the error correctly
sudo -u hdfs sqoop import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect "jdbc:sqlserver://XX.XX.X.X:1433;instanceName=data-engr-sql-svr; databaseName=AdventureWorks2019" \
--username sa \
--password XXXXXXXX \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--warehouse-dir "/user/hive/warehouse/AdventureWorks2019.db" \
--hive-import \
--create-hive-table \
--fields-terminated-by ',' \
--hive-table AdventureWorks2019.Production.TransactionHistory \
--table Production.TransactionHistory \
--split-by TransactionID \
-- --schema Production
I don't know how to handle schemas, most of the tutorial uses a dummy database without proper schemas which are not helpful.
Error
21/03/31 08:52:47 INFO conf.HiveConf: Using the default value passed in for log id: 95e2b831-cfe5-4108-be0f-0df1d9a8797e
21/03/31 08:52:47 INFO session.SessionState: Updating thread name to 95e2b831-cfe5-4108-be0f-0df1d9a8797e main
21/03/31 08:52:47 INFO conf.HiveConf: Using the default value passed in for log id: 95e2b831-cfe5-4108-be0f-0df1d9a8797e
21/03/31 08:52:47 INFO ql.Driver: Compiling command(queryId=hdfs_20210331085247_050638e8-593a-4d01-8020-c40b7db8e66a): CREATE TABLE IF NOT EXISTS AdventureWorks2019.Production.TransactionHistory ( TransactionID INT, ProductID INT, ReferenceOrderID INT, ReferenceOrderLineID INT, TransactionDate STRING, TransactionType STRING, Quantity INT, ActualCost DOUBLE, ModifiedDate STRING) COMMENT 'Imported by sqoop on 2021/03/31 08:52:45' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' LINES TERMINATED BY '\012' STORED AS TEXTFILE
21/03/31 08:52:49 INFO hive.metastore: HMS client filtering is enabled.
21/03/31 08:52:49 INFO hive.metastore: Trying to connect to metastore with URI thrift://cnt7-naya-cdh63:9083
21/03/31 08:52:49 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/03/31 08:52:49 INFO hive.metastore: Connected to metastore.
21/03/31 08:52:49 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
FAILED: SemanticException [Error 10255]: Invalid table name AdventureWorks2019.Production.TransactionHistory
21/03/31 08:52:49 ERROR ql.Driver: FAILED: SemanticException [Error 10255]: Invalid table name AdventureWorks2019.Production.TransactionHistory

There is no such thing as schema inside the database in Hive. Database and schema mean the same thing and can be used interchangeably.
So, the bug is in using database.schema.table. Use database.table in Hive.
Read the documentation: Create/Drop/Alter/UseDatabase

Related

Upsert data into SQL Server from pyspark code

I have a pyspark dataframe that I wanted to upsert into a SQL Server table. I was looking at df.write modes and I do not see any upsert option. Therefore I am trying to write the dataframe into HDFS as parquet format and then sqoop the file using --update-mode allowinsert. However, I keep getting the following error:
Got exception in update thread: com.microsoft.sqlserver.jdbc.SQLServerException: One or more values is out of range of values for the datetime2 SQL Server data type
I tried to write the file as csv just to check if the contents/timestamp in the file is out of range, however the timestamps are correct.
Has anybody been able to write the pyspark dataframe into a SQL Server table?
Here's the function to write DF to HDFS:
def write_df_to_hdfs(df, filename, hdfs_location_working):
"""
Function to write delta records dataframe to HDFS
"""
logging.info("Started writing delta records dataframe to hdfs")
df.write.save(hdfs_location_working, format='parquet', mode='append', timestampFormat='YYYY-MM-dd hh:mm:ss.SSS',emptyValue="")
logging.info("Successfully written delta records dataframe to hdfs")
Also, here's the sqoop command I m using to write that data into SQL Server:
sqoop export -Dmapreduce.map.memory.mb=4096 -Dmapreduce.map.java.opts=-Xmx3000m -Dmapred.job.queuename=ici -Dsqoop.export.records.per.statement=30 -Dsqoop.export.statements.per.transaction=30 -libjars /opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p6.12486751/lib/sqoop/lib/sqljdbc.jar --connect "jdbc:sqlserver://*******.hosts.cloud.ford.com;databaseName=SQTDIAPM_AM;schema=dbo;" \
--username 'user' \
--password 'pwd' \
--export-dir <HDFS Path> \
--table <tablename> \
--input-null-string '""' \
--input-null-string '\\N' \
--input-null-non-string '\\N' \
--update-key col1,col2 \
--update-mode allowinsert \
--batch \
-m 40 \
--verbose
Appreciate your help!

Sqoop export from HDFS dir to sybase IQ failed

I am trying to export HDFS file from a HDFS directory to sybase IQ table.
I have placed the sybase driver in sqoop lib path correctly .
sqoop Command :
sqoop export \
--connect jdbc:sybase:Tds:sybasehost:port/DATABASE=OMEGA \
--username dummy \
--password dummy \
--driver com.sybase.jdbc4.jdbc.SybDriver \
--table omega_sybase_table \
--export-dir /user/cloudera/omega/output_files/ \
--input-fields-terminated-by ','
I am getting the below error and this export failed.
17/04/25 16:17:07 INFO mapreduce.Job: Task Id : attempt_1489579695153_4935_m_000002_1, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: java.sql.SQLException: SQL Anywhere Error -210: User 'another user' has the row in 'omega_sybase_table' locked
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:233)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.sql.SQLException: SQL Anywhere Error -210: User 'another user' has the row in 'omega_sybase_table' locked
at com.sybase.jdbc4.jdbc.SybConnection.getAllExceptions(Unknown Source)
Could someone help me fixing this issue?
its getting because of multiple mappers tasks being used in sqoop export command.
Sybase IQ only allows one connection at a time, multiple mappers tasks try to insert records in sybase iq table in parallel.
Solution is to use -m 1 in sqoop export command.

sqoop import all to hive from db2 specific schema

I was trying to import all tables from a Specific schema in DB2 using below command line.
sqoop import-all-tables --username user --password pass \
--connect jdbc:db2://myip:50000/databs:CurrentSchema=testdb \
--driver com.ibm.db2.jcc.DB2Driver --fields-terminated-by ',' \
--lines-terminated-by '\n' --hive-database default --hive-import --hive-overwrite \
--create-hive-table -m 1;
Struck with following error
2017-05-02 09:21:18,474 ERROR - [main:] ~ Error reading database metadata:
com.ibm.db2.jcc.am.SqlSyntaxErrorException: [jcc][10165][10051][4.11.77]
Invalid database URL syntax:
jdbc:db2://myip:50000/msrc:CurrentSchema=testdb. ERRORCODE=-4461,
SQLSTATE=42815 (SqlManager:43)
com.ibm.db2.jcc.am.SqlSyntaxErrorException: [jcc][10165][10051][4.11.77]
Invalid database URL syntax:
jdbc:db2://myip:50000/msrc:CurrentSchema=testdb. ERRORCODE=-4461,
SQLSTATE=42815
at com.ibm.db2.jcc.am.gd.a(gd.java:676)
at com.ibm.db2.jcc.am.gd.a(gd.java:60)
at com.ibm.db2.jcc.am.gd.a(gd.java:85)
at com.ibm.db2.jcc.DB2Driver.tokenizeURLProperties(DB2Driver.java:911)
at com.ibm.db2.jcc.DB2Driver.connect(DB2Driver.java:408)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at
org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:885)
at
org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.listTables(SqlManager.java:520)
at
org.apache.sqoop.tool.ImportAllTablesTool.run(ImportAllTablesTool.java:95)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at java.util.StringTokenizer.nextToken(StringTokenizer.java:377)
at com.ibm.db2.jcc.DB2Driver.tokenizeURLProperties(DB2Driver.java:899)
... 13 more
Could not retrieve tables list from server
2017-05-02 09:21:18,696 ERROR - [main:] ~ manager.listTables() returned null
(ImportAllTablesTool:98)
[
Command:
sqoop import-all-tables \
--driver com.ibm.db2.jcc.DB2Driver \
--connect jdbc:db2://myip:50000/databs \
--username username --password password \
--hive-database default --hive-import --m 1 \
--create-hive-table --hive-overwrite
The import-all-tables tool imports a set of tables from an RDBMS to HDFS. Data from each table is stored in a separate directory in HDFS.
For the import-all-tables tool to be useful, the following conditions must be met:
Each table must have a single-column primary key.
You must intend to import all columns of each table.
You must not intend to use non-default splitting column, nor impose any conditions via a WHERE clause.

Apache Scoop import qualified table from SQL Server

When I try to import a table from SQL Server using
sqoop import \
-m 1 \
--connect jdbc:sqlserver://Arwen:1433 \
--username=bods \
--password=***\
--table datamart.dbo.fct_txn
--compression-codec=snappy \
--as-avrodatafile \
--warehouse-dir=/user/tkidb
sqoop seems to create a wrong query syntax. Apparently it expects an unqualified table name. Then the brackets would work. How to tackle this?
16/06/25 07:44:55 INFO tool.CodeGenTool: Beginning code generation
16/06/25 07:44:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [datamart.dbo.fct_txn] AS t WHERE 1=0
16/06/25 07:44:57 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'datamart.dbo.fct_txn'.
Based on a query from error log:
SELECT t.* FROM [datamart.dbo.fct_txn] AS t WHERE 1=0
The problem is in quotes around [datamart.dbo.fct_txn] the correct syntax must be [datamart].[dbo].[fct_txn] or datamart.dbo.fct_txn. Try to change two strings:
--connect 'jdbc:sqlserver://Arwen:1433;database=datamart' \
--table fct_txn
If datamart is default DB for the user you are trying to login, then change only table part.

Issue using Sqoop to import data from Sybase

I am trying to import data from Sybase using Sqoop. From logs i can say that i have mangeged to do a coonection successfully.
But my job fails giving me some Sql exception from Sybase. I don't primarily work on Sybase so
could not dig out much with this error. Only one of my sources resides at Sybase.
I used following command:
sqoop import --verbose \
--driver com.sybase.jdbc3.jdbc.SybDriver \
--connect jdbc:sybase:Tds:nyhostx123.sm.com:13290/DATABASE=tempdb \
--table tempdb..mit \
--split-by sipid \
--fields-terminated-by ',' \
--target-dir /home/DEVTEST/sqoop_mit \
--username user01 \
-m 1 \
-P
Error Snippet:
13/03/14 07:36:19 INFO mapred.JobClient: Running job: job_201301151126_25936
13/03/14 07:36:20 INFO mapred.JobClient: map 0% reduce 0%
13/03/14 07:36:27 INFO mapred.JobClient: Task Id : attempt_201301151126_25936_m_000000_0, Status : FAILED
java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: com.sybase.jdbc3.jdbc.SybSQLException: Incorrect syntax near '.'.
at com.sybase.jdbc3.tds.Tds.a(Unknown Source)
at
attempt_201301151126_25936_m_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201301151126_25936_m_000000_0: log4j:WARN Please initialize the log4j system properly.
13/03/14 07:36:33 INFO mapred.JobClient: Task Id : attempt_201301151126_25936_m_000000_1, Status : FAILED
java.io.IOException: SQLException in nextKeyValue
I believe that the problem is in --table parameter. Sqoop is expecting pure table name, but you seem to be passing extra value "tempdb.." (I guess it's a db name?). Would you mind trying it out with "--table mit" only?

Resources