When i run following sqoop command from $SQOOP_HOME/bin it works fine
sqoop import --connect "jdbc:sqlserver://ip_address:port_number;database=database_name;username=sa;password=sa#Admin" --table $SQL_TABLE_NAME --hive-import --hive-home $HIVE_HOME --hive-table $HIVE_TABLE_NAME -m 1
But when i run the same command in a loop for different databases from bash script as follows
while IFS='' read -r line || [[ -n $line ]]; do
$DATABASE_NAME=$line
sqoop import --connect "jdbc:sqlserver://ip_address:port_number;database=$DATABASE_NAME;username=sa;password=sa#Admin" --table $SQL_TABLE_NAME --hive-import --hive-home $HIVE_HOME --hive-table $HIVE_TABLE_NAME -m 1
done < "$1"
I am passing database names to my bash script in text file as parameter. My hive table is same as i want to append data from all databases in one hive table only.
For first two-three databases it works fine after that it starts giving following errors
15/06/25 11:41:06 INFO mapreduce.Job: Job job_1435124207953_0033 failed with state FAILED due to:
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: The MapReduce job has already been retired. Performance
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: counters are unavailable. To get this information,
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: you will need to enable the completed job store on
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: the jobtracker with:
11:41:06 INFO mapreduce.ImportJobBase:mapreduce.jobtracker.persist.jobstatus.active = true
11:41:06 INFO mapreduce.ImportJobBase: mapreduce.jobtracker.persist.jobstatus.hours = 1
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: A jobtracker restart is required for these settings
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: to take effect.
15/06/25 11:41:06 ERROR tool.ImportTool: Error during import: Import job failed!
I have already restarted my multi-node hadoop cluster by changing mapred-site.xml with above two said parametrs i.e
mapreduce.ImportJobBase:mapreduce.jobtracker.persist.jobstatus.active = true
mapreduce.jobtracker.persist.jobstatus.hours = 1
Still i am facing same problem. As i have just started learning sqoop any help will be appreciated.
Related
I am using Sqoop to import data from SQL server to local HDFS. I am using a simple free form query to pull some 10 rows from the table. Below is the sqoop command that I execute from the terminal:
sqoop import --connect 'jdbc:sqlserver://xx.xx.xx.xx;username=xx;password=xxxxx;database=DBName' --query "SELECT top 10 OrderID from DJShopcart_OrderItems where \$CONDITIONS" --split-by "OrderID" --target-dir /work/gearpurchase
When I execute this from my local machine, I get the following exception:
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The index
2 is out of range. at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:191)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.verifyValidColumnIndex(SQLServerResultSet.java:543)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getterGetColumn(SQLServerResultSet.java:2066)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:2099)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getValue(SQLServerResultSet.java:2084)
at
com.microsoft.sqlserver.jdbc.SQLServerResultSet.getInt(SQLServerResultSet.java:2327)
at
org.apache.sqoop.lib.JdbcWritableBridge.readInteger(JdbcWritableBridge.java:52)
at
com.cloudera.sqoop.lib.JdbcWritableBridge.readInteger(JdbcWritableBridge.java:53)
at QueryResult.readFields(QueryResult.java:105) at
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:244)
If I import 2 columns, the exception says index 3 is out of range.
I checked the SQLServerResultSet class documentation also, to understand what could be the cause of exception but to no use. Only got more confused with concepts like Client-Side Cursor vs Server-Side Cursor
No matter what I try I can't get this simple free form query to import data from SQL server.
Sqoop version : 1.4.6
Hadoop : 2.7.3
Machine : Ubuntu 16.04
Please help me out. Thanks in advance.
If I import 2 columns, the exception says index 3 is out of range.
Then the fault lies with SQOOP.
at org.apache.sqoop.lib.JdbcWritableBridge.readInteger(JdbcWritableBridge.java:52)
As it's passing 3 as an argument to
at com.microsoft.sqlserver.jdbc.SQLServerResultSet.getInt(SQLServerResultSet.java:2327)
When the result only contains 2 columns, and the valid column indexes are 1 and 2.
I am trying to export HDFS file from a HDFS directory to sybase IQ table.
I have placed the sybase driver in sqoop lib path correctly .
sqoop Command :
sqoop export \
--connect jdbc:sybase:Tds:sybasehost:port/DATABASE=OMEGA \
--username dummy \
--password dummy \
--driver com.sybase.jdbc4.jdbc.SybDriver \
--table omega_sybase_table \
--export-dir /user/cloudera/omega/output_files/ \
--input-fields-terminated-by ','
I am getting the below error and this export failed.
17/04/25 16:17:07 INFO mapreduce.Job: Task Id : attempt_1489579695153_4935_m_000002_1, Status : FAILED
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: java.sql.SQLException: SQL Anywhere Error -210: User 'another user' has the row in 'omega_sybase_table' locked
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:233)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.sql.SQLException: SQL Anywhere Error -210: User 'another user' has the row in 'omega_sybase_table' locked
at com.sybase.jdbc4.jdbc.SybConnection.getAllExceptions(Unknown Source)
Could someone help me fixing this issue?
its getting because of multiple mappers tasks being used in sqoop export command.
Sybase IQ only allows one connection at a time, multiple mappers tasks try to insert records in sybase iq table in parallel.
Solution is to use -m 1 in sqoop export command.
I wrote a simple script to create a user (TestV100), create a table (Xy100) in that schema and export a tabl delimited flat file from hadoop to this oracle table.
This is the shell script: - ExportOracleTestV100.sh
#!/bin/bash
# Testing connectivity to Oracle DB
#sqoop eval --connect jdbc:oracle:thin:#hostname:1521:orcl -username test -password password --query "SELECT count(*) as bob FROM \"TestV1\".\"Test\"" --verbose
HOST=$1
USER=$2
PASS=$3
SCHEMA=$4
PORT=$5
SID=$6
SQOOP=/usr/bin/sqoop
JDBC="jdbc:oracle:thin:#$1:$5:$6"
SQOOP_EVAL="$SQOOP eval --connect $JDBC --username $USER --password $PASS --query"
#Create Schema and Tables;
${SQOOP_EVAL} "CREATE USER \"TestV100\" identified by \"password\""
${SQOOP_EVAL} "GRANT CONNECT TO \"TestV100\""
${SQOOP_EVAL} "ALTER USER \"TestV100\" QUOTA UNLIMITED ON USERS"
${SQOOP_EVAL} "DROP TABLE \"TestV100\".\"Xy100\""
${SQOOP_EVAL} "CREATE TABLE \"TestV100\".\"Xy100\"( \"a\" NVARCHAR2(255) DEFAULT NULL, \"x\" NUMBER(10,0) DEFAULT NULL, \"y\" NUMBER(10,0) DEFAULT NULL )"
############################
## Load Data into tables; ##
############################
SQOOP_EXPORT="/usr/bin/sudo -u hdfs $SQOOP export --connect ${JDBC} --username $USER --password $PASS --export-dir"
${SQOOP_EXPORT} "/tmp/rv/TestV100/xy100.txt" --table "\"\"$SCHEMA\".\"Xy100\"\"" --fields-terminated-by "\t" --input-null-string null -m 1
And this is the input file: - cat /tmp/rv/TestV100/Xy100.txt
c 8 3
a 1 4
c 6 1
c 2 0
a 7 7
c 4 2
c 7 5
a 0 0
c 5 6
a 2 2
a 5 5
a 3 6
c 9 7
a 4 1
c 3 4
a 6 3
b 6 5
b 8 7
b 5 1
b 7 3
b 2 4
b 1 0
b 4 6
b 3 2
This is how the shell script is called:
sh ./ExportOracleTestV100.sh oracle11 test password TestV100 1521 orcl --verbose
Note: 'test' user has full access on TestV100 schema.
Output:
[root#abc-repo-app1 rv]# sh ./ExportOracleTestV100.sh oracle11 test password TestV100 1521 orcl --verbose
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/11/02 12:40:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.1
15/11/02 12:40:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/11/02 12:40:07 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
15/11/02 12:40:07 INFO manager.SqlManager: Using default fetchSize of 1000
15/11/02 12:40:07 INFO tool.CodeGenTool: Beginning code generation
15/11/02 12:40:08 INFO manager.OracleManager: Time zone has been set to GMT
15/11/02 12:40:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "TestV100"."xy100" t WHERE 1=0
15/11/02 12:40:08 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: There is no column found in the target table "TestV100"."xy100". Please ensure that your table name is correct.
java.lang.IllegalArgumentException: There is no column found in the target table "TestV100"."xy100". Please ensure that your table name is correct.
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1658)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
As you can see above, Sqoop version is 1.4.5-cdh5.4.1
I somehow managed to get it this far. I see a lot of posting online for sqoop import and this error, and solution is to change the table name to UPPERCASE in the command. But, I am running export. Also, the oracle table HAS to be created with mixed case.
I hope I gave all required information here. Can someone please help or point to some place which can help me get past this error?
table name in uppercase worked.
sqoop export --connect jdbc:oracle:thin:#xyzx:1569:xyz --username xyz --password xyz --table CIPHADOOPCUSTOMERREPORT --export-dir /apps/hive/warehouse/ciprpt.db/dtd_customer_report --input-fields-terminated-by "\t" --input-lines-terminated-by "\n" --verbose -m 8 --input-null-string '\N' --input-null-non-string '\N'
Supply the table name in upper case in the --table argument.
Sounds silly but yeah - works with table name in upper caps.
I had this problem because target table for sqoop export was one column short.
Solution:
specify list of columns with --columns parameter; or regenerate target table to match schema of the input table.
I am trying to import data from Sybase using Sqoop. From logs i can say that i have mangeged to do a coonection successfully.
But my job fails giving me some Sql exception from Sybase. I don't primarily work on Sybase so
could not dig out much with this error. Only one of my sources resides at Sybase.
I used following command:
sqoop import --verbose \
--driver com.sybase.jdbc3.jdbc.SybDriver \
--connect jdbc:sybase:Tds:nyhostx123.sm.com:13290/DATABASE=tempdb \
--table tempdb..mit \
--split-by sipid \
--fields-terminated-by ',' \
--target-dir /home/DEVTEST/sqoop_mit \
--username user01 \
-m 1 \
-P
Error Snippet:
13/03/14 07:36:19 INFO mapred.JobClient: Running job: job_201301151126_25936
13/03/14 07:36:20 INFO mapred.JobClient: map 0% reduce 0%
13/03/14 07:36:27 INFO mapred.JobClient: Task Id : attempt_201301151126_25936_m_000000_0, Status : FAILED
java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:265)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:182)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: com.sybase.jdbc3.jdbc.SybSQLException: Incorrect syntax near '.'.
at com.sybase.jdbc3.tds.Tds.a(Unknown Source)
at
attempt_201301151126_25936_m_000000_0: log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient).
attempt_201301151126_25936_m_000000_0: log4j:WARN Please initialize the log4j system properly.
13/03/14 07:36:33 INFO mapred.JobClient: Task Id : attempt_201301151126_25936_m_000000_1, Status : FAILED
java.io.IOException: SQLException in nextKeyValue
I believe that the problem is in --table parameter. Sqoop is expecting pure table name, but you seem to be passing extra value "tempdb.." (I guess it's a db name?). Would you mind trying it out with "--table mit" only?
In the command line, this will successfully update table1:
pt-table-sync --execute h=host1,D=db1,t=table1 h=host2,D=db2
However if I want to update more than one table, I'm not sure how to write it. This only updates table1 as well and ignores the other tables:
pt-table-sync --execute h=host1,D=db1,t=table1,table2,table3 h=host2,D=db2
And this gives me an error:
pt-table-sync --execute h=host1,D=db1 --tables table1,table2,table3 h=host2,D=db2
Anyone have an example of how to list the '-tables'... so that it successfully update all the tables in the list?
The --tables option seems to be incompatible with the DSN notation, you get this error:
You specified a database but not a table in h=localhost,D=test.
Are you trying to sync only tables in the 'test' database?
If so, use '--databases test' instead.
As suggested in that error message, you can use --databases and then you can use --tables successfully.
For example, I created tables test.foo and test.bar, filled each with three rows, then deleted the rows from test.bar on the second server dewey.
I ran this:
$ pt-table-sync h=huey h=dewey --databases test --tables foo,bar --execute --verbose
# Syncing h=dewey
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
# 0 0 3 0 Chunk 15:26:15 15:26:15 2 test.bar
# 0 0 0 0 Chunk 15:26:15 15:26:15 0 test.foo
It successfully re-inserted the 3 missing rows in test.bar.
Other tables in my test database were ignored.
This is an old question, but I searched everywhere for an answer. pt-table-sync only does one table. There is no tool that does the same thing to a list of tables or a full database schema. Specifically I want to run a Live server and be able to sync back to a Staging server, then edit code and files in the Staging server without fear of messing up Live or being overwritten by Live... and I want it to be free :)
I ended up writing a shell script called mysql_sync_live_to_stage.sh as follows:
#!/bin/bash
# sync db live to staging
error_log_file='./mysql_sync_errors.log'
echo $(date +"%Y %m %d %H:%M") > $error_log_file
function sync_table()
{
pt-table-sync --no-foreign-key-checks --execute
h=DB_1_HOST,u=DB_1_USER,p=DB_1_PASSWORD,D=$1,t=$3
h=DB_2_HOST,u=DB_2_USER,p=DB_2_PASSWORD,D=$2,t=$3 >> $error_log_file
}
# SYNC ALL TABLES IN name_of_live_database
mysql -h "DB_1_HOST" -u "DB_1_USER" -pDB_1_PASSWORD -D "DB_1_DBNAME" -e "SHOW TABLES" |
egrep -i '[0-9a-z\-\_]+' | egrep -i -v 'Tables_in' | while read -r table ; do
echo "Processing $table"
sync_table "name_of_live_database" "name_of_staging_database" $table
done
# FIX Config Settings For Staging
echo "Cleanup Queries..."
mysql -h "DB_2_HOST" -u "DB_2_USER" -pDB_2_PASSWORD -D "DB_2_DBNAME"
-e "UPDATE name_of_staging_database.nameofmyconfigtable SET value='bar'
WHERE config_id='foo'"
mysql -h "DB_2_HOST" -u "DB_2_USER" -pDB_2_PASSWORD -D "DB_2_DBNAME"
-e "UPDATE name_of_staging_database.nameofmyconfigtable SET value='bar2'
WHERE config_id='foo2'"
echo "Done"
This reads a list of table names from the live site then executes a sync on each one via the do loop. It goes through the list alphabetically, so I recommend keeping the --no-foreign-key-checks flag.
Its not perfect... It won't sync tables that don't exist in both databases, but when combined with a "git pull -f origin master" I get a complete sync in a couple minutes.