I wrote a simple script to create a user (TestV100), create a table (Xy100) in that schema and export a tabl delimited flat file from hadoop to this oracle table.
This is the shell script: - ExportOracleTestV100.sh
#!/bin/bash
# Testing connectivity to Oracle DB
#sqoop eval --connect jdbc:oracle:thin:#hostname:1521:orcl -username test -password password --query "SELECT count(*) as bob FROM \"TestV1\".\"Test\"" --verbose
HOST=$1
USER=$2
PASS=$3
SCHEMA=$4
PORT=$5
SID=$6
SQOOP=/usr/bin/sqoop
JDBC="jdbc:oracle:thin:#$1:$5:$6"
SQOOP_EVAL="$SQOOP eval --connect $JDBC --username $USER --password $PASS --query"
#Create Schema and Tables;
${SQOOP_EVAL} "CREATE USER \"TestV100\" identified by \"password\""
${SQOOP_EVAL} "GRANT CONNECT TO \"TestV100\""
${SQOOP_EVAL} "ALTER USER \"TestV100\" QUOTA UNLIMITED ON USERS"
${SQOOP_EVAL} "DROP TABLE \"TestV100\".\"Xy100\""
${SQOOP_EVAL} "CREATE TABLE \"TestV100\".\"Xy100\"( \"a\" NVARCHAR2(255) DEFAULT NULL, \"x\" NUMBER(10,0) DEFAULT NULL, \"y\" NUMBER(10,0) DEFAULT NULL )"
############################
## Load Data into tables; ##
############################
SQOOP_EXPORT="/usr/bin/sudo -u hdfs $SQOOP export --connect ${JDBC} --username $USER --password $PASS --export-dir"
${SQOOP_EXPORT} "/tmp/rv/TestV100/xy100.txt" --table "\"\"$SCHEMA\".\"Xy100\"\"" --fields-terminated-by "\t" --input-null-string null -m 1
And this is the input file: - cat /tmp/rv/TestV100/Xy100.txt
c 8 3
a 1 4
c 6 1
c 2 0
a 7 7
c 4 2
c 7 5
a 0 0
c 5 6
a 2 2
a 5 5
a 3 6
c 9 7
a 4 1
c 3 4
a 6 3
b 6 5
b 8 7
b 5 1
b 7 3
b 2 4
b 1 0
b 4 6
b 3 2
This is how the shell script is called:
sh ./ExportOracleTestV100.sh oracle11 test password TestV100 1521 orcl --verbose
Note: 'test' user has full access on TestV100 schema.
Output:
[root#abc-repo-app1 rv]# sh ./ExportOracleTestV100.sh oracle11 test password TestV100 1521 orcl --verbose
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/11/02 12:40:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.1
15/11/02 12:40:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/11/02 12:40:07 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
15/11/02 12:40:07 INFO manager.SqlManager: Using default fetchSize of 1000
15/11/02 12:40:07 INFO tool.CodeGenTool: Beginning code generation
15/11/02 12:40:08 INFO manager.OracleManager: Time zone has been set to GMT
15/11/02 12:40:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "TestV100"."xy100" t WHERE 1=0
15/11/02 12:40:08 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.IllegalArgumentException: There is no column found in the target table "TestV100"."xy100". Please ensure that your table name is correct.
java.lang.IllegalArgumentException: There is no column found in the target table "TestV100"."xy100". Please ensure that your table name is correct.
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1658)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:64)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
As you can see above, Sqoop version is 1.4.5-cdh5.4.1
I somehow managed to get it this far. I see a lot of posting online for sqoop import and this error, and solution is to change the table name to UPPERCASE in the command. But, I am running export. Also, the oracle table HAS to be created with mixed case.
I hope I gave all required information here. Can someone please help or point to some place which can help me get past this error?
table name in uppercase worked.
sqoop export --connect jdbc:oracle:thin:#xyzx:1569:xyz --username xyz --password xyz --table CIPHADOOPCUSTOMERREPORT --export-dir /apps/hive/warehouse/ciprpt.db/dtd_customer_report --input-fields-terminated-by "\t" --input-lines-terminated-by "\n" --verbose -m 8 --input-null-string '\N' --input-null-non-string '\N'
Supply the table name in upper case in the --table argument.
Sounds silly but yeah - works with table name in upper caps.
I had this problem because target table for sqoop export was one column short.
Solution:
specify list of columns with --columns parameter; or regenerate target table to match schema of the input table.
Related
I am able to successfully import data from SQL Server to HDFS using sqoop. However, when it tries to link to HIVE I get an error. I am not sure I understand the error correctly
sudo -u hdfs sqoop import \
-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect "jdbc:sqlserver://XX.XX.X.X:1433;instanceName=data-engr-sql-svr; databaseName=AdventureWorks2019" \
--username sa \
--password XXXXXXXX \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--warehouse-dir "/user/hive/warehouse/AdventureWorks2019.db" \
--hive-import \
--create-hive-table \
--fields-terminated-by ',' \
--hive-table AdventureWorks2019.Production.TransactionHistory \
--table Production.TransactionHistory \
--split-by TransactionID \
-- --schema Production
I don't know how to handle schemas, most of the tutorial uses a dummy database without proper schemas which are not helpful.
Error
21/03/31 08:52:47 INFO conf.HiveConf: Using the default value passed in for log id: 95e2b831-cfe5-4108-be0f-0df1d9a8797e
21/03/31 08:52:47 INFO session.SessionState: Updating thread name to 95e2b831-cfe5-4108-be0f-0df1d9a8797e main
21/03/31 08:52:47 INFO conf.HiveConf: Using the default value passed in for log id: 95e2b831-cfe5-4108-be0f-0df1d9a8797e
21/03/31 08:52:47 INFO ql.Driver: Compiling command(queryId=hdfs_20210331085247_050638e8-593a-4d01-8020-c40b7db8e66a): CREATE TABLE IF NOT EXISTS AdventureWorks2019.Production.TransactionHistory ( TransactionID INT, ProductID INT, ReferenceOrderID INT, ReferenceOrderLineID INT, TransactionDate STRING, TransactionType STRING, Quantity INT, ActualCost DOUBLE, ModifiedDate STRING) COMMENT 'Imported by sqoop on 2021/03/31 08:52:45' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' LINES TERMINATED BY '\012' STORED AS TEXTFILE
21/03/31 08:52:49 INFO hive.metastore: HMS client filtering is enabled.
21/03/31 08:52:49 INFO hive.metastore: Trying to connect to metastore with URI thrift://cnt7-naya-cdh63:9083
21/03/31 08:52:49 INFO hive.metastore: Opened a connection to metastore, current connections: 1
21/03/31 08:52:49 INFO hive.metastore: Connected to metastore.
21/03/31 08:52:49 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
FAILED: SemanticException [Error 10255]: Invalid table name AdventureWorks2019.Production.TransactionHistory
21/03/31 08:52:49 ERROR ql.Driver: FAILED: SemanticException [Error 10255]: Invalid table name AdventureWorks2019.Production.TransactionHistory
There is no such thing as schema inside the database in Hive. Database and schema mean the same thing and can be used interchangeably.
So, the bug is in using database.schema.table. Use database.table in Hive.
Read the documentation: Create/Drop/Alter/UseDatabase
I was trying to import all tables from a Specific schema in DB2 using below command line.
sqoop import-all-tables --username user --password pass \
--connect jdbc:db2://myip:50000/databs:CurrentSchema=testdb \
--driver com.ibm.db2.jcc.DB2Driver --fields-terminated-by ',' \
--lines-terminated-by '\n' --hive-database default --hive-import --hive-overwrite \
--create-hive-table -m 1;
Struck with following error
2017-05-02 09:21:18,474 ERROR - [main:] ~ Error reading database metadata:
com.ibm.db2.jcc.am.SqlSyntaxErrorException: [jcc][10165][10051][4.11.77]
Invalid database URL syntax:
jdbc:db2://myip:50000/msrc:CurrentSchema=testdb. ERRORCODE=-4461,
SQLSTATE=42815 (SqlManager:43)
com.ibm.db2.jcc.am.SqlSyntaxErrorException: [jcc][10165][10051][4.11.77]
Invalid database URL syntax:
jdbc:db2://myip:50000/msrc:CurrentSchema=testdb. ERRORCODE=-4461,
SQLSTATE=42815
at com.ibm.db2.jcc.am.gd.a(gd.java:676)
at com.ibm.db2.jcc.am.gd.a(gd.java:60)
at com.ibm.db2.jcc.am.gd.a(gd.java:85)
at com.ibm.db2.jcc.DB2Driver.tokenizeURLProperties(DB2Driver.java:911)
at com.ibm.db2.jcc.DB2Driver.connect(DB2Driver.java:408)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at
org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:885)
at
org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.listTables(SqlManager.java:520)
at
org.apache.sqoop.tool.ImportAllTablesTool.run(ImportAllTablesTool.java:95)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
at java.util.StringTokenizer.nextToken(StringTokenizer.java:377)
at com.ibm.db2.jcc.DB2Driver.tokenizeURLProperties(DB2Driver.java:899)
... 13 more
Could not retrieve tables list from server
2017-05-02 09:21:18,696 ERROR - [main:] ~ manager.listTables() returned null
(ImportAllTablesTool:98)
[
Command:
sqoop import-all-tables \
--driver com.ibm.db2.jcc.DB2Driver \
--connect jdbc:db2://myip:50000/databs \
--username username --password password \
--hive-database default --hive-import --m 1 \
--create-hive-table --hive-overwrite
The import-all-tables tool imports a set of tables from an RDBMS to HDFS. Data from each table is stored in a separate directory in HDFS.
For the import-all-tables tool to be useful, the following conditions must be met:
Each table must have a single-column primary key.
You must intend to import all columns of each table.
You must not intend to use non-default splitting column, nor impose any conditions via a WHERE clause.
When i run following sqoop command from $SQOOP_HOME/bin it works fine
sqoop import --connect "jdbc:sqlserver://ip_address:port_number;database=database_name;username=sa;password=sa#Admin" --table $SQL_TABLE_NAME --hive-import --hive-home $HIVE_HOME --hive-table $HIVE_TABLE_NAME -m 1
But when i run the same command in a loop for different databases from bash script as follows
while IFS='' read -r line || [[ -n $line ]]; do
$DATABASE_NAME=$line
sqoop import --connect "jdbc:sqlserver://ip_address:port_number;database=$DATABASE_NAME;username=sa;password=sa#Admin" --table $SQL_TABLE_NAME --hive-import --hive-home $HIVE_HOME --hive-table $HIVE_TABLE_NAME -m 1
done < "$1"
I am passing database names to my bash script in text file as parameter. My hive table is same as i want to append data from all databases in one hive table only.
For first two-three databases it works fine after that it starts giving following errors
15/06/25 11:41:06 INFO mapreduce.Job: Job job_1435124207953_0033 failed with state FAILED due to:
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: The MapReduce job has already been retired. Performance
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: counters are unavailable. To get this information,
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: you will need to enable the completed job store on
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: the jobtracker with:
11:41:06 INFO mapreduce.ImportJobBase:mapreduce.jobtracker.persist.jobstatus.active = true
11:41:06 INFO mapreduce.ImportJobBase: mapreduce.jobtracker.persist.jobstatus.hours = 1
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: A jobtracker restart is required for these settings
15/06/25 11:41:06 INFO mapreduce.ImportJobBase: to take effect.
15/06/25 11:41:06 ERROR tool.ImportTool: Error during import: Import job failed!
I have already restarted my multi-node hadoop cluster by changing mapred-site.xml with above two said parametrs i.e
mapreduce.ImportJobBase:mapreduce.jobtracker.persist.jobstatus.active = true
mapreduce.jobtracker.persist.jobstatus.hours = 1
Still i am facing same problem. As i have just started learning sqoop any help will be appreciated.
In the command line, this will successfully update table1:
pt-table-sync --execute h=host1,D=db1,t=table1 h=host2,D=db2
However if I want to update more than one table, I'm not sure how to write it. This only updates table1 as well and ignores the other tables:
pt-table-sync --execute h=host1,D=db1,t=table1,table2,table3 h=host2,D=db2
And this gives me an error:
pt-table-sync --execute h=host1,D=db1 --tables table1,table2,table3 h=host2,D=db2
Anyone have an example of how to list the '-tables'... so that it successfully update all the tables in the list?
The --tables option seems to be incompatible with the DSN notation, you get this error:
You specified a database but not a table in h=localhost,D=test.
Are you trying to sync only tables in the 'test' database?
If so, use '--databases test' instead.
As suggested in that error message, you can use --databases and then you can use --tables successfully.
For example, I created tables test.foo and test.bar, filled each with three rows, then deleted the rows from test.bar on the second server dewey.
I ran this:
$ pt-table-sync h=huey h=dewey --databases test --tables foo,bar --execute --verbose
# Syncing h=dewey
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
# 0 0 3 0 Chunk 15:26:15 15:26:15 2 test.bar
# 0 0 0 0 Chunk 15:26:15 15:26:15 0 test.foo
It successfully re-inserted the 3 missing rows in test.bar.
Other tables in my test database were ignored.
This is an old question, but I searched everywhere for an answer. pt-table-sync only does one table. There is no tool that does the same thing to a list of tables or a full database schema. Specifically I want to run a Live server and be able to sync back to a Staging server, then edit code and files in the Staging server without fear of messing up Live or being overwritten by Live... and I want it to be free :)
I ended up writing a shell script called mysql_sync_live_to_stage.sh as follows:
#!/bin/bash
# sync db live to staging
error_log_file='./mysql_sync_errors.log'
echo $(date +"%Y %m %d %H:%M") > $error_log_file
function sync_table()
{
pt-table-sync --no-foreign-key-checks --execute
h=DB_1_HOST,u=DB_1_USER,p=DB_1_PASSWORD,D=$1,t=$3
h=DB_2_HOST,u=DB_2_USER,p=DB_2_PASSWORD,D=$2,t=$3 >> $error_log_file
}
# SYNC ALL TABLES IN name_of_live_database
mysql -h "DB_1_HOST" -u "DB_1_USER" -pDB_1_PASSWORD -D "DB_1_DBNAME" -e "SHOW TABLES" |
egrep -i '[0-9a-z\-\_]+' | egrep -i -v 'Tables_in' | while read -r table ; do
echo "Processing $table"
sync_table "name_of_live_database" "name_of_staging_database" $table
done
# FIX Config Settings For Staging
echo "Cleanup Queries..."
mysql -h "DB_2_HOST" -u "DB_2_USER" -pDB_2_PASSWORD -D "DB_2_DBNAME"
-e "UPDATE name_of_staging_database.nameofmyconfigtable SET value='bar'
WHERE config_id='foo'"
mysql -h "DB_2_HOST" -u "DB_2_USER" -pDB_2_PASSWORD -D "DB_2_DBNAME"
-e "UPDATE name_of_staging_database.nameofmyconfigtable SET value='bar2'
WHERE config_id='foo2'"
echo "Done"
This reads a list of table names from the live site then executes a sync on each one via the do loop. It goes through the list alphabetically, so I recommend keeping the --no-foreign-key-checks flag.
Its not perfect... It won't sync tables that don't exist in both databases, but when combined with a "git pull -f origin master" I get a complete sync in a couple minutes.
I have database on a server with 120 tables.
I want to clone the whole database with a new db name and the copied data.
Is there an efficient way to do this?
$ mysqldump yourFirstDatabase -u user -ppassword > yourDatabase.sql
$ mysql yourSecondDatabase -u user -ppassword < yourDatabase.sql
mysqldump -u <user> --password=<password> <DATABASE_NAME> | mysql -u <user> --password=<password> -h <hostname> <DATABASE_NAME_NEW>
Like accepted answer but without .sql files:
mysqldump sourcedb -u <USERNAME> -p<PASS> | mysql destdb -u <USERNAME> -p<PASS>
In case you use phpMyAdmin
Select the database you wish to copy (by clicking on the database from the phpMyAdmin home screen).
Once inside the database, select the Operations tab.
Scroll down to the section where it says "Copy database to:"
Type in the name of the new database.
Select "structure and data" to copy everything. Alternately, you can select "Structure only" if you want the columns but not the data.
Check the box "CREATE DATABASE before copying" to create a new database.
Check the box "Add AUTO_INCREMENT value."
Click on the Go button to proceed.
There is mysqldbcopy tool from the MySQL Utilities package.
http://dev.mysql.com/doc/mysql-utilities/1.3/en/mysqldbcopy.html
If you want to make sure it is an exact clone, the receiving database needs to be entirely cleared / dropped. This way, the new db only has the tables in your import file and nothing else. Otherwise, your receiving database could retain tables that weren't specified in your import file.
ex from prior answers:
DB1 == tableA, tableB
DB2 == tableB, tableC
DB1 imported to -> DB2
DB2 == tableA, tableB, tableC //true clone should not contain tableC
the change is easy with --databases and --add-drop-database (see mysql docs). This adds the drop statement to the sqldump so your new database will be an exact replica:
$ mysqldump -h $ip -u $user -p$pass --databases $dbname --add-drop-database > $file.sql
$ mysql -h $ip $dbname -u $user -p$pass < $file.sql
of course replace the $ variables and as always, no space between password and -p. For extra security, strip the -p$pass from your command
$newdb = (date('Y')-1);
$mysqli->query("DROP DATABASE `".$newdb."`;");
$mysqli->query("CREATE DATABASE `".$newdb."`;");
$query = "
SELECT
TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA LIKE 'rds'
";
$result = $mysqli->query($query)->fetch_all(MYSQLI_ASSOC);
foreach($result as $val) {
echo $val['TABLE_NAME'].PHP_EOL;
$mysqli->query("CREATE TABLE `".$newdb."`.`".$val['TABLE_NAME']."` LIKE rds.`".$val['TABLE_NAME']."`");
$mysqli->query("INSERT `".$newdb."`.`".$val['TABLE_NAME']."` SELECT * FROM rds.`".$val['TABLE_NAME']."`");
}