Sqoop Import HBase - Sql Database

Sqoop Import HBase - Sql Database - sql-server

I wanna to migrate my data from SQL database to HBase. One of my problem is my SQL tables don't have a Primary key so to overcome this. I am using Composite Key in Sqoop query. I have successfully imported data from SQL to HBase, but the main problem is that the imported data doesn't consists of columns which are used for Candidate Key which are required with data imported. Kindly give some resolution to this..!!
Sqoop query wich I am currently using is of below format :
sqoop import --connect "jdbc:sqlserver://Ip:1433;database=dbname;username=test;password=test" --table TableName --hbase-create-table --hbase-table TableName --column-family NameSpace --hbase-row-key Candidate1,Candidate2,Candidate3 -m 1
Also let me know if anyone knows a query to import the complete database for the same rather then single table.

After lots of research, I came across a correct syntax through which I was able to load all the data correctly without losing any of the single columns as below:
sqoop import -D sqoop.hbase.add.row.key=true –connect “jdbc:sqlserver://IP:1433;database=DBNAME;username=UNAME;password=PWD” --table SQLTABLENAME –hbase-create-table –hbase-table HBASETABLENAME –column-family COLUMNFAMILYNAME –hbase-row-key PRIMARYKEY -m 1
OR
sqoop import -D sqoop.hbase.add.row.key=true –connect “jdbc:sqlserver://IP:1433;database=DBNAME;username=UNAME;password=PWD” --table SQLTABLENAME –hbase-create-table –hbase-table HBASETABLENAME –column-family COLUMNFAMILYNAME –hbase-row-key CANDIDATEKEY1, CANDIDATEKEY2, CANDIDATEKEY3 -m 1

Related

PostgreSQL: How to copy data from one database table to another database

I need simple example how to copy data from database DB1 table T1 to database DB2 table T2.
T2 has identical structure like T1 (same column names, properties. Just different data)
DB2 running on same server like DB1, but on different port.

In the case the two databases are on two different server instances, you could export in CSV from db1 and then import the data in db2 :
COPY (SELECT * FROM t1) TO '/home/export.csv';
and then load back into db2 :
COPY t2 FROM '/home/export.csv';
Again, the two tables on the two different database instances must have the same structure.
Using the command line tools : pg_dump and psql , you could do even in this way :
pg_dump -U postgres -t t1 db1 | psql -U postgres -d db2
You can specify command line arguments to both pg_dump and psql to specify the address and/or port of the server .
Another option would be to use an external tool like : openDBcopy, to perform the migration/copy of the table.

You can try this one -
pg_dump -t table_name_to_copy source_db | psql target_db

Best practice to import data from SQL server to Hive through Sqoop

We are working on to import data from MS SQL Server to hive through Sqoop. If we use the incremental & append mode which is the requirement then we need to specify the --last-value of the row id which we inserted last time.
I have to update about 100 tables into Hive.
What is the practice to save the value of row id for all tables and specify in the sqoop --last-value command ?
Why does not Sqoop itself check the row id of the source & destination table, finally update the rows onwards the last row id value of the destination table?
If i save the last value of row id for all tables in a hive table and want to use those values in Sqoop job then how it's possible?
All and above, i want to automate the data importing job so that i do not have to provide the value manually for each table data import per day
Any pointers ?
Thanks

Sqoop invalid object name error caused by wrong FROM clause generated when table includes dots on SQL Server

I have to use Sqoop on Hadoop with an existing MSSQL Database structure.
All permissions seem to be okay. Using an authorized User johnwith SQL Studio the correct working query would look like this:
SELECT TOP 1000 [ksttyp_id]
,[orgunit_nr]
,[ksttyp_nr]
,[bezeichnung]
FROM [egec01_t].[integris].[kst_typ]
I run this import command:
sqoop import --connect "jdbc:sqlserver://example.com;username=john;password=1234;database=egec01_t" --table "integris.kst_typ" --target-dir /home/sqoop/ -as-textfile
The expected FROM clause is:
FROM [egec01_t].[integris].[kst_typ]
Instead I get:
Executing SQL statement: SELECT t.* FROM [integris.kst_typ] AS t WHERE 1=0
ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'integris.kst_typ'.
I am not sure if this is related to Sqoop or to the SqlManager or CodeGenTool?
I tried a general query to list all databases too:
sqoop list-databases...
Same issue:
ERROR manager.CatalogQueryManager: Failed to list databases
com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'SYS.DAT...
At the bottom line it seems to be a problem with dots used instead of e.g. underscores:
sqoop unable to import table with dot
But unfortunately I am forced to use the existing structure.
I posted an issue but got no response yet:
https://issues.apache.org/jira/browse/SQOOP-2706
Possibly related issue:
https://issues.apache.org/jira/browse/SQOOP-476
Is there a solution to maybe escape the table name?

Have you tried
--table "egec01_t.integris.kst_typ"

Sqoop Import from SQL Server, casting binary(10) to bigint

I have a below column in sql database, however, I need to sqoop import the table and import into hive and get the Max of that table. kindly help with the conversion:
column name __$seqval in the CDC table :
datatype in sql database showing as : __$seqval(binary(10),NOT NULL)
values of the columns are as below :
0x000001D1000003520003
0x000001D1000003520003
0x000001D10000035A0003
0x000001D1000003630003
0x000001D1000006FB0003
0x000001D1000007090003
0x000001D1000007100003
0x000001D1000007170003
0x000001D10000071E0003
0x000001D100000747002C
0x000001D100000747002C
0x000001D100000747002E
0x000001D100000747002E
0x000001D1000007470030
0x000001D1000007470030
0x000001D1000007850002
0x000001D1000007850002
0x000001D1000007AA002C
0x000001D1000007AA002C
How do I convert these and get the MAX of them.. in Hive

When you're pulling it out of SQL-Server, you can use:
select convert(bigint, binaryColumnName)
...which will pull out the binary values as BigInts. Then when presented to Hadoop it should treat it as BigInt. (I don't know Hadoop so can't tell you how to get a MAX out of it.)

"IDENTITY_INSERT is set to off" sqoop error while exporting table to Sql Server

I am exporting a simple hive table to Sql server. Both tables have the exact schema. There is an identity column in Sql Server and I have done a "set identity_insert table_name on" on it.
But when I export from sqoop to sql server, sqoop gives me an error saying that "IDENTITY_INSERT is set to off".
If I export to a Sql Server table having no identity column then all works fine.
Any idea about this? Anyone faced this issue while exporting from sqoop to sql server?
Thanks

In Short:
Postfix -- --identity-insert to your Sqoop export command
Detailed:
Here is an example for anyone searching (and possibly for my own later reference).
SQLSERVER_JDBC_URI="jdbc:sqlserver://<address>:<port>;username=<username>;password=<password>"
HIVE_PATH="/user/hive/warehouse/"
$TABLENAME=<tablename>
sqoop-export \
-D mapreduce.job.queuename=<queuename> \
--connect $SQLSERVER_JDBC_URI \
--export-dir "$HIVE_PATH""$TABLENAME" \
--input-fields-terminated-by , \
--table "$TABLENAME" \
-- --schema <schema> \
--identity-insert
Note the particular bits on the last line -- -- --schema <schema> --identity-insert . You can omit the schema part, but leave in the extra --.
That allows you to set the identity insert ability for that table within your sqoop session. (source)

Tell SQL Server to let you insert into the table with the IDENTITY column. That's an autoincrement column that you normally can't write to. But you can change that. See here or here. It'll still fail if one of your values conflicts with one that already exists in that column.

The SET IDENTITY_INSERT statement is session-specific. So if you set it by opening a query window, executing the statement, and then ran the export anywhere else, IDENTITY_INSERT was only set in that session, not in the export session. You need to modify the export itself if possible. If not, a direct export from sqoop to MSSQL will not be possible; instead you will need to dump the data from sqoop to a file that MSSQL can read (such as tab delimited) and then write a statement that first does SET IDENTITY_INSERT ON, then BULK INSERTs the file, then does SET IDENTITY_INSERT OFF.