I have a below column in sql database, however, I need to sqoop import the table and import into hive and get the Max of that table. kindly help with the conversion:
column name __$seqval in the CDC table :
datatype in sql database showing as : __$seqval(binary(10),NOT NULL)
values of the columns are as below :
0x000001D1000003520003
0x000001D1000003520003
0x000001D10000035A0003
0x000001D1000003630003
0x000001D1000006FB0003
0x000001D1000007090003
0x000001D1000007100003
0x000001D1000007170003
0x000001D10000071E0003
0x000001D100000747002C
0x000001D100000747002C
0x000001D100000747002E
0x000001D100000747002E
0x000001D1000007470030
0x000001D1000007470030
0x000001D1000007850002
0x000001D1000007850002
0x000001D1000007AA002C
0x000001D1000007AA002C
How do I convert these and get the MAX of them.. in Hive
When you're pulling it out of SQL-Server, you can use:
select convert(bigint, binaryColumnName)
...which will pull out the binary values as BigInts. Then when presented to Hadoop it should treat it as BigInt. (I don't know Hadoop so can't tell you how to get a MAX out of it.)
Related
I'm in process of migrating data from DB2 to SQL Server using linked server and open query, like below:
--SET STATISTICS IO on
-- Number of records are: 18176484
select * INTO [DBName].[DBO].Table1
FROM OPENQUERY(DB2,
'Select * From OPERATIONS.Table1')
This query is taking 9 hrs and 17mins (number of record 18176484) to be inserted.
Is there any other way to insert records more quickly? Can I use "OpenRowSet" function to do the bulk insert? OR an SSIS package will increase the performance and will take less time? Please help
You probably want to export the data to a csv file such as this answer on StackOverflow:
EXPORT TO result.csv OF DEL MODIFIED BY NOCHARDEL SELECT col1, col2, coln FROM testtable;
(Exporting result of select statement to CSV format in DB2)
Once its a CSV file you can import it into SQL Server using either BCP or SSIS both of which are extremely fast especially if you use file lock on the target table.
I'm using Sqoop to Import data from SQL Server into Hive and then later Export that data out of Hive into another SQL Server. The Sqoop Import works fine and converts VCHAR/NVARCHAR data type into String.
My question is what is the best column type to define on the Target table, since Hive now currently holds the data type as String? I originally defined most my columns on the Target table as VARCHAR(100) and it has been working, but now some String it failed during the Export and I get:
SQL State: 22001, error code: 8152
"java.sql.BatchUpdateException: String or binary data would be
truncated."
Sample string failed:
"HEALTH SITE PROVIDERS LLC"|" "|"3435673"|"UHGID0000547777"|"906225"|"\\N"|"\\N"|"\\N"
Clearly this data has far less characters than 100 for each column (column delimited by |), So I'm confused as to how Hive/Sqoop is converting this String or does it do any conversion at all during the Export?
I was thinking of defining my columns in the Target table as NVARCHAR(max) but is this a bit extreme? Also I need to have some columns Index as well and NVARCHAR(max) isn't allowed in SQL Server.
Regards,
Since you mostly data is of type VARCHAR(100). There is no need to store it is Hive's STRING. You can save VARCHAR and NVARCHAR in Hive's VARCHAR.
Use --map-column-hive <column-name,hive-type....> in your sqoop import command.
Example:
Say col1 is VARCHAR(100) and col2 is NVARCHAR(100)
--map-column-hive col1='varchar(100)',col2='varchar(100)',....
Now you can export it back to SQL Server table having columns VARCHAR/NVARCHAR.
We are working on to import data from MS SQL Server to hive through Sqoop. If we use the incremental & append mode which is the requirement then we need to specify the --last-value of the row id which we inserted last time.
I have to update about 100 tables into Hive.
What is the practice to save the value of row id for all tables and specify in the sqoop --last-value command ?
Why does not Sqoop itself check the row id of the source & destination table, finally update the rows onwards the last row id value of the destination table?
If i save the last value of row id for all tables in a hive table and want to use those values in Sqoop job then how it's possible?
All and above, i want to automate the data importing job so that i do not have to provide the value manually for each table data import per day
Any pointers ?
Thanks
I have a question about the SQL Server 8k limitation. I have a destination table with 453 columns, all are of type varchar(max). My import table also has same number of columns and of same datatype. Basically Import table -> destination table. I am sure ppl are going to suggest normalizing and redesign but I am more interested in why sql server is behaving this way.
The character count of the rows from import table are around 4000 and 4500. Below are the senarios I need help with
If I do a select * into sometable from "import table", I get a successful run.
If I do a insert into "destination table" select * from "import table" I get an error saying "Cannot create a row of size 8239 which is greater than the allowable maximum row size of 8060."
I am totally losing my mind. I thought using varchar(max) allows me to stretch the 8060 limitation upto 2 gig. All my destination table columns and import table columns are of type varchar(max) which allows for LOB and/or out-of-row storage.
Need help please.
When importing a file.csv to sql server 2008 I am getting a problem.
In my file.csv the decimal is written in this way: (ex. 1234,34112) and It seems that SQL server does not understand the ',' as decimal.
My solution has been to import it using BULK INSERT as VARCHAR and after that convert it to decimal. It works but I guess it may be a better solution which I am not able to get.
Could you help me with that?
Thanks in advance
There are only two ways of doing it one you have already mentioned importing in Sql server and then doing something like this..
CREATE Table ImportTable (Value NVARCHAR(1000)) --<-- Import the data as it is
INSERT INTO #TABLE VALUES
('1234,34112'),('12651635,68466'),('1234574,5874')
/* Alter table add column of NUMERIC type.
And update that column something like this...
*/
UPDATE ImportTable
SET NewColumn = CAST(REPLACE(Value , ',', '.') AS NUMERIC(28,8))
Or you can change it your excel sheet before you import it.
Unless you are using SSIS to import data it is always best to get your data in Sql Server 1st using lose datatypes and then do any data manipulation if needed.
SQL Server Management Studio 17 provides a new direct option to import flat files that handles import of decimal csv columns for you. Right your database, then click Tasks > Import Flat File...