SSIS Oracle table w/BLOBs XML to SQL Server table - sql-server

We have an Oracle table that contains archived data in the table format:
BLOB | BLOBID
Each BLOB is an XML file that contains all the business objects we need.
Every BLOB needs to be read, the XML parsed and put into 1 SQL Server table that will hold all data.
The table has 5 columns R | S | T | D | BLOBID
Sample XML derived from BLOB:
<N>
<E>
<R>33</R>
<S>1</S>
<T>2012-01-25T09:48:43.213</T>
<D>6.9534619e+003</D>
</E>
<E>
<R>33</R>
<S>1</S>
<T>2012-01-25T09:48:45.227</T>
<D>1.1085871e+004</D>
</E>
<E>
<R>33</R>
<S>1</S>
<T>2012-01-25T09:48:47.227</T>
<D>1.1561764e+004</D>
</E>
</N>
There are a few million BLOBs and we want to avoid copying all the data over as an XML column then to a table, instead we want to go BLOB to table in one step.
What is the best approach to doing this with SSIS/SQL Server?
The code below almost does what we are looking for but only in Oracle Developer and only for one BLOB:
ALTER SESSION SET NLS_TIMESTAMP_FORMAT='yyyy-mm-dd HH24:MI:SS.FF';
SELECT b.BLOBID, a.R as R, a.S as S, a.T as T, cast(a.D AS float(23)) as D
FROM XMLTABLE('/N/E' PASSING
(SELECT xmltype.createxml(BLOB, NLS_CHARSET_ID('UTF16'), null)
FROM CLOUD --Oracle BLOB Cloud
WHERE BLOBID = 23321835)
COLUMNS
R int PATH 'R',
S int PATH 'S',
T TIMESTAMP(3) PATH 'T',
D VARCHAR(23) PATH 'D'
) a
Removing WHERE BLOBID = 23321835 gives the following error ORA-01427: single-row subquery returns more than one row since there are millions of BLOBS. Even so is there a way to run this through SSIS? Adding the SQL to the OLE DB Source did not work for pulling the data from Oracle even for 1 BLOB and would result in errors.
Using SQL Server 2012 and Oracle 10g
To summarize, how would we go from a Oracle BLOB containing XML to SQL Server table with business objects derived from XML with SSIS?
I'm new to working with Oracle, any help would be greatly appreciated!
Update:
I was able to get some of my code to work in SSIS by modifying the Oracle Source in SSIS to use the SQL command code above minus the first line,
ALTER SESSION SET NLS_TIMESTAMP_FORMAT='yyyy-mm-dd HH24:MI:SS.FF';
SSIS doesn't like this line.
Error message with the ALTER SESSION line above included:
No column information was returned by the SQL Command
Would there be another way to format the date without losing data? I'll try experimenting more, possible using varchar(23) for the date instead of timestamp.

Related

Azure SQL to DB2 data migration via SSIS

I want to create an SSIS package with source as azure sql and destination as DB2 table.
I have to check if Id (int) in sql matches with Id (X12 basically varchar) in DB2, if it matches, I have to update name in db2 with the name in SQL
else insert new record in DB2.
Since, datatype of id does not match, I have to do a data type conversion- that is if id in sql is 1 .. I have to convert it to varchar with leading zeros - 00000000001
Can some one please provide a step by step approach?
and where exactly can I make this conversion.. in which task?

Migrating from SQL Server to Hive Table using flat file

I am migrating my data from SQL Server to Hive using following steps but there is data issue with the resulting table. I tried various options including checking datatype, Using csvSerde but not able to get data aligned properly in respective columns. I followed following steps:
Export SQL Server data to flat file with fields separated by comma.
Create external table in Hive as given below and load data.
CREATE EXTERNAL TABLE IF NOT EXISTS myschema.mytable (
r_date timestamp
, v_nbr varchar(12)
, d_account int
, d_amount decimal(19,4)
, a_account varchar(14)
)
row format delimited
fields terminated by ','
stored as textfile;
LOAD DATA INPATH 'gs://mybucket/myschema.db/mytable/mytable.txt' OVERWRITE INTO TABLE myschema.mytable;
There is issue with data with all combination I could try.
I also tried OpenCSVSerde but the result was worse than simple text file. I also tried by changing delimiter to semicolon but no luck.
row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties ( "separatorChar" = ",") stored as textfile
location 'gs://mybucket/myschema.db/mytable/';
Can you please suggest some robust approach so that I don't have to deal with data issue.
Note: Currently I don't have option of connecting my SQL Server table with Sqoop.

SSIS package breaks accented characters on SELECT

I have a SQL Server SSIS package that inserts data in a table according to data in another table. The thing is this job breaks the accents of the varchar data in the process. I suppose it has something to do with encoding.
Simplified, my package does the following :
Select data with an OLE DB Source through SQL command data access mode
SELECT id, name, lastname
FROM Client
<WHERE id = 1>
Insert the selected data with an OLE DB Destination through mapping to ClientCopy table.
I noticed that previewing the data in the first step was returning me the broken accents already. Because of this, the data inserted in ClientCopy obviously has those broken accents.
id name lastname
1 Andr‚ BriŠre
This same query when executed in SQL Server returns the data correctly so I'm a bit lost right now.
id name lastname
1 André Brière
Thanks for helping me out!

How do I avoid date type column of MSSQL INTO PIVOTAL HAWQ null at DBMS migration

We are trying to pull data from external source (mssql) to postgres. But when i checked for invoicedate column entries are getting blank at the same time mssql is showing invoicedate values for those entries.
ie
We tried following query on both the DBMS:
When query executed in SQL Server:
select * from tablename where salesorder='168490'
getting 12 rows where invoicedate column is '2015-10-26 00:00:00.000'
But same query is executed on Postgres
select "InvoceDt" from tablename where salesorder='168490'
Getting 12 rows where the column invoicedate is null.
Question is why?
Postgres InvoiceDt column is coming null rather than we can see that SQL Server is showing appropriate data values.
Why is the data different between SQL Server and Postgres for this particular column?
Vicps, you aren't using Postgres and that is why a_horse_with_no_name is having such a hard time trying to understand your question. You are using Pivotal HDB (formally called HAWQ). HAWQ is now associated with the incubator project, "Apache HAWQ" and the commercial version is "Pivotal HDB".
Pivotal HDB is a fork of Pivotal Greenplum database which is a fork of PostgreSQL 8.2. It has many similarities to Postgres but it is most definitely not Postgres.
You are also using Spring-XD to move the data from SQL Server to HDFS which is critical in understanding what the true problem is.
You provided this example:
CREATE TABLE tablename ( "InvoiceDt" timestamp )
LOCATION ('pxf://hostname/path/to/hdfs/?profile=HdfsTextSimple')
FORMAT 'csv' ( delimiter '^' null 'null' quote '~');
Your file only has one column in it? How is this possible? Above, you mention the salesorder column. Secondly, have you tried looking at the file written by Spring-XD?
hdfs dfs -cat hdfs://hostname:8020/path/to/hdfs | grep 168490
I bet you have an extra delimiter, null character, or an escape character in the data which is causing the problem. You also may want to tag your question with spring-xd too.

Slow performance for package with XML destination column

I have done several SSIS packages over the past few months to move data from a legacy database to a SQL Server database. It normally takes 10-20 minutes to process around 5 millions of records depending on the transformation.
The issue I am experiencing with one of my package is a very poor performance because one of the columns in my destination is of the SQL Server XML data type.
Data comes in like this: 5
A script creates a Unicode string like this: <XmlData><Value>5</Value></XmlData>
Destination is simply a column with XML data type
This is really slow. Any advice?
I did a SQL Trace and notice that in behind the scene SSIS is executing on each row a convert before the insert:
declare #p as xml
set #p=convert(xml,N'<XmlData><Value>5</Value></XmlData>')
Try using a temporary table to store the resulting 5 million records without the XML transformation and then use SQL Server itself to move them from tempDB to the final destination:
INSERT INTO final_destination (...)
SELECT cast(N'<XmlData><Value>5</Value></XmlData>' AS XML) AS batch_converted_xml, col1, col2, colX
FROM #tempTable
If 5.000.000 turns to be too much data for a single batch, you can do it in smaller batches (100k lines should work like a charm).
The record captured by the profiler looks like an OleDB transformation with one command per line.

Resources