Slow performance for package with XML destination column - sql-server

I have done several SSIS packages over the past few months to move data from a legacy database to a SQL Server database. It normally takes 10-20 minutes to process around 5 millions of records depending on the transformation.
The issue I am experiencing with one of my package is a very poor performance because one of the columns in my destination is of the SQL Server XML data type.
Data comes in like this: 5
A script creates a Unicode string like this: <XmlData><Value>5</Value></XmlData>
Destination is simply a column with XML data type
This is really slow. Any advice?
I did a SQL Trace and notice that in behind the scene SSIS is executing on each row a convert before the insert:
declare #p as xml
set #p=convert(xml,N'<XmlData><Value>5</Value></XmlData>')

Try using a temporary table to store the resulting 5 million records without the XML transformation and then use SQL Server itself to move them from tempDB to the final destination:
INSERT INTO final_destination (...)
SELECT cast(N'<XmlData><Value>5</Value></XmlData>' AS XML) AS batch_converted_xml, col1, col2, colX
FROM #tempTable
If 5.000.000 turns to be too much data for a single batch, you can do it in smaller batches (100k lines should work like a charm).
The record captured by the profiler looks like an OleDB transformation with one command per line.

Related

How to avoid re-inserting data (duplicates) into SQL Server table while re-running SSIS package that loads data?

I have created a package is SSIS. It's working fine for first time insertion. When I am running the package through SQL Server agent jobs, I am getting duplicates inserted when the scheduled job is inserting data.
I don't have any idea about how to stop inserting multiple duplicate records.
I am expecting to remove duplicates insertion while running deployed package through SQL Server Jobs
There are 2 approaches to do that:
(1) using SQL Command
This option can be used if source and destination are on the same server
Since you are using ADO.NET source you can change the Data Access mode to SQL Command and select only data that not exists in the destination:
SELECT *
FROM SourceTable
WHERE NOT EXISTS(
SELECT 1
FROM DestinationTable
WHERE SourceTable.ID = DestinationColumn.ID)
(2) using Lookup Transformation
You can use a Lookup transformation to get the non-matching rows between Source and destination and ignore duplicates:
UNDERSTAND SSIS LOOKUP TRANSFORMATION WITH AN EXAMPLE STEP BY STEP
SSIS - only insert rows that do not exists
SSIS import data or insert data if no match
Implementing Lookup Logic in SQL Server Integration Services
In order to remove duplicates use SQL Task with the following query (assuming that you are not extracting million of rows and you want to remove duplicates on the extracted data, not destination) :
with cte as (
select field1,field2, row_number() over(partition by allfieldsfromPK order by allfieldsfromPK) as rownum)
delete from cte where rownum > 1
Then use a Data Flow Task and insert clean data into destination table.
In case you just want to not insert duplicates , a very good option is to use MERGE statement, a more performant alternative.

Logic App Execute SQL TO JSON automatically chunks output

The logic app I am working with is intended to quickly update a json file, which is based on a SQL Server table (1000 rows, 6 columns).
The SQL statement resembles this:
SELECT ID, NAME, FIELD1, FIELD2, FIELD3, FIELD4 FROM TABLENAME FOR JSON PATH;
There are ~1000 rows in the table, with little variance or changes.
When I run this SQL in SSMS or locally, my output is a single row / consolidated json output; when I run the same SQL via the Logic App, it batches the output into groups of 10 json rows.
screenshot of output from stored proc / execute sql
If I use a stored procedure with NO COUNT ON, the same behavior results.
Does anyone know a method to force the Execute SQL task in logic apps NOT to chunk / batch the return into different resultsets?
I've since learned that the Execute SQL automatically casts its output to Json.
To fix this, I changed my SQL to remove the FOR JSON PATH, and used ResultSet.Table1 as the source for a Compose Task. This wraps the array with the Json-specific square brackets, and now output is as expected.

Executing query from SSDT

I'm using Visual Studio 2015, SSIS to run set of sql tasks in Execute Sql task and then do a data transfer between tables which are in SSMS by executing package in SSIS. When we run a series of sql statements on SSMS, we get results like rows effected for every sql successful activity. However, now I want to automate the process using SSIS to reduce the turn around time. I would like to get the rows effected for every sql query like select, insert, delete which are in execute sql task. How can it be done in SSIS? I don't have dbo_owner permission to stored procedures in SSMS. I'm thinking SSIS would be a quick way. But it is very important for me to make a log of rows effected to validate the data, as it is financial data. I have nearly 10 sql statements in each sql task like select and delete. But the output is only one table.
For example my sql task is like below
select * from dbo.table1;
select * from dbo.table2 where city = 'Chicago';
create dbo.table3(id int, name varchar(50);
insert into dbo.table3(1,'a');
select * from dbo.table3;
If I execute this in SSMS I get rows effected for each select statement and also table is created. If I execute the same through package in SSIS, how will get messages for each of them?
I assume your data lies on SQL Server. With selects, you could use data flow tasks and row counts instead of Excecute Sql's.
For inserts and updates there's a few ways to get affected rowcount, like this: https://stackoverflow.com/a/1834264/5605866
or like this: http://microsoft-ssis.blogspot.fi/2011/03/rowcount-for-execute-sql-statement.html
Basically the same thing but with a bit different syntax.
You can use the Row Count transaformation after the Data source and save it the variable. Can refer to this get the number of rows returned from the Source that SHOULD be processed.
Hope this help.

Quicker ways to migrate data from DB2 to SQL Server?

I'm in process of migrating data from DB2 to SQL Server using linked server and open query, like below:
--SET STATISTICS IO on
-- Number of records are: 18176484
select * INTO [DBName].[DBO].Table1
FROM OPENQUERY(DB2,
'Select * From OPERATIONS.Table1')
This query is taking 9 hrs and 17mins (number of record 18176484) to be inserted.
Is there any other way to insert records more quickly? Can I use "OpenRowSet" function to do the bulk insert? OR an SSIS package will increase the performance and will take less time? Please help
You probably want to export the data to a csv file such as this answer on StackOverflow:
EXPORT TO result.csv OF DEL MODIFIED BY NOCHARDEL SELECT col1, col2, coln FROM testtable;
(Exporting result of select statement to CSV format in DB2)
Once its a CSV file you can import it into SQL Server using either BCP or SSIS both of which are extremely fast especially if you use file lock on the target table.

SSIS Oracle table w/BLOBs XML to SQL Server table

We have an Oracle table that contains archived data in the table format:
BLOB | BLOBID
Each BLOB is an XML file that contains all the business objects we need.
Every BLOB needs to be read, the XML parsed and put into 1 SQL Server table that will hold all data.
The table has 5 columns R | S | T | D | BLOBID
Sample XML derived from BLOB:
<N>
<E>
<R>33</R>
<S>1</S>
<T>2012-01-25T09:48:43.213</T>
<D>6.9534619e+003</D>
</E>
<E>
<R>33</R>
<S>1</S>
<T>2012-01-25T09:48:45.227</T>
<D>1.1085871e+004</D>
</E>
<E>
<R>33</R>
<S>1</S>
<T>2012-01-25T09:48:47.227</T>
<D>1.1561764e+004</D>
</E>
</N>
There are a few million BLOBs and we want to avoid copying all the data over as an XML column then to a table, instead we want to go BLOB to table in one step.
What is the best approach to doing this with SSIS/SQL Server?
The code below almost does what we are looking for but only in Oracle Developer and only for one BLOB:
ALTER SESSION SET NLS_TIMESTAMP_FORMAT='yyyy-mm-dd HH24:MI:SS.FF';
SELECT b.BLOBID, a.R as R, a.S as S, a.T as T, cast(a.D AS float(23)) as D
FROM XMLTABLE('/N/E' PASSING
(SELECT xmltype.createxml(BLOB, NLS_CHARSET_ID('UTF16'), null)
FROM CLOUD --Oracle BLOB Cloud
WHERE BLOBID = 23321835)
COLUMNS
R int PATH 'R',
S int PATH 'S',
T TIMESTAMP(3) PATH 'T',
D VARCHAR(23) PATH 'D'
) a
Removing WHERE BLOBID = 23321835 gives the following error ORA-01427: single-row subquery returns more than one row since there are millions of BLOBS. Even so is there a way to run this through SSIS? Adding the SQL to the OLE DB Source did not work for pulling the data from Oracle even for 1 BLOB and would result in errors.
Using SQL Server 2012 and Oracle 10g
To summarize, how would we go from a Oracle BLOB containing XML to SQL Server table with business objects derived from XML with SSIS?
I'm new to working with Oracle, any help would be greatly appreciated!
Update:
I was able to get some of my code to work in SSIS by modifying the Oracle Source in SSIS to use the SQL command code above minus the first line,
ALTER SESSION SET NLS_TIMESTAMP_FORMAT='yyyy-mm-dd HH24:MI:SS.FF';
SSIS doesn't like this line.
Error message with the ALTER SESSION line above included:
No column information was returned by the SQL Command
Would there be another way to format the date without losing data? I'll try experimenting more, possible using varchar(23) for the date instead of timestamp.

Resources