Data Flow Task in SSIS package stuck for an indefinite period - sql-server

We are using SQL Server 2012 and SSDT 2010 for the development and debug purpose for SSIS packages.
I have a simple data flow task where it has two components only- OLE DB source and OLE DB target. The target table stores data for multiple dates and is loaded incrementally from the source table i.e. whenever the source table receives data with a new date, it is loaded into the target table. There is no transformation, calculation or logic applied on the data flow.
The source OLE DB uses the following query to select data from the source-
SELECT * FROM source_table WHERE date_col NOT IN
(SELECT DISTINCT date_col FROM target_table);
In the OLE DB target page, it is set as Fast Load and Table Lock option is also unchecked.
Whenever we are executing the package from SSDT, it only shows some rows in figures in the data flow path as processing and goes into the never ending stage. The figure doesn't grow and the package never ends until stopped forcefully.
Thanks in advance.

Try substituting sub query with a left join as below:
SELECT a.* FROM source_table a
left join target_table b
on a. date_col=b.date_col where b.date_col is null

Related

How to avoid re-inserting data (duplicates) into SQL Server table while re-running SSIS package that loads data?

I have created a package is SSIS. It's working fine for first time insertion. When I am running the package through SQL Server agent jobs, I am getting duplicates inserted when the scheduled job is inserting data.
I don't have any idea about how to stop inserting multiple duplicate records.
I am expecting to remove duplicates insertion while running deployed package through SQL Server Jobs
There are 2 approaches to do that:
(1) using SQL Command
This option can be used if source and destination are on the same server
Since you are using ADO.NET source you can change the Data Access mode to SQL Command and select only data that not exists in the destination:
SELECT *
FROM SourceTable
WHERE NOT EXISTS(
SELECT 1
FROM DestinationTable
WHERE SourceTable.ID = DestinationColumn.ID)
(2) using Lookup Transformation
You can use a Lookup transformation to get the non-matching rows between Source and destination and ignore duplicates:
UNDERSTAND SSIS LOOKUP TRANSFORMATION WITH AN EXAMPLE STEP BY STEP
SSIS - only insert rows that do not exists
SSIS import data or insert data if no match
Implementing Lookup Logic in SQL Server Integration Services
In order to remove duplicates use SQL Task with the following query (assuming that you are not extracting million of rows and you want to remove duplicates on the extracted data, not destination) :
with cte as (
select field1,field2, row_number() over(partition by allfieldsfromPK order by allfieldsfromPK) as rownum)
delete from cte where rownum > 1
Then use a Data Flow Task and insert clean data into destination table.
In case you just want to not insert duplicates , a very good option is to use MERGE statement, a more performant alternative.

SSIS OLE DB Destination Table View Fastload or normal does not give error on duplicate key

I am filling a table using a package and after preparing the data to be saved I present it to an OLEDB destination for SQL server and setting the data acces mode to Table or View - Fastload or just Table or view (no fast load).
There is no error message but it does not write to the table.
I switch over to the normal Table or View so that each record is inserted with a separate INSERT command instead of a BULK insert.
When nothing happens I stop the execution of the package and do a select * from the destination table. I saw that he inserted 20 records. After investigation the data which is send to the OLE DB destination, I saw that record 21 results in a duplicate key.
Instead of getting an error message the package does not continue its execution flow.
What am I doing wrong.
Go through all the elements of the package, including the package itself, and set:
FailPackageOnFailure = True

Handling duplicate in SSIS when extracting data from MS Access and loading them to SQL db

I have created an SSIS package to extract and load data from access table (.mdb) named TmShipping to a SQL table say, TmShippingImport, this SQL table has Id and importedDate as additional columns. I have scheduled the package to run for every 30 minutes.
TmShipping.mds
----------------------------------------
OrderId CarrierId TotalCharge
TmShippingImport (SQL table)
-------------------------------------------------------
Id OrderId CarrierId TotalCharge importedDate
In the Data Flow Task:
I am getting the data from source using OLD EB connection and extracting all row data from the access table and the output of this is connected to a Recordset Destination so that I can extract each row.
In the Control flow task:
I have a loop container (connected to data flow task's o/p) which inserts each row into the sql table by a sql query and loads all the row data along with the current datetime.
Package execution
The SSIS package when executed for the first time loads each row into the SQL Table and add a DateTime to imporatDatetime Column. When new records are created in the Access table, the package now takes all the rows in the MS access (rows that were extracted previously and new rows) and loads them in the SQL table again. How to avoid duplicates? My primary key in SQL table is ID which is not present in the MS Access Table.
Tried using Lookup table in the Dataflow in between source and recordset destination but it failed saying I can't connect the available column to BLOB.
Should I be trying with Lookup Merge in the dataflow or should I make changes in the foreachloop container so that it checks for duplicates before inserting the rows into the sql table or...?
First of all, you're approach to inserting the data into the SQL table seems very expensive. Is there a reason you can't simply insert the data in to the SQL table in the data flow using a DataFlow destination, it will be much, much more performant.
If the reason you've not done this is because of the timestamp, you can achieve this be using a derived column transformation and GetDate().
Once you've done that you can implement the pattern used in the answer Tab linked to.

SSIS OLEDB Command transformation (Insert if not exists)

Ok so according to Microsoft docs the OLE DB Command Transformation in SSIS does this
The OLE DB Command transformation runs an SQL statement for each row in a data flow. For example, you can run an SQL statement that inserts, updates, or deletes rows in a database table.
So I want to write some SQL to Insert rows in one of my tables only IF the record doesn't exists
So I tried this but the controls keeps complaining of bad sintaxys
IF NOT EXISTS
(SELECT * FROM M_Employee_Login WHERE
Column1=?
AND Column2=?
AND Column3=?)
INSERT INTO [M_Employee_Login]
([Column1]
,[Column2]
,[Column3])
VALUES
(?,?,?)
However if I remove the IF NOT EXISTS section (leaving the insert only) the controls says may code is Ok, what am I doing wrong.
Is there an easier solution?
Update: BTW My source is a Flat File (csv file)
Update since answer: Just to let people know. I ended up using the OLE DB Command Transformation like I planned cause is better than the OLE DB Destination for this operation. The difference is that I did used the Lookup Component to filter all the already existent records (like the answer suggested). Then use the OLE DB Command Transformation with the Insert SQL that I had in the question and it worked as expected. Hope it helps
OLEDB Command object is not the same as the OLE DB Destination
Rather than doing it as you describe, instead use a Lookup Component. Your data flow becomes Flat File Source -> Lookup Component -> OLE DB Destination
In your lookup, you will write the query SELECT Column1, Column2, Column3 FROM M_Employee_Login and configure it such that it will redirect no match entities to the stream instead of failure (depending on your version 2005 vs not 2005) this will be the default.
After the lookup, the output of No Match will contain the values that didn't find a corresponding match in the target table.
Finally, configure your OLEDB Destination to perform the fast load option.
Though you can make use of Look up component in SSIS to avoid the duplicates which is the best possible approach, but if you are looking for some query to avoid the duplicates then, you can simply insert all the data in some temp/staging table in your database, and run the following query.
INSERT INTO M_Employee_Login(Column1, Column2, Column3)
SELECT vAL1,vAL2,vAL3 from Staging_Table
EXCEPT
SELECT Column1, Column2, Column3 FROM M_Employee_Login

Netezza Incremental load from Sql server using SSIS

I am trying to do a incremental load from Sql server 2008 to Netezza (Nps6) using SSIS.
Netezza 5.x version OLEDB driver used. I am using Table or View - Fast Load option with Maximum insert commit size = 0.
Here I am trying to insert few thousands of records to a Netezza table. This destination table contains millions of records. This Data flow task was taking a hours to complete. When I looked into the Netezza Administrator Active Queries I could see that a query like below was the problem,
SELECT * FROM Destination_Table;
The next step is an external table load like below,
insert into "destination_table"(col1, col2, col3)
select c0, c1, c2 from external '/dev/null' (c0, c1, c2) using (
remotesource odbc' delimiter ' ' escapechar '\' ctrlchars 'yes' crinstring 'yes' timeroundnanos 'yes' encoding 'internal' maxerrors 1
) ;
Can anyone help me understand why a SELECT * FROM the Destination Table is required for load. Or how a Netezza OLEDB driver works with SSIS.
Appreciate your help.
Without looking at details in your package, the behavior which you have explained occurs if you have not selected the Table or View -fast load option for your Data access mode in your OLE DB Destination component. The fast load option would internally use a BULK INSERT for uploading data into the destination table.
Using the Table or view behaves like a SELECT * and pulls all the columns. This access mode should be used only if you need all the columns of the table or view from the source to the destination.
The problem for you is that this option might not be appearing for you by default, since you are using Netezza.
See issue discussed here along with possible workarounds:
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/965b6d83-cf5e-405b-8784-7981e4386adc
Official bug report raised here:
https://connect.microsoft.com/SQLServer/feedback/details/569087
After installing OLEDB 6.x version this "SELECT * FROM DESTINATION TABLE" issue is not occurring. I could see a good performance improvement with OLEDB 6 version. But, If we are working on OLEDB 5.x version, i believe it is better to load to a stage table and then load to the destination table

Resources