SSIS OLEDB Command transformation (Insert if not exists) - sql-server

Ok so according to Microsoft docs the OLE DB Command Transformation in SSIS does this
The OLE DB Command transformation runs an SQL statement for each row in a data flow. For example, you can run an SQL statement that inserts, updates, or deletes rows in a database table.
So I want to write some SQL to Insert rows in one of my tables only IF the record doesn't exists
So I tried this but the controls keeps complaining of bad sintaxys
IF NOT EXISTS
(SELECT * FROM M_Employee_Login WHERE
Column1=?
AND Column2=?
AND Column3=?)
INSERT INTO [M_Employee_Login]
([Column1]
,[Column2]
,[Column3])
VALUES
(?,?,?)
However if I remove the IF NOT EXISTS section (leaving the insert only) the controls says may code is Ok, what am I doing wrong.
Is there an easier solution?
Update: BTW My source is a Flat File (csv file)
Update since answer: Just to let people know. I ended up using the OLE DB Command Transformation like I planned cause is better than the OLE DB Destination for this operation. The difference is that I did used the Lookup Component to filter all the already existent records (like the answer suggested). Then use the OLE DB Command Transformation with the Insert SQL that I had in the question and it worked as expected. Hope it helps

OLEDB Command object is not the same as the OLE DB Destination
Rather than doing it as you describe, instead use a Lookup Component. Your data flow becomes Flat File Source -> Lookup Component -> OLE DB Destination
In your lookup, you will write the query SELECT Column1, Column2, Column3 FROM M_Employee_Login and configure it such that it will redirect no match entities to the stream instead of failure (depending on your version 2005 vs not 2005) this will be the default.
After the lookup, the output of No Match will contain the values that didn't find a corresponding match in the target table.
Finally, configure your OLEDB Destination to perform the fast load option.

Though you can make use of Look up component in SSIS to avoid the duplicates which is the best possible approach, but if you are looking for some query to avoid the duplicates then, you can simply insert all the data in some temp/staging table in your database, and run the following query.
INSERT INTO M_Employee_Login(Column1, Column2, Column3)
SELECT vAL1,vAL2,vAL3 from Staging_Table
EXCEPT
SELECT Column1, Column2, Column3 FROM M_Employee_Login

Related

How to avoid re-inserting data (duplicates) into SQL Server table while re-running SSIS package that loads data?

I have created a package is SSIS. It's working fine for first time insertion. When I am running the package through SQL Server agent jobs, I am getting duplicates inserted when the scheduled job is inserting data.
I don't have any idea about how to stop inserting multiple duplicate records.
I am expecting to remove duplicates insertion while running deployed package through SQL Server Jobs
There are 2 approaches to do that:
(1) using SQL Command
This option can be used if source and destination are on the same server
Since you are using ADO.NET source you can change the Data Access mode to SQL Command and select only data that not exists in the destination:
SELECT *
FROM SourceTable
WHERE NOT EXISTS(
SELECT 1
FROM DestinationTable
WHERE SourceTable.ID = DestinationColumn.ID)
(2) using Lookup Transformation
You can use a Lookup transformation to get the non-matching rows between Source and destination and ignore duplicates:
UNDERSTAND SSIS LOOKUP TRANSFORMATION WITH AN EXAMPLE STEP BY STEP
SSIS - only insert rows that do not exists
SSIS import data or insert data if no match
Implementing Lookup Logic in SQL Server Integration Services
In order to remove duplicates use SQL Task with the following query (assuming that you are not extracting million of rows and you want to remove duplicates on the extracted data, not destination) :
with cte as (
select field1,field2, row_number() over(partition by allfieldsfromPK order by allfieldsfromPK) as rownum)
delete from cte where rownum > 1
Then use a Data Flow Task and insert clean data into destination table.
In case you just want to not insert duplicates , a very good option is to use MERGE statement, a more performant alternative.

Insert a data in OLE DB destination

I have created a SSIS package. It'll load a XML file and store the data in the database.
I need to use a value (FILE_NAME) from FILE_INFO which is passed from the XML file into the OLE DB Destination in SQL Command mode. How can I use the FILE_NAME in the sql query.
This is my SQL query inside OLE DB Destination
Insert into DummyFile(DummyFileName, DummyFileStatusID)
VALUES ('I NEED TO INSERT THE FILE NAME HERE',
(Select FileStatusID from DummyFileStatus where StatusName='Created'));
Please advice.
I think you are looking for a Derived Column
If you want to insert the values into another OLEDB Destination DummyFileStatus, you can add a MutliCast transformation that allow you to insert data into multiple OLEDB Destination.
Or Just add another DataFlow Task that will be executed after this DataFlow task, to import Data from DummyFile to DummyFileStatus and use a Derived Column within it.
Additional Informations
Derived Column Transformation
Multicast Transformation

Netezza Incremental load from Sql server using SSIS

I am trying to do a incremental load from Sql server 2008 to Netezza (Nps6) using SSIS.
Netezza 5.x version OLEDB driver used. I am using Table or View - Fast Load option with Maximum insert commit size = 0.
Here I am trying to insert few thousands of records to a Netezza table. This destination table contains millions of records. This Data flow task was taking a hours to complete. When I looked into the Netezza Administrator Active Queries I could see that a query like below was the problem,
SELECT * FROM Destination_Table;
The next step is an external table load like below,
insert into "destination_table"(col1, col2, col3)
select c0, c1, c2 from external '/dev/null' (c0, c1, c2) using (
remotesource odbc' delimiter ' ' escapechar '\' ctrlchars 'yes' crinstring 'yes' timeroundnanos 'yes' encoding 'internal' maxerrors 1
) ;
Can anyone help me understand why a SELECT * FROM the Destination Table is required for load. Or how a Netezza OLEDB driver works with SSIS.
Appreciate your help.
Without looking at details in your package, the behavior which you have explained occurs if you have not selected the Table or View -fast load option for your Data access mode in your OLE DB Destination component. The fast load option would internally use a BULK INSERT for uploading data into the destination table.
Using the Table or view behaves like a SELECT * and pulls all the columns. This access mode should be used only if you need all the columns of the table or view from the source to the destination.
The problem for you is that this option might not be appearing for you by default, since you are using Netezza.
See issue discussed here along with possible workarounds:
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/965b6d83-cf5e-405b-8784-7981e4386adc
Official bug report raised here:
https://connect.microsoft.com/SQLServer/feedback/details/569087
After installing OLEDB 6.x version this "SELECT * FROM DESTINATION TABLE" issue is not occurring. I could see a good performance improvement with OLEDB 6 version. But, If we are working on OLEDB 5.x version, i believe it is better to load to a stage table and then load to the destination table

Sql Query Task in SSIS

I have added a Execute Sql Task in my Project.I have added a Sql query in it
Insert into M1
select * from M4
But the problem is M1 table is in AAA database & M4 table is in DDD Database.
It is showing some error...?
If both databases are on the same server then fully qualify the table names:
insert into AAA.dbo.M1 (col1, col2, ...)
select col1, col2, ...
from DDD.dbo.M4
Of course, if your objects are not in the dbo schema then you need to put the correct one. You should never use SELECT * by the way, it can lead to problems if you ever change the table structure (or someone else does). Instead, always specify the column names.
An alternative would be to use a data flow to copy the data, but that's probably unnecessary here.
you can use a Data Flow Task. add a OLE DB Source and a OLE DB Destination. Then configure source and destination as required.
Take a look at here

SQL Server Insert failure due to XML Schema validation error

I have a XML column in a table and it is defined by a schema. I am trying to insert values into this table by using Insert into tbl1 Select * from tbl for xml. But this is failing due to schema validation failure for one of the records. But i want to insert the records which have passed the validation atleast and i can capture the others later. Can someone help me in this.
SQL server validates all dataset, not single row. If you want to validate Row-by-Row using SQL server tools, methods are:
SQLCLR (fastest) link
SSIS (easy to create) - using loop FOREACH you try to insert row into table. All failed rows are redirecting to another table.
TSQL TRY/CATCH Block - insert xml from single row to schema validated variable. Slowest one.

Resources