Netezza Incremental load from Sql server using SSIS - sql-server

I am trying to do a incremental load from Sql server 2008 to Netezza (Nps6) using SSIS.
Netezza 5.x version OLEDB driver used. I am using Table or View - Fast Load option with Maximum insert commit size = 0.
Here I am trying to insert few thousands of records to a Netezza table. This destination table contains millions of records. This Data flow task was taking a hours to complete. When I looked into the Netezza Administrator Active Queries I could see that a query like below was the problem,
SELECT * FROM Destination_Table;
The next step is an external table load like below,
insert into "destination_table"(col1, col2, col3)
select c0, c1, c2 from external '/dev/null' (c0, c1, c2) using (
remotesource odbc' delimiter ' ' escapechar '\' ctrlchars 'yes' crinstring 'yes' timeroundnanos 'yes' encoding 'internal' maxerrors 1
) ;
Can anyone help me understand why a SELECT * FROM the Destination Table is required for load. Or how a Netezza OLEDB driver works with SSIS.
Appreciate your help.

Without looking at details in your package, the behavior which you have explained occurs if you have not selected the Table or View -fast load option for your Data access mode in your OLE DB Destination component. The fast load option would internally use a BULK INSERT for uploading data into the destination table.
Using the Table or view behaves like a SELECT * and pulls all the columns. This access mode should be used only if you need all the columns of the table or view from the source to the destination.
The problem for you is that this option might not be appearing for you by default, since you are using Netezza.
See issue discussed here along with possible workarounds:
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/965b6d83-cf5e-405b-8784-7981e4386adc
Official bug report raised here:
https://connect.microsoft.com/SQLServer/feedback/details/569087

After installing OLEDB 6.x version this "SELECT * FROM DESTINATION TABLE" issue is not occurring. I could see a good performance improvement with OLEDB 6 version. But, If we are working on OLEDB 5.x version, i believe it is better to load to a stage table and then load to the destination table

Related

How to avoid re-inserting data (duplicates) into SQL Server table while re-running SSIS package that loads data?

I have created a package is SSIS. It's working fine for first time insertion. When I am running the package through SQL Server agent jobs, I am getting duplicates inserted when the scheduled job is inserting data.
I don't have any idea about how to stop inserting multiple duplicate records.
I am expecting to remove duplicates insertion while running deployed package through SQL Server Jobs
There are 2 approaches to do that:
(1) using SQL Command
This option can be used if source and destination are on the same server
Since you are using ADO.NET source you can change the Data Access mode to SQL Command and select only data that not exists in the destination:
SELECT *
FROM SourceTable
WHERE NOT EXISTS(
SELECT 1
FROM DestinationTable
WHERE SourceTable.ID = DestinationColumn.ID)
(2) using Lookup Transformation
You can use a Lookup transformation to get the non-matching rows between Source and destination and ignore duplicates:
UNDERSTAND SSIS LOOKUP TRANSFORMATION WITH AN EXAMPLE STEP BY STEP
SSIS - only insert rows that do not exists
SSIS import data or insert data if no match
Implementing Lookup Logic in SQL Server Integration Services
In order to remove duplicates use SQL Task with the following query (assuming that you are not extracting million of rows and you want to remove duplicates on the extracted data, not destination) :
with cte as (
select field1,field2, row_number() over(partition by allfieldsfromPK order by allfieldsfromPK) as rownum)
delete from cte where rownum > 1
Then use a Data Flow Task and insert clean data into destination table.
In case you just want to not insert duplicates , a very good option is to use MERGE statement, a more performant alternative.

Data Flow Task in SSIS package stuck for an indefinite period

We are using SQL Server 2012 and SSDT 2010 for the development and debug purpose for SSIS packages.
I have a simple data flow task where it has two components only- OLE DB source and OLE DB target. The target table stores data for multiple dates and is loaded incrementally from the source table i.e. whenever the source table receives data with a new date, it is loaded into the target table. There is no transformation, calculation or logic applied on the data flow.
The source OLE DB uses the following query to select data from the source-
SELECT * FROM source_table WHERE date_col NOT IN
(SELECT DISTINCT date_col FROM target_table);
In the OLE DB target page, it is set as Fast Load and Table Lock option is also unchecked.
Whenever we are executing the package from SSDT, it only shows some rows in figures in the data flow path as processing and goes into the never ending stage. The figure doesn't grow and the package never ends until stopped forcefully.
Thanks in advance.
Try substituting sub query with a left join as below:
SELECT a.* FROM source_table a
left join target_table b
on a. date_col=b.date_col where b.date_col is null

Loading data of one table into another residing on different databases - Netezza

I have a big file which I have loaded in a table in a netezza database using an ETL tool, lets call this database Staging_DB. Now, post some verifications, the content of this table needs to be inserted into similar structured table residing in another netezza DB, lets call this one PROD_DB. What is the fastest way to transfer data from staging_DB to PROD_DB?
Should I be using the ETL tool to load the data into PROD_DB? Or,
Should the transfer be done using external tables concept?
If there is no transformation need to be done, then better way to transfer is cross database data transfer. As described in Netezza documentation that Netezza support cross database support where the user has object level permission on both databases.
You can check permission with following command -
dbname.schemaname(loggenin_username)=> \dpu username
Please find below working example -
INSERT INTO Staging_DB..TBL1 SELECT * FROM PROD_DB..TBL1
If you want to do some transformation and than after you need to insert in another database then you can write UDT procedures (also called as resultset procedures).
Hope this will help.
One way you could move the data is by using Transient External Tables. Start by creating a flat file from your source table/db. Because you are moving from Netezza to Netezza you can save time and space by turning on compression and using internal formatting.
CREATE EXTERNAL TABLE 'C:\FileName.dat'
USING (
delim 167
datestyle 'MDY'
datedelim '/'
maxerrors 2
encoding 'internal'
Compress True
REMOTESOURCE 'ODBC'
logDir 'c:\' ) AS
SELECT * FROM source_table;
Then create the table in your target database using the same DDL in the source and just load it up.
INSERT INTO target SELECT * FROM external 'C:\FileName.dat'
USING (
delim 167
datestyle 'MDY'
datedelim '/'
maxerrors 2
encoding 'internal'
Compress True
REMOTESOURCE 'ODBC'
logDir 'c:\' );
I would write a SP on production db and do a CTAS from stage to production database. The beauty of SP is you can add transformations as well.
One other option is NZ migrate utility provided by Netezza and that is the fastest route I believe.
A simple SQL query like
INSERT INTO Staging_DB..TBL1 SELECT * FROM PROD_DB..TBL1
works great if you just need to do that.
Just be aware that you have to be connected to the destination database when executing the query, otherwise you will get an error code
HY0000: "Cross Database Access not supported for this type of command"
even if you have read/write access to both databases and tables.
In most cases you can simply change the catalog using a "Set Catalog" command
https://www-304.ibm.com/support/knowledgecenter/SSULQD_7.0.3/com.ibm.nz.dbu.doc/r_dbuser_set_catalog.html
set catalog='database_name';
insert into target_db.target_schema.target_table select source_db.source_schema.source_table;

SSIS OLEDB Command transformation (Insert if not exists)

Ok so according to Microsoft docs the OLE DB Command Transformation in SSIS does this
The OLE DB Command transformation runs an SQL statement for each row in a data flow. For example, you can run an SQL statement that inserts, updates, or deletes rows in a database table.
So I want to write some SQL to Insert rows in one of my tables only IF the record doesn't exists
So I tried this but the controls keeps complaining of bad sintaxys
IF NOT EXISTS
(SELECT * FROM M_Employee_Login WHERE
Column1=?
AND Column2=?
AND Column3=?)
INSERT INTO [M_Employee_Login]
([Column1]
,[Column2]
,[Column3])
VALUES
(?,?,?)
However if I remove the IF NOT EXISTS section (leaving the insert only) the controls says may code is Ok, what am I doing wrong.
Is there an easier solution?
Update: BTW My source is a Flat File (csv file)
Update since answer: Just to let people know. I ended up using the OLE DB Command Transformation like I planned cause is better than the OLE DB Destination for this operation. The difference is that I did used the Lookup Component to filter all the already existent records (like the answer suggested). Then use the OLE DB Command Transformation with the Insert SQL that I had in the question and it worked as expected. Hope it helps
OLEDB Command object is not the same as the OLE DB Destination
Rather than doing it as you describe, instead use a Lookup Component. Your data flow becomes Flat File Source -> Lookup Component -> OLE DB Destination
In your lookup, you will write the query SELECT Column1, Column2, Column3 FROM M_Employee_Login and configure it such that it will redirect no match entities to the stream instead of failure (depending on your version 2005 vs not 2005) this will be the default.
After the lookup, the output of No Match will contain the values that didn't find a corresponding match in the target table.
Finally, configure your OLEDB Destination to perform the fast load option.
Though you can make use of Look up component in SSIS to avoid the duplicates which is the best possible approach, but if you are looking for some query to avoid the duplicates then, you can simply insert all the data in some temp/staging table in your database, and run the following query.
INSERT INTO M_Employee_Login(Column1, Column2, Column3)
SELECT vAL1,vAL2,vAL3 from Staging_Table
EXCEPT
SELECT Column1, Column2, Column3 FROM M_Employee_Login

Sql Query Task in SSIS

I have added a Execute Sql Task in my Project.I have added a Sql query in it
Insert into M1
select * from M4
But the problem is M1 table is in AAA database & M4 table is in DDD Database.
It is showing some error...?
If both databases are on the same server then fully qualify the table names:
insert into AAA.dbo.M1 (col1, col2, ...)
select col1, col2, ...
from DDD.dbo.M4
Of course, if your objects are not in the dbo schema then you need to put the correct one. You should never use SELECT * by the way, it can lead to problems if you ever change the table structure (or someone else does). Instead, always specify the column names.
An alternative would be to use a data flow to copy the data, but that's probably unnecessary here.
you can use a Data Flow Task. add a OLE DB Source and a OLE DB Destination. Then configure source and destination as required.
Take a look at here

Resources