I'm doing some data migration of a large amount of data in which I need to perform some data matching in order to identify the operation that needs to be done on the record. For that What I'm currently doing is to read the data from the source and then match the records using a SQL Command - so that I need to hit the Database twice for each record. So Will it improve the performance if I read the data to a recordset and then match the values inside that ?
I'm reading from SQL Server 2008 R2
1) using Look up transformation is one efficient way of merging records
ex:
2) Use merge procedures
ex:
MERGE [dbo].[Value] AS TARGET
USING [dbo].[view_Value] AS SOURCE
ON (
TARGET.[Col1] = SOURCE.[col1]
)
WHEN MATCHED
THEN
UPDATE SET
TARGET.[col3] = SOURCE.[col3]
TARGET.[col2] = SOURCE.[col2]
WHEN NOT MATCHED BY TARGET THEN
INSERT ([col1], [col2], [col3] )
VALUES (SOURCE.[col1], SOURCE.[col2], SOURCE.[col3] )
Related
I am updating an on-premises SQL Server database table with data from a csv file using a Copy Data activity. There is an int identity Id column on the sink table that gets generated when I do the Upsert. I would like to retrieve the Id value generated in that table to use later in the pipeline.
Is there a way to do this?
I can't use a data flow as I am using a self-hosted Integration Runtime.
Hi #Nick.McDermaid, I am loading about 7,000 rows from the file to the database. I want to store the identities in the database the file comes from.
Edit:
I have 2 databases (source/target). I want to upsert (using MERGE SQL below, with the OUTPUT clause) into the target db from the source db and then return the Ids (via the OUTPUT resultset) to the source db. The problem I have is that the upsert (MERGE) SQL gets it's SELECT statement from the same target db that the target table is in (when using a Copy Data activity), but I need to get the SELECT from the source db. Is there a way to do this, maybe using the Script activity?
Edit 2: To clarify, the 2 databases are on different servers.
Edit 3 (MERGE Update):
MERGE Product AS target
USING (SELECT [epProductDescription]
,[epProductPrimaryReference]
FROM [epProduct]
WHERE [epEndpointId] = '438E5150-8B7C-493C-9E79-AF4E990DEA04') AS source
ON target.[Sku] = source.[epProductPrimaryReference]
WHEN MATCHED THEN
UPDATE SET [Name] = source.[epProductDescription]
,[Sku] = source.[epProductPrimaryReference]
WHEN NOT MATCHED THEN
INSERT ([Name]
,[Sku]
VALUES (source.[epProductDescription]
,source.[epProductPrimaryReference]
OUTPUT $action, inserted.*, updated.*;
Edit 3 (sample data):
source sample:
target output
Is there a way to do this, maybe using the Script activity?
Yes, you can execute this script using Script activity in ADF
As your tables are on different SQL servers first you have to create Linked server with source database on target Database.
go to >> Server Objects >> Linked Server >> New Linked server and create linked server with source database on target Database as below.
While creating linked server make sure same user must exist on both databases.
then I wrote Merge Query using this linked sever source.
My Sample Query:
MERGE INTO PersonsTarget as trg
USING (SELECT [LastName],[FirstName],[State]
FROM [OP3].[sample1].[dbo].[Personssource]) AS src
ON trg.[State] = src.[State]
WHEN MATCHED THEN
UPDATE SET [LastName] = src.[LastName]
,[FirstName] = src.[FirstName]
WHEN NOT MATCHED THEN
INSERT ([LastName],[FirstName],[State])
VALUES (src.[FirstName],src.[LastName],src.[State])
OUTPUT $action, inserted.*;
Then In Script activity I provided the script
Note: In linked service for on premises target table use same user which you used in linked service
Executed successfully and returning Ids:
I have created a package is SSIS. It's working fine for first time insertion. When I am running the package through SQL Server agent jobs, I am getting duplicates inserted when the scheduled job is inserting data.
I don't have any idea about how to stop inserting multiple duplicate records.
I am expecting to remove duplicates insertion while running deployed package through SQL Server Jobs
There are 2 approaches to do that:
(1) using SQL Command
This option can be used if source and destination are on the same server
Since you are using ADO.NET source you can change the Data Access mode to SQL Command and select only data that not exists in the destination:
SELECT *
FROM SourceTable
WHERE NOT EXISTS(
SELECT 1
FROM DestinationTable
WHERE SourceTable.ID = DestinationColumn.ID)
(2) using Lookup Transformation
You can use a Lookup transformation to get the non-matching rows between Source and destination and ignore duplicates:
UNDERSTAND SSIS LOOKUP TRANSFORMATION WITH AN EXAMPLE STEP BY STEP
SSIS - only insert rows that do not exists
SSIS import data or insert data if no match
Implementing Lookup Logic in SQL Server Integration Services
In order to remove duplicates use SQL Task with the following query (assuming that you are not extracting million of rows and you want to remove duplicates on the extracted data, not destination) :
with cte as (
select field1,field2, row_number() over(partition by allfieldsfromPK order by allfieldsfromPK) as rownum)
delete from cte where rownum > 1
Then use a Data Flow Task and insert clean data into destination table.
In case you just want to not insert duplicates , a very good option is to use MERGE statement, a more performant alternative.
I am using mule database connector to insert update in database . now i have different queries like insert and update in different table , and payload for them will be different as well . how can i achieve bulk operations in this. can i save the queries in a flow variable as list , and accordingly save the values in another list and pass it both to database flow ? will it work .
so i want to generate raw sql queries and save it to file and then use bulk execute for that . does mule provide any tostring method to just convert the query with placeholders to actual raw query ?
like i have query
update table mytable set column1 = #[payload.column1], column2 = #[payload.id]
to
update table mytable set column1 = 'stringvalue', column2 = 1234 ;
Mule's database component does support bulk operations. You can select Bulk Execute in the Operation. The implementation is descriptive when you select the operation.
With regards to making the query dynamic, you can pass the values from variables or property files, as per your convenience.
You can have stored procedure for insert and update accepting input parameters as array.Send the records in blocks inside for loop by setting batch size. This will result in less round trips.
Below is the link to article and has all the details
https://dzone.com/articles/passing-java-arrays-in-oracle-stored-procedure-fro
I have imported an Excel table into SQL Server 2016 Express and made a table using the import wizard.
Now when I update my Excel sheet, e.g add one ore more row to it, I also want to update my SQL Server table, that means, adding the one ore more rows to the table as well. What's the easiest way to do this? In an "append" rows manner. I don't want to add the whole Excel sheet again..
You asked for the easiest solution. Since you are already comfortable using the wizard it would seem to me that the easiest way is to import the "updated" Excel sheet / file into SQL Server Express using the wizard as well. Yet, import it into a new table (without removing the old one).
Afterwards, insert new rows or update the existing records on the SQL Server with a simple SQL MERGE statement. Afterwards, you can drop / delete the imported table again (because the existing table has been updated).
While I do not know your tables the following SQL code sample shows a simple merge on a basic customer table where tblCustomers would be the exiting table (to be updated / insert new rows) and tblCustomersNEW would be the new import (which will be deleted again once the update / append is complete):
merge dbo.tblCustomers as target
using dbo.tblCustomersNEW as source
on source.ID = target.ID
when matched then
update set target.Name = source.Name,
target.Country = source.Country
when not matched by target then
insert (Name, Country)
values (
source.Name,
source.Country
);
Note, that the MERGE statement requires a semicolon at the end similar to CTE requiring a semicolon before you start ; With myTable as....
For more information on the MERGE statement you might want to read the following on article on MSDN: https://msdn.microsoft.com/en-us/library/bb510625.aspx
I have one table in SQL server and 5 tables in Teradata.I want to join those 5 table in teradata with sql server table and store result in Teradata table.
I have sql server name but i dont know how to simultaneously run a query both on sql server and teradata.
i want to do this:
sql server table query
Select distinct store
from store_Desc
teradata tables:
select cmp_id,state,sde
from xyz
where store in (
select distinct store
from sql server table)
You can create a table (or a volatile table if you do not have write privileges) to do this. Export result from SQL Server as text or into the language of your choice.
CREATE VOLATILE TABLE store_table (
column_1 datatype_1,
column_2 datatype_2,
...
column_n datatype_n);
You may need to add ON COMMIT PRESERVE ROWS before the ; to the above depending on your transaction settings.
From a language you can loop the below or do an execute many.
INSERT INTO store_table VALUES(value_1, value_2, ..., value_n);
Or you can use the import from text using Teradata SQL Assistant by going to File and selecting Import. Then execute the below and navigate to your file.
INSERT INTO store_table VALUES(?, ?, ..., n);
Once you have inserted your data you can query it by simply referencing the table name.
SELECT cmp_id,state,sde
FROM xyz
WHERE store IN(
SELECT store
FROM store_table)
The DISTINCT function is most easily done on export from SQL Server to minimize the rows you need to upload.
EDIT:
If you are doing this many times you can do this with a script, here is a very simple example in Python:
import pyodbc
con_ss = pyodbc.connect('sql_server_odbc_connection_string...')
crs_ss = con_ss.cursor()
con_td = pyodbc.connect('teradata_odbc_connection_string...')
crs_td = con_td.cursor()
# pull data for sql server
data_ss = crs_ss.execute('''
SELECT distinct store AS store
from store_Desc
''').fetchall()
# create table in teradata
crs_td.execute('''
CREATE VOLATILE TABLE store_table (
store DEC(4, 0)
) PRIMARY INDEX (store)
ON COMMIT PRESERVE ROWS;''')
con_td.commit()
# insert values; you can also use an execute many, but this is easier to read...
for row in data_ss:
crs_td.execute('''INSERT INTO store_table VALUES(?)''', row)
con_td.commit()
# get final data
data_td = crs_td.execute('''SELECT cmp_id,state,sde
FROM xyz
WHERE store IN(
SELECT store
FROM store_table);''').fetchall()
# from here write to file or whatever you would like.
Is fetching data from the Sql Server through ODBC an option?
The best option may be to use Teradata Parallel Transporter (TPT) to fetch data from SQL Server using its ODBC operator (as a producer) combined with Load or Update operator as the consumer to insert it into an intermediate table on Teradata. You must then perform rest of the operations on Teradata. For the rest of the operations, you can use BTEQ/SQLA to store the results in the final Teradata table. You can also put the same SQL in TPT's DDL operator instead of BTEQ/SQLA and get it done in a single job script.
To allow use of tables residing on separate DB environments (in your case SQL-Server and Teradata) in a single select statement, Teradata has recently released Teradata Query Grid. But I'm not sure about exact level of support for SQL-Server and it will involve licensing hassle and quite a learning curve to do this simple job.