Is it possible to set up a pipeline in Azure Data Factory that performs a MERGE between the source and the destination rather than an INSERT? I have been able to successfully select data from my source on-prem table and insert into my destination, but I'd really like to set up a pipeline that will continually update my destination with any changes in the source. E.g copying new records that are added to the source, or updating any data which changes on an existing record.
I've seen references to the Data Sync framework, but from what I can tell that is only supported in the legacy portal. My V12 databases do not even show up in the class Azure portal.
There is the Stored Proc activity which could handle this. You could use Data Factory to land the data in a staging table then call the stored proc to perform the MERGE. Otherwise Data Factory logic is not that sophisticated so you could not perform a merge in the same way you could in SSIS for example. Custom activities are probably not suitable for this, IMHO. This is also in line with Data Factory being ELT rather than ETL.
Related
I am looking for a way in Informatica to pull data from a table in a database, load it in Snowflake, and then move on to the next table in that same DB and repeating that for the remaining tables in the database.
We currently have this set up running in Matillion where there is an orchestration that grabs all of the names of a table of a database, and then loops through each of the tables in that database to send the data into Snowflake.
My team and I have tried to ask Informatica Global Support, but they have not been very helpful for us to figure out how to accomplish this. They have suggested things like Dynamic Mapping, which I do not think will work for our particular case since we are in essence trying to get data from one database to a Snowflake database and do not need to do any other transformations.
Please let me know if any additional clarification is needed.
Dynamic Mapping Task is your answer. You create one mapping. With, or without any transformations - as you need. Then you set up Dynamic Mapping Task to execute the mapping across whole set of your 60+ different sources and targets.
Please note that this is available as part of Cloud Data Integration module of IICS. It's not available in PowerCenter.
I want to create a daily process where I reload all rows from table A into table B. Over time table A rows will change due to changes in source system and also because of aging/deletion of records in the origin table. Table A gets truncated/reloaded daily in step 1. Table B is the master table that just gets new/updated rows.
From a historical point of view, I want to keep track of ALL the rows in table B and be able to do a point in time comparison for analytics purposes.
So I need to do two things, Daily insert rows from table A to table B if they don't exist and then also create a new record in Table B if the record already exists but ANY of the columns have changed. At one point I attempted to use temporal tables but I had too many false/positives on 'real' changes, basically certain columns were throwing off things because a date/time column was updated(only real change in row).
I'm using a Azure SQL Server Managed Instance database (Microsoft SQL Azure (RTM) - 12.0.2000.8).
At my disposal I have SSMS, SQL Server and also Azure Data Factory.
Any suggestions on the best way to do this or tools to help with this?
There are 2 concepts out of which you can implement any one.
Temporal table
Capture Data Change (CDC)
As CDC is the commonly used approach in which you can create an Azure data factory with a pipeline that loads delta data based on change data capture (CDC) information in the source Azure SQL Managed Instance database to an Azure blob storage.
To implement the CDC, you can you can follow this simple Microsoft tutorial Incrementally load data from Azure SQL Managed Instance to Azure Storage using change data capture (CDC)
Note: You also need to Create a storage account which is required but not given in above tutorial.
I have staging tables in my SQL Server database, views that transform and combine those tables and final tables that I create from the result data of the views.
I could automatise the process by creating a stored process that would truncate the final table and insert the data from the view.
I want to know if it's possible to do this operation with an Azure Data Factory copy activity using the view as source and the table as sink.
Thank you for your help!
ADF does support SQL server as source as well as sink.
So there are 2 ways:
You can use copy activity with the view as your source and table as the destination
You can use stored procedure activity wherein you have all data ingestion/transformations logics within stored procedure and call the stored procedure
I am creating a new a Azure data factory pipeline. In that I need to to copy one table to Azure Blob and delete the data after copy success. before deleting the data i need to create a view of copied data and compare the data in source database which is going to delete.I need to delete the data from source table only the data in view and source table match.
As I know about Azure Data Factory, it doesn't support you create the view, so you can not do that.
Hope this helps.
We have following question about the SQL Master Data Service:
We have already integrated different client data to MDS with the help MS Excel plugin, now we want to push back updated or new added record to source database. is it possible do using MDS?
Do we have any background sync process to which automatically sync data to and from subscriber and MDS?
Create subscription views in MDS. Then you can leverage SSIS to pull the data from MDS using the subscription view and merging it into the source database.
Used in conjunction with Business Rules in MDS, the ETL can be coded to only query for "validated" members in MDS. This allows for a nice separation of concerns so the ETL doesn't have to be overly complex with data it needs to retrieve from MDS>