SQL Server / SSIS: Create new ID for inserted row of merge - sql-server

I am kind of stuck:
Scenario:
I have a SSIS-Package, which loads data incrementally. In the first step, I load rows from a source which have a) been inserted or b) updated into a staging table. I do this by using the last timestamp of the source table.
In the next step, I am trying to use a MERGE-Statement to update the data in another database (similar to a data warehouse). I have no control over this other database, otherwise my task would be quite easy.
Problem:
The data warehouse table includes an ID-Column ([cId], BIGINT), which it does not set by itself. I have tried to create a sequence, from which I pull a value whenever I insert a new row into the data warehouse (not when I update a row, since that row will already have an ID). However, as specified here, SQL Server will not let me use the next value from my sequence for the target of a MERGE-Statement. Since I have no control over the data warehouse, I cannot change this.
Another solution would be to get the next value from my sequence when I load the data into the staging table. This, however, will result in "holen" in my ID-sequence, because when I update a row in the data warehouse from my staging table, the [cId] column would not be updated, since that row already has an ID.
Does anyone have an idea how to solve this? I am basically trying to pull a new, unique BIGINT, whenever I do an insert inside my MERGE-Statement.
Thanks!

Related

AzureSynapse pipeline how to add guid to raw data

I am new to AzureSynapse and am technically a Data Scientist whos doing a Data Engineering task. Please help!
I have some xlsx files containing raw data that I need to import into an SQL database table. The issue is that the raw data does not have a uniqueidentifer column and I need to add one before inserting the data into my SQL database.
I have been able to successfully add all the rows to the table by adding a new column on the Copy Data command and setting it to be #guid(). However, this sets the guid of every row to the same value (not unique for each row).
GUID mapping:
DB Result:
If I do not add this mapping, the pipeline throws an error stating that it cannot import a NULL Id into the column Id. Which makes sense as this column does not accept NULL values.
Is there a way to have AzureSynapse analystics read in a raw xlsx file and then import it into my DB with a unique identifier for each row? If so, how can I accomplish this?
Many many thanks for any support.
Giving dynamic content to a column in this way would generate the same value for entire column.
Instead, you can generate a new guid for each row using a for each activity.
You can retrieve the data from your source excel file using a lookup activity (my source only has name column). Give the output array of lookup activity to for each activity.
#activity('Lookup1').output.value
Inside for each, since you already have a linked service, create a script activity. In this script activity, you can create a query with dynamic content to insert values into the destination table. The following is the query I built using dynamic content.
insert into demo values ('#{guid()}','#{item().name}')
This allows you to iterate through source rows, insert each row individually while generating new guid every time
You can follow the above procedure to build a query to insert each row with unique identifier value. The following is an image where I used copy data to insert first 2 rows (same as yours) and inserted the next 2 rows using the above procedure.
NOTE: I have taken Azure SQL database for demo, but that does not affect the procedure.

Datafactory - dynamically copy subsection of columns from one database table to another

I have a database on SQL Server on premises and need to regularly copy the data from 80 different tables to an Azure SQL Database. For each table the columns I need to select from and map are different - example, TableA - I need columns 1,2 and 5. For TableB I need just column 1. The tables are named the same in the source and target, but the column names are different.
I could create multiple Copy data pipelines and select the source and target data sets and map to the target table structures, but that seems like a lot of work for what is ultimately the same process repeated.
I've so far created a meta table, which lists all the tables and the column mapping information. This table holds the following data:
SourceSchema, SourceTableName, SourceColumnName, TargetSchema, TargetTableName, TargetColumnName.
For each table, data is held in this table to map the source tables to the target tables.
I have then created a lookup which selects each table from the mapping table. It then does a for each loop and does another lookup to get the source and target column data for the table in the foreach iteration.
From this information, I'm able to map the Source table and the Sink table in a Copy Data activity created within the foreach loop, but I'm not sure how I can dynamically map the columns, or dynamically select only the columns I require from each source table.
I have the "activity('LookupColumns').output" from the column lookup, but would be grateful if someone could suggest how I can use this to then map the source columns to the target columns for the copy activity. Thanks.
In your case, you can use the expression in the mapping setting.
It needs your provide an expression and it's data should like this:{"type":"TabularTranslator","mappings":[{"source":{"name":"Id"},"sink":{"name":"CustomerID"}},{"source":{"name":"Name"},"sink":{"name":"LastName"}},{"source":{"name":"LastModifiedDate"},"sink":{"name":"ModifiedDate"}}]}
So you need to add a column named as Translator in your meta table, and it's value should be like the above JSON data. Then use this expression to do mapping:#item().Translator
Reference: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#parameterize-mapping

How to use the pre-copy script from the copy activity to remove records in the sink based on the change tracking table from the source?

I am trying to use change tracking to copy data incrementally from a SQL Server to an Azure SQL Database. I followed the tutorial on Microsoft Azure documentation but I ran into some problems when implementing this for a large number of tables.
In the source part of the copy activity I can use a query that gives me a change table of all the records that are updated, inserted or deleted since the last change tracking version. This table will look something like
PersonID Age Name SYS_CHANGE_OPERATION
---------------------------------------------
1 12 John U
2 15 James U
3 NULL NULL D
4 25 Jane I
with PersonID being the primary key for this table.
The problem is that the copy activity can only append the data to the Azure SQL Database so when a record gets updated it gives an error because of a duplicate primary key. I can deal with this problem by letting the copy activity use a stored procedure that merges the data into the table on the Azure SQL Database, but the problem is that I have a large number of tables.
I would like the pre-copy script to delete the deleted and updated records on the Azure SQL Database, but I can't figure out how to do this. Do I need to create separate stored procedures and corresponding table types for each table that I want to copy or is there a way for the pre-copy script to delete records based on the change tracking table?
You have to use a LookUp activity before the Copy Activity. With that LookUp activity you can query the database so that you get the deleted and updated PersonIDs, preferably all in one field, separated by comma (so its easier to use in the pre-copy script). More information here: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
Then you can do the following in your pre-copy script:
delete from TableName where PersonID in (#{activity('MyLookUp').output.firstRow.PersonIDs})
This way you will be deleting all the deleted or updated rows before inserting the new ones.
Hope this helped!
In the meanwhile the Azure Data Factory provides the meta-data driven copy task. After going through the dialogue driven setup, a metadata table is created, which has one row for each dataset to be synchronized. I solved this UPSERT problem by adding a stored procedure as well as a table type for each dataset to be synchronized. Then I added the relevant information in the metadata table for each row like this
{
"preCopyScript": null,
"tableOption": "autoCreate",
"storedProcedure": "schemaname.UPSERT_SHOP_SP",
"tableType": "schemaname.TABLE_TYPE_SHOP",
"tableTypeParameterName": "shops"
}
After that you need to adapt the sink properties of the copy task like this (stored procedure, table type, table type parameter name):
#json(item().CopySinkSettings).storedProcedure
#json(item().CopySinkSettings).tableType
#json(item().CopySinkSettings).tableTypeParameterName
If the destination table does not exist, you need to run the whole task once before adding the above variables, because auto-create of tables works only as long as no stored procedure is given in the sink properties.

Change tracking -- simplest scenario

I am coding in ASP.NET C# 4. The database is SQL Server 2012.
I have a table that has 2000 rows and 10 columns. I want to load this table in memory and if the table is updated/inserted in any way, I want to refresh the in-memory copy from the DB.
I looked into SQL Server Change Tracking, and while it does what I need, it appears I have to write quite a bit of code to select from the change functions -- more coding than I want to do for a simple scenario that I have.
What is the best (simplest) solution for this problem? Do I go with CacheDependency?
I currently have a similar problem: I'm implementing a rest service that returns a table with 50+ columns and I want to cache the data on the client to reduce trafic.
I'm thinking about this implementation:
All my tables have the fields
ID AutoIncrement (primary key)
Version RowVersion (a numeric value that will be incremented
every time the record is updated)
To calculate a "fingerprint" of the table I use the select
select count(*), max(id), sum(version) from ...
Deleting records changes the first value, inserting the second value and updating the third value.
So if one of the three values changes, i have to reload the table.

Select last updated row ID and last deleted row ID in trigger before they occur in SQL Server

I need to get the id of the updated row in a table to use it to update another table via trigger
Also need to get the id of the deleted row in a table to use it to update another table via trigger
How can I do this?
Is there any built in functions in SQL Server?
If not what kind of trick that can help to accomplish this
Within your trigger, if you want to know the OLD value that was either updated / deleted
SELECT idColumnName FROM deleted
Where idColumnName is the column that contains the ID that you are interested in.
You can then use this ID value to then perform whatever processing that you need.
Additionally. if you want to use the NEW value being updated, the below query gives you that. This is useful especially in case of Updates where you want to compare old / new values of certain fields. In your case, since its an ID column, this will probably not be relevant
SELECT idColumnName FROM inserted

Resources