I have dataflow that is doing some joins and a count on columns which then sinks into Azure SQL table.
Azure SQL table structure has a GUID as ID column.
What I want is that when the values in the count column change, for it to update in the Azure SQL table. Rather than an insert.
I use this code in SQL for it to work, but I want to implement this in ADF Dataflow. I am using Alter Row expression true() for upserts. Doesn't seem to be working when it comes to counts. Not sure where I am wrong.
Please ask for clarification if needed.
Below is the SQL code that works and I want to implement that in ADF.
MERGE [Target] AS [T]
USING [#Temp] AS [S]
ON [T].[Filter] = [S].[Filter]
AND ISNULL([T].[OrganisationGroup], '') = ISNULL([S].[OrganisationGroup], '')
AND ISNULL([T].[ItemCatalog], '') = ISNULL([S].[FinancialReference], '')
AND ISNULL([T].[CIType], '') = ISNULL([S].[CIType], '')
AND ISNULL([T].[ConfigurationItemTypeName], '') = ISNULL([S].[ConfigurationItemTypeName], '')
AND ISNULL([T].[Status], '') = ISNULL([S].[Status], '')
AND ISNULL([T].[Further_Detail], '') = ISNULL([S].[Further_Detail], '')
AND ISNULL([T].[PartitionKey], '') = ISNULL([S].[PartitionKey], '')
WHEN MATCHED AND [T].[CICount] <> [S].[CICount] THEN
UPDATE SET [T].[CICount] = [S].[CICount]
WHEN NOT MATCHED BY SOURCE THEN
DELETE
WHEN NOT MATCHED THEN
INSERT
I have repro’d in my lab with some sample data, please find the below steps.
Existing data in the SQL table (table tb2):
ADF data flow:
Connect the source to the input dataset. Here I have new records compared to existing SQL data, so the count will be updated, and new records will be inserted if not matched.
Adding aggregate to get the count of the input records.
Aggregate data preview:
Adding Alter row transformation to perform upsert.
Upsert condition:
and(and(isNull(Name)== false(),isNull(Department)== false()),isNull(Status)== false())
Note: Here, include all columns except count as we are performing upsert based on Count column.
Alter row data preview:
Connect Sink to SQL table. In settings, enable Allow upsert and include a list of key columns.
Sink preview:
After execution, Count in the sink table is updated for existing records when the count is not matched and new records are inserted.
Related
I have a local SQL Server DB table with about 5 million records.
I snowflake server that has a similar table that is updated daily.
I need to update my local table with the new records that are added on the Snowflake table.
This code works but it takes about an hour to retrieve about 200,000 records. I insert the records into a local temp table and then insert them into my Sql server db.
Is there a faster way to retrieve the records from Snowflake and get them into SQL Server?
TIA
JohnB
SELECT A.*
into #Sale2020New
FROM OPENQUERY(SNOW, 'SELECT * FROM "DATA"."PUBLIC"."Sales" where "Sales"."Date" >= ''1/1/2020'' and "Sales"."Date" <= ''12/31/2020'' ') A
Left JOIN [SnowFlake].[dbo].Sale2020 B
ON B.PrimaryKey = A.PrimaryKey
WHERE
b.PrimaryKey IS NULL;
Does it take 1 hour just retrieving data from Snowflake or the whole process?
To speed up data retrieval from Snowflake, implement clustering on DATE column in snowflake table. This would prune micropartitions and avoid full table scan. You can get more information on clustering here
As for delta load, instead of a join you can apply filter on DATE column to current date and this will avoid a costly join operation and filter data at the start.
SELECT * FROM "SALES"
where "Sales"."Date" = '2020-04-07'
I'm trying to insert daily imported data into a SQL Server (2017) table. While most of the time the imported data has a fixed amount of columns, sometimes the client wants to add a new column to the data-to-be-imported.
I'm seeking for a solution that when the data gets imported (whether it is from another table, from R or from .csv's, don't mind this), SQL would automatically add the missing (extra) column to the parent table, providing the column name and assigning NULL to all previous entries.
I've tried with both UNION ALL and BULK INSERT, but both of these require the same # of columns. I'm working with SSMS2017, R3.4.1.
Next, I tried with a staging table and modifying the UNION clause as:
SELECT * FROM Table_new
UNION ALL
SELECT Tp.*, '' FROM Table_parent Tp;
But more often than not the extra column doesn't occur, so the column dimension problem occurs again.
I also thought about running the queries from R with DBI and odbc dbWriteTable() and handling the invalid column error with TryCatch(), parsing the column name from the error message and so on, but this would be a shakiest craft I've ever done and would prefer not to.
Ultimately I thought adding an if clause in R, and depending on the number of added new columns, loop and add the ', ""' part to the SQL query to create the extra columns. I'm convinced that this is too complex solution to this problem.
# Pseudo-R
#calculate the difference between lenght(colnames)
diff <- diff(length(colnames_new, colnames_parent)
if diff = 0 {
dbQuery(BULK INSERT INTO old SELECT * FROM new;)
} else if diff > 0 {
dbQuery(paste0(SELECT * FROM new
UNION ALL
SELECT T1.*, loop_paste(, '' /* for every diff */), FROM parent T1;))
} else if diff < 0 {
dbQuery(SELECT * FROM parent
UNION ALL
SELECT T2.*, loop_paste(, '' /* for every diff */), FROM new T2;))
}
To summarize: when inserting data to SQL table, how to (automatically) append the columns in the parent table, when necessary? Thanks!
The things in your database such as tables, columns, primary keys, foreign keys, check clauses are all part of the database schema. People design the schema before adding data to the database.
If you want to add new columns then you have to redesign your schema. When you do this you will also have to rewrite some of the CRUD procedures.
I have two SQL Server tables where I need to add records from one table to the next. If the unique identifier already exists in the target table, then update the record to the data coming from source table - If the unique identifier doesn't exist, then insert the entire new record into the target table.
I seem to have gotten the initial part to work where I update the records in target table but the the part where I would INSERT new records does not seem to be working.
if exists (
select 1
from SCM_Top_Up_Operational O
join SCM_Top_Up_Rolling R ON O.String = R.string
)
begin
update O
set O.Date_Added = R.Date_Added,
O.Real_Exfact = R.Real_Exfact,
O.Excess_Top_Up = R.Excess_Top_Up
from SCM_Top_Up_Operational O
join SCM_Top_Up_Rolling R on O.String = R.String
where O.String = R.string and R.date_added > O.date_added
end
else
begin
insert into SCM_Top_Up_Operational (String,Date_Added,Real_Exfact,Article_ID,Excess_Top_Up,Plant)
select String,Date_Added,Real_Exfact,Article_ID,Excess_Top_Up,Plant
from SCM_Top_Up_Rolling
end
If I followed you correctly, you should be able to solve this with a single SQL query, using SQL Server MERGE syntax, available since SQL Server 2008.
From the documentation:
Runs insert, update, or delete operations on a target table from the results of a join with a source table. For example, synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.
Consider the following query:
MERGE
SCM_Top_Up_Operational O
USING SCM_Top_Up_Rolling R ON (O.String = R.string)
WHEN MATCHED
THEN UPDATE SET
O.Date_Added = R.Date_Added,
O.Real_Exfact = R.Real_Exfact,
O.Excess_Top_Up = R.Excess_Top_Up
WHEN NOT MATCHED BY TARGET
THEN INSERT ( String, Date_Added, Real_Exfact, Article_ID, Excess_Top_Up, Plant)
VALUES (R.String, R.Date_Added, R.Real_Exfact, R.Article_ID, R.Excess_Top_Up, R.Plant)
Here is the following situation:
I have a table of StudentsA which needs to be synchronized with another table, on a different server, StudentsB. It's a one-way sync from A to B.
Since the table StudentsA can hold a large number of rows, we have a table called StudentsSync (on the input server) containing the ID of StudentsA which have been modified since the last copy from StudentsA to StudentsB.
I made the following SSIS Data Flow task:
The only problem is that I need to DELETE the row from StudentsSync after a successful copy or update. Something like this:
Any idea how this can be achieved?
It can be achieved using 3 methods
1.If your target table in OutputDB has TimeStamp columns such as Create and modified TimeStamp then rows which have got updated or inserted can be obtained by writing a simple query. You need to write the below query in the execte sql task in Control Flow to delete those rows in Sync Table .
Delete from SyncTable
where keyColumn in (Select primary_key from target
where ModifiedTimeStamp >= GETDATE() or (ModifiedTimeStamp is null
and CreateTimeStamp>=GETDATE()))
I assume StudentsA's primary key is present in Sync table along with primary key of Target table. The above condition basically checks, if a new row is added then CreateTimeStamp column will have current date and modifiedTimeStamp will be null else if the values are updated then the modifiedTimeStamp will have current date
The above query will work if you have TimeStamp columns in your target table which i feel should be there if your loading data into Data Warehouse
2.You can use MERGE syntax to perform the update and insert in Control Flow with Execute SQL Task.No need to use Data Flow Task .The below query can be used even if you don't have any TimeStamp columns
DECLARE #Output TABLE ( ActionType VARCHAR(20), SourcePrimaryKey INT)
MERGE StudentsB AS TARGET
USING StudentsA AS SOURCE
ON (TARGET.CommonColumn = SOURCE.CommonColumn)
WHEN MATCHED
THEN
UPDATE SET TARGET.column = SOURCE.Column,TARGET.ModifiedTimeStamp=GETDATE()
WHEN NOT MATCHED BY TARGET THEN
INSERT (col1,col2,Col3)
VALUES (SOURCE.col1, SOURCE.col2, SOURCE.Col3)
OUTPUT $action,
INSERTED.PrimaryKey AS SourcePrimaryKey INTO #Output
Delete from SyncTable
where PrimaryKey in (Select SourcePrimaryKey from #Output
where ActionType ='INSERT' or ActionType='UPDATE')
The code is not tested as i'm running out of time .but at-least it should give you a fair idea how to proceed . .For furthur detail on MERGE syntax read this and this
3.Use Multicast Component to duplicate the dataset for Insert and Update .Connect a MULTICAST to lookmatch output and another multicast to Lookup No match output
Add a task after "Update existing entry" and after "Insert new entry" to add the student ID to a variable which will contain the list of IDs to delete.
Enclose all of the tasks in a sequence container.
After the sequence container executes add a task to delete all the records from the sync table that are in that variable you've been populating.
I am using SSIS to move excel data to a temp sql server table and from there to the target table.
So my temp table consists of only varchar columns - my target table expects money values for some columns. In my temp table the original excel columns have a formula but leave an empty cell on some rows which is represented by the temp table with an empty cell as well. But when I cast one of these columns to money these originally blank cells become 0,00 in the target column.
Of course that is not what I want, so how can I get NULL values in there? Keeping in mind that it is possible that a wanted 0,00 shows up in one of these columns.
I guess I would need to edit my temp table to turn the empty cells to NULL. Can I do this from within a SSIS package or is there a setting for the table I could use?
thank you.
For existing data you can write a simple script that updates data to NULL where empty.
UPDATE YourTable SET Column = NULL WHERE Column = ''
For inserts you can use NULLIF function to insert nulls if empty
INSERT INTO YourTable (yourColumn)
SELECT NULLIF(sourceColum, '') FROM SourceTable
Edit: for multiple column updates you need to combine the two solutions and write something like:
UPDATE YourTable SET
Column1 = NULLIF(Column1, '')
, Column2 = NULLIF(Column2, '')
WHERE Column1 = '' OR Column2 = ''
etc
That will update all