I am trying to update snowflake table via databricks. where i have created databricks temp table and created query based on temp table which will update snowflake table. But i am not sure if it is possible at all Could someone help me on this.
query = """MERGE INTO dw_3nf.temp_tgt target
USING
(SELECT source1.id as mergekey, 0 as deleted, source1.* FROM dw_3nf.temp_src as source1
UNION ALL
SELECT NULL as mergekey,0 as deleted, source1.*
FROM dw_3nf.temp_src source1 JOIN dw_3nf.temp_tgt target
ON source1.id = target.id
WHERE target.live_flag = 1 AND source1.name <> target.name
UNION ALL
SELECT target.id as mergekey, 1 as deleted, source.*
FROM dw_3nf.temp_tgt as target left join dw_3nf.temp_src as source
ON source.id = target.id
WHERE source.id is null and target.live_flag=1
) staged_updates
ON target.id = mergekey
WHEN MATCHED AND target.live_flag = 1 AND staged_updates.name <> target.name THEN
UPDATE SET live_flag = 0
WHEN MATCHED AND staged_updates.deleted = 1 and target.live_flag=1 THEN
UPDATE SET live_flag=2
WHEN NOT MATCHED THEN
INSERT (id, name, live_flag)
VALUES(staged_updates.id,staged_updates.name,1)"""
df.createOrReplaceTempView("source")
spark.write \
.format("snowflake") \
.options(**options) \
.option("query", query) \
.save()```
Take a step back and think about the systems.
'Databricks Cluster' <---> 'Snowflake Cluster'
You want to avoid too much communincation between both system, because the network is slow.
So I would recommend either:
copy/insert your data into snowflake and transform merge it there
prepare the data in databricks, copy the result to snowflake merge it there
So can you merge into a Snowflake table from a databricks table in one statement, I don't know. Should you do it. Probably not.
Related
I have data flowing into one table from multiple other tables lets say: Table_A
Then I have a Merge stored proc that takes data from table A merges it with Table B.
However, something doesn't seem to be right. If i truncate and load the data it works fine, but if i dont truncate and load, and just fetch the query by eachh hour I get the error message saying
Msg 8672, Level 16, State 1, Procedure Merge_Table_A, Line 4 [Batch Start Line 0]
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
How can I overcome this?
I want to be able to incrementally load the data and not do truncate loads, but at the same time have a stored proc that updates or inserts or doesnt care if the row already exists.
Seems you have duplicate rows in your target table which are loaded from your previous runs.
Note: Matching in a Merge does not consider the rows inserted (even duplicate) while running the Merge itself.
Below is my repro example with a sample data:
Table1: Initial data
Table2: Taget table
Merge Statement:
MERGE tb2 AS Target
USING tb1 AS Source
ON Source.firstname = Target.firstname and
Source.lastname = Target.lastname
-- For Inserts
WHEN NOT MATCHED BY Target THEN
INSERT (firstname, lastname, updated_date)
VALUES (Source.firstname, Source.lastname, source.updated_date)
-- For Updates
WHEN MATCHED THEN UPDATE SET
Target.updated_date = Source.updated_date
-- For Deletes
WHEN NOT MATCHED BY Source THEN
DELETE;
When Merge is executed, it inserts all data without any errors.
New data in tb1:
When I run the Merge statement, it gives me the same error as yours.
As a workaround using one of the below options,
Add additional conditions if possible in the ON clause to uniquely identify the data.
Remove the duplicates from the source and merge the data into tb2 as below.
--temp table
drop table if exists #tb1;
select * into #tb1 from (
select *, row_number() over(partition by firstname, lastname order by firstname, lastname, updated_date desc) as rn from tb1) a
where rn = 1
MERGE tb2 AS Target
USING #tb1 AS Source
ON Source.firstname = Target.firstname and
Source.lastname = Target.lastname
-- For Inserts
WHEN NOT MATCHED BY Target THEN
INSERT (firstname, lastname, updated_date)
VALUES (Source.firstname, Source.lastname, source.updated_date)
-- For Updates
WHEN MATCHED THEN UPDATE SET
Target.updated_date = Source.updated_date
-- For Deletes
WHEN NOT MATCHED BY Source THEN
DELETE;
Data merged into tb2 successfully.
I have a task to create a stored procedure in Oracle DB. Given two different databases DB1 with student_lookup table and DB2 with student_master table . The SP needs to check if DB2.student_master's record exists in DB1.student_lookup table.
If the record exists in DB1 then don't anything
If the record doesn't exists in DB1 then add from DB2
If the record is in DB1 but not DB2 then update that record and set partition_key column to 1.
Any help will be appreciated. I am completely new to Oracle DBA.
If it's two users using:
MERGE INTO db1.student_lookup a
USING
(select * from db2.student_mater) b
ON (
a.id = b.id
AND <others join column>
)
WHEN MATCHED THEN UPDATE SET a.partition_key = 1
WHEN NOT MATCHED THEN INSERT (<a.column>)
VALUES (<b.column>)
If it's two db:
CREATE DATABASE LINK DBLINK_DB1_DB2
CONNECT TO DB2 IDENTIFIED BY <ENTER USER PASSWORD HERE>
USING '<FROM tnsnames>'
MERGE INTO db1.student_lookup a
USING
(select * from "student_mater"#"DBLINK_DB1_DB2") b
ON (
a.id = b.id
AND <others join column>
)
WHEN MATCHED THEN UPDATE SET a.partition_key = 1
WHEN NOT MATCHED THEN INSERT (<a.column>)
VALUES (<b.column>)
If you neen SP simple megre into your SP.
Is there a way to use the instruction:
MERGE INTO MySchema.MyTable AS Target
USING (VALUES
........
)
With nothing instead of the dots? Usually you have there something like a list of (firstValue, SecondValue,...,LastValue), one for each row you want to merge but I'd like to be able to write the instruction with NO rows so that the DELETE part of the MERGE deletes all the rows.
This is because I am using a stored procedure that creates the MERGE instruction automatically but sometimes the table that i am starting from is empty.
Of course I tried with:
MERGE INTO MySchema.MyTable AS Target USING (VALUES)
but it is not accepted.
Example:
MERGE INTO [dbo].[MyTable] AS Target
USING (VALUES (1,'Attivo') ,(2,'Disabilitato') ,(3,'Bloccato') ) AS Source ([IDMyField],[MyField]) ON (Target.[IDMyField] = Source.[IDMyField])
WHEN MATCHED AND ( NULLIF(Source.[MyField], Target.[MyField]) IS NOT NULL OR NULLIF(Target.[MyField], Source.[MyField]) IS NOT NULL)
THEN UPDATE SET [MyField] = Source.[MyField]
WHEN NOT MATCHED BY TARGET
THEN INSERT([IDMyField],[MyField]) VALUES(Source.[IDMyField],Source.[MyField])
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
A viable solution is :
USING (SELECT * FROM MyTable WHERE 1 = 0)
If you're generating the inside query, and the outside query is matching on an predefined ID field, the following will work:
MERGE INTO tester AS Target
USING (
select null as test1 --generate select null, alias as your id field
) as SOURCE on target.test1 = source.test1
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
For your particluar case:
MERGE INTO table1 AS Target
USING (
values(null)
) as SOURCE(id) on target.id = source.id
WHEN NOT MATCHED BY SOURCE
THEN DELETE;
I have following scenario.
Table A // Dev
ID
NAME
Address
Table A // Prod
ID
NAME
Address
While Deleting from Dev i need to check if it exists in prod and if it does then i need to restore it's values from prod and delete all those don't exist in Prod. ANY SQL help around this? Can anyone suggest a query?
You should use Lookup
1.Source will be your development .
2.Drag a lookup and write a query to get the ID from the production table .Match the ID's from source and Production in Lookup and select the ID from Production as well as other columns in production
3.Drag a OLedb command and write a query to update the dev
update d set d.Col1 = ?, d.Col2 =?
from dev.tableA d
where d.id = ?
4.Similary write the query for delete and map the columns selected fromn the lookup
delete from dev.tableA
where id <> ?
Note: Oleb command executes for each row .Therefore it will be slow if you have too many rows . If performance is a main concern then you can dump all the data after lookup into a table in your development server and then use Merge syntax in a Execute SQL task to perform update and delete operation
If both databases are in the same instance then you could try following queries:
--update query
update d set d.name = p.name, d.address = p.address
from dev.dbo.tableA d
join prod.dbo.tableA p on d.id = p.id
--delete query
delete dev.dbo.tableA
where id not in (select id from prod.dbo.tableA)
I having the scenario of loading the data from source table to target table. If the data from source is not present in target, then i need to insert. If it is present in the target table already, then update the status of the row to 'expire' and insert the column as new row. I used Merge query to do this. I can do insert if not exists and i can do update also. But while trying to insert when matched, it says insert not allowed in 'when matched' clause.
Please help me.. Thanks in advance
If you want to perform multiple actions for a single row of source data, you need to duplicate that row somehow.
Something like the following (making up table names, etc):
;WITH Source as (
SELECT Col1,Col2,Col3,t.Dupl
FROM SourceTable,(select 0 union all select 1) t(Dupl)
)
MERGE INTO Target t
USING Source s ON t.Col1 = s.Col1 and s.Dupl=0 /* Key columns here */
WHEN MATCHED THEN UPDATE SET Expired = 1
WHEN NOT MATCHED AND s.Dupl=1 THEN INSERT (Col1,Col2,Col3) VALUES (s.Col1,s.Col2,s.Col3);
You always want the s.Dupl condition in the not matched branch, because otherwise source rows which don't match any target rows would be inserted twice.
From the example you posted as a comment, I'd change:
MERGE target AS tar USING source AS src ON src.id = tar.id
WHEN MATCHED THEN UPDATE SET D_VALID_TO=#nowdate-1, C_IS_ACTIVE='N', D_LAST_UPDATED_DATE=#nowdate
WHEN NOT MATCHED THEN INSERT (col1,col2,col3) VALUES (tar.col1,tar.col2,tar.col3);
into:
;WITH SourceDupl AS (
SELECT id,col1,col2,col3,t.Dupl
FROM source,(select 0 union all select 1) t(Dupl)
)
MERGE target AS tar USING SourceDupl as src on src.id = tar.id AND Dupl=0
WHEN MATCHED THEN UPDATE SET D_VALID_TO=#nowdate-1, C_IS_ACTIVE='N', D_LAST_UPDATED_DATE=#nowdate
WHEN NOT MATCHED AND Dupl=1 THEN INSERT (col1,col2,col3) VALUES (src.col1,src.col2,src.col3);
I've changed the values in the VALUES clause, since in a NOT MATCHED branch, the tar table doesn't have a row to select values from.
Check out one of those many links:
Using SQL Server 2008's MERGE Statement
MERGE on Technet
Introduction to MERGE statement
SQL Server 2008 MERGE
Without actually knowing what your database tables look like, we cannot be of more help - you need to read those articles and figure out yourself how to apply this to your concrete situation.