How to convert delete+insert SQL into DBT module - snowflake-cloud-data-platform

I'm learning DBT and would like to rewrite following Snowflake procedure with DBT model.
Unfortunately I don't know how to express SQL delete/inserts in DBT.
Here is my procedure:
create or replace procedure staging.ingest_google_campaigns_into_master()
returns varchar
language sql
as
$$
begin
DELETE FROM GOOGLE_ADWORD_CAMPAIGN
WHERE DT IN (SELECT DISTINCT ORIGINALDATE AS DT FROM GOOGLEADWORDS_CAMPAIGN);
INSERT INTO GOOGLE_ADWORD_CAMPAIGN
SELECT DISTINCT *
FROM
(
SELECT g.* ,
YEAR(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)) AS YEAR,
LPAD(MONTH(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS MONTH,
LPAD(DAY(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS DAY,
TO_DATE(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR) AS DT
FROM GOOGLEADWORDS_CAMPAIGN g
) t;
end;
$$
;
The procedure first remove old rows from table GOOGLE_ADWORD_CAMPAIGN and later replace them with the fresh one.

This pattern is called an "incremental" materialization in dbt. See the docs for more background.
On Snowflake, there are a few different "strategies" you can use for incremental materializations. One strategy is called delete+insert, which does exactly what your stored procedure does. Another option is merge, which may perform better.
Adapting your code to dbt would look like this:
{{
config(
materialized='incremental',
unique_key='ORIGINALDATE',
incremental_strategy='delete+insert',
)
}}
SELECT DISTINCT
g.* ,
YEAR(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)) AS YEAR,
LPAD(MONTH(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS MONTH,
LPAD(DAY(TO_TIMESTAMP(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR)),2,0) AS DAY,
TO_DATE(DATE_PART(EPOCH_SECOND, ORIGINALDATE::TIMESTAMP)::VARCHAR) AS DT
FROM GOOGLEADWORDS_CAMPAIGN g
Note that this assumes there is just a single record in GOOGLEADWORDS_CAMPAIGN for each ORIGINALDATE -- if that is not true, you will want to substitute your own unique_key in the config block.
This also assumes that GOOGLEADWORDS_CAMPAIGN contains a sliding date window already. If that isn't the case, and you want to filter the dates contained in that table (and only update a subset of the data), you want to add a conditional WHERE statement when the model is built in incremental mode:
...
FROM GOOGLEADWORDS_CAMPAIGN g
{% if is_incremental() %}
WHERE
DT >= (SELECT MAX(DT) FROM {{ this }}) - INTERVAL '7 DAYS'
{% endif %}

Related

How to generate an excel file (.xlsx) from SQL Server

I have this query:
WITH InfoNeg AS
(
SELECT DISTINCT
n.idcliente,
CASE
WHEN DATEDIFF(MONTH, MAX(n.fechanegociacion), GETDATE()) <= 2
THEN 'Negociado 6 meses'
ELSE NULL
END AS TipoNeg
FROM
SAB2NewExports.dbo.negociaciones AS n
WHERE
Aprobacion = 'Si'
AND cerrado = 'Si'
GROUP BY
n.idcliente
), Multi AS
(
SELECT DISTINCT
idcliente, COUNT(distinct idportafolio) AS NumPorts
FROM
orangerefi.wfm.wf_master_HIST
WHERE
YEAR(Fecha_BKP) = 2021
AND MONTH(Fecha_BKP) = 08
GROUP BY
idcliente
)
SELECT DISTINCT
m.IdCliente, c.Nombre1
FROM
orangerefi.wfm.wf_master_HIST as m
LEFT JOIN
InfoNeg ON m.idcliente = InfoNeg.idcliente
LEFT JOIN
Multi ON m.IdCliente = Multi.idcliente
LEFT JOIN
SAB2NewExports.dbo.Clientes AS c ON m.IdCliente = c.IdCliente
WHERE
CanalTrabajo = 'Callcenter - Outbound' -- Cambiar aca
AND YEAR (Fecha_BKP) = 2021
AND MONTH(Fecha_BKP) = 08
AND GrupoTrabajo IN ('Alto') -- Cambiar aca
AND Bucket IN (1, 2) -- Cambiar aca
AND Multi.NumPorts > 1
AND Infoneg.TipoNeg IS NULL
When I run it, I get 30 thousand rows and the columns that I get when performing the query are: ClientID, name. I would like it to be saved in an Excel file when I run it, I don't know if it's possible.
Another question, is it possible to create a variable that stores text?
I used CONCAT(), but the text being so long is a bit cumbersome, I don't know if there is any alternative.
If you can help me, I appreciate it.
To declare a variable
DECLARE #string VARCHAR(MAX)
SET #string = concat()
then insert whatever you are concatenating
Here is an answer given by carusyte
Export SQL query data to Excel
I don't know if this is what you're looking for, but you can export the results to Excel like this:
In the results pane, click the top-left cell to highlight all the records, and then right-click the top-left cell and click "Save Results As". One of the export options is CSV.
You might give this a shot too:
INSERT INTO OPENROWSET
('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=c:\Test.xls;','SELECT productid, price FROM dbo.product')
Lastly, you can look into using SSIS (replaced DTS) for data exports. Here is a link to a tutorial:
http://www.accelebrate.com/sql_training/ssis_2008_tutorial.htm
== Update #1 ==
To save the result as CSV file with column headers, one can follow the steps shown below:
Go to Tools->Options
Query Results->SQL Server->Results to Grid
Check “Include column headers when copying or saving results”
Click OK.
Note that the new settings won’t affect any existing Query tabs — you’ll need to open new ones and/or restart SSMS.

How to write user-defined functions in snowflake?

I am very new to Snowflake. Till now I had used Teradata for writing complex SQL queries.
In snowflake, I need to create and call macros (similar to Teradata), where I have to pass date as parameters, and within the function I have to append records in a table. Something along these lines:
CREATE TABLE SFAAP.WS_DIRBNK_DPST.PV_HIGH_RISK_FI_LIST
(
APP_DT DATE
,FI_NAME VARCHAR(50)
);
CREATE OR REPLACE FUNCTION SFAAP.INSERT_FI (DT DATE, CRED CHAR(5))
--RETURNS NULL
--COMMENT='Create list of high risk FI by date'
AS
'
INSERT INTO SFAAP.WS_DIRBNK_DPST.PV_HIGH_RISK_FI_LIST
TO_DATE(DD) --------------Passed Parameter
,FI_NAME
FROM
(
SELECT
FINANCIAL_INSTITUTION AS FI_NAME
,COUNT(DISTINCT CASE WHEN IND_FPFA_FRAUD = 1 THEN APP_ID ELSE NULL END) AS TOT_FPFA_APPS
,COUNT(DISTINCT APP_ID) AS TOT_APPS
,CAST(TOT_FPFA_APPS AS DECIMAL(38,2))/TOT_APPS AS FRAUD_RATE
FROM
(
SELECT
A.*
,C.FINANCIAL_INSTITUTION
FROM BASE_05 A
LEFT JOIN
(
SELECT
BNK_ACCT_NBR_TOK
,BNK_TRAN_TYP_CDE
,ALT_DR_CR_CDE
,TRAN_1_DSC_TOK
,TRAN_DT
,TRAN_AMT
FROM "SFAAP"."V_SOT_DIRBNK_CLB_FRD_CRD"."BNK_DPS_TRAN_RLT_INFO"
WHERE TRAN_DT BETWEEN DATEADD(Day,-90,TO_DATE(DD)) AND TO_DATE(DD) --------------Passed Parameter, does calculation in the 90 days window from the passed date
AND ALT_DR_CR_CDE = TO_CHAR(CRED) --------------Passed Parameter
AND BNK_TRAN_TYP_CDE IN (22901,56003,56002,56302,56303,56102,70302)
AND TRAN_AMT>=5
QUALIFY ROW_NUMBER() OVER(PARTITION BY BNK_ACCT_NBR_TOK, TRAN_DT, TRAN_AMT, BNK_TRAN_TYP_CDE ORDER BY TRAN_DT ASC, TRAN_AMT DESC)=1
) B
ON A.BNK_ACCT_NBR = B.BNK_ACCT_NBR_TOK
LEFT JOIN SFAAP.WS_DIRBNK_DPST.PV_FRAUD_METRICS_03 C
ON B.TRAN_1_DSC_TOK = C.TOKEN_NAME
)SUB_A
GROUP BY 1
)SUB_B
WHERE FINANCIAL_INSTITUTION IS NOT NULL
AND TOT_APPS>=3
AND FRAUD_RATE>=0.20
'
;
I took some guidance from this answer here, but I am still not there yet. Here's the error which I am getting:
Due to lack of experience writing snowflake user-defined functions, I think I am messing up syntax somewhere (could be the way I am passing those two parameters). Comments/suggestions are most welcome.
Thanks in advance.
It looks like SFAAP is your database name, please include your schema name if you are going to use "Fully Qualified Names", or change your session context to use a database and schema and then create the function without the database and schema name.
example:
CREATE OR REPLACE FUNCTION SFAAP.WS_DIRBNK_DPST.INSERT_FI (

Joining continuous queries in Flink SQL

I'm trying to join two continuous queries, but keep running into the following error:
Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before.\nPlease check the documentation for the set of currently supported SQL features.
Here's the table definition:
CREATE TABLE `Combined` (
`machineID` STRING,
`cycleID` BIGINT,
`start` TIMESTAMP(3),
`end` TIMESTAMP(3),
WATERMARK FOR `end` AS `end` - INTERVAL '5' SECOND,
`sensor1` FLOAT,
`sensor2` FLOAT
)
and the insert query
INSERT INTO `Combined`
SELECT
a.`MachineID`,
a.`cycleID`,
MAX(a.`start`) `start`,
MAX(a.`end`) `end`,
MAX(a.`sensor1`) `sensor1`,
MAX(m.`sensor2`) `sensor2`
FROM `Aggregated` a, `MachineStatus` m
WHERE
a.`MachineID` = m.`MachineID` AND
a.`cycleID` = m.`cycleID` AND
a.`start` = m.`timestamp`
GROUP BY a.`MachineID`, a.`cycleID`, SESSION(a.`start`, INTERVAL '1' SECOND)
In the source tables Aggregated and MachineStatus, the start and timestamp columns are time attributes with a watermark.
I've tried casting the input rows of the join to timestamps, but that didn't fix the issue and would mean that I cannot use SESSION, which is supposed to ensure that only one data point gets recorded per cycle.
Any help is greatly appreciated!
I investigated this a little further and noticed that the GROUP BY statement doesn't make sense in that context.
Furthermore, the SESSION can be replaced by a time window, which is the more idiomatic approach.
INSERT INTO `Combined`
SELECT
a.`MachineID`,
a.`cycleID`,
a.`start`,
a.`end`,
a.`sensor1`,
m.`sensor2`
FROM `Aggregated` a, `MachineStatus` m
WHERE
a.`MachineID` = m.`MachineID` AND
a.`cycleID` = m.`cycleID` AND
m.`timestamp` BETWEEN a.`start` AND a.`start` + INTERVAL '0' SECOND
To understand the different ways to join dynamic tables, I found the Ververica SQL training extremely helpful.

How do I update an XML column in sql server by checking for the value of two nodes including one which needs to do a contains (like) comparison

I have an xml column called OrderXML in an Orders table...
there is an XML XPath like this in the table...
/Order/InternalInformation/InternalOrderBreakout/InternalOrderHeader/InternalOrderDetails/InternalOrderDetail
There InternalOrderDetails contains many InternalOrderDetail nodes like this...
<InternalOrderDetails>
<InternalOrderDetail>
<Item_Number>FBL11REFBK</Item_Number>
<CountOfNumber>10</CountOfNumber>
<PriceLevel>FREE</PriceLevel>
</InternalOrderDetail>
<InternalOrderDetail>
<Item_Number>FCL13COTRGUID</Item_Number>
<CountOfNumber>2</CountOfNumber>
<PriceLevel>NONFREE</PriceLevel>
</InternalOrderDetail>
</InternalOrderDetails>
My end goal is to modify the XML in the OrderXML column IF the Item_Number of the node contains COTRGUID (like '%COTRGUID') AND the PriceLevel=NONFREE. If that condition is met I want to change the PriceLevel column to equal FREE.
I am having trouble with both creating the xpath expression that finds the correct nodes (using OrderXML.value or OrderXML.exist functions) and updating the XML using the OrderXML.modify function).
I have tried the following for the where clause:
WHERE OrderXML.value('(/Order/InternalInformation/InternalOrderBreakout/InternalOrderHeader/InternalOrderDetails/InternalOrderDetail/Item_Number/node())[1]','nvarchar(64)') like '%13COTRGUID'
That does work, but it seems to me that I need to ALSO include my second condition (PriceLevel=NONFREE) in the same where clause and I cannot figure out how to do it. Perhaps I can put in an AND for the second condition like this...
AND OrderXML.value('(/Order/InternalInformation/InternalOrderBreakout/InternalOrderHeader/InternalOrderDetails/InternalOrderDetail/PriceLevel/node())[1]','nvarchar(64)') = 'NONFREE'
but I am afraid it will end up operating like an OR since it is an XML query.
Once I get the WHERE clause right I will update the column using a SET like this:
UPDATE Orders SET orderXml.modify('replace value of (/Order/InternalInformation/InternalOrderBreakout/InternalOrderHeader/InternalOrderDetails/InternalOrderDetail/PriceLevel[1]/text())[1] with "NONFREE"')
However, I ran this statement on some test data and none of the XML columns where updated (even though it said zz rows effected).
I have been at this for several hours to no avail. Help is appreciated. Thanks.
if you don't have more than one node with your condition in each row of Orders table, you can use this:
update orders set
data.modify('
replace value of
(
/Order/InternalInformation/InternalOrderBreakout/
InternalOrderHeader/InternalOrderDetails/
InternalOrderDetail[
Item_Number[contains(., "COTRGUID")] and
PriceLevel="NONFREE"
]/PriceLevel/text()
)[1]
with "FREE"
');
sql fiddle demo
If you could have more than one node in one row, there're a several possible solutions, none of each is really elegant, sadly.
You can reconstruct all xmls in table - sql fiddle demo
or you can do your updates in the loop - sql fiddle demo
This may get you off the hump.
Replace #HolderTable with the name of your table.
SELECT T2.myAlias.query('./../PriceLevel[1]').value('.' , 'varchar(64)') as MyXmlFragmentValue
FROM #HolderTable
CROSS APPLY OrderXML.nodes('/InternalOrderDetails/InternalOrderDetail/Item_Number') as T2(myAlias)
SELECT T2.myAlias.query('.') as MyXmlFragment
FROM #HolderTable
CROSS APPLY OrderXML.nodes('/InternalOrderDetails/InternalOrderDetail/Item_Number') as T2(myAlias)
EDIT:
UPDATE
#HolderTable
SET
OrderXML.modify('replace value of (/InternalOrderDetails/InternalOrderDetail/PriceLevel/text())[1] with "MyNewValue"')
WHERE
OrderXML.value('(/InternalOrderDetails/InternalOrderDetail/PriceLevel)[1]', 'varchar(64)') = 'FREE'
print ##ROWCOUNT
Your issue is the [1] in the above.
Why did I put it there?
Here is a sentence from the URL listed below.
Note that the target being updated must be, at most, one node that is explicitly specified in the path expression by adding a "[1]" at the end of the expression.
http://msdn.microsoft.com/en-us/library/ms190675.aspx
EDIT.
I think I've discovered the the root of your frustration. (No fix, just the problem).
Note below, the second query works.
So I think the [1] is some cases is saying "only ~~search~~ the first node".....and not (as you and I were hoping)...... "use the first node..after you find a match".
UPDATE
#HolderTable
SET
OrderXML.modify('replace value of (/InternalOrderDetails/InternalOrderDetail/PriceLevel/text())[1] with "MyNewValue001"')
WHERE
OrderXML.value('(/InternalOrderDetails/InternalOrderDetail/PriceLevel[text() = "NONFREE"])[1]', 'varchar(64)') = 'NONFREE'
/* and OrderXML.value('(/InternalOrderDetails/InternalOrderDetail/Item_Number)[1]', 'varchar(64)') like '%COTRGUID' */
UPDATE
#HolderTable
SET
OrderXML.modify('replace value of (/InternalOrderDetails/InternalOrderDetail/PriceLevel/text())[1] with "MyNewValue002"')
WHERE
OrderXML.value('(/InternalOrderDetails/InternalOrderDetail/PriceLevel[text() = "FREE"])[1]', 'varchar(64)') = 'FREE'
Try this :
;with InternalOrderDetail as (SELECT id,
Tbl.Col.value('Item_Number[1]', 'varchar(40)') Item_Number,
Tbl.Col.value('CountOfNumber[1]', 'int') CountOfNumber,
case
when Tbl.Col.value('Item_Number[1]', 'varchar(40)') like '%COTRGUID'
and Tbl.Col.value('PriceLevel[1]', 'varchar(40)')='NONFREE'
then 'FREE'
else
Tbl.Col.value('PriceLevel[1]', 'varchar(40)')
end
PriceLevel
FROM (select id ,orderxml from demo)
as a cross apply orderxml.nodes('//InternalOrderDetail')
as
tbl(col) ) ,
cte_data as(SELECT
ID,
'<InternalOrderDetails>'+(SELECT ITEM_NUMBER,COUNTOFNUMBER,PRICELEVEL
FROM InternalOrderDetail
where ID=Results.ID
FOR XML AUTO, ELEMENTS)+'</InternalOrderDetails>' as XML_data
FROM InternalOrderDetail Results
GROUP BY ID)
update demo set orderxml=cast(xml_data as xml)
from demo
inner join cte_data on demo.id=cte_data.id
where cast(orderxml as varchar(2000))!=xml_data;
select * from demo;
SQL Fiddle
I have handled following cases :
1. As required both where clause in question.
2. It will update all <Item_Number> like '%COTRGUID' and <PriceLevel>= NONFREE in one
node, not just the first one.
It may require minor changes for your data and tables.

How can I copy records between tables only if they are valid according to check constraints in Oracle?

I don't know if that is possible, but I want to copy a bunch of records from a temp table to a normal table. The problem is that some records may violate check constraints so I want to insert everything that is possible and generate error logs somewhere else for the invalid records.
If I execute:
INSERT INTO normal_table
SELECT ... FROM temp_table
nothing would be inserted if any record violates any constraint. I could make a loop and manually insert one by one, but I think the performance would be lower.
Ps: if possible, I'd like a solution that works with Oracle 9
From Oracle 10gR2, you can use the log errors clause:
EXECUTE DBMS_ERRLOG.CREATE_ERROR_LOG('NORMAL_TABLE');
INSERT INTO normal_table
SELECT ... FROM temp_table
LOG ERRORS REJECT LIMIT UNLIMITED;
In its simplest form. You can then see what errors you got:
SELECT ora_err_mesg$
FROM err$_normal_table;
More on the CREATE_ERROR_LOG step here.
I think this approach works from 9i, but don't have an instance available to test on, so this is actually run on 11gR2
Update: tested and tweaked (to avoid PLS-00436) in 9i:
declare
type t_temp_table is table of temp_table%rowtype;
l_temp_table t_temp_table;
l_err_code err_table.err_code%type;
l_err_msg err_table.err_msg%type;
l_id err_table.id%type;
cursor c is select * from temp_table;
error_array exception;
pragma exception_init(error_array, -24381);
begin
open c;
loop
fetch c bulk collect into l_temp_table limit 100;
exit when l_temp_table.count = 0;
begin
forall i in 1..l_temp_table.count save exceptions
insert into normal_table
values l_temp_table(i);
exception
when error_array then
for j in 1..sql%bulk_exceptions.count loop
l_id := l_temp_table(sql%bulk_exceptions(j).error_index).id;
l_err_code := sql%bulk_exceptions(j).error_code;
l_err_msg := sqlerrm(-1 * sql%bulk_exceptions(j).error_code);
insert into err_table(id, err_code, err_msg)
values (l_id, l_err_code, l_err_msg);
end loop;
end;
end loop;
end;
/
With all your real columns instead of just id, which I've done just for demo purposes:
create table normal_table(id number primary key);
create table temp_table(id number);
create table err_table(id number, err_code number, err_msg varchar2(2000));
insert into temp_table values(42);
insert into temp_table values(42);
Then run the anonymous block above...
select * from normal_table;
ID
----------
42
column err_msg format a50
select * from err_table;
ID ERR_CODE ERR_MSG
---------- ---------- --------------------------------------------------
42 1 ORA-00001: unique constraint (.) violated
This is less satisfactory on a few levels - more coding, slower if you have a lot of exceptions (because of the individual inserts for those), doesn't show which constraint was violated (or any other error details), and won't retain the errors if you rollback - though you could call an autonomous transaction to log it if that was an issue, which I doubt here.
If you have a small enough volume of data to not want to worry about the limit clause you can simplify it a bit:
declare
type t_temp_table is table of temp_table%rowtype;
l_temp_table t_temp_table;
l_err_code err_table.err_code%type;
l_err_msg err_table.err_msg%type;
l_id err_table.id%type;
error_array exception;
pragma exception_init(error_array, -24381);
begin
select * bulk collect into l_temp_table from temp_table;
forall i in 1..l_temp_table.count save exceptions
insert into normal_table
values l_temp_table(i);
exception
when error_array then
for j in 1..sql%bulk_exceptions.count loop
l_id := l_temp_table(sql%bulk_exceptions(j).error_index).id;
l_err_code := sql%bulk_exceptions(j).error_code;
l_err_msg := sqlerrm(-1 * sql%bulk_exceptions(j).error_code);
insert into err_table(id, err_code, err_msg)
values (l_id, l_err_code, l_err_msg);
end loop;
end;
/
The 9i documentation doesn't seem to be online any more, but this is in a new-features document, and lots of people have written about it - it's been asked about here before too.
If you're specifically interested only in check constraints then one method to think about is to read the definitions of the target check constraints from the data dictionary and apply them as predicates to the query that extracts data from the source table using dynamic sql.
Given:
create table t1 (
col1 number check (col1 between 3 and 10))
You can:
select constraint_name,
search_condition
from user_constraints
where constraint_type = 'C' and
table_name = 'T1'
The result being:
"SYS_C00226681", "col1 between 3 and 10"
From there it's "a simple matter of coding", as they say, and the method will work on just about any version of Oracle. The most efficient method would probably be to use a multitable insert to direct rows to either the intended target table or to an error logging table based on the result of a CASE statement that applies the check constraint predicates.

Resources