Snowflake/DBT Incremental table, inserting checksum value on the fly - snowflake-cloud-data-platform

I have an incremental table in dbt (running on snowflake), that I need to create a checksum (ie hash on all columns except metadata columns) from the original table I am creating. I have the query below - (I also referenced help form this post to create said query). Is this even the right approach to take to a)get an initial table loaded & b) have it load every day? (ie run --full refresh in dbt and then schedule --run daily)?
/*Set schema to vault */
{{ config(
schema = 'vault',
materialized = 'incremental',
unique_key = 'o_hashdiff'
) }}
with values_table as (
SELECT
sha2_binary(id, 256) AS o_hash_id,
id AS o_id,
min(pbi_import_dt) AS load_timestamp,
's.a' AS record_source,
CAST(
sha2_binary(
CONCAT_WS(
'||',
IFNULL(
NULLIF(UPPER(TRIM(CAST(id AS VARCHAR))), ''),
'^^'
),
IFNULL(
NULLIF(
UPPER(
TRIM(CAST('s.a' AS VARCHAR))
),
''
),
'^^'
)
),
256
) AS BINARY(64)
) AS o_hashdiff,
current_date() AS insert_date,
'{{ invocation_id }}' AS _audit_invocation_id
FROM
{{ source ('s', 'a') }}
WHERE
ultimate_id_c IS NULL
GROUP BY
1,
2,
4
)
SELECT t1.*,
t2.hash_value
FROM values_table as t1
,LATERAL (SELECT HASH(*) AS hash_value
FROM (SELECT * EXCLUDE (a,b,c)
FROM {{ source ('s', 'a') }} AS t2
WHERE t1.o_id = t2.id and t2.date is null)
) AS t2

Related

How can I refer to a LAG() function column in SQL Server?

I have a query in which I use LAG function :
WITH Tr AS
(
SELECT
DocDtls.Warehouse, Transactions.Code, DocDtls.zDate,
Transactions.ID, Transactions.QtyIn, Transactions.QtyOut,
Transactions.BalanceAfter
FROM
DocDtls
INNER JOIN
Transactions ON DocDtls.[PrimDocNum] = Transactions.[DocNum]
)
SELECT
ID, Code, QtyIn, QtyOut, BalanceAfter,
LAG(BalanceAfter, 1, 0) OVER (PARTITION BY Warehouse, Code
ORDER BY Code, ID) Prev_BlncAfter
FROM
Tr;
It's working fine but when I try to add this column before FROM:
SUM(Prev_BlncAfter + QtyIn) - QtyOut AS NewBlncAfter
I get this error :
Msg 207, Level 16, State 1, Line 3
Invalid column name 'Prev_BlncAfter'
How can I fix this ? Thanks
You can create the LAG column inside the CTE instead of in the outer query. E.g.
declare #DocDtls table (Warehouse int, zDate date, [PrimDocNum] int);
declare #Transactions table (code int, id int, QtyIn int, QtyOut int, balanceafter int, [DocNum] int)
;with Tr As
(
SELECT
d.Warehouse
, t.Code
, d.zDate
, t.ID
, t.QtyIn
, t.QtyOut
, t.BalanceAfter
,LAG(BalanceAfter,1,0) Over (partition by Warehouse,Code order by Code,ID) Prev_BlncAfter
FROM #DocDtls d
INNER JOIN #Transactions t ON d.[PrimDocNum] = t.[DocNum]
)
select ID,Code,QtyIn,QtyOut,BalanceAfter
,SUM(Prev_BlncAfter + QtyIn)-QtyOut As NewBlncAfter
from Tr
group by ID,Code,QtyIn,QtyOut,BalanceAfter;
You can nest this query to refer the newly added column from the outer scope, or create another with like you've done before for referencing it afterwards:
with Tr As (
SELECT
DocDtls.Warehouse,
Transactions.Code,
DocDtls.zDate,
Transactions.ID,
Transactions.QtyIn,
Transactions.QtyOut,
Transactions.BalanceAfter
FROM
DocDtls
INNER JOIN Transactions ON DocDtls.[PrimDocNum] = Transactions.[DocNum]
),
formatted_tr as (
select
ID,
Code,
QtyIn,
QtyOut,
BalanceAfter,
LAG(BalanceAfter, 1, 0) Over (
partition by Warehouse,
Code
order by
Code,
ID
) Prev_BlncAfter
from
Tr
)
select
SUM(Prev_BlncAfter + QtyIn) - QtyOut As NewBlncAfter
from
formatted_tr
group by
ID, QtyOut
;
Based on comments , I combined the two answers to get what I need :
with Tr As (
SELECT
DocDtls.Warehouse,
Transactions.Code,
DocDtls.zDate,
Transactions.ID,
Transactions.QtyIn,
Transactions.QtyOut,
Transactions.BalanceAfter
FROM
DocDtls
INNER JOIN Transactions ON DocDtls.[PrimDocNum] = Transactions.[DocNum]
),
formatted_tr as (
select
ID,
Code,
QtyIn,
QtyOut,
BalanceAfter,
LAG(BalanceAfter, 1, 0) Over (
partition by Warehouse,
Code
order by
Code,zDate,ID
) Prev_BlncAfter
from
Tr
)
select ID,Code,QtyIn,QtyOut,BalanceAfter
,SUM(Prev_BlncAfter + QtyIn)-QtyOut As NewBlncAfter
from formatted_tr
group by ID,Code,QtyIn,QtyOut,BalanceAfter;
;

SQL unpivot of multiple columns

I would like the following wide table to be unpivotted but only where a user has a true value against the field, along with the appropriate date.
Current State:
CUSTOMER_ID
First_Party_Email
Third_Party_Email
First_Party_Email_Date
Third_Party_Email_Date
40011111
1
1
2021-01-22 04:38:00.000
2021-01-17 06:38:00.000
50022222
NULL
1
NULL
2021-01-18 04:38:00.000
80066666
1
NULL
2021-01-24 05:38:00.000
NULL
_______________
_______________________
_______________________
_______________________________
_______________________________
Required State:
Customer_ID
Type
Value
Date
40011111
First_Party_Email
1
22/01/2021 04:38
40011111
Third_Party_Email
1
17/01/2021 06:38
50022222
Third_Party_Email
1
18/01/2021 04:38
80066666
First_Party_Email
1
24/01/2021 05:38
_______________________________________________________________________
Associated query to create table and my attempt that doesn't work:
create table Permissions_Obtained
(Customer_ID bigint
,First_Party_Email bit
,Third_Party_Email bit
,First_Party_Email_Date datetime
,Third_Party_Email_Date datetime
)
insert into Permissions_Obtained
(Customer_ID
,First_Party_Email
,Third_Party_Email
,First_Party_Email_Date
,Third_Party_Email_Date
)
VALUES
(40011111, 1, 1, '2021-01-22 04:38', '2021-01-17 06:38'),
(50022222, NULL, 1, NULL, '2021-01-18 04:38'),
(80066666, 1, NULL, '2021-01-24 05:38', null)
select *
from Permissions_Obtained
select
customer_id, Permission
from Permissions_Obtained
unpivot
(
GivenPermission
for Permission in (
First_Party_Email, Third_Party_Email
)
) unpiv1,
unpivot
(
GivenPermissionDate
for PermissionDate in (
First_Party_Email_Date, Third_Party_Email_Date
)
) unpiv2
where GivenPermission = 1
--drop table Permissions_Obtained
Any help would be massively appreciated. TIA
You cannot have multiple unpivots at the same time. Instead you can use Cross Apply or Inner join or union, union all or kind of joins depending on your requirement. I have added a sample answer for this using join and unpivot.
SELECT
unpvt.Customer_ID
, [Type]
, ISNULL(po.First_Party_Email ,po.Third_Party_Email) AS [Value]
,CASE WHEN unpvt.Type = 'First_Party_Email' THEN po.First_Party_Email_Date
ELSE po.Third_Party_Email_Date
END AS [Date]
FROM
(
SELECT
Customer_ID, First_Party_Email , Third_Party_Email
FROM Permissions_Obtained
) p
UNPIVOT
( [Value] FOR [Type] IN
(First_Party_Email , Third_Party_Email )
)AS unpvt
INNER JOIN Permissions_Obtained [po]
on [po].Customer_ID = unpvt.Customer_ID
When un-pivoting multiple columns, CROSS APPLY (VALUES is often the easiest and most effective solution.
It creates a virtual table per-row of the previous table, and therefore un-pivots it into separate rows.
SELECT
p.Customer_Id,
v.[Type],
v.Value,
v.Date
FROM Permissions_Obtained p
CROSS APPLY (VALUES
('First_Party_Email', p.First_Party_Email, p.First_Party_Email_Date),
('Third_Party_Email', p.Third_Party_Email, p.Third_Party_Email_Date)
) v([Type], Value, Date)
where v.Value IS NOT NULL;

Optimizing query with huge amount of data

How can I optimize the query. I looked at the execution plan and created all the index. Every table has huge data. And this query execution time is very large. By looking at the query could you please suggest where can I optimize more.
If I give little background of the query the structure like:
There are many companies
Each company can have multiple managers
Data is in pagination format
Filter on #parent_manager so another temp table created parent_manager_filter just to use for the filtering purpose as #parent_manager has name in "," separated format
CREATE TABLE #parent_manager
(
cid NUMERIC(18) PRIMARY KEY,
name NVARCHAR(MAX),
code NVARCHAR(MAX)
);
CREATE INDEX cte_parent_manager ON #parent_manager(cid);
CREATE TABLE #parent_manager_filter
(
cid NUMERIC(18),
name NVARCHAR(1000),
code NVARCHAR(1000)
);
CREATE INDEX cte_parent_manager_filter_idx ON #parent_manager_filter(cid);
INSERT INTO #parent_manager
SELECT DISTINCT
mgrc.cid,
name = CAST (STUFF ((SELECT ', ' + CAST(c.company_name AS varchar(2000))
FROM manager_company mc
INNER JOIN company c ON (mc.mgr_cid = c.cid )
WHERE mc.cid = mgrc.cid
AND c.company_name IS NOT NULL
FOR XML PATH ('')), 1, 1, '') AS VARCHAR(2000)),
code = CAST (STUFF ((SELECT ', ' + CAST(c.code AS varchar(2000))
FROM manager_company mc
INNER JOIN company c ON (mc.mgr_cid = c.cid )
WHERE mc.cid = mgrc.cid
AND c.company_name IS NOT NULL
FOR XML PATH ('')), 1, 1, '') AS VARCHAR(2000))
FROM
manager_company mgrc
INNER JOIN
company c ON (mgrc.mgr_cid = c.cid )
JOIN
handler h ON (c.handlerId = h.handlerid )
WHERE
h.handlerid = 5800657002370
INSERT INTO #parent_manager_filter
SELECT DISTINCT
mc.cid,
c.company_name as name,
c.code as code
FROM
manager_company mc
INNER JOIN
company c ON (mc.mgr_cid = c.cid )
JOIN
handler h ON (h.handlerid = c.handlerid)
WHERE
h.handlerid = 5800657002370 ;
WITH company AS
(
SELECT DISTINCT
c.cid AS cid,
parentManager.name AS MANAGER_NAME,
parentManager.code AS code
FROM
company c
LEFT JOIN
#parent_manager parentManager ON (parentManager.cid = c.cid)
LEFT JOIN
# parent_manager_filter parentManagerFilter ON (parentManagerFilter.cid = c.cid)
WHERE
parentManagerFilter.name IN (:managerList)
),
total_rows AS
(
SELECT
COUNT(*) OVER () AS TOTALCOUNT,
ROW_NUMBER() OVER (ORDER BY company_name ASC) AS rnum,
grid.*
FROM
company grid
)
SELECT *
FROM total_rows rnum
WHERE rnum >= 1
AND rnum <= 10
DROP TABLE #parent_manager;
DROP TABLE #parent_manager_filter;
If you are building up temp tables then I would make sure you don't miss a clustered index, else your temp table is simply a heap. You don't have one covering the filter table.
INSERT INTO #parent_manager_filter ...
CREATE CLUSTERED INDEX cte_parent_manager_filter On #parent_manager_filter(cid);

SQL Server Query for required result

I am using SQL Server with my application.
The Table data is as following :
And I want result in following format:
I have tried with split function but its not working properly.
Is it possible to get such a result.
Please suggest.
Thank you.
Try this. I did not manage to get a single Not Req, it is like this ("Not Req/Not Req").
drop table if exists dbo.TableB;
create table dbo.TableB (
OldSPC varchar(100)
, old_freq varchar(100)
, NewSPC varchar(100)
, new_freq varchar(100)
);
insert into dbo.TableB(OldSPC, old_freq, NewSPC, new_freq)
values ('ADH,BAP', '7,7', 'ADH,BAP', '7,7')
, ('Not Req', 'Not Req', 'ADH,BAP', '7,7')
, ('BAP,EXT,ADL', '35,7,42', 'BAP,EXT,BAP,ADL', '21,7,35,42');
select
tt1.OldSPCOldFreq
, tt2.NewSPCNewFreq
from (
select
t.OldSPC, t.old_freq, t.NewSPC, t.new_freq
, STRING_AGG(t1.value + '/' + t2.value, ',') OldSPCOldFreq
from dbo.TableB t
cross apply (
select
ROW_NUMBER () over (order by t.OldSPC) as Rbr
, ss.value
from string_split (t.OldSPC, ',') ss
) t1
cross apply (
select
ROW_NUMBER () over (order by t.old_freq) as Rbr
, ss.value
from string_split (t.old_freq, ',') ss
) t2
where t1.Rbr = t2.Rbr
group by t.OldSPC, t.old_freq, t.NewSPC, t.new_freq
) tt1
inner join (
select
t.OldSPC, t.old_freq, t.NewSPC, t.new_freq
, STRING_AGG(t3.value + '/' + t4.value, ',') NewSPCNewFreq
from dbo.TableB t
cross apply (
select
ROW_NUMBER () over (order by t.NewSPC) as Rbr
, ss.value
from string_split (t.NewSPC, ',') ss
) t3
cross apply (
select
ROW_NUMBER () over (order by t.new_freq) as Rbr
, ss.value
from string_split (t.new_freq, ',') ss
) t4
where t3.Rbr = t4.Rbr
group by t.OldSPC, t.old_freq, t.NewSPC, t.new_freq
) tt2 on tt1.OldSPC = tt2.OldSPC
and tt1.old_freq = tt2.old_freq
and tt1.NewSPC = tt2.NewSPC
and tt1.new_freq = tt2.new_freq
As mentioned in comments, it might be easier for you to do it on front end, but it could be done in SQL Server as well.
Partial Rextester Demo
I didn't replicate your whole scenario but got it for 2 columns. To do it first of all, you need a unique identifier for each row. I am using a sequence number (1,2,3...).
Now refer to this answer, which uses recursive subquery to split csv to rows. Then I used XML PATH to change columns back to csv.
This is the query which is doing it for OLD SPC and OLD FREQ.
;with tmp(SEQ,OldSPCItem,OldSPC,OLD_FREQ_item,OLD_FREQ) as (
select SEQ, LEFT(OldSPC, CHARINDEX(',',OldSPC+',')-1),
STUFF(OldSPC, 1, CHARINDEX(',',OldSPC+','), ''),
LEFT(OLD_FREQ, CHARINDEX(',',OLD_FREQ+',')-1),
STUFF(OLD_FREQ, 1, CHARINDEX(',',OLD_FREQ+','), '')
from table1
union all
select SEQ, LEFT(OldSPC, CHARINDEX(',',OldSPC+',')-1),
STUFF(OldSPC, 1, CHARINDEX(',',OldSPC+','), ''),
LEFT(OLD_FREQ, CHARINDEX(',',OLD_FREQ+',')-1),
STUFF(OLD_FREQ, 1, CHARINDEX(',',OLD_FREQ+','), '')
from tmp
where OldSPC > ''
)
select seq,STUFF( (SELECT ',' + CONCAT(OldSPCItem,'/',OLD_FREQ_item) FROM TMP I
WHERE I.seq = O.seq FOR XML PATH('')),1,1,'') OLD_SPC_OLD_FREQ
from tmp O
GROUP BY seq
;
It will give you this output
+-----+------------------+
| seq | OLD_SPC_OLD_FREQ |
+-----+------------------+
| 1 | ADH/7,BAP/9 |
| 2 | NOT REQ/NOT REQ |
+-----+------------------+
What do you have to do now
- Find a way to generate a sequence number to uniquely identify each row. If you can use any column, use that instead of SEQ.
Similarly add logic for NEW_SPC and NEW_FREQ. (just copy paste LEFT and STUFF like in OLD_FREQ and change it for NEW_SPC and NEW_FREQ.
Replace multiple NOT REQ/ with '', so you will get only one NOT REQ. You can do it with replace function.
If you face any issue/error while doing so, add it to the Rexterster Demo and share the URL, we will check that.

Finding difference between 2 tables in MS Access or SQL Server

I have 2 Excel files which I imported into MS Access as two tables. These two tables are identical but imported on different dates.
Now, how can I find out what rows and what fields are updated on the later date? Any help would be highly appreciated.
Finding Inserted records is easy
select * from B where not exists (select 1 from A where A.pk=B.pk)
Finding Deleted records is just as easy
select * from A where not exists (select 1 from B where A.pk=B.pk)
Finding Updated records is a pain. The following rigorous query assumes you have nullable columns and it should work in all situations.
select B.*
from B
inner join A on B.pk=A.pk
where A.col1<>B.col1 or (IsNull(A.col1) and not IsNull(B.col1)) or (not IsNull(A.col1) and IsNull(B.col1))
or A.col2<>B.col2 or (IsNull(A.col2) and not IsNull(B.col2)) or (not IsNull(A.col2) and IsNull(B.col2))
or A.col3<>B.col3 or (IsNull(A.col3) and not IsNull(B.col3)) or (not IsNull(A.col3) and IsNull(B.col3))
etc...
If the columns are defined as NOT NULL then the query is much simper, just remove all the NULL tests.
If the columns are nullable but you can identify a value that will never appear in the data, then use a simple comparison like:
Nz(A.col1,neverAppearingValue)<>Nz(B.col1,neverAppearingValue)
I believe this should be as simple as running a query like this:
SELECT *
FROM Table1
JOIN Table2
ON Table1.ID = Table2.ID AND Table1.Date != Table2.Date
One way to do this is by unpivoting both tables, so you get a new table with , , . Note, though, that you have to take types into account.
For example, the following gets differences in fields:
with oldt as (select id, col, val
from <old table> t
unpivot (val for col in (<column list>)) unpvt
),
newt as (select id, col, val
from <new table> t
unpivot (val for col in (<column list>)) unpvit
)
select *
from oldt full outer join newt on oldt.id = newt.id
where oldt.id is null or newt.id is null
The alternative way with a join is rather cumbersome. This version shows whether columns are added, deleted, and which columns changed if any:
select *
from (select coalesce(oldt.id, newt.id) as id,
(case when oldt.id is null and newt.id is not null then 'ADDED'
when oldt.id is not null and newt.id is null then 'DELETED'
else 'SAME'
end) as stat,
(case when oldt.col1 <> newt.col1 or oldt.col1 is null and newt.col1 is null
then 1 else 0 end) as diff_col1,
(case when oldt.col2 <> newt.col2 or oldt.col2 is null and newt.col2 is null
then 1 else 0 end) as diff_col2,
...
from <old table> oldt full outer join <new table> newt on oldt.id = newt.id
) c
where status in ('ADDED', 'DELETED') or
(diff_col1 + diff_col2 + ... ) > 0
It does have the advantage of working for any data types.
(Select * from OldTable Except Select *from NewTable)
Union All
(Select * from NewTable Except Select *from OldTable)

Resources