how to convert snowflake table to different structure - snowflake-cloud-data-platform

id
name
DESCRIPTION
ACTIVE
UPDATED_JSON
id1
name-1
desc-1
true
{"diffFields": [{"fieldName": "name","valueAfter": "new-segment-name-1","valueBefore": null},{"fieldName": "active","valueAfter": true,"valueBefore": null}],"segmentId": "b204c220-ea8d-4cf4-b579-30eb59a1a2a4"}
id2
name-2
desc-2
true
{"diffFields": [{"fieldName": "name","valueAfter": "new-segment-name-2","valueBefore": null},{"fieldName": "active","valueAfter": true,"valueBefore": null}],"segmentId": "b204c220-ea8d-4cf4-b579-30eb59a1a2a4"}
I have a table of the above structure in snowflake. UPDATED_JSON is a variant column. I want to change this table to have the structure similar to the one below.
In UPDATED_JSON I have fieldName, when its value is name I need to update the name column to have valueAfter data. diffFields is not ordered. If name in updated_json is not present, I want to leave name column with its current value.
in the below example , name-1 changed to new-segment-name-1 because UPDATED_JSON has a fieldName with value name and valueAfter with value new-segment-name-1
id
name
DESCRIPTION
ACTIVE
id1
new-segment-name-1
desc-1
true
id2
new-segment-name-2
desc-2
true
I am trying to to this with dbt

you data as a CTE:
WITH data(id, name, DESCRIPTION, ACTIVE, UPDATED_JSON) as (
select column1, column2, column3, column4, parse_json(column5) from values
('id1', 'name-1', 'desc-1', true,'{"diffFields": [{"fieldName": "name","valueAfter": "new-segment-name-1","valueBefore": null},{"fieldName": "active","valueAfter": true,"valueBefore": null}],"segmentId": "b204c220-ea8d-4cf4-b579-30eb59a1a2a4"}'),
('id2', 'name-2', 'desc-2', true, '{"diffFields": [{"fieldName": "name","valueAfter": "new-segment-name-2","valueBefore": null},{"fieldName": "active","valueAfter": true,"valueBefore": null}],"segmentId": "b204c220-ea8d-4cf4-b579-30eb59a1a2a4"}')
)
select id
,max(iff(f.value:fieldName::text = 'name', f.value:valueAfter::text, null)) as name
,DESCRIPTION
,active
from data, table(flatten(input=>UPDATED_JSON:diffFields)) f
group by 1,3,4;
gives:
ID
NAME
DESCRIPTION
ACTIVE
id2
new-segment-name-2
desc-2
TRUE
id1
new-segment-name-1
desc-1
TRUE

Related

Renaming a JSON column for a UNION

Remark: my example is overly simplified. In reality, I am dealing with a huge query. But to illustrate the issue/errors, let us resort to apples and oranges.
My original query looked like this:
SELECT 'FruitsCount' AS "Type", (SELECT count(id) as Counter, [Name] FROM Fruits group by name FOR JSON PATH) AS "Value"
Which would result in something like. Let's refer to this as Format A
|---------------------|------------------------------------------------------------------------------|
| Type | Value |
|---------------------|------------------------------------------------------------------------------|
| FruitCount | [{"Counter":2, "Name":"Apple"},{"Counter":3, "Name":"Orange"}] |
|---------------------|------------------------------------------------------------------------------|
However, now I want to create a union of Fruit and Vegetable counts. My query now looks like this
(SELECT count(id) as Counter, [Name] FROM Fruits group by name
UNION
SELECT count(id) as Counter, [Name] FROM Vegetables group by name)
FOR JSON PATH
|---------------------|------------------------------------------------------------------------------|
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
|---------------------|------------------------------------------------------------------------------|
| [{"Counter":2, "Name":"Apple"},{"Counter":3, "Name":"Orange"},{"Counter":7, "Name":"Tomato"}] |
|---------------------|------------------------------------------------------------------------------|
However, I want it in the format as before, where I have a Type and Value columns (Format A).
I tried doing the following:
SELECT 'FruitsCount' AS "Type", ((SELECT count(id) as Counter, [Name] FROM Fruits group by name
UNION
SELECT count(id) as Counter, [Name] FROM Vegetables group by name) FOR JSON PATH) as "Value"
However, I am presented with Error 156: Incorrect syntax near the keyword 'FOR'.
Then I tried the following:
SELECT 'FruitsAndVegCount' AS "Type", (SELECT count(id) as Counter, [Name] FROM Fruits group by name
UNION
SELECT count(id) as Counter, [Name] FROM Vegetables group by name FOR JSON PATH) as "Value"
However, I am presented with Error 1086: The FOR XML and FOR JSON clauses are invalid in views, inline functions, derived tables, and subqueries when they contain a set operator.
I'm stuck in trying to get my "union-ized" query to be in Format A.
Update 1: Here is the desired output
|---------------------|------------------------------------------------------------------------------------------------|
| Type | Value |
|---------------------|------------------------------------------------------------------------------------------------|
| FruitAndVegCount | [{"Counter":2, "Name":"Apple"},{"Counter":3, "Name":"Orange"},{"Counter":7, "Name":"Tomato"}] |
|---------------------|------------------------------------------------------------------------------------------------|
The goal is to only have a single row, with 2 columns (Type, Value) where Type is whatever I specify (i.e. FruitAndVegCount) and Value is a JSON of the ResultSet that is created by the union query.
If I understand the question correctly, the following statement is an option:
SELECT
[Type] = 'FruitAndVegCount',
[Value] = (
SELECT Counter, Name
FROM (
SELECT count(id) as Counter, [Name] FROM Fruits group by name
UNION ALL
SELECT count(id) as Counter, [Name] FROM Vegetables group by name
) t
FOR JSON PATH
)
You could do it with two columns, Type and Value, as follows. Something like this
select 'FruitAndVegCount' as [Type],
(select [Counter], [Name]
from (select count(id) as Counter, [Name] from #Fruits group by [name]
union all
select count(id) as Counter, [Name] from #Vegetables group by [name]) u
for json path) [Value];
Output
Type Value
FruitAndVegCount [{"Counter":2,"Name":"apple"},{"Counter":1,"Name":"pear"},{"Counter":2,"Name":"carrot"},{"Counter":1,"Name":"kale"},{"Counter":2,"Name":"lettuce"}]

How to recreate old snapshot using field history table in Bigquery

I'm currently working on an interesting problem. I am trying to recreate state of table as it was on a given previous date. I have 2 tables
Table A: consists of live data, gets refreshed on an hourly basis.
Table A_field_history: consists of changes made to the fields in Table A.
Following image consists of current state, where Table A has live updated data and Table A_field_history only captures changes made to the fields on table A.
I am trying to recreate Table A as of particular given date. Following image consists of table state as it was on 06/30/2020.
The requirement is to have capability to recreate state of Table A based on any given date.
I actually identified a way to rollback (virtually, not on actual table) all the updates made after given specific date. Following are the steps followed:
Create dummy tables:
WITH
Table_A AS
(
SELECT 1 As ID, '2020-6-28' as created_date, 10 as qty, 100 as value
Union ALL
SELECT 2 As ID, '2020-5-29' as created_date, 20 as qty, 200 as value),
Table_A_field_history AS
(
SELECT 'xyz' id,'2020-07-29' created_date,'12345' created_by,'qty' field,'10' new_value,'200' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-07-24' created_date,'12345' created_by,'qty' field,'20' new_value,'10' old_value,'2' A_id
UNION ALL
SELECT 'xyz' id,'2020-07-29' created_date,'12345' created_by,'value' field,'100' new_value,'2000' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-07-24' created_date,'12345' created_by,'value' field,'200' new_value,'5000' old_value,'2' A_id
UNION ALL
SELECT 'xyz' id,'2020-06-29' created_date,'12345' created_by,'qty' field,'200' new_value,'' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-05-30' created_date,'12345' created_by,'qty' field,'10' new_value,'' old_value,'2' A_id
UNION ALL
SELECT 'xyz' id,'2020-06-29' created_date,'12345' created_by,'value' field,'2000' new_value,'' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-05-30' created_date,'12345' created_by,'value' field,'5000' new_value,'' old_value,'2' A_id
),
Step 1. Create date cte to filter data based on given date:
`date_spine
AS
(
SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 Day)) AS as_of_date
),`
Step 2. Above created date cte can be used as a Spine for our query, cross join to map as_of_date with all the changes made in the history table.
date_changes
AS
(
SELECT DISTINCT
date.as_of_date,
hist.A_id
FROM Table_A_field_history hist CROSS JOIN date_spine date
),
Step 3. Now we have as_of_date mapped to all historical transactions, now we can get max of change date.
most_recent_changes AS (
SELECT
dc.as_of_date,
dc.A_id ,
MAX(fh.created_date) AS created_date,
FROM date_changes dc
LEFT JOIN Table_A_field_history AS fh
ON dc.A_id = fh.A_id
WHERE CAST(fh.created_date AS DATE) <= dc.as_of_date
GROUP BY dc.as_of_date,
dc.A_id
),
Step 4. Now mapping max change date with actual created_date and history table
past_changes AS (
SELECT
mr.as_of_date,
mr.A_id,
mr.created_date,
a.id AS entry_id,
a.created_by AS created_by_id,
CASE WHEN a.field='qty' THEN a.new_value ELSE '' END AS qty,
CASE WHEN a.field='value' THEN a.new_value ELSE '' END AS value,
FROM most_recent_changes AS mr
LEFT JOIN Table_A_field_history AS a
ON mr.A_id = a.A_id
AND mr.created_date = a.created_date
WHERE a.id IS NOT NULL
)
Step 5. Now we can use as_of_date to get historical state of Table A.
Select *
From past_changes x
WHERE x.as_of_date = '2020-07-29'

How to skip the max function which has only one entry when i do a group by in SQL Server

I have a requirement where I do a group by the table
Table
Name salary
------------
abc 10000
abc 1000
def 100
Query:
select max(salary)
from table
group by Name
Result:
abc 10000
def 100
I don't want 'def' to be displayed since it's a single entry in the table. How can I achieve this?
You can add a HAVING clause.
Having specifies a search condition for a group or an aggregate.
HAVING can be used only with the SELECT statement. HAVING is typically
used with a GROUP BY clause. When GROUP BY is not used, there is an
implicit single, aggregated group.
select
Name
,max(salary)
from table
group by Name having count(*) > 1
This will only return the aggregates for names that have more than 1 row, which seems to be what you want.
EXAMPLE
declare #table table (name varchar(16), salary int)
insert into #table
values
('abc',10000),
('abc',1000),
('def',100),
('xxf',100)
select
Name
,max(salary)
from #table
group by Name
having count(*) > 1

Query to find the record with most matching columns, where the number of columns and names of columns is unknown?

I have two tables, X and Y, with identical schema but different records. Given a record from X, I need a query to find the closest matching record in Y that contains NULL values for non-matching columns. Identity columns should be excluded from the comparison. For example, if my record looked like this:
------------------------
id | col1 | col2 | col3
------------------------
0 |'abc' |'def' | 'ghi'
And table Y looked like this:
------------------------
id | col1 | col2 | col3
------------------------
6 |'abc' |'def' | 'zzz'
8 | NULL |'def' | NULL
Then the closest match would be record 8, since where the columns don't match, there are NULL values. 6 WOULD have been the closest match, but the 'zzz' disqualified it.
What's unique about this problem is that the schema of the tables is unknown besides the id column and the data types. There could be 4 columns, or there could be 7 columns. We just don't know - it's dynamic. All we know is that there is going to be an 'id' column and that the columns will be strings, either varchar or nvarchar.
What is the best query in this case to pick the closest matching record out of Y, given a record from X? I'm actually writing a function. The input is an integer (the id of a record in X) and the output is an integer (the id of a record in Y, or NULL). I'm an SQL novice, so a brief explanation of what's happening in your solution would help me greatly.
There could be 4 columns, or there could be 7 columns.... I'm actually writing a function.
This is an impossible task. Because functions are deterministic, so you cannot have a function that will work on an arbitrary table structure, using dynamic SQL. A stored procedure, sure, but not a function.
However, the below shows you a way using FOR XML and some decomposing of the XML to unpivot rows into column names and values which can then be compared. The technique used here and the queries can be incorporated into a stored procedure.
MS SQL Server 2008 Schema Setup:
-- this is the data table to match against
create table t1 (
id int,
col1 varchar(10),
col2 varchar(20),
col3 nvarchar(40));
insert t1
select 6, 'abc', 'def', 'zzz' union all
select 8, null , 'def', null;
-- this is the data with the row you want to match
create table t2 (
id int,
col1 varchar(10),
col2 varchar(20),
col3 nvarchar(40));
insert t2
select 0, 'abc', 'def', 'ghi';
GO
Query 1:
;with unpivoted1 as (
select n.n.value('local-name(.)','nvarchar(max)') colname,
n.n.value('.','nvarchar(max)') value
from (select (select * from t2 where id=0 for xml path(''), type)) x(xml)
cross apply x.xml.nodes('//*[local-name()!="id"]') n(n)
), unpivoted2 as (
select x.id,
n.n.value('local-name(.)','nvarchar(max)') colname,
n.n.value('.','nvarchar(max)') value
from (select id,(select * from t1 where id=outr.id for xml path(''), type) from t1 outr) x(id,xml)
cross apply x.xml.nodes('//*[local-name()!="id"]') n(n)
)
select TOP(1) WITH TIES
B.id,
sum(case when A.value=B.value then 1 else 0 end) matches
from unpivoted1 A
join unpivoted2 B on A.colname = B.colname
group by B.id
having max(case when A.value <> B.value then 1 end) is null
ORDER BY matches;
Results:
| ID | MATCHES |
----------------
| 8 | 1 |

Include NULL in each "Details Group" in SSRS

In SSRS I have a List with, say, a table with two columns: name and number e.g.:
NAME NUMBER
John 123
John 456
John NULL
Name is never null, but number may be. In this case I want the report to include the NULL in each group, like this:
GROUP 1:
John NULL
John 123
GROUP 2:
John NULL
John 456
The SSRS, however, puts the null in a group on its own. How do I accomplish this?
You have told SSRS to group on the NUMBER column, so it will generate a separate group for each value in the NUMBER column and then display those rows. To get what you want, you have to make the data set have the rows you want.
Select Name, Number, cast(Number as varchar(50)) as displayvalue
From mytable
UNION ALL
Select m.Name, m.Number, 'NULL' as displayvalue
From mytable m
Where exists(Select 1 from mytable where Name=m.Name and Number is NULL)
Group by Name, Number
Then group on the Number column but report on the DisplayValue column.

Resources