Difference between LATERAL FLATTEN(...) and TABLE(FLATTEN(...)) in Snowflake - snowflake-cloud-data-platform

What is the difference between the use of LATERAL FLATTEN(...) and TABLE(FLATTEN(...)) in Snowflake? I checked the documentation on FLATTEN, LATERAL and TABLE and cannot make heads or tails of a functional difference between the following queries.
select
id as account_id,
account_regions.value::string as region
from
salesforce.accounts,
lateral flatten(split(salesforce.accounts.regions, ', ')) account_regions
select
id as account_id,
account_regions.value::string as region
from
salesforce.accounts,
table(flatten(split(salesforce.accounts.regions, ', '))) account_regions

I'll say that in the presented queries there's no difference - as the lateral join is implicit by the dynamic creation of a table out of the results of operating within values coming out of a row.
The real need for the flatten keyword comes out of queries like this:
select *
from departments as d
, lateral (
select *
from employees as e
where e.department_id = d.department_id
) as iv2
order by employee_id;
-- https://docs.snowflake.com/en/sql-reference/constructs/join-lateral.html
Without the lateral keyword for this join, you get an Error: invalid identifier 'D.DEPARTMENT_ID'.

Related

Snowflake - Invalid Identifier

Hi i am getting invalid identifier for “owner”while executing this query below,i have tried single/double quotes and tilda
in show roles i have comments with email address and i am trying to get owner and owner email address.
comments example - {RoleType:"Access",Workload:"RTR",Application:"Business Automation",AppName:"RTR",Contact:"sabfinw#gmail.com",Director:"sabrish133#gmail.com",Environment:"DEV",Owner:"sanjayd.980#gmail.com",VP:"sabrish"}
show roles;
select owner,split_part(b.data,':',2) as Owner_Email from (
select a.value::string as data from TABLE(RESULT_SCAN(LAST_QUERY_ID())),
lateral flatten(input=>split("comment",',')) a) b where b.data like '%Owner%';
You will need to select the value from the inner select
SELECT
owner,
split_part(b.data,':',2) as Owner_Email
FROM (
SELECT r.owner,
a.value::string as data
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) r,
LATERAL FLATTEN(input=>split(r.comment, ',')) a
) b
WHERE b.data LIKE '%Owner%';
Or you can use the QUALIFY clause to filter in the same time like:
SELECT r.owner,
split_part(a.value::string, ':', 2) as Owner_Email
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())) r,
LATERAL FLATTEN(input=>split(r.comment, ',')) a
QUALIFY a.value::string LIKE '%Owner%'

Rows to columns without PIVOT in SQL Server

I have a 3 tables from which contain this data:
Table 1:
Table 2:
Table 3:
Output:
I have tried using Pivot but it has to have an aggregate function in it.
SELECT
project_code, project_name, fk_prj_project_id,
[A], [B], [C], [D]
FROM
(SELECT
project_code, project_name, employee_name,
fk_prj_project_id, fk_prj_project_id AS nm,
activity_details
FROM
PRJ_MST_PROJECT AS a
LEFT JOIN
PRJ_TNS_DAILY_SUMMARY AS b ON a.pk_prj_project_id = b.fk_prj_project_id
LEFT JOIN
HRM_EMP_MST_EMPLOYEE AS c ON b.fk_hrm_emp_employee_id = c.pk_hrm_emp_employee_id
WHERE
a.project_status = 0
AND b.transaction_status = 1
AND CONVERT(date, b.transaction_date, 103) = CONVERT(date, '15/04/2021', 103)) x
PIVOT
(MAX(nm)
FOR nm IN ([A], [B], [C], [D])
) p
The problem is you set your PIVOT to look for values of nm in A, B, C, and D, but nm is an alias for fk_prj_project_id, which has possible values of 1, 2, 3, 4, and 5. So there are no A, B, C, or D values to be had. I don't even see a name for the column that holds A, B, C, and D, but whatever column that is needs to be what you put in the "FOR ___ IN" section of your pivot.
Test your query by commenting out the reference to the pivot columns in the SELECT and comment out the word PIVOT and everything after it and re-run your query. You should see some column with values A, B, C, D. If you don't, fix your query so you do. Once you do, that column is what you PIVOT on (put it between FOR and IN in the pivot block).
Oh, and if you provide data in a usable format people might run your query and give you directly usable results, it's a lot to ask to have people enter your data to get to help you so meet them half way. A link to sqlfiddle is ideal, but even just a bunch of DECLARE #T1 and INSERT INTO T1 VALUES statements is usually enough to get significantly better help.
EDIT:
Nice job with the Fiddle!
OK, so using your data, we can test out actual queries. For PIVOT to work, we need a column to look up (employee name), a column to aggregate (activity_details), and some columns that will be constant across the rows produced (the project's name and ID). You're working with text not numbers, so your aggregation can't be mathematical, leaving you with pretty much just MAX or MIN. To make sure you get the right (newest) one, I first built a table of comments and numbered them by how new they were, then I picked just the newest comment for each (project, user) pair. cteCommentNewest is the result of that.
Now with a clean (and verified) table to pivot, the actual pivot syntax is simple. Well, as simple as Pivot can be, it's inherently pretty confusing IMHO, but structuring it this way keeps the actual PIVOT as clean as possible.
Note that the query is in twice, I tested it as a static query before converting it to dynamic because it's much easier to troubleshoot a static query, then I left it in in case you want to experiment with it. You don't need it for the final solution to work.
Here's the final code, fully tested and producing the specified output:
DECLARE #cols3 AS NVARCHAR(MAX)
DECLARE #query3 AS NVARCHAR(MAX)=''
DECLARE #dt varchar(100)='14/04/2021'
select #cols3 = STUFF((SELECT ',' + QUOTENAME(employee_name)
from dbo.HRM_EMP_MST_EMPLOYEE
order by employee_name
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
--SELECT #cols3 --Test column list for dynamic query
--Test the core functions of pivot before making dynamic
;with cteCommentsAll as (
SELECT P.project_code , P.project_name, C.activity_details , E.employee_name
, ROW_NUMBER () over (PARTITION BY P.project_code , E.employee_name ORDER BY C.transaction_date DESC) as Newness
FROM dbo.PRJ_MST_PROJECT as P --Projects
LEFT OUTER JOIN dbo.PRJ_TNS_DAILY_SUMMARY as C --Comments on projects
ON P.pk_prj_project_id = C.fk_prj_project_id --Get all projects, then all comments for each project
LEFT OUTER JOIN dbo.HRM_EMP_MST_EMPLOYEE as E --Employees who commented
on E.pk_hrm_emp_employee_id = C.fk_hrm_emp_employee_id
), cteCommentsNewest as (
SELECT project_code , project_name, activity_details , employee_name
FROM cteCommentsAll WHERE Newness = 1 --Only one comment per user per project of CROSS problems
)
SELECT *
FROM cteCommentsNewest as N --TEST up to this point to see the raw table
PIVOT (MAX(activity_details) FOR employee_name IN (A, B, C) ) as P
--Put the working query, modified for dynamic columns, into a variable
set #query3 = N'
;with cteCommentsAll as (
SELECT P.project_code , P.project_name, C.activity_details , E.employee_name
, ROW_NUMBER () over (PARTITION BY P.project_code , E.employee_name ORDER BY C.transaction_date DESC) as Newness
FROM dbo.PRJ_MST_PROJECT as P --Projects
LEFT OUTER JOIN dbo.PRJ_TNS_DAILY_SUMMARY as C --Comments on projects
ON P.pk_prj_project_id = C.fk_prj_project_id --Get all projects, then all comments for each project
LEFT OUTER JOIN dbo.HRM_EMP_MST_EMPLOYEE as E --Employees who commented
on E.pk_hrm_emp_employee_id = C.fk_hrm_emp_employee_id
), cteCommentsNewest as (
SELECT project_code , project_name, activity_details , employee_name
FROM cteCommentsAll WHERE Newness = 1 --Only one comment per user per project of CROSS problems
)SELECT *
FROM cteCommentsNewest as N
PIVOT (MAX(activity_details) FOR employee_name IN (' + #cols3 + ') ) as P
'
exec sp_executesql #query3
which produces the following output
project_code
project_name
A
B
C
MOA20171
Project A
some remark By Employee A on 14
NULL
some remark By Employee C on 14
MOA20172
Project B
NULL
NULL
some remark By Employee C on 15
MOA20173
Project C
NULL
NULL
NULL

Lateral Flatten two columns with different array length in snowflake

i am new to snowflake and currently learning to use Lateral Flatten.
I currently have a dummy table which looks like this:
The data type used for "Customer_Number" & "Cities" is array.
I have managed to understand and apply the Flatten concept to explode the data using the following sql statement:
select c.customer_id, c.last_name, f.value as cust_num, f1.value as city
from customers as c,
lateral flatten(input => c.customer_number) f,
lateral flatten(input => c.cities) f1
where f.index = f1.index
order by customer_id;
The output shown is:
As we can clearly see from the dummy table, in row 4 customer_id 104 has 3 numbers and i would like to see all three of it in my output and if there is no matching index value in cities i would like to just see "Null" in "City".
My expected output is:
Is this possible to be done ?
The trick is to remove the second lateral, and use the index from the first to choose values from the second array:
select c.customer_id, c.last_name, f.value as cust_num, cites[f.index] as city
from customers as c,
lateral flatten(input => c.customer_number) f
order by customer_id;
As long as you can be sure the second record is going to be shorter, you can do:
select customer_id, last_name, list1_table.value::varchar as customer_number,
split(cities,',')[list1_table.index]::varchar as city
from customers, lateral flatten(input=>split(customer_number, ',')) list1_table;
Otherwise you'd have to do union between the 2 sets of records (a regular union will eliminate duplicates)
You may want to use a LEFT OUTER JOIN for this task, but need to create a rowset version of the cities first.
select c.customer_id, c.last_name, f.value as cust_num, f1.value as city
from customers as c
cross join lateral flatten(input => c.customer_number) f
left outer join (select * from customers, lateral flatten(input => cities)) f1
on f.index = f1.index
order by customer_id;

SQL Server: COUNT used with WHERE

so I'd consider myself really new to SQL Server so the less used keywords like HAVING and COUNT() etc. So when I got this error:
An aggregate may not appear in the WHERE clause unless it is in a
subquery contained in a HAVING clause or a select list, and the column
being aggregated is an outer reference.
I was really confused by the last bit. "a select list?" "column being aggregated is an outer reference?" Can anyone explain this in layman's terms?
It's basically saying you need to use a subquery that references another table if you want to use aggregates in those places:
SELECT A,
B,
C
FROM Table T
WHERE A = (SELECT MAX(D) FROM Table T2 WHERE T2.A = T.A)
--Valid, MAX(D) is an outer reference to another table we call T2
SELECT A,
B,
C
FROM Table T
WHERE A = MAX(D) --Invalid
The HAVING version would be something like this:
SELECT A,
B,
C
FROM Table T
GROUP BY A,
B,
C
HAVING COUNT(*) > (SELECT MAX(D) FROM Table T2) --Valid
SELECT A,
B,
C
FROM Table T
GROUP BY A,
B,
C
HAVING COUNT(*) > MAX(D) --Invalid
The SELECT-list is
SELECT a, b, c ... <=== this list of expressions after SELECT
An outer reference is a column of the surrounding query referenced in a subquery. This is clearly explained here: Aggregates with an Outer Reference
Note that the WHERE-clause is applied before grouping (with GROUP BY) and the HAVING-clause after grouping. Therefore the aggregate functions can appear in the HAVING-clause but not in the WHERE clause.
SELECT customer_id, COUNT(*) as number_of_orders, SUM(amount) AS total_amount
FROM cust_orders
WHERE year(order_date) = 2017 -- filters records before grouping.
GROUP BY customer_id -- groups while counting and summing up.
HAVING COUNT(*) > 2 -- count is available here.
This selects all the customer orders of the year 2017 and calculates the totals per customer. Only customers having more than 2 orders in this year are returned.
Basically what it says is that you cannot do this:
WHERE COUNT(ColumnA) = 100
You need a HAVING after the GROUP BY:
SELECT COUNT(ColumnA) AS CountA, ColumnB, ColumnC
FROM Table
GROUP BY ColumnB, ColumnC
HAVING COUNT(ColumnA) = 100

TSQL self join to get results

I run the following query
Select * From
(
Select
GUID,
MFG_CODE,
STK_NAME,
parentid,
masteritem,
ROW_NUMBER() over(order by guid) r
From Fstock Where MasterItem=1 OR isNull(parentID, '')=''
) a
Where r between 4716 And 4716
And I get following results
GUID MFG_CODE parentid masteritem r
31955 369553 0 1 4717
As you can see GUID 31955 is actually a parentITEM & I need to bring in all the children of this parent item within the same query.
For example if I do:
Select * From Fstock where parentID = 31955
It returns 3 children of it
GUID
31956
31957
31958
So is there a way to combine these two queries together, I only want to return fixed amount of rows using row_number() function, however those returned rows sometimes contain a Parent ITem, I would like to return the children for those parent items as well within same query.
Performance is very important for me.
--- EDIT ----
I got it to work with following query, does anyone have other ideas?
With CTE
As
(
Select
GUID,
Manufacturer,
SELL_PRICE,
MFG_CODE,
parentid,
masteritem,
ROW_NUMBER() over(order by GUID) r
From Fstock Where MasterItem=1 OR isNull(parentID, '')=''
)
Select A.*,F.parentID From
(
Select * From CTE
Where r between 4717 And 6000
) A
Left join Fstock F on F.parentID = A.GUID
Order by A.r
This is crude and untested, but I believe you're looking for a recursive Common Table Expression (CTE) that will combine the parent-child relationships for you. Now, natively, this does not integrate any row limitations you mentioned in terms of returning a "fixed number of rows," which I was not precisely sure how to interpret, but the basic query below should be a start for you.
With Products(GUID, MFG_CODE,STK_NAME, parentid,masteritem)
as
(
Select GUID,MFG_CODE,STK_NAME,parentid,masteritem
from fstock
where masteritem=1 OR isNull(parentID, '')=''
Union all
Select f.GUID,f.MFG_CODE,f.STK_NAME,f.parentid,f.masteritem
from fstock f
inner join products g
on f.parentid=g.guid
)

Resources