Is it possible to create SQL query templates? - snowflake-cloud-data-platform

I have a couple of tables as data sources which have extremely similar structure. I only care about some columns of them and I want to join them. So what I do at the moment is:
SELECT 'table_a' AS source, col1, col2, col3, col4
FROM table_a as source_table
INNER JOIN other on source_table.id = other.id
UNION ALL
SELECT 'table_b' AS source, col1, col2, col3, col4
FROM table_b as source_table
INNER JOIN other on source_table.id = other.id
UNION ALL
SELECT 'table_c' AS source, col1, col2, col3, col4
FROM table_c as source_table
INNER JOIN other on source_table.id = other.id
UNION ALL
SELECT 'table_d' AS source, col1, col2, col3, col4
FROM table_d as source_table
INNER JOIN other on source_table.id = other.id
I would like to do something like this:
query(param1, param2) := {
SELECT param1 AS source, col1, col2, col3, col4
FROM param2 as source_table
INNER JOIN other on source_table.id = other.id
}
query('table_a', table_a)
UNION ALL
query('table_b', table_b)
UNION ALL
query('table_c', table_c)
UNION ALL
query('table_d', table_d)
I know how to do this within the programming language (using a templating engine and constructing the query string).
Is something like this possible within SQL (Snowflake Warehouse)?

You can't do exactly that I'm afraid. However, you can use Snowflake Stored Procedures (SP) to effectively achieve this. You can construct the SQL query text in SP based on the parameters passed to it, and then executing it. You can e.g. pass to it an array of table names etc.
One problem is that today SPs in Snowflake do not return the result of the query directly. To overcome this, you can e.g. save the result of the query in a new table (with the name hardcoded in SP, or passed as a parameter to SP) and then query it with a separate SELECT.

Related

Federated queries in SnowFlake?

I have three organizations which want to collaborate. All three of them have the same backend database and tables, and want to run a federated query across these three tables. Is that possible using snowflake?
If they each have one "table" each, and data share it to the other two, that can have the three "tables" and
SELECT a.*, b.*, c.*
FROM mytable AS a
JOIN their_table_one AS b
JOIN the_other_table AS c
just fine.
You can import all tables into Snowflake and then create views that combine these tables so that they are visible as one view.
Example:
CREATE VIEW Table1_v
AS
SELECT col1, col2, col3, 'Source A' AS src
FROM SourceA_Table1
UNION ALL
SELECT col1, col2, col3, 'Source B' AS src
FROM SourceB_Table1
UNION ALL
SELECT col1, col2, col3, 'Source C' AS src
FROM SourceC_Table1;

SQL query optimization for select query over IN query

I have one view, i want to add pagination logic on this view. There are over 1.5 million records. It took longer time to get result if for my where condition that select only specific records mapped with one Id.
I am thinking of getting only those mapped records from main table and then select only those records from view, will this faster?
Select top 10 col1, col2, col3, ROW_NUMBER() OVER (ORDER BY col4 desc) from vMyView where someid=1
Then
Select top 10 col1, col2, col3 from vMyView where col1 in (Select col1 from tMyTable where someid=1)
FYI I am not expert
Assuming typical cardinality, I tend to write it more like this:
select top 10 col1, col2, col3
from vMyView v
inner join tMyTable t ON t.col1 = v.col1
WHERE t.someid = 1
However, if it's possible to match more than one row in tMyTable for each col1 value in vMyView, this could possibly result in duplicating rows from vMyView. If duplicating rows is possible, a solution based on row_number() is typically the fastest option.
i want to add pagination logic on this view
As for paging, you should look into OFFSET/FETCH syntax, rather than TOP n.
SELECT col1, col2, col3
FROM vMyView v
ORDER BY <need an order by clause for paging to work>
OFFSET <pagenumber * pagesize> FETCH NEXT <pagesize> ROWS ONLY

Keeping the results with GROUP BY

I connect four tables, but if I do a GROUP BY with a propertie of the fourth table, I get different results. This is the Query:
There are basically two options:
JOIN back to original table using nested query.
SELECT TA.col1, AggrFunc(col2) AS col2,
(SELECT col3 -- TOP 1? MAX? It must be single row
FROM table1 AS TB
JOIN TA ON TA. = TB. -- INNER JOIN? LEFT OUTER JOIN?
FROM table1 AS TB JOIN table2 JOIN table3
GROUP BY TA.col1;
Or use a CTE. You have more control on how many rows of extra columns to return
WITH CTE AS
(
SELECT col1, AggrFunc(col2) AS col2
FROM ... JOINs
GROUP BY col1
)
SELECT CTE.*, table1.col3
FROM CTE
JOIN table1 --INNER JOIN? LEFT OUTER JOIN?
Use window function if possible
SELECT col1, AggrFunc(col2) OVER (PARTITION BY col1) AS col2, extra_col3
FROM ...JOINs...
then you can put above query a CTE or FROM clause to further filtering or grouping.
SELECT
FROM (query above)
WHERE ...
GROUP BY ...
The question is same: How do you get single extra_col3(SKU.[Reorder Cycle] in your case) row? How do you pick up one record when there are multiple matches to your grouped data.
Oke, this was doing the job(oke, I made a pivot of it):

Column names in each view or function must be unique

while creating view I am getting error Column names in each view or function must be unique but while framing select query i am getting only one record.i have to use col1 and col2 for both the tables..if data doesnot exist in A table it will take from B table.how i can do this.Thanks in advance..
Create View ViewName AS
select
A.col1 as col1,
A.col2 as col2,
null as col1,
B.col2 as col2
from table A,table B where A.col3=B.col3
From your comments, it looks like you are looking for this:
Create View ViewName AS
select
A.col1,
COALESCE(A.col2, B.col2) AS col2
from table A
left join table B
on A.col3=B.col3;
Use a LEFT OUTER JOIN to handle the condition where the join fails.
Explanation of the actual error:
The error says it all - your view has two columns named col1 and two columns col2. Change the columns names of one of the sets. Unlike views, an adhoc select query doesn't require unique names (or any column name at all, for that matter).
Based on your comment, you will probably need something like this:
CREATE VIEW ViewName AS
SELECT ISNULL(A.col1, B.col1) as col1, -- This will return B.Col1 if A.Col1 is null
ISNULL(A.col2, B.col2) as col2,
FROM table A INNER JOIN table B ON(A.col3 = B.col3)
Edit
Based on your comments to this answer, you can do something like this:
ALTER VIEW temp AS
SELECT COALESCE(A.col1, D.col1) as col1,
COALESCE(A.col2, B.col2, C.col2, D.col2) as col2
FROM table A
INNER JOIN table1 B ON (A.col3=B.col3)
INNER JOIN table3 C ON (A.col3=C.col3)
INNER JOIN table4 D ON (A.col3=D.col3)
Note: you wrote COALESCE(A.col1,null,null,D.col1), this is equivalent to COALESCE(A.col1, D.col1), since the coalesce function will return the first argument it receives that is not null.
LIke this we can do?
alter view temp as select COALESCE(A.col1,null,null,D.col1) as col1, COALESCE(A.col2,B.col2,C.col2,D.col2) as col2 from table A INNER JOIN table1 B INNER JOIN table3 C INNER JOIN table4 ON (A.col3=B.col3 and A.col3=C.col3 and A.col3=D.col3)

What's the most efficient syntax to use Merge to upsert many rows at once?

There's 2 ways I've found of upserting many rows into a table with SQL Server 2008.
One of which is found here http://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx says to create a temp table, then insert values to temp table, and finally merge that table with target able.
This doesn't seem very efficient to me because you have to create a table, fill the table, merge to target table, and then delete the temp table.
The only other thing I can think of is as follows...
MERGE dbo.targettable as tgt
USING (
SELECT 12 as col1, 13 as col2, 'abc' as col3, 'zyx' as col4
UNION ALL
SELECT 11 as col1, 11 as col2, 'def' as col3, 'def' as col4
(etc etc)
UNION ALL
SELECT 7 as col1, 10 as col2, 'jfj' as col3, 'tub' as col4)
as new
ON tgt.col1=new.col1
WHEN MATCHED THEN UPDATE SET tgt.col2=new.col2, tgt.col3=new.col3, tgt.col4=new.col4
WHEN NOT MATCHED THEN INSERT (col1, col2, col3, col4)
VALUES(new.col1, new.col2, new.col3, new.col4);
Based on usr's answer I was able to find http://msdn.microsoft.com/en-us/library/bb510625.aspx
I think this is the way to do it. Could someone verify that this syntax appears correct?
MERGE dbo.targettable as tgt
USING (VALUES(12, 13, 'abc', 'zyx'), (11, 11, 'def', 'def'),(7, 10, 'jfj', 'tub'))
AS new (col1, col2, col3, col4)
ON tgt.col1=new.col1
WHEN MATCHED THEN UPDATE SET tgt.col2=new.col2, tgt.col3=new.col3, tgt.col4=new.col4
WHEN NOT MATCHED THEN INSERT (col1, col2, col3, col4)
VALUES(new.col1, new.col2, new.col3, new.col4);
Where does the data to be merged come from?
If it comes from a query, inline the query into the merge.
If it
comes from the app, use table-valued parameters.
If it is generated
iteratively, use a temp table or table variable.
If it is a constant like in your example use the VALUES clause. Don't use UNION ALL because it is more verbose, does not document semantics nicely and increases query compile time because the optimizer has to convert it to VALUES form.

Resources