Run Multiple Snowflake UDF's Given Parameters in an Array - snowflake-cloud-data-platform

Basically, what I really want to do is
for i in arr:
foo(i)
where foo is a snowflake script that looks up records matching i and merges them into a different table.
Is there anyway to do this?

SQL Scripting is currently not supported yet, but simple Stored Procedure with something like below should do the trick:
create or replace table table1 (a int);
create or replace table table2 (a int);
insert into table1 values (1), (2), (3), (4), (5);
select * from table1;
select * from table2; -- empty at this stage
CREATE OR REPLACE PROCEDURE my_proc(INPUT VARCHAR)
RETURNS BOOLEAN
LANGUAGE JAVASCRIPT
AS
$$
function my_insert(i) {
var stmt = snowflake.createStatement( {
sqlText: "INSERT INTO table2 SELECT * FROM table1 WHERE a = ?",
binds: [i]
} );
stmt.execute();
}
arr = INPUT.split(",");
for (var i=0; i<arr.length; i++) {
my_insert(arr[i])
}
return true;
$$ ;
call my_proc('1,2,5');
select * from table2; -- should have values 1, 2 and 5

For a purely Snowflake solution then you can only do this within a stored procedure.
Obviously, any coding language (that can connect to Snowflake) can do this.

Related

Stored procedure handling multiple SQL statements in Snowflake

I'm creating a stored procedure in Snowflake that will eventually be called by a task.
However I'm getting the following error:
Multiple SQL statements in a single API call are not supported; use one API call per statement instead
And not sure how approach the advised solution within my Javascript implementation.
Here's what I have
CREATE OR REPLACE PROCEDURE myStoreProcName()
RETURNS VARCHAR
LANGUAGE javascript
AS
$$
var rs = snowflake.execute( { sqlText:
`set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE myTableName AS
with cte1 as (
SELECT
*
FROM Table1
where date = $curr_date
)
,cte2 as (
SELECT
*
FROM Table2
where date = $curr_date
)
select * from
cte1 as 1
inner join cte2 as 2
on(1.key = 2.key)
`
} );
return 'Done.';
$$;
You could write your own helper function(idea of user: waldente):
this.executeMany=(s) => s.split(';').map(sqlText => snowflake.createStatement({sqlText}).execute());
executeMany('set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE ...');
The last statement should not contain ; it also may fail if there is ; in one of DDL which was not intended as separator.
You can't have:
var rs = snowflake.execute( { sqlText:
`set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE myTableName AS
...
`
Instead you need to call execute twice (or more). Each for a different query ending in ;.

Combine multiple tables into one in Snowflake

Let's say I have the following monthly tables with table names formatted such that the number after the underscore refers to the month. What I want to do is to combine these 12 tables into one without having to write 10-30 insert/union all statements
table_1
table_2
table_3
table_4
table_5
table_6
table_7
table_8
table_9
table_10
table_11
table_12 -- (only 12 in this instance but could be as many as 36)
My current approach is to first create the master table with data from table_1.
create temporary table master_table_1_12 as
select * -- * to keep it simple for this example
from table_1;
Then use variables such that I can simply keep hitting the run button until it errors out with "table_13 does not exist"
set month_id=(select max(month_id) from master_table_1_12) + 1;
set table_name=concat('table_',$month_id);
insert into master_table_1_12
select *
from identifier($table_name);
Note: All monthly tables have a month_id column
Sure it saves some space on the console(compared to multiple inserts), but I still have to run it 12 times. Are Snowflake Tasks something I could use for this? I couldn't find a fitting example from their documentation to code that up but, if anyone had success with that or with a Javascript based SP for a problem like this, please enlighten.
Here's a stored procedure that will insert into master_table_1_12 from selects on table_1 through table_12. Modify as required:
create or replace procedure FILL_MASTER_TABLE()
returns string
language javascript
as
$$
var rows = 0;
for (var i=1; i<=12; i++) {
rows += insertRows(i);
}
return rows + " rows inserted into master_table_1_12.";
// End of main function
function insertRows(i) {
sql =
`insert into master_table_1_12
select *
from table_${i};`;
return doInsert(sql);
}
function doInsert(queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs = stmt.execute();;
rs.next();
return rs.getColumnValue(1);
}
$$;
call fill_master_table();
By the way, if you don't have any processing to do and just need to consolidate the tables, you can do something like this:
insert into master_table_1_12
select * from table_1
union all
select * from table_2
union all
select * from table_3
union all
select * from table_4
union all
select * from table_5
union all
select * from table_6
union all
select * from table_7
union all
select * from table_8
union all
select * from table_9
union all
select * from table_10
union all
select * from table_11
union all
select * from table_12
;
Can you not create a view on top of these 12 tables. The view will be an union of all these tables.
Based on the comments below, I further elaborated my answer. please try this approach. It will provide better performance when your table is large. Partitioning it will improve performance. This is based on real experience.
CREATE TABLE SALES_2000 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2001 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2002 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2003 (REGION VARCHAR, UNITS_SOLD NUMBER);
INSERT INTO SALES_2000 VALUES('ASIA', 25);
INSERT INTO SALES_2001 VALUES('ASIA', 50);
INSERT INTO SALES_2002 VALUES('ASIA', 55);
INSERT INTO SALES_2003 VALUES('ASIA', 65);
CREATE VIEW ALL_SALES AS
SELECT * FROM SALES_2000
UNION
SELECT * FROM SALES_2001
UNION
SELECT * FROM SALES_2002
UNION
SELECT * FROM SALES_2003;
SELECT * FROM ALL_SALES WHERE UNITS_SOLD = 25;
I ended up creating a UDF that spits out a create view statement and a stored procedure that executes it to create a temporary view. I work with tables following specific naming convention, so you might have to tweak this solution a little for your use case. The separation of UDF and stored proc actually helps with that as you'd mostly need to tweak the SQL UDF. I am sharing a simplified version of what I actually have in the interest of keeping it representative of the tables I listed in my question.
SQL UDF FOR GENERATING A CREATE VIEW STATETEMENT
create or replace function sandbox.public.define_view(table_pattern varchar, start_month varchar, end_month varchar)
returns table ("" varchar) as
$$
with cte1(month_id) as
(select start_month::int + row_number() over (order by 1) - 1
from table(generator(rowcount=> end_month::int - start_month::int + 1)))
,cte2(month_id,statement) as
(select 0,
concat('create or replace temporary view master_',
split_part(table_pattern,'.',-1),
start_month,
'_',
end_month,
' as ')
union all
select month_id,
concat('select * from ',
table_pattern,
month_id,
case when month_id=end_month::int then ';' else ' union all ' end)
from cte1)
select listagg(statement, '\n') within group (order by month_id) as create_view_statement
from cte2
$$;
PROCEDURE FOR EXECUTING THE OUTPUT OF THE UDF ABOVE
create or replace procedure sandbox.public.create_view(TABLE_PATTERN varchar, START_MONTH varchar,END_MONTH varchar)
returns varchar not null
language Javascript
execute as caller
as
$$
sql_command = 'select * from table(sandbox.public.define_view(:1, :2, :3))';
var stmt = snowflake.createStatement({sqlText: sql_command ,binds: [TABLE_PATTERN, START_MONTH, END_MONTH]}).execute();
stmt.next();
var ddl = stmt.getColumnValue(1);
var run=snowflake.createStatement({sqlText: ddl}).execute();
run.next();
var message=run.getColumnValue(1);
return "Temporary " + message;
$$;
USAGE DEMO
set table_pattern ='sandbox.public.table_';
set start_month ='1';
set end_month = '12';
set master_view='master_'||split_part($table_pattern,'.',-1)||$start_month||'_'||$end_month;
call create_view($table_pattern, $start_month, $end_month);
select top 100 *
from identifier($master_view);

Use dynamic value in snowflake time travel sql

I want to use dynamic value in time travel sql of snowflake
select * from my_table at (timestamp => (select max(COMPLETION_DATE) from my_table_2):: timestamp)
When I ran something similar i got this error:
SQL compilation error: argument TIMESTAMP to function AT needs to be constant
I believe that a variable will work for this though:
set x = (select max(COMPLETION_DATE)::timestamp from my_table_2);
select * from my_table at (timestamp => $x);
Edit for a view
I don't think you can do this in a veiw, but you could do something like this probably with a UDF. See documentation here: https://docs.snowflake.net/manuals/sql-reference/udf-table-functions.html
Something like this:
create or replace function My_timetravel()
returns table (column1 varchar, column2 numeric(11, 2))
as
$$
set x = (select max(COMPLETION_DATE)::timestamp from my_table_2);
select column1, column2 from my_table at (timestamp => $x);
$$
;
select * from table(My_timetravel());
Time travel can only be specified with a constant, which makes i impossible to parametrize (currently).
The only way to do this is via a stored procedure, where you can supply a query as text.
I answered a similar question some days ago with this procedure:
CREATE OR REPLACE PROCEDURE TIME_TRAVEL(QUERY TEXT, DAYS FLOAT)
RETURNS VARIANT LANGUAGE JAVASCRIPT AS
$$
function run_query(query, offset) {
try {
var sqlText = query.replace('"at"', " AT(OFFSET => " + (offset + 0) + ") ");
return (snowflake.execute({sqlText: sqlText})).next();
}
catch(e) { return false }
}
var days, result = [];
for (days = 0; days < DAYS; days++)
if (run_query(QUERY, -days * 86400)) result.push(days);
return result;
$$;
CALL TIME_TRAVEL('SELECT * FROM TASK_HISTORY "at" WHERE QUERY_ID = ''019024ef-002e-8f71-0000-05e10030a782''', 7);
Session variable in Snowflake can work with query substitution and would help in the timestamp clause of your time travel query. Currently, this feature is not supported inside a UDF so you would declare the session variable and then use it with your UDF:
SET x = (SELECT current_timestamp());
CREATE OR REPLACE FUNCTION my_timetravel()
RETURNS TABLE (c1 int)
AS
$$
SELECT c1 FROM t1 AT (TIMESTAMP => $x)
$$
;
SELECT c1 FROM TABLE(my_timetravel());

UNION ALL on all tables starting with a certain string

I would like to combine tables starting with the same name into one table.
For example let's say I have a database with tables 'EXT_ABVD', 'EXT_ADAD','EXT_AVSA','OTHER', and I want to combine all tables beginning with 'EXT_', I would want the result of
select col1 ,col2 from EXT_ABVD
union all
select col1 ,col2 from EXT_ADAD
union all
select col1 ,col2 from EXT_AVSA;
I would like to do this on a regular basis (daily for example), and every time this runs there may be new tables starting with 'EXT_'. I don't want to update the union_all query manually.
I am new to Snowflake and don't know how can I do that? Can I use a script inside Snowflake?
Given these tables:
CREATE TABLE TEST_DB.PUBLIC.EXT_ABVD (col1 INTEGER, col2 INTEGER);
CREATE TABLE TEST_DB.PUBLIC.EXT_ADAD (col1 INTEGER, col2 INTEGER);
CREATE TABLE TEST_DB.PUBLIC.EXT_ADAQ (col1 INTEGER, col2 INTEGER);
A view like this could be dynamically created:
CREATE OR REPLACE VIEW TEST_DB.PUBLIC.union_view AS
SELECT * FROM TEST_DB.PUBLIC.EXT_ABVD
UNION ALL
SELECT * FROM TEST_DB.PUBLIC.EXT_ADAD
UNION ALL
SELECT * FROM TEST_DB.PUBLIC.EXT_ADAQ
Using this Procedure:
create or replace procedure TEST_DB.PUBLIC.CREATE_UNION_VEIW(TBL_PREFIX VARCHAR)
returns VARCHAR -- return final create statement
language javascript
as
$$
// build query to get tables from information_schema
var get_tables_stmt = "SELECT Table_Name FROM TEST_DB.INFORMATION_SCHEMA.TABLES \
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_NAME LIKE '"+ TBL_PREFIX + "%';"
var get_tables_stmt = snowflake.createStatement({sqlText:get_tables_stmt });
// get result set containing all table names
var tables = get_tables_stmt.execute();
// to control if UNION ALL should be added or not
// this could likely be handled more elegantly but i don't know JavaScript :)
var row_count = get_tables_stmt.getRowCount();
var rows_iterated = 0;
// define view name
var create_statement = "CREATE OR REPLACE VIEW TEST_DB.PUBLIC.union_view AS \n";
// loop over result set to build statement
while (tables.next()) {
rows_iterated += 1;
// we get values from the first (and only) column in the result set
var table_name = tables.getColumnValue(1);
// this will obviously fail if the column count doesnt match
create_statement += "SELECT * FROM TEST_DB.PUBLIC." + table_name
// add union all to all but last row
if (rows_iterated < row_count){
create_statement += "\n UNION ALL \n"
}
}
// create the view
var create_statement = snowflake.createStatement( {sqlText: create_statement} );
create_statement.execute();
// return the create statement as text
return create_statement.getSqlText();
$$
;
Which we would call like this: CALL CREATE_UNION_VIEW('EXT_A');
This is just a basic example so logic for column counts, schemas etc. likely needs to be added. But given this I think you will be able to figure out how to deal with result sets, parameters and statements.
Edit: See here for how to set up a task that would run a procedure on daily basis. The most basic would in this case look like this:
create or replace task create_union_task
warehouse = COMPUTE_WH
schedule = '1440 minute' -- once every day
as
CALL CREATE_UNION_VIEW('EXT_A');
The only way you can achieve this currently is via a Snowflake Stored Procedure.
You don't specify how you want to consume the result of the query, but a convenient way is via a VIEW. So the Stored Procedure has to generate a VIEW definition containing the query in your question.

Postgresql stored procedure return select result set

In Microsoft SQL server I could do something like this :
create procedure my_procedure #argument1 int, #argument2 int
as
select *
from my_table
where ID > #argument1 and ID < #argument2
And that would return me table with all columns from my_table.
Closest thing to that what I managed to do in postgresql is :
create or replace function
get_test()
returns setof record
as
$$ select * from my_table $$
language sql
or i could define my table type, but manually recreating what technically already exists is very impractical.
create or replace function
get_agent_summary()
returns table (
column1 type, column2 type, ...
)
as
$$
begin
return query select col1, col2, ... from my_existing_table;
...
and it is pain to maintain.
So, how can I easily return resultset without redefining defining every single column from table that I want to return?
In Postgres a table automatically defines the corresponding type:
create or replace function select_my_table(argument1 int, argument2 int)
returns setof my_table language sql as $$
select *
from my_table
where id > argument1 and id < argument2;
$$;
select * from select_my_table(0, 2);
The syntax is more verbose than in MS SQL Server because you can create functions in one of several languages and functions may be overloaded.

Resources