Snowflake - How to call a UDF within a Procedure

Snowflake - How to call a UDF within a Procedure - snowflake-cloud-data-platform

In the following, TEST_DATA is my data and TEST_VALIDATE is the table I am validating against. I want to test TEST_DATA.DVALUE against TEST_VALIDATE and adjust the DVALUE such that it is always within a given range. The UDF TEST_RANGE() does this and returns the adjusted buffered value:
CREATE TABLE TEST_DATA
(
MNEMONIC VARCHAR2(3),
DVALUE NUMBER(5,1)
);
INSERT INTO TEST_DATA ( MNEMONIC, DVALUE ) VALUES ( 'AT', 2 );
INSERT INTO TEST_DATA ( MNEMONIC, DVALUE ) VALUES ( 'BP', 88 );
CREATE TABLE TEST_VALIDATE
(
MNEMONIC VARCHAR2(3),
MINIMUM_VALUE NUMBER(5,1),
MAXIMUM_VALUE NUMBER(5,1)
);
INSERT INTO TEST_VALIDATE ( MNEMONIC, MINIMUM_VALUE, MAXIMUM_VALUE ) VALUES ( 'AT', 5.1, 10.1 );
INSERT INTO TEST_VALIDATE ( MNEMONIC, MINIMUM_VALUE, MAXIMUM_VALUE ) VALUES ( 'BP', 2.2, 100.2 );
CREATE OR REPLACE FUNCTION TEST_RANGE( p_mnemonic VARCHAR2, p_reading NUMBER(5,1) )
RETURNS NUMBER AS
'SELECT
MAX(CASE
WHEN p_reading BETWEEN MINIMUM_VALUE AND MAXIMUM_VALUE THEN p_reading
WHEN p_reading < MINIMUM_VALUE THEN MINIMUM_VALUE
WHEN p_reading > MAXIMUM_VALUE THEN MAXIMUM_VALUE
ELSE NULL
END)
FROM TEST_VALIDATE
WHERE MNEMONIC = p_mnemonic';
This test executes fine:
SELECT TEST_RANGE( 'AT', 55 );
but this attracts the following error "002031 (42601): SQL compilation error:
Unsupported subquery type cannot be evaluated" -- WHY:
SELECT TEST_RANGE( MNEMONIC, DVALUE ) FROM TEST_DATA;
So I thought of creating a Stored Procedure to do this, but how do you call a UDF in a Procedure? I thought this was pretty basic but I can't find any post on it. Do I have to wrap the UDF in a createStatement().execute().getColumnValue() routine? Is there a more readable way than this?
CREATE OR REPLACE PROCEDURE EDW_WEATHER.TEST_SP ()
RETURNS VARIANT
LANGUAGE JAVASCRIPT
AS $$
var rs = snowflake.createStatement( { sqlText: `SELECT MNEMONIC, DVALUE FROM TEST_DATA`, binds:[] } ).execute();
var results_array = [];
while (rs.next()) {
// This is wrong. How do I call the UDF???
results_array.push( rs.select( TEST_RANGE( rs.getColumnValue('MNEMONIC'), rs.getColumnValue('DVALUE') ) ) );
}
return results_array;
$$

You SP can look something like below:
CREATE OR REPLACE PROCEDURE TEST_SP ()
RETURNS VARIANT
LANGUAGE JAVASCRIPT
AS $$
var rs = snowflake.createStatement( { sqlText: `SELECT MNEMONIC, DVALUE FROM TEST_DATA`, binds:[] } ).execute();
var results_array = [];
while (rs.next()) {
var sub_rs = snowflake.createStatement( {
sqlText: `SELECT TEST_RANGE(?, ?)`,
binds:[
rs.getColumnValue('MNEMONIC'),
rs.getColumnValue('DVALUE')
] }
).execute();
sub_rs.next();
results_array.push(sub_rs.getColumnValue(1));
}
return results_array;
$$;
Basically it is the same way as how you would run a SELECT statement.

To answer your last question how you call a UDF within a Stored Procedure: You can trigger the UDF the same way you do outside the SP.
You have to use the .execute()-method, trigger a SELECT myUDF() with it and then retrieve the resultset for further consumption & processing.

Related

Pass array to Snowflake UDF

My goal is to create a Snowflake UDF that, given an array of values from different columns, returns the maximum value.
This is the function I currently have:
CREATE OR REPLACE FUNCTION get_max(input_array array)
RETURNS double precision
AS '
WITH t AS
(
SELECT value::integer as val from table(flatten(input => input_array))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2
'
When I pass different columns from a table, e.g. select get_max(to_array([table.col1, table.col2, table.col3])) I get the error
Unsupported subquery type cannot be evaluated
However, if I run the sql query only and replace input_array with an array such as array_construct(7, 120, 2, 4, 5, 80) there is no error and the correct value is returned.
WITH t AS
(
SELECT value::integer as val from table(flatten(input => array_construct(2,4,5)))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2

When flattening arrays in a SQL UDF gives you trouble, you can always write a JS, Java, or Python UDF instead.
Here you can see a JS and a Python UDF in action:
CREATE OR REPLACE FUNCTION get_max_from_array_js(input_array array)
RETURNS double precision
language javascript
as
$$
return Math.max(...INPUT_ARRAY)
$$;
CREATE OR REPLACE FUNCTION get_max_from_array_py(input_array array)
RETURNS double precision
language python
handler = 'x'
runtime_version = 3.8
as
$$
def x(input_array):
return max(input_array)
$$;
select get_max_from_array_js([1.1,7.7,2.2,3.3,4.4]);
select get_max_from_array_py([1.1,7.7,2.2,3.3,4.4]);
But given the problem statement, consider using GREATEST in SQL instead:
select greatest(table.col1, table.col2, table.col3)
Performance wise, pure SQL is the best, then JS, then Python:
select current_date()
, max(greatest(c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk)) m
from snowflake_sample_data.tpcds_sf10tcl.customer
-- 692ms S
-- 155ms 3XL
;
select current_date()
, max(get_max_from_array_js([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 15s S
-- 1.2s 3XL
;
select current_date()
, max(get_max_from_array_py([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 32s S
-- 4.3s 3XL
;

Snowflake case statement to return multiple columns value in procedure

Below is the snowflake procedure in am trying to run.
CREATE OR REPLACE PROCEDURE test()
RETURNS VARCHAR(16777216)
LANGUAGE SQL
AS
$$
DECLARE
V_LAT varchar;
V_LNG varchar;
BEGIN
INSERT INTO test.crs_compact.case_test
(
c_address,
c_comment
)
SELECT
(CASE WHEN c_nationkey = 0 then (:V_LAT=a.c_address, :V_LNG=c_comment)
END)
from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."CUSTOMER" as a;
return 'Data Loaded Successfully';
END;
$$
while calling the procedure getting below error.
Uncaught exception of type 'STATEMENT_ERROR' on line 11 at position 1 : SQL compilation error: error line 7 at position 3 Invalid argument types for function 'IFF': (BOOLEAN, ROW(BOOLEAN, BOOLEAN), NULL)

(a.c_address, c_comment) or (:V_LAT=a.c_address, :V_LNG=c_comment) is a ROW
you cannot put a ROW inside a CASE (which is also an IFF)..
Where you have typed looks like you are trying to do this:
SELECT
a.c_address, c_comment
from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."CUSTOMER" as a
WHERE c_nationkey = 0;
Or more to the point:
INSERT INTO test.crs_compact.case_test
(
c_address,
c_comment
)
SELECT
a.c_address, c_comment
from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."CUSTOMER" as a
WHERE c_nationkey = 0;
Now if you want those null values instead of skipping them:
INSERT INTO test.crs_compact.case_test
(
c_address,
c_comment
)
SELECT
iff(c_nationkey = 0, a.c_address, NULL)
,iff(c_nationkey = 0, c_comment, NULL)
from "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."CUSTOMER" as a;

Stored procedure not working while calling in snowflake

Hope everyone is doing good.
I have started working on snowflake. I am trying to create a stored and procedure and calling this SP. Stored procedure is created without any issue. While calling it is saying that unexpected invalid token.
JavaScript compilation error: Uncaught Syntax issue: Invalid or unexpected token in USP_USERS at INSERT INTO stg.users_Temp position 0
CREATE OR REPLACE PROCEDURE ods.usp_Users()
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS
$$
var sql_command =
"INSERT INTO stg.users_Temp
SELECT
LTRIM(RTRIM(id)) AS Id
, LTRIM(RTRIM(name)) AS Name
, LTRIM(RTRIM(divisionId)) AS DivisionId
, LTRIM(RTRIM(divisionName)) AS DivisionName
, LTRIM(RTRIM(department)) AS Department
, LTRIM(RTRIM(email)) AS Email
, LTRIM(RTRIM(state)) AS State
, LTRIM(RTRIM(title)) AS Title
, LTRIM(RTRIM(username)) AS Username
, LTRIM(RTRIM(managerId)) AS ManagerId
, LTRIM(RTRIM(employeeId)) AS employeeId
, LTRIM(RTRIM(employeeType)) AS employeeType
, LTRIM(RTRIM(officialName)) AS officialName
, LTRIM(RTRIM(dateHire)) AS dateHire
, LTRIM(RTRIM(LocationId)) AS LocationId
FROM stg.users;
MERGE INTO
ods.Users AS t
USING
stg.users_Temp s
ON
(
s.Id = t.Id
)
WHEN
MATCHED
THEN
UPDATE
SET
t.Name = s.Name
,t.DivisionId = s.DivisionId
,t.DivisionName = s.DivisionName
,t.Department = s.Department
,t.Email = s.Email
,t.State = s.State
,t.Title = s.Title
,t.Username = s.Username
,t.LocationId = s.LocationId
,t.ManagerId = s.ManagerId
,t.employeeId = s.employeeId
,t.employeeType = s.employeeType
,t.officialName = s.officialName
,t.dateHire = s.dateHire
,t.EtlLastUpdatedDate = CURRENT_TIMESTAMP() :: TIMESTAMP
WHEN NOT MATCHED
THEN INSERT
(
id
,Name
,DivisionId
,DivisionName
,Department
,Email
,State
,Title
,Username
,LocationId
,ManagerId
,employeeId
,employeeType
,officialName
,dateHire
,CurrentRecord
,EtlCreatedDate
)
VALUES
(
s.id
,s.Name
,s.DivisionId
,s.DivisionName
,s.Department
,s.Email
,s.State
,s.Title
,s.Username
,s.LocationId
,s.ManagerId
,s.employeeId
,s.employeeType
,s.officialName
,s.dateHire
,'1'
,CURRENT_TIMESTAMP() :: TIMESTAMP
);"
try {
snowflake.execute (
{sqlText: sql_command}
);
return "Succeeded."; // Return a success/error indicator.
}
catch (err) {
return "Failed: " + err; // Return a success/error indicator.
}
TRUNCATE TABLE stg.users_Temp;
$$
;
//call ods.usp_Users();
While call SP it is throwing an error
Could someone help me on this issue.
Regards,
Khatija

You need to use the backtick (`) character to define multi-line strings:
CREATE OR REPLACE PROCEDURE ods.usp_Users()
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS
$$
var sql_command =
`BEGIN
INSERT INTO stg.users_Temp
SELECT
LTRIM(RTRIM(id)) AS Id
, LTRIM(RTRIM(name)) AS Name
, LTRIM(RTRIM(divisionId)) AS DivisionId
, LTRIM(RTRIM(divisionName)) AS DivisionName
, LTRIM(RTRIM(department)) AS Department
, LTRIM(RTRIM(email)) AS Email
, LTRIM(RTRIM(state)) AS State
, LTRIM(RTRIM(title)) AS Title
, LTRIM(RTRIM(username)) AS Username
, LTRIM(RTRIM(managerId)) AS ManagerId
, LTRIM(RTRIM(employeeId)) AS employeeId
, LTRIM(RTRIM(employeeType)) AS employeeType
, LTRIM(RTRIM(officialName)) AS officialName
, LTRIM(RTRIM(dateHire)) AS dateHire
, LTRIM(RTRIM(LocationId)) AS LocationId
FROM stg.users;
MERGE INTO
ods.Users AS t
USING
stg.users_Temp s
ON
(
s.Id = t.Id
)
WHEN
MATCHED
THEN
UPDATE
SET
t.Name = s.Name
,t.DivisionId = s.DivisionId
,t.DivisionName = s.DivisionName
,t.Department = s.Department
,t.Email = s.Email
,t.State = s.State
,t.Title = s.Title
,t.Username = s.Username
,t.LocationId = s.LocationId
,t.ManagerId = s.ManagerId
,t.employeeId = s.employeeId
,t.employeeType = s.employeeType
,t.officialName = s.officialName
,t.dateHire = s.dateHire
,t.EtlLastUpdatedDate = CURRENT_TIMESTAMP() :: TIMESTAMP
WHEN NOT MATCHED
THEN INSERT
(
id
,Name
,DivisionId
,DivisionName
,Department
,Email
,State
,Title
,Username
,LocationId
,ManagerId
,employeeId
,employeeType
,officialName
,dateHire
,CurrentRecord
,EtlCreatedDate
)
VALUES
(
s.id
,s.Name
,s.DivisionId
,s.DivisionName
,s.Department
,s.Email
,s.State
,s.Title
,s.Username
,s.LocationId
,s.ManagerId
,s.employeeId
,s.employeeType
,s.officialName
,s.dateHire
,'1'
,CURRENT_TIMESTAMP() :: TIMESTAMP
);
END`
try {
snowflake.execute (
{sqlText: sql_command}
);
return "Succeeded."; // Return a success/error indicator.
}
catch (err) {
return "Failed: " + err; // Return a success/error indicator.
}
// TRUNCATE TABLE stg.users_Temp;
$$
;
PS: You can't user trancate as you tried to do, you should also call it using snowflake.execute. And when you run this proc, it would fail with "Failed: Multiple SQL statements in a single API call are not supported; use one API call per statement instead.". To overcome this issue, you can break up queries, and execute them separately or surround them in a BEGIN/END block.

Stored procedure handling multiple SQL statements in Snowflake

I'm creating a stored procedure in Snowflake that will eventually be called by a task.
However I'm getting the following error:
Multiple SQL statements in a single API call are not supported; use one API call per statement instead
And not sure how approach the advised solution within my Javascript implementation.
Here's what I have
CREATE OR REPLACE PROCEDURE myStoreProcName()
RETURNS VARCHAR
LANGUAGE javascript
AS
$$
var rs = snowflake.execute( { sqlText:
`set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE myTableName AS
with cte1 as (
SELECT
*
FROM Table1
where date = $curr_date
)
,cte2 as (
SELECT
*
FROM Table2
where date = $curr_date
)
select * from
cte1 as 1
inner join cte2 as 2
on(1.key = 2.key)
`
} );
return 'Done.';
$$;

You could write your own helper function(idea of user: waldente):
this.executeMany=(s) => s.split(';').map(sqlText => snowflake.createStatement({sqlText}).execute());
executeMany('set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE ...');
The last statement should not contain ; it also may fail if there is ; in one of DDL which was not intended as separator.

You can't have:
var rs = snowflake.execute( { sqlText:
`set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE myTableName AS
...
`
Instead you need to call execute twice (or more). Each for a different query ending in ;.

Use dynamic value in snowflake time travel sql

I want to use dynamic value in time travel sql of snowflake
select * from my_table at (timestamp => (select max(COMPLETION_DATE) from my_table_2):: timestamp)

When I ran something similar i got this error:
SQL compilation error: argument TIMESTAMP to function AT needs to be constant
I believe that a variable will work for this though:
set x = (select max(COMPLETION_DATE)::timestamp from my_table_2);
select * from my_table at (timestamp => $x);
Edit for a view
I don't think you can do this in a veiw, but you could do something like this probably with a UDF. See documentation here: https://docs.snowflake.net/manuals/sql-reference/udf-table-functions.html
Something like this:
create or replace function My_timetravel()
returns table (column1 varchar, column2 numeric(11, 2))
as
$$
set x = (select max(COMPLETION_DATE)::timestamp from my_table_2);
select column1, column2 from my_table at (timestamp => $x);
$$
;
select * from table(My_timetravel());

Time travel can only be specified with a constant, which makes i impossible to parametrize (currently).
The only way to do this is via a stored procedure, where you can supply a query as text.
I answered a similar question some days ago with this procedure:
CREATE OR REPLACE PROCEDURE TIME_TRAVEL(QUERY TEXT, DAYS FLOAT)
RETURNS VARIANT LANGUAGE JAVASCRIPT AS
$$
function run_query(query, offset) {
try {
var sqlText = query.replace('"at"', " AT(OFFSET => " + (offset + 0) + ") ");
return (snowflake.execute({sqlText: sqlText})).next();
}
catch(e) { return false }
}
var days, result = [];
for (days = 0; days < DAYS; days++)
if (run_query(QUERY, -days * 86400)) result.push(days);
return result;
$$;
CALL TIME_TRAVEL('SELECT * FROM TASK_HISTORY "at" WHERE QUERY_ID = ''019024ef-002e-8f71-0000-05e10030a782''', 7);

Session variable in Snowflake can work with query substitution and would help in the timestamp clause of your time travel query. Currently, this feature is not supported inside a UDF so you would declare the session variable and then use it with your UDF:
SET x = (SELECT current_timestamp());
CREATE OR REPLACE FUNCTION my_timetravel()
RETURNS TABLE (c1 int)
AS
$$
SELECT c1 FROM t1 AT (TIMESTAMP => $x)
$$
;
SELECT c1 FROM TABLE(my_timetravel());