Pass array to Snowflake UDF - snowflake-cloud-data-platform

My goal is to create a Snowflake UDF that, given an array of values from different columns, returns the maximum value.
This is the function I currently have:
CREATE OR REPLACE FUNCTION get_max(input_array array)
RETURNS double precision
AS '
WITH t AS
(
SELECT value::integer as val from table(flatten(input => input_array))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2
'
When I pass different columns from a table, e.g. select get_max(to_array([table.col1, table.col2, table.col3])) I get the error
Unsupported subquery type cannot be evaluated
However, if I run the sql query only and replace input_array with an array such as array_construct(7, 120, 2, 4, 5, 80) there is no error and the correct value is returned.
WITH t AS
(
SELECT value::integer as val from table(flatten(input => array_construct(2,4,5)))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2

When flattening arrays in a SQL UDF gives you trouble, you can always write a JS, Java, or Python UDF instead.
Here you can see a JS and a Python UDF in action:
CREATE OR REPLACE FUNCTION get_max_from_array_js(input_array array)
RETURNS double precision
language javascript
as
$$
return Math.max(...INPUT_ARRAY)
$$;
CREATE OR REPLACE FUNCTION get_max_from_array_py(input_array array)
RETURNS double precision
language python
handler = 'x'
runtime_version = 3.8
as
$$
def x(input_array):
return max(input_array)
$$;
select get_max_from_array_js([1.1,7.7,2.2,3.3,4.4]);
select get_max_from_array_py([1.1,7.7,2.2,3.3,4.4]);
But given the problem statement, consider using GREATEST in SQL instead:
select greatest(table.col1, table.col2, table.col3)
Performance wise, pure SQL is the best, then JS, then Python:
select current_date()
, max(greatest(c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk)) m
from snowflake_sample_data.tpcds_sf10tcl.customer
-- 692ms S
-- 155ms 3XL
;
select current_date()
, max(get_max_from_array_js([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 15s S
-- 1.2s 3XL
;
select current_date()
, max(get_max_from_array_py([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 32s S
-- 4.3s 3XL
;

Related

Nested json extraction into postgres table

I have used following query to parse and store json elements into table 'pl'
'test' table is used to store raw json.
select
each_attribute ->> 'id' id,
each_attribute ->> 'sd' sd,
each_attribute ->> 'v' v
from test
cross join json_array_elements(json_array) each_section
cross join json_array_elements(each_section -> 'firstt') each_attribute
I am able to view following json values using above query but not able to insert it into another table using json_populate_recordset.
Table definition I need to insert nested json into:
id integer, character varying(6666), character varying(99999)
Table1(for above definition) should store value for key firstt
Table2(for above definition) should store value for key secondt
Json format:
{
"firstt": [
{
"id": 1,
"sd": "test3",
"v": "2223"
},
{
"id": 2,
"sd": "test2",
"v": "2222"
}],
"secondt": [
{
"id": 1,
"sd": "test3",
"v": "2223"
},
{
"id": 2,
"sd": "test2",
"v": "2222"
}]
}
Please assist. I have tried every possible thing from stackoverflow solutions but nothing is given for nested array like this for insertion.
Adding code for dynamic query. It does not work. Error -'too few arguments for format'.
do $$
DECLARE
my record;
tb_n varchar(50);
BEGIN
FOR my IN
SELECT json_object_keys(json_array) as t FROM test
LOOP
tb_n := my.t;
EXECUTE format($$ WITH tbl_record_arrays as(
SELECT
entries.*
FROM
test
JOIN LATERAL json_each(json_array) as entries(tbl_name,tbl_data_arr) ON TRUE
)
INSERT INTO %I
SELECT
records.*
FROM
tbl_record_arrays
JOIN LATERAL json_populate_recordset(null::%I,tbl_data_arr) records ON TRUE
WHERE
tbl_name = %I$$,tb_n);
END LOOP;
END;
$$;
To create a plpgsql function that dynamically inserts a json array for a specified key into a specified table, you can do:
CREATE OR REPLACE FUNCTION dynamic_json_insert(key_name text,tbl text) RETURNS VOID AS $$
BEGIN
-- the $<tag>$ syntax allows for generating a multiline string
EXECUTE format($sql$
INSERT INTO %1$I
SELECT
entries.*
FROM test
JOIN LATERAL json_populate_recordset(null::%1$I,json_data -> $1) as entries ON TRUE;
$sql$::text,tbl) USING dynamic_json_insert.key_name;
END;
$$ LANGUAGE plpgsql
VOLATILE --modifies data
STRICT -- Returns NULL if any arguments are NULL
SECURITY INVOKER; --Execute this function with the Role of the caller, rather than the Role that defined the function;
and call it like
SELECT dynamic_json_insert('firstt','table_1')
If you want to insert into multiple tables using multiple key value pairs you can make a plpgsql function that takes a variadic array of key,table pairs and then generate a single Common Table Expression (CTE) with all of the INSERTs in a single atomic statement.
First create a custom type:
CREATE TYPE table_key as (
tbl_key text,
relation regclass -- special type that refers to a Postgresql relation
);
Then define the function:
CREATE OR REPLACE FUNCTION dynamic_json_insert(variadic table_keys table_key[]) RETURNS VOID AS $$
DECLARE
tbl_key_len integer = array_length(dynamic_json_insert.table_keys,1);
BEGIN
IF tbl_key_len > 0 THEN
EXECUTE (
--generates a single atomic insert CTE when there are multiple table_keys OR a single insert statement otherwise
--the SELECT is enclosed in parenthesis because it generates a single text value which EXECUTE receives.
SELECT
--append WITH if We have more than 1 table_key (for CTE)
CASE WHEN tbl_key_len > 1 THEN 'WITH ' ELSE '' END
|| string_agg(
CASE
WHEN
--name the auxiliary statement and put it in parenthesis.
is_aux THEN format('%1$I as (%2$s)','ins_' || tk.tbl_key,stmt) || end_char
ELSE stmt
END,E'\n') || ';'
FROM
--unnest the table_keys argument and get its index (rn)
unnest(dynamic_json_insert.table_keys) WITH ORDINALITY AS tk(tbl_key,relation,rn)
-- the JOIN LATERAL here means "for each unnested table_key, generate the rows of the following subquery"
JOIN LATERAL (
SELECT
rn < tbl_key_len is_aux,
--we need a comma between auxiliary statements
CASE WHEN rn = tbl_key_len - 1 THEN '' ELSE ',' END end_char,
--dynamically generate INSERT statement
format($sql$
INSERT INTO %1$I
SELECT
entries.*
FROM test
JOIN LATERAL json_populate_recordset(null::%1$I,json_data -> %2$L) as entries ON TRUE
$sql$::text,tk.relation,tk.tbl_key) stmt
) stmts ON TRUE
);
END IF;
END;
$$ LANGUAGE plpgsql
VOLATILE --modifies data
STRICT -- Returns NULL if any arguments are NULL
SECURITY INVOKER; --Execute this function with the Role of the caller, rather than the Role that defined the function;
Then call the function like:
SELECT dynamic_json_insert(
('firstt','table_1'),
('secondt','table_2')
);
Because of the use of the variadic keyword, you can pass in each element of the array as an individual argument and Postgres will cast to the appropriate types automatically.
The generated/executed SQL for the above function call will be:
WITH ins_firstt as (
INSERT INTO table_1
SELECT
entries.*
FROM test
JOIN LATERAL json_populate_recordset(null::table_1,json_data -> 'firstt') as entries ON TRUE
)
INSERT INTO table_2
SELECT
entries.*
FROM test
JOIN LATERAL json_populate_recordset(null::table_2,json_data -> 'secondt') as entries ON TRUE
;

Scalar function with WHILE loop to inline function

Hi I have a view which is used in lots of search queries in my application.
The issue is application queries which use this view is is running very slow.I am investigating this and i found out a particular portion on the view definition which is making it slow.
create view Demoview AS
Select
p.Id as Id,
----------,
STUFF((SELECT ',' + [dbo].[OnlyAlphaNum](colDesc)
FROM dbo.ContactInfoDetails cd
WHERE pp.FormId = f.Id AND ppc.PageId = pp.Id
FOR XML PATH('')), 1, 1, '') AS PhoneNumber,
p.FirstName as Fname,
From
---
This is one of the column in the view.
The scalar function [OnlyAlphaNum] is making it slow,as it stops parallel execution of the query.
The function is as below;
CREATE FUNCTION [dbo].[OnlyAlphaNum]
(
#String VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
WHILE PATINDEX('%[^A-Z0-9]%', #String) > 0
SET #String = STUFF(#String, PATINDEX('%[^A-Z0-9]%', #String), 1, '')
RETURN #String
END
How can i convert it into an inline function.?
I tried with CASE ,but not successful.I have read that CTE is a good option.
Any idea how to tackle this problem.?
I already did this; you can read more about it here.
The function:
CREATE FUNCTION dbo.alphaNumericOnly8K(#pString varchar(8000))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
/****************************************************************************************
Purpose:
Given a varchar(8000) string or smaller, this function strips all but the alphanumeric
characters that exist in #pString.
Compatibility:
SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse
Parameters:
#pString = varchar(8000); Input string to be cleaned
Returns:
AlphaNumericOnly - varchar(8000)
Syntax:
--===== Autonomous
SELECT ca.AlphaNumericOnly
FROM dbo.AlphaNumericOnly(#pString) ca;
--===== CROSS APPLY example
SELECT ca.AlphaNumericOnly
FROM dbo.SomeTable st
CROSS APPLY dbo.AlphaNumericOnly(st.SomeVarcharCol) ca;
Programmer's Notes:
1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see:
http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
2. This is an iTVF (Inline Table Valued Function) that performs the same task as a
scalar user defined function (UDF) accept that it requires the APPLY table operator.
Note the usage examples below and see this article for more details:
http://www.sqlservercentral.com/articles/T-SQL/91724/
The function will be slightly more complicated to use than a scalar UDF but will yeild
much better performance. For example - unlike a scalar UDF, this function does not
restrict the query optimizer's ability generate a parallel query plan. Initial testing
showed that the function generally gets a
3. AlphaNumericOnly runs 2-4 times faster when using make_parallel() (provided that you
have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance).
4. This is an iTVF (Inline Table Valued Function) that will be used as an iSF (Inline
Scalar Function) in that it returns a single value in the returned table and should
normally be used in the FROM clause as with any other iTVF.
5. CHECKSUM returns an INT and will return the exact number given if given an INT to
begin with. It's also faster than a CAST or CONVERT and is used as a performance
enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT.
6. Another performance enhancement is using a WHERE clause calculation to prevent
the relatively expensive XML PATH concatentation of empty strings normally
determined by a CASE statement in the XML "loop".
7. Note that AlphaNumericOnly returns an nvarchar(max) value. If you are returning small
numbers consider casting or converting yout values to a numeric data type if you are
inserting the return value into a new table or using it for joins or comparison
purposes.
8. AlphaNumericOnly is deterministic; for more about deterministic and nondeterministic
functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
Usage Examples:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
---------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20150526 - Inital Creation - Alan Burstein
Rev 00 - 20150526 - 3rd line in WHERE clause to correct something that was missed
- Eirikur Eiriksson
Rev 01 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
WITH
E1(N) AS
(
SELECT N
FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))x(N)
),
iTally(N) AS
(
SELECT TOP (LEN(ISNULL(#pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 a CROSS JOIN E1 b CROSS JOIN E1 c CROSS JOIN E1 d
)
SELECT AlphaNumericOnly =
(
SELECT SUBSTRING(#pString,CHECKSUM(N),1)
FROM iTally
WHERE
((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 65) & 0x7FFF) < 26
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 97) & 0x7FFF) < 26
ORDER BY N
FOR XML PATH('')
);
Note the examples in the code comments:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
Returns:
AlphaNumericOnly
-------------------
xxx123abc999
txtID OldTxt AlphaNumericOnly
----------- ------------- -----------------
1 !!!A555A!!! A555A
2 NULL NULL
3 AAA.999 AAA999
It's the fastest of it's kind. It runs extra fast with a parallel execution plan. To force a parallel execution plan, grab a copy of make_parallel by Adam Machanic. Then you would run it like this:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM dbo.alphaNumericOnly8K('xxx123abc999!!!') ao
CROSS APPLY dbo.make_parallel();
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY dbo.alphaNumericOnly8K(st.txt)
CROSS APPLY dbo.make_parallel();
Surely there is scope to improve this. test it out.
;WITH CTE AS (
SELECT (CASE WHEN PATINDEX('%[^A-Z0-9]%', D.Name) > 0
THEN STUFF(D.Name, PATINDEX('%[^A-Z0-9]%', D.Name), 1, '')
ELSE D.NAME
END ) NameString
FROM #dept D
UNION ALL
SELECT STUFF(C.NameString, PATINDEX('%[^A-Z0-9]%', C.NameString), 1, '')
FROM CTE C
WHERE PATINDEX('%[^A-Z0-9]%', C.NameString) > 0
)
Select STUFF((SELECT ',' + E.NameString from CTE E
WHERE PATINDEX('%[^A-Z0-9]%', E.NameString) = 0
FOR XML PATH('')), 1, 1, '') AS NAME

Is there a way to add a logical Operator in a WHERE clause using CASE statements? - T-SQL

I searched the web but cannot find a solution for my problem (but perhaps I am using the wrong keywords ;) ).
I've got a Stored Procedure which does some automatic validation (every night) for a bunch of records. However, sometimes a user wants to do the same validation for a single record manually. I thought about calling the Stored Procedure with a parameter, when set the original SELECT statement (which loops through all the records) should get an AND operator with the specified record ID. I want to do it this way so that I don't have to copy the entire select statement and modify it just for the manual part.
The original statement is as follows:
DECLARE GenerateFacturen CURSOR LOCAL FOR
SELECT TOP 100 PERCENT becode, dtreknr, franchisebecode, franchisenemer, fakgroep, vonummer, vovolgnr, count(*) as nrVerOrd,
FaktuurEindeMaand, FaktuurEindeWeek
FROM (
SELECT becode, vonummer, vovolgnr, FaktuurEindeMaand, FaktuurEindeWeek, uitgestfaktuurdat, levdat, voomschrijving, vonetto,
faktureerperorder, dtreknr, franchisebecode, franchisenemer, fakgroep, levscandat
FROM vwOpenVerOrd WHERE becode=#BecondeIN AND levdat IS NOT NULL AND fakstatus = 0
AND isAllFaktuurStukPrijsChecked = 1 AND IsAllFaktuurVrChecked = 1
AND (uitgestfaktuurdat IS NULL OR uitgestfaktuurdat<=#FactuurDate)
) sub
WHERE faktureerperorder = 1
GROUP BY becode, dtreknr, franchisebecode, franchisenemer, fakgroep, vonummer, vovolgnr,
FaktuurEindeMaand, FaktuurEindeWeek
ORDER BY MIN(levscandat)
At the WHERE faktureerperorder = 1 I came up with something like this:
WHERE faktureerperorder = 1 AND CASE WHEN #myParameterManual = 1 THEN vonummer=#vonummer ELSE 1=1 END
But this doesn't work. The #myParameterManual indicates whether or not it should select only a specific record. The vonummer=#vonummer is the record's ID. I thought by setting 1=1 I would get all the records.
Any ideas how to achieve my goal (perhaps more efficient ideas or better ideas)?
I'm finding it difficult to read your query, but this is hopefully a simple example of what you're trying to achieve.
I've used a WHERE clause with an OR operator to give you 2 options on the filter. Using the same query you will get different outputs depending on the filter value:
CREATE TABLE #test ( id INT, val INT );
INSERT INTO #test
( id, val )
VALUES ( 1, 10 ),
( 2, 20 ),
( 3, 30 );
DECLARE #filter INT;
-- null filter returns all rows
SET #filter = NULL;
SELECT *
FROM #test
WHERE ( #filter IS NULL
AND id < 5
)
OR ( #filter IS NOT NULL
AND id = #filter
);
-- filter a specific record
SET #filter = 2;
SELECT *
FROM #test
WHERE ( #filter IS NULL
AND id < 5
)
OR ( #filter IS NOT NULL
AND id = #filter
);
DROP TABLE #test;
First query returns all:
id val
1 10
2 20
3 30
Second query returns a single row:
id val
2 20

SQL: Filter on Alpha and numeric on column of Type TEXT

I've got a column of type Text. In the column are numeric values such as4, 8, 3.2, etc... and also values such as 'Negative', 'Positive', '27A', '2pos 1neg'.
The user needs to be able to say: "Give me all the values between 10 and 30, and also the values that are 'Negative'. The WHERE clause would need to do something along the lines of:
WHERE Tbl.Col > 10
AND Tbl.Col < 30
AND Tbl.Col = 'Negative'
This is problematic for obvious reasons. I've tried using the ISNUMERIC function to alleviate the issue but can't seem to get exactly what i need. I can either get all the alpha values in the column, or all the numeric values in the column as floats but cant seem to filter on both at the same time. To grab all the Numeric values I've been using this:
SELECT Num.Val FROM
(SELECT Val = CASE ISNUMERIC(CAST(TBL.COL AS VARCHAR)) WHEN 1
THEN CAST(CAST(TBL.COL AS VARCHAR) AS FLOAT) ELSE NULL END
FROM Table TBL
WHERE TBL.COL IS NOT NULL ) as Num
WHERE Num.val IS NOT NULL
AND Num.val > 10
If I understand the issue correctly something like this should get you close.
with MyNumbers as
(
select t.Col
from Tbl t
--where ISNUMERIC(t.Col) = 1
where t.Col NOT LIKE '%[^0-9.]%'
)
, MyAlpha as
(
select t.Col
from Tbl t
where ISNUMERIC(t.Col) = 0
)
select Col
from MyNumbers
where Col > 10
and Col < 30
union all
select Col
from MyAlpha
where ColorMatch = ' Negative'
First I would go slap the person who designed the table (hopefully it isn't you) :>
Go to here and get the split table function. I would then convert the text column (like you have in example above) into varchar(max) and supply it as the parameter to the split function. Then you could select from the table results of the split function using the user supplied parameters.
I have found the answer to my problem:
SELECT
al_Value = Table.Column
FROM Table
WHERE (
ISNUMERIC(CAST(Table.Column AS VARCHAR)) = 1 AND
CONVERT(FLOAT, CAST(Table.Column AS VARCHAR)) > 1.0 AND
CONVERT(FLOAT, CAST(Table.Column AS VARCHAR)) < 10.0
)
OR (
CAST(Table.Column AS VARCHAR) IN ('negative', 'no bueno')
)
This will return one column named 'al_Value' that filters on Table.Column (which is of Datatype TEXT) and apply the filters in the WHERE clause above.
Thanks everyone for trying to help me with this issue.

SQL Server casting from a table

In SQL Server I have a query that looks like this (part of the WHERE clause of a larger query)
SELECT 1
WHERE TPR.GRDE_PK IN
(
SELECT CAST(String AS INT)
FROM dbo.Split_New(#GRADES, ',')
)
#Grades is equal to '14,15' and dbo.Split_New is a function that returns a table with a single column called String that will contains '14' and '15'. TPR.GRDE_PK is of type INT. I get a conversion error when I try to execute this line, can anyone tell me how to fix it?
Here is that the Split_New function looks like (Written by someone more skilled than me, so I don't understand all of it):
function [dbo].[Split_New] (
#StringToSplit nvarchar(4000),
#Separator varchar(128))
returns table as return
with indices as
(
select 0 S, 1 E
union all
select E, charindex(#Separator, #StringToSplit, E) + len(#Separator)
from indices
where E > S
)
select substring(#StringToSplit,S,
case when E > len(#Separator) then e-s-len(#Separator) else len(#StringToSplit) - s + 1 end) String
--,S StartIndex
from indices where S >0
The problem is your TPR.GRDE_PK value is an Integer, cast it as a VARCHAR():
SELECT 1
WHERE CAST(TPR.GRDE_PK AS VARCHAR(25)) IN
(
SELECT *
FROM dbo.Split_New(#GRADES, ',')
)
The function works fine, it returns the expected table of results given your string.
Alternatively, you can avoid using the function at all with LIKE:
WHERE ','+CAST(TPR.GRDE_PK AS VARCHAR(25))+',' LIKE '%,'+#GRADES+',%'
It is difficult to say exactly what it is without looking at the function.
First see if you get the correct results from the function:
SELECT String FROM dbo.Split_New(#GRADES, ',')
String may have leading/trailing spaces. Try to trim them before converting/casting using LTRIM() and RTRIM() function
SELECT CONVERT(INT, LTRIM(RTRIM(String))) FROM dbo.Split_New(#GRADES, ',')
ISNUMERIC() function is not ideal to filter and convert as it returns 1 for some characters that are not numbers.

Resources