UDF as default value in snowflake - snowflake-cloud-data-platform

I am trying use a sequence with a limited range as an ID column in a table. Right now, snowfalke does not support an upper limit on a sequence, so am thinking of using a UDF to get around this:
create or replace sequence seq1 with start = 1 ;
create or replace function seq1_with_max()
returns number
as
$$
select case when a.s < 10 then a.s else null end as id from ( select seq1.nextval as s from dual ) a
$$
;
select seq1_with_max() ;
create or replace table f (
id number not null default seq1_with_max(),
c varchar
) ;
insert into f(c) values ('a') ;
returns
SQL compilation error:
syntax error line 1 at position 1 unexpected 'SELECT'.
syntax error line 1 at position 12 unexpected 'A'.
I don't quite understand why this isn't working. How could the UDF be modified to achieve the same goal?

You are right. This looks like snowflake create statement we cannot use UDF. I tried this instead.
create or replace table f (
id number not null ,
c varchar
) ;
insert into f(id,c) select seq1_with_max(),'hello';
select * from f;

Related

Pass array to Snowflake UDF

My goal is to create a Snowflake UDF that, given an array of values from different columns, returns the maximum value.
This is the function I currently have:
CREATE OR REPLACE FUNCTION get_max(input_array array)
RETURNS double precision
AS '
WITH t AS
(
SELECT value::integer as val from table(flatten(input => input_array))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2
'
When I pass different columns from a table, e.g. select get_max(to_array([table.col1, table.col2, table.col3])) I get the error
Unsupported subquery type cannot be evaluated
However, if I run the sql query only and replace input_array with an array such as array_construct(7, 120, 2, 4, 5, 80) there is no error and the correct value is returned.
WITH t AS
(
SELECT value::integer as val from table(flatten(input => array_construct(2,4,5)))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2
When flattening arrays in a SQL UDF gives you trouble, you can always write a JS, Java, or Python UDF instead.
Here you can see a JS and a Python UDF in action:
CREATE OR REPLACE FUNCTION get_max_from_array_js(input_array array)
RETURNS double precision
language javascript
as
$$
return Math.max(...INPUT_ARRAY)
$$;
CREATE OR REPLACE FUNCTION get_max_from_array_py(input_array array)
RETURNS double precision
language python
handler = 'x'
runtime_version = 3.8
as
$$
def x(input_array):
return max(input_array)
$$;
select get_max_from_array_js([1.1,7.7,2.2,3.3,4.4]);
select get_max_from_array_py([1.1,7.7,2.2,3.3,4.4]);
But given the problem statement, consider using GREATEST in SQL instead:
select greatest(table.col1, table.col2, table.col3)
Performance wise, pure SQL is the best, then JS, then Python:
select current_date()
, max(greatest(c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk)) m
from snowflake_sample_data.tpcds_sf10tcl.customer
-- 692ms S
-- 155ms 3XL
;
select current_date()
, max(get_max_from_array_js([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 15s S
-- 1.2s 3XL
;
select current_date()
, max(get_max_from_array_py([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 32s S
-- 4.3s 3XL
;

Transform JSON array to boolean columns in PostgreSQL

I have a column that contains a JSON array of strings, which I would like to transform into boolean columns. These columns are true if the value was present in the array.
Let's say I have the following columns in Postgres.
|"countries"|
---------------
["NL", "BE"]
["UK"]
I would like to transform this into boolean columns per market. e.g.
|"BE"|"NL"|"UK"|
--------------------
|True|True|False|
|False|False|True|
I know I can manually expand it using case statements for each country code, but there are 200+ countries.
Is there are more elegant solution?
Displaying a various list of columns whose labels are known only at the runtime is not so obvious with postgres. You need some dynamic sql code.
Here is a full dynamic solution whose result is close from your expected result and which relies on the creation of a user-defined composite type and on the standard functions jsonb_populate_record and jsonb_object_agg :
First you create the list of countries as a new composite type :
CREATE TYPE country_list AS () ;
CREATE OR REPLACE PROCEDURE country_list () LANGUAGE plpgsql AS
$$
DECLARE country_list text ;
BEGIN
SELECT string_agg(DISTINCT c.country || ' text', ',')
INTO country_list
FROM your_table
CROSS JOIN LATERAL jsonb_array_elements_text(countries) AS c(country) ;
EXECUTE 'DROP TYPE IF EXISTS country_list' ;
EXECUTE 'CREATE TYPE country_list AS (' || country_list || ')' ;
END ;
$$ ;
Then you can call the procedure country_list () just before executing the final query :
CALL country_list () ;
or even better call the procedure country_list () by trigger when the list of countries is supposed to be modified :
CREATE OR REPLACE FUNCTION your_table_insert_update()
RETURNS trigger LANGUAGE plpgsql VOLATILE AS
$$
BEGIN
IF EXISTS ( SELECT 1
FROM (SELECT jsonb_object_keys(to_jsonb(a.*)) FROM (SELECT(null :: country_list).*) AS a) AS b(key)
RIGHT JOIN jsonb_array_elements_text(NEW.countries) AS c(country)
ON c.country = b.key
WHERE b.key IS NULL
)
THEN CALL country_list () ;
END IF ;
RETURN NEW ;
END ;
$$ ;
CREATE OR REPLACE TRIGGER your_table_insert_update AFTER INSERT OR UPDATE OF countries ON your_table
FOR EACH ROW EXECUTE FUNCTION your_table_insert_update() ;
CREATE OR REPLACE FUNCTION your_table_delete()
RETURNS trigger LANGUAGE plpgsql VOLATILE AS
$$
BEGIN
CALL country_list () ;
RETURN OLD ;
END ;
$$ ;
CREATE OR REPLACE TRIGGER your_table_delete AFTER DELETE ON your_table
FOR EACH ROW EXECUTE FUNCTION your_table_delete() ;
Finally, you should get the expected result with the following query, except that the column label are lower case, and NULL is replacing false in the result :
SELECT (jsonb_populate_record(NULL :: country_list, jsonb_object_agg(lower(c.country), true))).*
FROM your_table AS t
CROSS JOIN LATERAL jsonb_array_elements_text(t.countries) AS c(country)
GROUP BY t
full test result in dbfiddle.

Looping in array to combine jsonb

I'm unable to loop multiple array and construct it into jsonb.
CREATE TABLE emr.azt_macres (
id serial NOT NULL,
analyzer_test_full_desc varchar(50) NULL,
specimen_id varchar(100) NULL,
data_reading varchar(20) NULL,
data_result varchar(20) NULL,
result_status varchar(20) NULL,
analyzer_message text NULL,
test_result jsonb NULL,
CONSTRAINT azt_macres_pkey PRIMARY KEY (id)
);
The array look like this. Bare in mind there are two Sample in array which is Sample123 and Sample456
1H|*^&|||Mindry^^|||||||PR|1394-97|20210225142532
P|3||QC1||^^||^^|U||||||||||||||||||||||||||
O|3|1^Sample123^1||TBILV^Bilirubin Total (VOX Method)^^
R|23|KA^Bilirubin Total (VOX Method)^^F|17.648074^^^^|µmol/L|
R|24|ATU^Alanine Aminotransferase^^F|58.934098^^^^|U/L|
L|1|N
1H|*^&|||Mindry^^|||||||PR|1394-97|20210225142532
P|3||QC1||^^||^^|U||||||||||||||||||||||||||
O|3|1^Sample456^1||TBILV^Bilirubin Total (VOX Method)
R|23|TBILV^Bilirubin Total (VOX Method)^^F|17.648074^^^^|
R|24|ALT^Alanine Aminotransferase^^F|58.934098^^^^|U/L|
R|25|TTU^Alkaline phosphatase^^F|92.675340^^^^|U/L|^|N||
I'm able to insert if there are only 1 sample barcode, however the second sample_id is not insert. Do I need to loop the array again?
my code:
CREATE OR REPLACE FUNCTION emr.azintin(v_msgar)
RETURNS jsonb
LANGUAGE plpgsql
AS $function$
DECLARE
v_cnt INT = 0;
v_msgar text[];
v_msgln text;
v_msgtyp character varying(3);
v_tmp text;
macres azt_macres%rowtype;
BEGIN
macres.analyzer_test_full_desc := 'CHEMO';
SELECT split_part(items[3], '^', 2)
INTO macres.specimen_id
FROM (
SELECT string_to_array(element, '|') as items
FROM unnest(v_msgar) as t(element)) t
WHERE items[1] = 'O';
SELECT jsonb_agg(jsonb_build_object('resultId', split_part(items[3],'^',1),'resultValue',split_part(items[4],'^',1)))
INTO macres.test_result
FROM (
SELECT string_to_array(element, '|') as items
FROM unnest(v_msgar) as t(element)) t
WHERE items[1] = 'R';
v_cnt := v_cnt + 1;
BEGIN
INSERT INTO azt_macres(analyzer_test_full_desc, specimen_id, data_reading ,
data_result,result_status,analyzer_message,test_result)
VALUES (macres.analyzer_test_full_desc, macres.specimen_id,macres.data_reading,
macres.data_result,macres.result_status,macres.analyzer_message, macres.test_result);
END;
END
$function$
;
Currently my output is like this
specimen_id
test_result
sample123
[{"resultId": "KA", "resultValue": "17.648074"}, {"resultId": "ATU", "resultValue":"58.934098"}, {"resultId": "TBILV", "resultValue": "17.648074"}, {"resultId": "ALT","resultValue": "58.934098"}, {"resultId": "TTU", "resultValue": "92.675340"}]
supposely my output should be like this
specimen_id
test_result
sample123
[{"resultId": "KA", "resultValue": "17.648074"}, {"resultId": "ATU","resultValue":"58.934098"}]
sample456
[{"resultId": "TBILV", "resultValue": "17.648074"}, {"resultId": "ALT","resultValue": "58.934098"}, {"resultId": "TTU", "resultValue": "92.675340"}]
1. Creating the output:
As I see it, you need a group of the R elements by the previous O elements. That needs to create a group of connected R elements, which share the same O value:
step-by-step demo:db<>fiddle
Don't be afraid, it looks heavier than it is ;) To understand all described intermediate steps, please have a look at the fiddle where every step is executed separately to demonstrate its impact.
SELECT
specimen_id,
json_agg(result) as test_result -- 9
FROM (
SELECT
code,
first_value( -- 5
split_part(id, '^', 2)
) OVER (PARTITION BY group_id ORDER BY index) as specimen_id,
json_build_object( -- 7
'result_id',
split_part(id, '^', 1), -- 6
'result_value',
split_part(value, '^', 1)
) as result
FROM (
SELECT
parts[1] as code,
parts[3] as id,
parts[4] as value,
elem.index,
SUM((parts[1] = 'O')::int) OVER (ORDER BY elem.index) as group_id -- 4
FROM mytable t,
unnest(myarray) WITH ORDINALITY as elem(value, index), -- 1
regexp_split_to_array(elem.value, '\|') as parts -- 2
WHERE parts[1] IN ('R', 'O') -- 3
) s
) s
WHERE code = 'R' -- 8
GROUP BY specimen_id
Put all array elements into separate records. WITH ORDINALITY adds an index to the records which represents the original position of the element in the original array.
Split the array elements into their parts (delimited by | character)
Filter only the R and O elements.
This is the most interesting part: Creating the group of connected records. The idea: Every O records gets a value 1 (boolean true cast into int is 1, so: (parts[1] = 'O')::int), every R gets a 0 value. The cumulative SUM() window function creates a total sum of the current and every previous record. So every O increases the sum value, the R values (add 0) keep this value. So this generates the same group identifier for every O record and all directly following R records. To ensure the correct record order, we use the index, which was created by the WITH ORDINALITY in step 1.
The first_value() window function gives the first value of a group (= partition), in this case, it adds to every record the first value of the recently created group. This finally associates the O value with each related R record.
split_part() for retrieving the correct values from the strings
build your JSON object for every record
Remove the O records
Group by the specimen_id (which was added in step 5) and aggregate the result values into an array
Now we have the expected output
2. Inserting:
Insert is easy: You can take the result (maybe you need some columns more, just add them to the SELECT statement, we recently created and execute the INSERT statement:
INSERT INTO mytable (col1, col2)
SELECT -- <query from above>
3. Create function:
Here it is not clear to me, what you want to achieve. Do you want to give the array as input parameter, or do you want to query a table? Why are you using a JSON return type although nothing is returned, ...
However, of course, you can put all the stuff into a stored procedure. Here I assume you give the array as input parameter into the procedure (for example, see the answer to your previous question):
CREATE FUNCTION my_function(myarray text[]) RETURNS void AS $$
BEGIN
INSERT INTO ...
SELECT ...
END;
$$ LANGUAGE plpgsql;

SQL Server to Oracle - using Cross Apply with Oracle

I have a function that takes primary keys and separates them with commas.
Oracle function:
create or replace function split(
list in CHAR,
delimiter in CHAR default ','
)
return split_tbl as
splitted split_tbl := split_tbl();
i pls_integer := 0;
list_ varchar2(32767) := list;
begin
loop
i := instr(list_, delimiter);
if i > 0 then
splitted.extend(1);
splitted(splitted.last) := substr(list_, 1, i - 1);
list_ := substr(list_, i + length(delimiter));
else
splitted.extend(1);
splitted(splitted.last) := list_;
return splitted;
end if;
end loop;
end;
and I have this query in SQL Server that returns the data of this query in the function table
select maxUserSalary.id as 'UserSalary'
into #usersalary
from dbo.Split(#usersalary,';') as userid
cross apply (
select top 1 * from User_Salaryas usersalary
where usersalary.User_Id= userid.item
order by usersalary.Date desc
) as maxUserSalary
The problem is, I'm not able to use cross apply in Oracle to throw this data into this function that is returning a table.
How can I use cross apply with Oracle to return this data in function?
You're using Oracle 18c so you can use the CROSS APPLY syntax. Oracle added it (as well as LATERAL and OUTER APPLY ) in 12c.
Here is a simplified version of your logic:
select us.name
, us.salary
from table(split('FOX IN SOCKS,THING ONE,THING TWO')) t
cross apply (select us.name, max(us.salary) as salary
from user_salaries us
where us.name = t.column_value ) us
There is a working demo on db<>fiddle .
If this doesn't completely solve your problem please post a complete question with table structures, sample data and expected output derived from that sample.
I think APC answered your direct question well. As a side note, I wanted to suggest NOT writing your own function to do this at all. There are several existing solutions to split delimited string values into virtual tables that don't require you to create your own custom types, and don't have the performance overhead of context switching between the SQL and PL/SQL engines.
-- example data - remove this to test with your User_Salary table
with User_Salary as (select 1 as id, 'A' as user_id, sysdate as "Date" from dual
union select 2, 'B', sysdate from dual)
-- your query:
select maxUserSalary.id as "UserSalary"
from (select trim(COLUMN_VALUE) as item
from xmltable(('"'||replace(:usersalary, ';', '","')||'"'))) userid -- note ';' delimiter
cross apply (
select * from User_Salary usersalary
where usersalary.User_Id = userid.item
order by usersalary."Date" desc
fetch first 1 row only
) maxUserSalary;
If you run this and pass in 'A;B;C' for :usersalary, you'll get 1 and 2 back.
A few notes:
In this example, I'm using ; as the delimiter, since that's what your query used.
I tried to match your table/column names, but your column name Date is invalid - it's an Oracle reserved keyword, so it has to be put in quotes to be a valid column name.
As a column identifier, "UserSalary" should also have double quotes, not single.
You can't use as in table aliases.
I removed into usersalary, since into is only used with queries which return a single row, and your query can return multiple rows.

Is there a way to add a logical Operator in a WHERE clause using CASE statements? - T-SQL

I searched the web but cannot find a solution for my problem (but perhaps I am using the wrong keywords ;) ).
I've got a Stored Procedure which does some automatic validation (every night) for a bunch of records. However, sometimes a user wants to do the same validation for a single record manually. I thought about calling the Stored Procedure with a parameter, when set the original SELECT statement (which loops through all the records) should get an AND operator with the specified record ID. I want to do it this way so that I don't have to copy the entire select statement and modify it just for the manual part.
The original statement is as follows:
DECLARE GenerateFacturen CURSOR LOCAL FOR
SELECT TOP 100 PERCENT becode, dtreknr, franchisebecode, franchisenemer, fakgroep, vonummer, vovolgnr, count(*) as nrVerOrd,
FaktuurEindeMaand, FaktuurEindeWeek
FROM (
SELECT becode, vonummer, vovolgnr, FaktuurEindeMaand, FaktuurEindeWeek, uitgestfaktuurdat, levdat, voomschrijving, vonetto,
faktureerperorder, dtreknr, franchisebecode, franchisenemer, fakgroep, levscandat
FROM vwOpenVerOrd WHERE becode=#BecondeIN AND levdat IS NOT NULL AND fakstatus = 0
AND isAllFaktuurStukPrijsChecked = 1 AND IsAllFaktuurVrChecked = 1
AND (uitgestfaktuurdat IS NULL OR uitgestfaktuurdat<=#FactuurDate)
) sub
WHERE faktureerperorder = 1
GROUP BY becode, dtreknr, franchisebecode, franchisenemer, fakgroep, vonummer, vovolgnr,
FaktuurEindeMaand, FaktuurEindeWeek
ORDER BY MIN(levscandat)
At the WHERE faktureerperorder = 1 I came up with something like this:
WHERE faktureerperorder = 1 AND CASE WHEN #myParameterManual = 1 THEN vonummer=#vonummer ELSE 1=1 END
But this doesn't work. The #myParameterManual indicates whether or not it should select only a specific record. The vonummer=#vonummer is the record's ID. I thought by setting 1=1 I would get all the records.
Any ideas how to achieve my goal (perhaps more efficient ideas or better ideas)?
I'm finding it difficult to read your query, but this is hopefully a simple example of what you're trying to achieve.
I've used a WHERE clause with an OR operator to give you 2 options on the filter. Using the same query you will get different outputs depending on the filter value:
CREATE TABLE #test ( id INT, val INT );
INSERT INTO #test
( id, val )
VALUES ( 1, 10 ),
( 2, 20 ),
( 3, 30 );
DECLARE #filter INT;
-- null filter returns all rows
SET #filter = NULL;
SELECT *
FROM #test
WHERE ( #filter IS NULL
AND id < 5
)
OR ( #filter IS NOT NULL
AND id = #filter
);
-- filter a specific record
SET #filter = 2;
SELECT *
FROM #test
WHERE ( #filter IS NULL
AND id < 5
)
OR ( #filter IS NOT NULL
AND id = #filter
);
DROP TABLE #test;
First query returns all:
id val
1 10
2 20
3 30
Second query returns a single row:
id val
2 20

Resources