Nested json extraction into postgres table - arrays

I have used following query to parse and store json elements into table 'pl'
'test' table is used to store raw json.
select
each_attribute ->> 'id' id,
each_attribute ->> 'sd' sd,
each_attribute ->> 'v' v
from test
cross join json_array_elements(json_array) each_section
cross join json_array_elements(each_section -> 'firstt') each_attribute
I am able to view following json values using above query but not able to insert it into another table using json_populate_recordset.
Table definition I need to insert nested json into:
id integer, character varying(6666), character varying(99999)
Table1(for above definition) should store value for key firstt
Table2(for above definition) should store value for key secondt
Json format:
{
"firstt": [
{
"id": 1,
"sd": "test3",
"v": "2223"
},
{
"id": 2,
"sd": "test2",
"v": "2222"
}],
"secondt": [
{
"id": 1,
"sd": "test3",
"v": "2223"
},
{
"id": 2,
"sd": "test2",
"v": "2222"
}]
}
Please assist. I have tried every possible thing from stackoverflow solutions but nothing is given for nested array like this for insertion.
Adding code for dynamic query. It does not work. Error -'too few arguments for format'.
do $$
DECLARE
my record;
tb_n varchar(50);
BEGIN
FOR my IN
SELECT json_object_keys(json_array) as t FROM test
LOOP
tb_n := my.t;
EXECUTE format($$ WITH tbl_record_arrays as(
SELECT
entries.*
FROM
test
JOIN LATERAL json_each(json_array) as entries(tbl_name,tbl_data_arr) ON TRUE
)
INSERT INTO %I
SELECT
records.*
FROM
tbl_record_arrays
JOIN LATERAL json_populate_recordset(null::%I,tbl_data_arr) records ON TRUE
WHERE
tbl_name = %I$$,tb_n);
END LOOP;
END;
$$;

To create a plpgsql function that dynamically inserts a json array for a specified key into a specified table, you can do:
CREATE OR REPLACE FUNCTION dynamic_json_insert(key_name text,tbl text) RETURNS VOID AS $$
BEGIN
-- the $<tag>$ syntax allows for generating a multiline string
EXECUTE format($sql$
INSERT INTO %1$I
SELECT
entries.*
FROM test
JOIN LATERAL json_populate_recordset(null::%1$I,json_data -> $1) as entries ON TRUE;
$sql$::text,tbl) USING dynamic_json_insert.key_name;
END;
$$ LANGUAGE plpgsql
VOLATILE --modifies data
STRICT -- Returns NULL if any arguments are NULL
SECURITY INVOKER; --Execute this function with the Role of the caller, rather than the Role that defined the function;
and call it like
SELECT dynamic_json_insert('firstt','table_1')
If you want to insert into multiple tables using multiple key value pairs you can make a plpgsql function that takes a variadic array of key,table pairs and then generate a single Common Table Expression (CTE) with all of the INSERTs in a single atomic statement.
First create a custom type:
CREATE TYPE table_key as (
tbl_key text,
relation regclass -- special type that refers to a Postgresql relation
);
Then define the function:
CREATE OR REPLACE FUNCTION dynamic_json_insert(variadic table_keys table_key[]) RETURNS VOID AS $$
DECLARE
tbl_key_len integer = array_length(dynamic_json_insert.table_keys,1);
BEGIN
IF tbl_key_len > 0 THEN
EXECUTE (
--generates a single atomic insert CTE when there are multiple table_keys OR a single insert statement otherwise
--the SELECT is enclosed in parenthesis because it generates a single text value which EXECUTE receives.
SELECT
--append WITH if We have more than 1 table_key (for CTE)
CASE WHEN tbl_key_len > 1 THEN 'WITH ' ELSE '' END
|| string_agg(
CASE
WHEN
--name the auxiliary statement and put it in parenthesis.
is_aux THEN format('%1$I as (%2$s)','ins_' || tk.tbl_key,stmt) || end_char
ELSE stmt
END,E'\n') || ';'
FROM
--unnest the table_keys argument and get its index (rn)
unnest(dynamic_json_insert.table_keys) WITH ORDINALITY AS tk(tbl_key,relation,rn)
-- the JOIN LATERAL here means "for each unnested table_key, generate the rows of the following subquery"
JOIN LATERAL (
SELECT
rn < tbl_key_len is_aux,
--we need a comma between auxiliary statements
CASE WHEN rn = tbl_key_len - 1 THEN '' ELSE ',' END end_char,
--dynamically generate INSERT statement
format($sql$
INSERT INTO %1$I
SELECT
entries.*
FROM test
JOIN LATERAL json_populate_recordset(null::%1$I,json_data -> %2$L) as entries ON TRUE
$sql$::text,tk.relation,tk.tbl_key) stmt
) stmts ON TRUE
);
END IF;
END;
$$ LANGUAGE plpgsql
VOLATILE --modifies data
STRICT -- Returns NULL if any arguments are NULL
SECURITY INVOKER; --Execute this function with the Role of the caller, rather than the Role that defined the function;
Then call the function like:
SELECT dynamic_json_insert(
('firstt','table_1'),
('secondt','table_2')
);
Because of the use of the variadic keyword, you can pass in each element of the array as an individual argument and Postgres will cast to the appropriate types automatically.
The generated/executed SQL for the above function call will be:
WITH ins_firstt as (
INSERT INTO table_1
SELECT
entries.*
FROM test
JOIN LATERAL json_populate_recordset(null::table_1,json_data -> 'firstt') as entries ON TRUE
)
INSERT INTO table_2
SELECT
entries.*
FROM test
JOIN LATERAL json_populate_recordset(null::table_2,json_data -> 'secondt') as entries ON TRUE
;

Related

Pass array to Snowflake UDF

My goal is to create a Snowflake UDF that, given an array of values from different columns, returns the maximum value.
This is the function I currently have:
CREATE OR REPLACE FUNCTION get_max(input_array array)
RETURNS double precision
AS '
WITH t AS
(
SELECT value::integer as val from table(flatten(input => input_array))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2
'
When I pass different columns from a table, e.g. select get_max(to_array([table.col1, table.col2, table.col3])) I get the error
Unsupported subquery type cannot be evaluated
However, if I run the sql query only and replace input_array with an array such as array_construct(7, 120, 2, 4, 5, 80) there is no error and the correct value is returned.
WITH t AS
(
SELECT value::integer as val from table(flatten(input => array_construct(2,4,5)))
WHERE VAL IS NOT NULL
),
cnt AS
(
SELECT COUNT(*) AS c FROM t
)
SELECT MAX(val)::float
FROM
(
SELECT val FROM t
) t2
When flattening arrays in a SQL UDF gives you trouble, you can always write a JS, Java, or Python UDF instead.
Here you can see a JS and a Python UDF in action:
CREATE OR REPLACE FUNCTION get_max_from_array_js(input_array array)
RETURNS double precision
language javascript
as
$$
return Math.max(...INPUT_ARRAY)
$$;
CREATE OR REPLACE FUNCTION get_max_from_array_py(input_array array)
RETURNS double precision
language python
handler = 'x'
runtime_version = 3.8
as
$$
def x(input_array):
return max(input_array)
$$;
select get_max_from_array_js([1.1,7.7,2.2,3.3,4.4]);
select get_max_from_array_py([1.1,7.7,2.2,3.3,4.4]);
But given the problem statement, consider using GREATEST in SQL instead:
select greatest(table.col1, table.col2, table.col3)
Performance wise, pure SQL is the best, then JS, then Python:
select current_date()
, max(greatest(c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk)) m
from snowflake_sample_data.tpcds_sf10tcl.customer
-- 692ms S
-- 155ms 3XL
;
select current_date()
, max(get_max_from_array_js([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 15s S
-- 1.2s 3XL
;
select current_date()
, max(get_max_from_array_py([c_customer_sk, c_current_cdemo_sk, c_current_hdemo_sk, c_current_addr_sk, c_first_shipto_date_sk])) m
from snowflake_sample_data.tpcds_sf10tcl.customer
where c_customer_sk is not null
and c_current_cdemo_sk is not null
and c_current_hdemo_sk is not null
and c_current_addr_sk is not null
and c_first_shipto_date_sk is not null
-- 32s S
-- 4.3s 3XL
;

Extract words in between the separator

I have input like below
I want like below
I was trying with
Sales External?HR?Purchase Department
I did LISTAGG because finally i want in separate columns
Query Output would be like below,
meaning it should search for first occurrence of the separator (in this case "?", it can be anything but not common ones like "-" or "/" as the separator needs to be separate than sting value) and then extract the phrase before the first separator and create one column and put the value. Then it should look for second occurrence of the separator and then extract the word and keep creating columns, there can be multiple separators.
I tried with SPLIT_PART but it does not maintain the sequence in real data scenario and data does not come correct as per sequence.
I also tried with REGEXP_INSTR, but unable to use special characters as separators.
Any thought?
Regex Extract should work for you:
SELECT
REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")
You can use LATERAL FLATTEN to convert your array into rows:
WITH MY_CTE AS (
SELECT
REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")
)
SELECT
*
FROM
LATERAL FLATTEN(INPUT => MY_CTE, MODE=> 'ARRAY')
Deeper dive into some more cases: https://dwgeek.com/snowflake-convert-array-to-rows-methods-and-examples.html/
Here's a simplified version of the data. It uses a CTE with array_agg to group the rows. It then changes from arrays to columns. To add more columns, you can use max(), min(), or any_value() functions to get them through the aggregation. (Note that use of any_value() will not allow use of cached results from the result set cache since it's flagged as nondeterministic.)
create or replace table T1 (EMPID int, ROLE string, ACCESS string, ACCESS_LVL string, ITERATION string);
insert into T1(EMPID, ROLE, ACCESS, ACCESS_LVL, ITERATION) values
(1234, 'Sales Rep', 'Specific', 'REGION', 'DEV'),
(1234, 'Purchase Rep', 'Specific', 'EVERY', 'PROD'),
(1234, 'HR', NULL, 'Dept', 'PROD'),
(4321, 'HR', 'Foo', 'Foo', 'Foo')
;
with X as
(
select EMPID
,array_agg(nvl(ROLE,'')) within group (order by ROLE) ARR_ROLE
,array_agg(nvl(ACCESS,'')) within group (order by ROLE) ARR_ACCESS
,array_agg(nvl(ACCESS_LVL,'')) within group (order by ROLE) ARR_ACCESS_LVL
,array_agg(nvl(ITERATION,'')) within group (order by ROLE) ARR_ITERATION
from T1
group by EMPID
)
select EMPID
,ARR_ROLE[0]::string as ROLE1
,ARR_ROLE[1]::string as ROLE2
,ARR_ROLE[2]::string as ROLE3
,ARR_ACCESS[0]::string as ACCESS1
,ARR_ACCESS[1]::string as ACCESS2
,ARR_ACCESS[2]::string as ACCESS3
,ARR_ACCESS_LVL[0]::string as ACCESS_LVL1
,ARR_ACCESS_LVL[1]::string as ACCESS_LVL2
,ARR_ACCESS_LVL[2]::string as ACCESS_LVL3
,ARR_ITERATION[0]::string as ITERATION1
,ARR_ITERATION[1]::string as ITERATION2
,ARR_ITERATION[2]::string as ITERATION3
from X
;
There's nothing particular that seems interesting to sort the rows into the array so that ROLE1, ROLE2, ROLE3, etc. are deterministic. I showed simply sorting on the name of the role, but it could be any order by within that group.
Here's a stored proc that will produce a table result with a dynamic set of columns based on the input string and specified delimiter.
If you are looking for a way to generate dynamic column names based on values, I recommend visiting Felipe Hoffa's blog entry here:
https://medium.com/snowflake/dynamic-pivots-in-sql-with-snowflake-c763933987c
create or replace procedure pivot_dyn_results(input string, delimiter string)
returns table ()
language SQL
AS
declare
max_count integer default 0;
lcount integer default 0;
rs resultset;
stmt1 string;
stmt2 string;
begin
-- Get number of delimiter separated values (assumes no leading or trailing delimiter)
select regexp_count(:input, '\\'||:delimiter, 1) into :max_count from dual;
-- Generate the initial row-based result set of parsed values
stmt1 := 'SELECT * from lateral split_to_table(?,?)';
-- Build dynamic query to produce the pivoted column based results
stmt2 := 'select * from (select * from table(result_scan(last_query_id(-1)))) pivot(max(value) for index in (';
-- initialize look counter for resulting columns
lcount := 1;
stmt2 := stmt2 || '\'' || lcount || '\'';
-- append pivot statement for each column to be represented
FOR l in 1 to max_count do
lcount := lcount + 1;
stmt2 := stmt2 || ',\'' || lcount || '\'';
END FOR;
-- close out the pivot statement
stmt2 := stmt2 || '))';
-- execute the
EXECUTE IMMEDIATE :stmt1 using (input, delimiter);
rs := (EXECUTE IMMEDIATE :stmt2);
return table(rs);
end;
Invocation:
call pivot_dyn_results([string],[delimiter]);
call pivot_dyn_results('Sales External?HR?Billing?Purchase Department','?');
Results:

Transform JSON array to boolean columns in PostgreSQL

I have a column that contains a JSON array of strings, which I would like to transform into boolean columns. These columns are true if the value was present in the array.
Let's say I have the following columns in Postgres.
|"countries"|
---------------
["NL", "BE"]
["UK"]
I would like to transform this into boolean columns per market. e.g.
|"BE"|"NL"|"UK"|
--------------------
|True|True|False|
|False|False|True|
I know I can manually expand it using case statements for each country code, but there are 200+ countries.
Is there are more elegant solution?
Displaying a various list of columns whose labels are known only at the runtime is not so obvious with postgres. You need some dynamic sql code.
Here is a full dynamic solution whose result is close from your expected result and which relies on the creation of a user-defined composite type and on the standard functions jsonb_populate_record and jsonb_object_agg :
First you create the list of countries as a new composite type :
CREATE TYPE country_list AS () ;
CREATE OR REPLACE PROCEDURE country_list () LANGUAGE plpgsql AS
$$
DECLARE country_list text ;
BEGIN
SELECT string_agg(DISTINCT c.country || ' text', ',')
INTO country_list
FROM your_table
CROSS JOIN LATERAL jsonb_array_elements_text(countries) AS c(country) ;
EXECUTE 'DROP TYPE IF EXISTS country_list' ;
EXECUTE 'CREATE TYPE country_list AS (' || country_list || ')' ;
END ;
$$ ;
Then you can call the procedure country_list () just before executing the final query :
CALL country_list () ;
or even better call the procedure country_list () by trigger when the list of countries is supposed to be modified :
CREATE OR REPLACE FUNCTION your_table_insert_update()
RETURNS trigger LANGUAGE plpgsql VOLATILE AS
$$
BEGIN
IF EXISTS ( SELECT 1
FROM (SELECT jsonb_object_keys(to_jsonb(a.*)) FROM (SELECT(null :: country_list).*) AS a) AS b(key)
RIGHT JOIN jsonb_array_elements_text(NEW.countries) AS c(country)
ON c.country = b.key
WHERE b.key IS NULL
)
THEN CALL country_list () ;
END IF ;
RETURN NEW ;
END ;
$$ ;
CREATE OR REPLACE TRIGGER your_table_insert_update AFTER INSERT OR UPDATE OF countries ON your_table
FOR EACH ROW EXECUTE FUNCTION your_table_insert_update() ;
CREATE OR REPLACE FUNCTION your_table_delete()
RETURNS trigger LANGUAGE plpgsql VOLATILE AS
$$
BEGIN
CALL country_list () ;
RETURN OLD ;
END ;
$$ ;
CREATE OR REPLACE TRIGGER your_table_delete AFTER DELETE ON your_table
FOR EACH ROW EXECUTE FUNCTION your_table_delete() ;
Finally, you should get the expected result with the following query, except that the column label are lower case, and NULL is replacing false in the result :
SELECT (jsonb_populate_record(NULL :: country_list, jsonb_object_agg(lower(c.country), true))).*
FROM your_table AS t
CROSS JOIN LATERAL jsonb_array_elements_text(t.countries) AS c(country)
GROUP BY t
full test result in dbfiddle.

How to use stored procedure with table-valued parameters to compare two input tables

I am trying to write a stored procedure to compute the differences between two input tables.
Stored procedure is used to calculate differences between two tables (both tables have the same predefined table structure), the stored procedure will provide records added, removed or updated when comparing table 1 to table 2.
Example:
table 1 New has 3 records: A, B and C
table 2 has 3 records: B', C and D
B' denotes a change to one or multiple fields within the record B
The output of this stored procedure call will be
A-addition
B-update
D-Removal
I have written a query to compute the difference between two tables, but finding it hard to translate to stored procedure.
Table structure:
X varchar (10)
Y int
Z datetime
SELECT
table1.*, ChangeType = 'Addition'
FROM
table1
WHERE
NOT EXISTS (SELECT *
FROM table2
WHERE table1.x = table2.x)
UNION ALL
SELECT
table2.*, ChangeType = 'Removal'
FROM
table2
WHERE
NOT EXISTS (SELECT *
FROM table1
WHERE table1.x = table2.x)
UNION ALL
SELECT
table1, ChangeType = 'Update'
FROM
table2
INNER JOIN
table1 ON table1.x = table2.x
WHERE
table1.Y <> table2.Y OR table1.Z <> table2.Z
Please also include the stored procedure execution script as well.
I think you are looking for the MERGE sentence. You can put table1 as target and table2 as source based on certain values and decide what to do in case the match or not: https://msdn.microsoft.com/en-us/library/bb510625.aspx
In your case it would be something like:
MERGE table1 AS target
USING table2 AS source (x, y, z)
ON (target.x= source.x)
WHEN MATCHED
--do something
WHEN NOT MATCHED BY TARGET
--do something different
WHEN NOT MATCHED BY SOURCE
--something else
As to how to receive a table as a parameter in a SP, you need to follow the next steps:
Create a DATA TYPE
CREATE TYPE tableExample (X varchar (10),
Y int,
Z datetime)
Pass it to the SP:
CREATE PROC sp_mysp #table1 tableExample, #table2 tableExample
AS ...
I prefer a single pass, using a case statement to classify the action.
CREATE PROCEDURE CompareTables
AS
BEGIN
SELECT ChangeType = CASE
WHEN table2.x IS NULL THEN
'Addition'
WHEN table1.x IS NULL THEN
'Removal'
WHEN table1.Y <> table2.Y
OR table1.Z <> table2.Z THEN
'Update'
ELSE
'No Change'
END,
table1.*,
table2.*
FROM table2
FULL OUTER JOIN table1
ON table1.x = table2.x
WHERE table2.x IS NULL
OR table1.x IS NULL
OR NOT ( table1.Y = table2.Y
AND table1.Z = table2.Z
);
END;

Stored procedure syntax with IN condition

(1)
=>CREATE TABLE T1(id BIGSERIAL PRIMARY KEY, name TEXT);
CREATE TABLE
(2)
=>INSERT INTO T1
(name) VALUES
('Robert'),
('Simone');
INSERT 0 2
(3)
SELECT * FROM T1;
id | name
----+--------
1 | Robert
2 | Simone
(2 rows)
(4)
CREATE OR REPLACE FUNCTION test_me(id_list BIGINT[])
RETURNS BOOLEAN AS
$$
BEGIN
PERFORM * FROM T1 WHERE id IN ($1);
IF FOUND THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
$$
LANGUAGE 'plpgsql';
CREATE FUNCTION
My problem is when calling the procedure. I'm not able to find an example on the net showing how to pass a list of values of type BIGINT (or integer, whatsoever).
I tried what follows without success (syntax errors):
First syntax:
eway=> SELECT * FROM test_me('{1,2}'::BIGINT[]);
ERROR: operator does not exist: bigint = bigint[]
LINE 1: SELECT * FROM T1 WHERE id IN ($1)
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
QUERY: SELECT * FROM T1 WHERE id IN ($1)
CONTEXT: PL/pgSQL function test_me(bigint[]) line 3 at PERFORM
Second syntax:
eway=> SELECT * FROM test_me('{1,2}');
ERROR: operator does not exist: bigint = bigint[]
LINE 1: SELECT * FROM T1 WHERE id IN ($1)
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
QUERY: SELECT * FROM T1 WHERE id IN ($1)
CONTEXT: PL/pgSQL function test_me(bigint[]) line 3 at PERFORM
Third syntax:
eway=> SELECT * FROM test_me(ARRAY [1,2]);
ERROR: operator does not exist: bigint = bigint[]
LINE 1: SELECT * FROM T1 WHERE id IN ($1)
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
QUERY: SELECT * FROM T1 WHERE id IN ($1)
CONTEXT: PL/pgSQL function test_me(bigint[]) line 3 at PERFORM
Any clues about a working syntax?
It's like the parser was trying to translate a BIGINT to BIGINT[] in the PEFORM REQUEST but it doesn't make any sense to me...
All your syntax variants to pass an array are correct.
Pass array literal to PostgreSQL function
The problem is with the expression inside the function. You can test with the ANY construct like #Mureinik provided or a number of other syntax variants. In any case run the test with an EXISTS expression:
CREATE OR REPLACE FUNCTION test_me(id_list bigint[])
RETURNS bool AS
$func$
BEGIN
IF EXISTS (SELECT 1 FROM t1 WHERE id = ANY ($1)) THEN
RETURN true;
ELSE
RETURN false;
END IF;
END
$func$ LANGUAGE plpgsql STABLE;
Notes
EXISTS is shortest and most efficient:
PL/pgSQL checking if a row exists - SELECT INTO boolean
The ANY construct applied to arrays is only efficient with small arrays. For longer arrays, other syntax variants are faster. Like:
IF EXISTS (SELECT 1 FROM unnest($1) id JOIN t1 USING (id)) THEN ...
How to do WHERE x IN (val1, val2,…) in plpgsql
Don't quote the language name, it's an identifier, not a string: LANGUAGE plpgsql
Simple variant
While you are returning a boolean value, it can be even simpler. It's probably just for the demo, but as a proof of concept:
CREATE OR REPLACE FUNCTION test_me(id_list bigint[])
RETURNS bool AS
$func$
SELECT EXISTS (SELECT 1 FROM t1 WHERE id = ANY ($1))
$func$ LANGUAGE sql STABLE;
Same result.
The easiest way to check if an item is in an array is with = ANY:
CREATE OR REPLACE FUNCTION test_me(id_list BIGINT[])
RETURNS BOOLEAN AS
$$
BEGIN
PERFORM * FROM T1 WHERE id = ANY ($1);
IF FOUND THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
$$
LANGUAGE 'plpgsql';

Resources