Create View from Pivot - snowflake-cloud-data-platform

Create View from Pivot - snowflake-cloud-data-platform

Is there a way to create a view in snowflake that would hold the Pivot data from the dynamic pivot technique? I am able to run the code and it works good but I would like to use it as a view the pivot column names changes, which is why I like the dynamic Pivot technique.
select name pivot_column
, date_trunc(quarter, month) month
, sum(month_views) pivot_value
from hero_views
group by 1,2;
call pivot_prev_results();
select *
from table(result_scan(last_query_id(-2)));
create or replace procedure pivot_prev_results()
returns string
language javascript
execute as caller as
$$
var cols_query = `
select '\\''
|| listagg(distinct pivot_column, '\\',\\'') within group (order by pivot_column)
|| '\\''
from table(result_scan(last_query_id(-1)))`;
var stmt1 = snowflake.createStatement({sqlText: cols_query});
var results1 = stmt1.execute();
results1.next();
var col_list = results1.getColumnValue(1);
pivot_query = `
select *
from (select * from table(result_scan(last_query_id(-2))))
pivot(max(pivot_value) for pivot_column in (${col_list}))
`
var stmt2 = snowflake.createStatement({sqlText: pivot_query});
stmt2.execute();
return `select * from table(result_scan('${stmt2.getQueryId()}'));\n select * from table(result_scan(last_query_id(-2)));`;
$$;

The problem with pivots and SQL is that usually for SQL to work you need to know the output columns at compile time.
I'm the author of that dynamic pivot code, and it works this way as the nature of the dynamic pivot is that we don't know what columns will be in the output. And that's why we can't create a view for it, because we don't know the columns to use.
The native pivot asks for a static list of columns for that reason - then the columns are known at compile time.

Related

Stored procedure handling multiple SQL statements in Snowflake

I'm creating a stored procedure in Snowflake that will eventually be called by a task.
However I'm getting the following error:
Multiple SQL statements in a single API call are not supported; use one API call per statement instead
And not sure how approach the advised solution within my Javascript implementation.
Here's what I have
CREATE OR REPLACE PROCEDURE myStoreProcName()
RETURNS VARCHAR
LANGUAGE javascript
AS
$$
var rs = snowflake.execute( { sqlText:
`set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE myTableName AS
with cte1 as (
SELECT
*
FROM Table1
where date = $curr_date
)
,cte2 as (
SELECT
*
FROM Table2
where date = $curr_date
)
select * from
cte1 as 1
inner join cte2 as 2
on(1.key = 2.key)
`
} );
return 'Done.';
$$;

You could write your own helper function(idea of user: waldente):
this.executeMany=(s) => s.split(';').map(sqlText => snowflake.createStatement({sqlText}).execute());
executeMany('set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE ...');
The last statement should not contain ; it also may fail if there is ; in one of DDL which was not intended as separator.

You can't have:
var rs = snowflake.execute( { sqlText:
`set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE myTableName AS
...
`
Instead you need to call execute twice (or more). Each for a different query ending in ;.

Combine multiple tables into one in Snowflake

Let's say I have the following monthly tables with table names formatted such that the number after the underscore refers to the month. What I want to do is to combine these 12 tables into one without having to write 10-30 insert/union all statements
table_1
table_2
table_3
table_4
table_5
table_6
table_7
table_8
table_9
table_10
table_11
table_12 -- (only 12 in this instance but could be as many as 36)
My current approach is to first create the master table with data from table_1.
create temporary table master_table_1_12 as
select * -- * to keep it simple for this example
from table_1;
Then use variables such that I can simply keep hitting the run button until it errors out with "table_13 does not exist"
set month_id=(select max(month_id) from master_table_1_12) + 1;
set table_name=concat('table_',$month_id);
insert into master_table_1_12
select *
from identifier($table_name);
Note: All monthly tables have a month_id column
Sure it saves some space on the console(compared to multiple inserts), but I still have to run it 12 times. Are Snowflake Tasks something I could use for this? I couldn't find a fitting example from their documentation to code that up but, if anyone had success with that or with a Javascript based SP for a problem like this, please enlighten.

Here's a stored procedure that will insert into master_table_1_12 from selects on table_1 through table_12. Modify as required:
create or replace procedure FILL_MASTER_TABLE()
returns string
language javascript
as
$$
var rows = 0;
for (var i=1; i<=12; i++) {
rows += insertRows(i);
}
return rows + " rows inserted into master_table_1_12.";
// End of main function
function insertRows(i) {
sql =
`insert into master_table_1_12
select *
from table_${i};`;
return doInsert(sql);
}
function doInsert(queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs = stmt.execute();;
rs.next();
return rs.getColumnValue(1);
}
$$;
call fill_master_table();
By the way, if you don't have any processing to do and just need to consolidate the tables, you can do something like this:
insert into master_table_1_12
select * from table_1
union all
select * from table_2
union all
select * from table_3
union all
select * from table_4
union all
select * from table_5
union all
select * from table_6
union all
select * from table_7
union all
select * from table_8
union all
select * from table_9
union all
select * from table_10
union all
select * from table_11
union all
select * from table_12
;

Can you not create a view on top of these 12 tables. The view will be an union of all these tables.
Based on the comments below, I further elaborated my answer. please try this approach. It will provide better performance when your table is large. Partitioning it will improve performance. This is based on real experience.
CREATE TABLE SALES_2000 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2001 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2002 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2003 (REGION VARCHAR, UNITS_SOLD NUMBER);
INSERT INTO SALES_2000 VALUES('ASIA', 25);
INSERT INTO SALES_2001 VALUES('ASIA', 50);
INSERT INTO SALES_2002 VALUES('ASIA', 55);
INSERT INTO SALES_2003 VALUES('ASIA', 65);
CREATE VIEW ALL_SALES AS
SELECT * FROM SALES_2000
UNION
SELECT * FROM SALES_2001
UNION
SELECT * FROM SALES_2002
UNION
SELECT * FROM SALES_2003;
SELECT * FROM ALL_SALES WHERE UNITS_SOLD = 25;

I ended up creating a UDF that spits out a create view statement and a stored procedure that executes it to create a temporary view. I work with tables following specific naming convention, so you might have to tweak this solution a little for your use case. The separation of UDF and stored proc actually helps with that as you'd mostly need to tweak the SQL UDF. I am sharing a simplified version of what I actually have in the interest of keeping it representative of the tables I listed in my question.
SQL UDF FOR GENERATING A CREATE VIEW STATETEMENT
create or replace function sandbox.public.define_view(table_pattern varchar, start_month varchar, end_month varchar)
returns table ("" varchar) as
$$
with cte1(month_id) as
(select start_month::int + row_number() over (order by 1) - 1
from table(generator(rowcount=> end_month::int - start_month::int + 1)))
,cte2(month_id,statement) as
(select 0,
concat('create or replace temporary view master_',
split_part(table_pattern,'.',-1),
start_month,
'_',
end_month,
' as ')
union all
select month_id,
concat('select * from ',
table_pattern,
month_id,
case when month_id=end_month::int then ';' else ' union all ' end)
from cte1)
select listagg(statement, '\n') within group (order by month_id) as create_view_statement
from cte2
$$;
PROCEDURE FOR EXECUTING THE OUTPUT OF THE UDF ABOVE
create or replace procedure sandbox.public.create_view(TABLE_PATTERN varchar, START_MONTH varchar,END_MONTH varchar)
returns varchar not null
language Javascript
execute as caller
as
$$
sql_command = 'select * from table(sandbox.public.define_view(:1, :2, :3))';
var stmt = snowflake.createStatement({sqlText: sql_command ,binds: [TABLE_PATTERN, START_MONTH, END_MONTH]}).execute();
stmt.next();
var ddl = stmt.getColumnValue(1);
var run=snowflake.createStatement({sqlText: ddl}).execute();
run.next();
var message=run.getColumnValue(1);
return "Temporary " + message;
$$;
USAGE DEMO
set table_pattern ='sandbox.public.table_';
set start_month ='1';
set end_month = '12';
set master_view='master_'||split_part($table_pattern,'.',-1)||$start_month||'_'||$end_month;
call create_view($table_pattern, $start_month, $end_month);
select top 100 *
from identifier($master_view);

SQL Server to Oracle - using Cross Apply with Oracle

I have a function that takes primary keys and separates them with commas.
Oracle function:
create or replace function split(
list in CHAR,
delimiter in CHAR default ','
)
return split_tbl as
splitted split_tbl := split_tbl();
i pls_integer := 0;
list_ varchar2(32767) := list;
begin
loop
i := instr(list_, delimiter);
if i > 0 then
splitted.extend(1);
splitted(splitted.last) := substr(list_, 1, i - 1);
list_ := substr(list_, i + length(delimiter));
else
splitted.extend(1);
splitted(splitted.last) := list_;
return splitted;
end if;
end loop;
end;
and I have this query in SQL Server that returns the data of this query in the function table
select maxUserSalary.id as 'UserSalary'
into #usersalary
from dbo.Split(#usersalary,';') as userid
cross apply (
select top 1 * from User_Salaryas usersalary
where usersalary.User_Id= userid.item
order by usersalary.Date desc
) as maxUserSalary
The problem is, I'm not able to use cross apply in Oracle to throw this data into this function that is returning a table.
How can I use cross apply with Oracle to return this data in function?

You're using Oracle 18c so you can use the CROSS APPLY syntax. Oracle added it (as well as LATERAL and OUTER APPLY ) in 12c.
Here is a simplified version of your logic:
select us.name
, us.salary
from table(split('FOX IN SOCKS,THING ONE,THING TWO')) t
cross apply (select us.name, max(us.salary) as salary
from user_salaries us
where us.name = t.column_value ) us
There is a working demo on db<>fiddle .
If this doesn't completely solve your problem please post a complete question with table structures, sample data and expected output derived from that sample.

I think APC answered your direct question well. As a side note, I wanted to suggest NOT writing your own function to do this at all. There are several existing solutions to split delimited string values into virtual tables that don't require you to create your own custom types, and don't have the performance overhead of context switching between the SQL and PL/SQL engines.
-- example data - remove this to test with your User_Salary table
with User_Salary as (select 1 as id, 'A' as user_id, sysdate as "Date" from dual
union select 2, 'B', sysdate from dual)
-- your query:
select maxUserSalary.id as "UserSalary"
from (select trim(COLUMN_VALUE) as item
from xmltable(('"'||replace(:usersalary, ';', '","')||'"'))) userid -- note ';' delimiter
cross apply (
select * from User_Salary usersalary
where usersalary.User_Id = userid.item
order by usersalary."Date" desc
fetch first 1 row only
) maxUserSalary;
If you run this and pass in 'A;B;C' for :usersalary, you'll get 1 and 2 back.
A few notes:
In this example, I'm using ; as the delimiter, since that's what your query used.
I tried to match your table/column names, but your column name Date is invalid - it's an Oracle reserved keyword, so it has to be put in quotes to be a valid column name.
As a column identifier, "UserSalary" should also have double quotes, not single.
You can't use as in table aliases.
I removed into usersalary, since into is only used with queries which return a single row, and your query can return multiple rows.

UNION ALL on all tables starting with a certain string

I would like to combine tables starting with the same name into one table.
For example let's say I have a database with tables 'EXT_ABVD', 'EXT_ADAD','EXT_AVSA','OTHER', and I want to combine all tables beginning with 'EXT_', I would want the result of
select col1 ,col2 from EXT_ABVD
union all
select col1 ,col2 from EXT_ADAD
union all
select col1 ,col2 from EXT_AVSA;
I would like to do this on a regular basis (daily for example), and every time this runs there may be new tables starting with 'EXT_'. I don't want to update the union_all query manually.
I am new to Snowflake and don't know how can I do that? Can I use a script inside Snowflake?

Given these tables:
CREATE TABLE TEST_DB.PUBLIC.EXT_ABVD (col1 INTEGER, col2 INTEGER);
CREATE TABLE TEST_DB.PUBLIC.EXT_ADAD (col1 INTEGER, col2 INTEGER);
CREATE TABLE TEST_DB.PUBLIC.EXT_ADAQ (col1 INTEGER, col2 INTEGER);
A view like this could be dynamically created:
CREATE OR REPLACE VIEW TEST_DB.PUBLIC.union_view AS
SELECT * FROM TEST_DB.PUBLIC.EXT_ABVD
UNION ALL
SELECT * FROM TEST_DB.PUBLIC.EXT_ADAD
UNION ALL
SELECT * FROM TEST_DB.PUBLIC.EXT_ADAQ
Using this Procedure:
create or replace procedure TEST_DB.PUBLIC.CREATE_UNION_VEIW(TBL_PREFIX VARCHAR)
returns VARCHAR -- return final create statement
language javascript
as
$$
// build query to get tables from information_schema
var get_tables_stmt = "SELECT Table_Name FROM TEST_DB.INFORMATION_SCHEMA.TABLES \
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_NAME LIKE '"+ TBL_PREFIX + "%';"
var get_tables_stmt = snowflake.createStatement({sqlText:get_tables_stmt });
// get result set containing all table names
var tables = get_tables_stmt.execute();
// to control if UNION ALL should be added or not
// this could likely be handled more elegantly but i don't know JavaScript :)
var row_count = get_tables_stmt.getRowCount();
var rows_iterated = 0;
// define view name
var create_statement = "CREATE OR REPLACE VIEW TEST_DB.PUBLIC.union_view AS \n";
// loop over result set to build statement
while (tables.next()) {
rows_iterated += 1;
// we get values from the first (and only) column in the result set
var table_name = tables.getColumnValue(1);
// this will obviously fail if the column count doesnt match
create_statement += "SELECT * FROM TEST_DB.PUBLIC." + table_name
// add union all to all but last row
if (rows_iterated < row_count){
create_statement += "\n UNION ALL \n"
}
}
// create the view
var create_statement = snowflake.createStatement( {sqlText: create_statement} );
create_statement.execute();
// return the create statement as text
return create_statement.getSqlText();
$$
;
Which we would call like this: CALL CREATE_UNION_VIEW('EXT_A');
This is just a basic example so logic for column counts, schemas etc. likely needs to be added. But given this I think you will be able to figure out how to deal with result sets, parameters and statements.
Edit: See here for how to set up a task that would run a procedure on daily basis. The most basic would in this case look like this:
create or replace task create_union_task
warehouse = COMPUTE_WH
schedule = '1440 minute' -- once every day
as
CALL CREATE_UNION_VIEW('EXT_A');

The only way you can achieve this currently is via a Snowflake Stored Procedure.
You don't specify how you want to consume the result of the query, but a convenient way is via a VIEW. So the Stored Procedure has to generate a VIEW definition containing the query in your question.

PostgreSQL - join statement duplicate row data combine to single row [duplicate]

I am looking for a way to concatenate the strings of a field within a group by query. So for example, I have a table:
ID COMPANY_ID EMPLOYEE
1 1 Anna
2 1 Bill
3 2 Carol
4 2 Dave
and I wanted to group by company_id to get something like:
COMPANY_ID EMPLOYEE
1 Anna, Bill
2 Carol, Dave
There is a built-in function in mySQL to do this group_concat

PostgreSQL 9.0 or later:
Modern Postgres (since 2010) has the string_agg(expression, delimiter) function which will do exactly what the asker was looking for:
SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;
Postgres 9 also added the ability to specify an ORDER BY clause in any aggregate expression; otherwise you have to order all your results or deal with an undefined order. So you can now write:
SELECT company_id, string_agg(employee, ', ' ORDER BY employee)
FROM mytable
GROUP BY company_id;
PostgreSQL 8.4.x:
PostgreSQL 8.4 (in 2009) introduced the aggregate function array_agg(expression) which collects the values in an array. Then array_to_string() can be used to give the desired result:
SELECT company_id, array_to_string(array_agg(employee), ', ')
FROM mytable
GROUP BY company_id;
PostgreSQL 8.3.x and older:
When this question was originally posed, there was no built-in aggregate function to concatenate strings. The simplest custom implementation (suggested by Vajda Gabo in this mailing list post, among many others) is to use the built-in textcat function (which lies behind the || operator):
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
Here is the CREATE AGGREGATE documentation.
This simply glues all the strings together, with no separator. In order to get a ", " inserted in between them without having it at the end, you might want to make your own concatenation function and substitute it for the "textcat" above. Here is one I put together and tested on 8.3.12:
CREATE FUNCTION commacat(acc text, instr text) RETURNS text AS $$
BEGIN
IF acc IS NULL OR acc = '' THEN
RETURN instr;
ELSE
RETURN acc || ', ' || instr;
END IF;
END;
$$ LANGUAGE plpgsql;
This version will output a comma even if the value in the row is null or empty, so you get output like this:
a, b, c, , e, , g
If you would prefer to remove extra commas to output this:
a, b, c, e, g
Then add an ELSIF check to the function like this:
CREATE FUNCTION commacat_ignore_nulls(acc text, instr text) RETURNS text AS $$
BEGIN
IF acc IS NULL OR acc = '' THEN
RETURN instr;
ELSIF instr IS NULL OR instr = '' THEN
RETURN acc;
ELSE
RETURN acc || ', ' || instr;
END IF;
END;
$$ LANGUAGE plpgsql;

How about using Postgres built-in array functions? At least on 8.4 this works out of the box:
SELECT company_id, array_to_string(array_agg(employee), ',')
FROM mytable
GROUP BY company_id;

As from PostgreSQL 9.0 you can use the aggregate function called string_agg. Your new SQL should look something like this: SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;

I claim no credit for the answer because I found it after some searching:
What I didn't know is that PostgreSQL allows you to define your own aggregate functions with CREATE AGGREGATE
This post on the PostgreSQL list shows how trivial it is to create a function to do what's required:
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
SELECT company_id, textcat_all(employee || ', ')
FROM mytable
GROUP BY company_id;

As already mentioned, creating your own aggregate function is the right thing to do. Here is my concatenation aggregate function (you can find details in French):
CREATE OR REPLACE FUNCTION concat2(text, text) RETURNS text AS '
SELECT CASE WHEN $1 IS NULL OR $1 = \'\' THEN $2
WHEN $2 IS NULL OR $2 = \'\' THEN $1
ELSE $1 || \' / \' || $2
END;
'
LANGUAGE SQL;
CREATE AGGREGATE concatenate (
sfunc = concat2,
basetype = text,
stype = text,
initcond = ''
);
And then use it as:
SELECT company_id, concatenate(employee) AS employees FROM ...

This latest announcement list snippet might be of interest if you'll be upgrading to 8.4:
Until 8.4 comes out with a
super-effient native one, you can add
the array_accum() function in the
PostgreSQL documentation for rolling
up any column into an array, which can
then be used by application code, or
combined with array_to_string() to
format it as a list:
http://www.postgresql.org/docs/current/static/xaggr.html
I'd link to the 8.4 development docs but they don't seem to list this feature yet.

Following up on Kev's answer, using the Postgres docs:
First, create an array of the elements, then use the built-in array_to_string function.
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
select array_to_string(array_accum(name),'|') from table group by id;

Following yet again on the use of a custom aggregate function of string concatenation: you need to remember that the select statement will place rows in any order, so you will need to do a sub select in the from statement with an order by clause, and then an outer select with a group by clause to aggregate the strings, thus:
SELECT custom_aggregate(MY.special_strings)
FROM (SELECT special_strings, grouping_column
FROM a_table
ORDER BY ordering_column) MY
GROUP BY MY.grouping_column

Use STRING_AGG function for PostgreSQL and Google BigQuery SQL:
SELECT company_id, STRING_AGG(employee, ', ')
FROM employees
GROUP BY company_id;

I found this PostgreSQL documentation helpful: http://www.postgresql.org/docs/8.0/interactive/functions-conditional.html.
In my case, I sought plain SQL to concatenate a field with brackets around it, if the field is not empty.
select itemid,
CASE
itemdescription WHEN '' THEN itemname
ELSE itemname || ' (' || itemdescription || ')'
END
from items;

If you are on Amazon Redshift, where string_agg is not supported, try using listagg.
SELECT company_id, listagg(EMPLOYEE, ', ') as employees
FROM EMPLOYEE_table
GROUP BY company_id;

According to version PostgreSQL 9.0 and above you can use the aggregate function called string_agg. Your new SQL should look something like this:
SELECT company_id, string_agg(employee, ', ')
FROM mytable GROUP BY company_id;

You can also use format function. Which can also implicitly take care of type conversion of text, int, etc by itself.
create or replace function concat_return_row_count(tbl_name text, column_name text, value int)
returns integer as $row_count$
declare
total integer;
begin
EXECUTE format('select count(*) from %s WHERE %s = %s', tbl_name, column_name, value) INTO total;
return total;
end;
$row_count$ language plpgsql;
postgres=# select concat_return_row_count('tbl_name','column_name',2); --2 is the value

I'm using Jetbrains Rider and it was a hassle copying the results from above examples to re-execute because it seemed to wrap it all in JSON. This joins them into a single statement that was easier to run
select string_agg('drop table if exists "' || tablename || '" cascade', ';')
from pg_tables where schemaname != $$pg_catalog$$ and tableName like $$rm_%$$

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Create View from Pivot - snowflake-cloud-data-platform

Related

Stored procedure handling multiple SQL statements in Snowflake

Combine multiple tables into one in Snowflake

SQL Server to Oracle - using Cross Apply with Oracle

UNION ALL on all tables starting with a certain string

PostgreSQL - join statement duplicate row data combine to single row [duplicate]

Categories

Resources