UNION ALL on all tables starting with a certain string - snowflake-cloud-data-platform

I would like to combine tables starting with the same name into one table.
For example let's say I have a database with tables 'EXT_ABVD', 'EXT_ADAD','EXT_AVSA','OTHER', and I want to combine all tables beginning with 'EXT_', I would want the result of
select col1 ,col2 from EXT_ABVD
union all
select col1 ,col2 from EXT_ADAD
union all
select col1 ,col2 from EXT_AVSA;
I would like to do this on a regular basis (daily for example), and every time this runs there may be new tables starting with 'EXT_'. I don't want to update the union_all query manually.
I am new to Snowflake and don't know how can I do that? Can I use a script inside Snowflake?

Given these tables:
CREATE TABLE TEST_DB.PUBLIC.EXT_ABVD (col1 INTEGER, col2 INTEGER);
CREATE TABLE TEST_DB.PUBLIC.EXT_ADAD (col1 INTEGER, col2 INTEGER);
CREATE TABLE TEST_DB.PUBLIC.EXT_ADAQ (col1 INTEGER, col2 INTEGER);
A view like this could be dynamically created:
CREATE OR REPLACE VIEW TEST_DB.PUBLIC.union_view AS
SELECT * FROM TEST_DB.PUBLIC.EXT_ABVD
UNION ALL
SELECT * FROM TEST_DB.PUBLIC.EXT_ADAD
UNION ALL
SELECT * FROM TEST_DB.PUBLIC.EXT_ADAQ
Using this Procedure:
create or replace procedure TEST_DB.PUBLIC.CREATE_UNION_VEIW(TBL_PREFIX VARCHAR)
returns VARCHAR -- return final create statement
language javascript
as
$$
// build query to get tables from information_schema
var get_tables_stmt = "SELECT Table_Name FROM TEST_DB.INFORMATION_SCHEMA.TABLES \
WHERE TABLE_TYPE = 'BASE TABLE' AND TABLE_NAME LIKE '"+ TBL_PREFIX + "%';"
var get_tables_stmt = snowflake.createStatement({sqlText:get_tables_stmt });
// get result set containing all table names
var tables = get_tables_stmt.execute();
// to control if UNION ALL should be added or not
// this could likely be handled more elegantly but i don't know JavaScript :)
var row_count = get_tables_stmt.getRowCount();
var rows_iterated = 0;
// define view name
var create_statement = "CREATE OR REPLACE VIEW TEST_DB.PUBLIC.union_view AS \n";
// loop over result set to build statement
while (tables.next()) {
rows_iterated += 1;
// we get values from the first (and only) column in the result set
var table_name = tables.getColumnValue(1);
// this will obviously fail if the column count doesnt match
create_statement += "SELECT * FROM TEST_DB.PUBLIC." + table_name
// add union all to all but last row
if (rows_iterated < row_count){
create_statement += "\n UNION ALL \n"
}
}
// create the view
var create_statement = snowflake.createStatement( {sqlText: create_statement} );
create_statement.execute();
// return the create statement as text
return create_statement.getSqlText();
$$
;
Which we would call like this: CALL CREATE_UNION_VIEW('EXT_A');
This is just a basic example so logic for column counts, schemas etc. likely needs to be added. But given this I think you will be able to figure out how to deal with result sets, parameters and statements.
Edit: See here for how to set up a task that would run a procedure on daily basis. The most basic would in this case look like this:
create or replace task create_union_task
warehouse = COMPUTE_WH
schedule = '1440 minute' -- once every day
as
CALL CREATE_UNION_VIEW('EXT_A');

The only way you can achieve this currently is via a Snowflake Stored Procedure.
You don't specify how you want to consume the result of the query, but a convenient way is via a VIEW. So the Stored Procedure has to generate a VIEW definition containing the query in your question.

Related

how to insert a value to both tables using join codeigniter 4 sql server

I'm trying to insert values to both tables at the same time. I'm using a form in my application where I use the inserted values from the form to inert them into the db. But now I'm inserting values to one table (Users).
public function registerUser($formdata){
helper('global');// a heper for randomString().
//Asign value to columns
$db_data['Emailaddress'] = $formdata['emailaddress'];
$db_data['Password'] = password_hash($formdata['password'], PASSWORD_DEFAULT);
$db_data['Status'] = 'Free';
$db_data['Token'] = randomString(32);
$db_data['FirstLogin'] = 0;
$db_data['Users.UsersKey'] = $db_data['UsersSettings.UsersKey'];
//insert to db
$this->db->table('Users', 'UsersSettings')->join('UsersSettings','Users.UsersKey = UsersSettings.UsersKey', 'inner')->insert($db_data);
}
public function updateUserSetting_proccess(){
$formdata = $this->request->getPostGet();
return $this->SettingsModel->update_user_settings($formdata);
}
The content of the Users table is:
SELECT TOP (1000) [UsersKey]
,[UniqueID]
,[Token]
,[ResetToken]
,[Emailaddress]
,[Password]
,[Status]
,[DateTimeAdded]
,[DateTimeLastUpdated]
,[FirstLogin]
FROM [dbo].[Users]
The UsersKey is inserted automaticly because of the auto increment.
The second table I want to use is UsersSettings with content:
SELECT TOP (1000) [UsersSettingsKey]
,[UsersKey]
,[FirstName]
,[LastName]
,[Logo]
,[Organization]
,[Address]
,[Number]
,[Addition]
,[Postcode]
,[City]
,[Country]
,[Language]
,[Theme]
,[CalcPercentage]
,[CalcAdminFee]
,[ColorPrimary]
,[ColorSecondary]
,[DateTimeLastUpdated]
FROM [dbo].[UsersSettings]
I want the UsersKey from UsersSettings have the same value in Users UsersKey.
I tried this:
join('UsersSettings','Users.UsersKey = UsersSettings.UsersKey', 'inner')
but it didn't help. Can someong give me some suggestions?
You'll need to perform the insert into table Users first in order to get the generated UsersKey.
Explanation of why inserting into table Users first is required, may be shown with the SQL equivalent:
declare #lv_UsersKey int
-- insert into table Users (only essential parts shown)
insert into Users(....) values (...)
-- capture UsersKey for inserted record
select #lv_UsersKey = cast(SCOPE_IDENTITY() as int)
-- then insert into UsersSettings (only essential parts shown)
insert into UsersSettings (UsersKey, ....) values (#lv_UsersKey, ...)
Transferring the above SQL to codeigniter will look like this:
$this->db->table('Users')->insert($db_data);
$inserted_users_key = $this->db->insert_id();
$db_data2['UsersKey'] = $inserted_users_key;
// some more init of $db_data2 here
$this->db->table('UsersSettings')->insert($db_data2);

How do I parametrize Lua script to go through table values executing queries

new with Lua but trying.
I have multiple "Create table" queries which I need to execute, what changes only is Schema and Table name.
At the moment I am explicitly defining each query.
I want to parametrize Lua script from the table below passing table name as argument, since there is 100+ tables which needs to be generated this way.
MappingTable
targetSchema
targetTable
originSchema
originTable
schema1
table1
schema3
table3
schema2
table2
schema4
table4
Current solution
CREATE LUA SCRIPT "ScriptName" () RETURNS ROWCOUNT AS
query([[
Create or replace table schema1.table1 as
select * from schema3.table3;
]])
query([[
Create or replace table schema2.table2 as
select * from schema4.table4;
]])
What I've tried:
CREATE OR REPLACE LUA SCRIPT "ScriptName"('MappingTable') RETURNS ROWCOUNT AS
map_table = execute[[ SELECT * FROM .."'MappingTableName'"..;]] -- passing argument of the script, mapping table name
-- passing values from the columns
load = [[Create or replace table ]]..
[[']]..targetSchema..[['.']]..
[[']]..targetTable..]]..
[[as select * from]]..
[[']]..originSchema..[['.']]..
[[']]..originTable..[[']]
Not sure about the syntax, also I guess I need to loop through the values of the table.
Thank you
Here is a sample script:
create or replace lua script ScriptName (
t_MappingTable
, s_ConditionColumn
, s_ConditionValue
)
returns rowcount as
-- passing argument of the script, mapping table name
local map_table = query ([[
select * from ::MappingTable where ::ConditionColumn = :ConditionValue
]],{
MappingTable = t_MappingTable
, ConditionColumn = s_ConditionColumn
, ConditionValue = s_ConditionValue
});
-- passing values from the columns
for i = 1, #map_table do
query ([[
create or replace table ::targetSchema.::targetTable as
select * from ::originSchema.::originTable
]],{
targetSchema = map_table[i].TARGETSCHEMA
, targetTable = map_table[i].TARGETTABLE
, originSchema = map_table[i].ORIGINSCHEMA
, originTable = map_table[i].ORIGINTABLE
});
end
/
You may want to read values from map_table the other way.
In case when you have case-sensitive column names:
targetSchema = map_table[i]."targetSchema"
, targetTable = map_table[i]."targetTable"
, originSchema = map_table[i]."originSchema"
, originTable = map_table[i]."originTable"
In case when you are sure in column order and don't want to worry about column names:
targetSchema = map_table[i][1]
, targetTable = map_table[i][2]
, originSchema = map_table[i][3]
, originTable = map_table[i][4]

Stored procedure handling multiple SQL statements in Snowflake

I'm creating a stored procedure in Snowflake that will eventually be called by a task.
However I'm getting the following error:
Multiple SQL statements in a single API call are not supported; use one API call per statement instead
And not sure how approach the advised solution within my Javascript implementation.
Here's what I have
CREATE OR REPLACE PROCEDURE myStoreProcName()
RETURNS VARCHAR
LANGUAGE javascript
AS
$$
var rs = snowflake.execute( { sqlText:
`set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE myTableName AS
with cte1 as (
SELECT
*
FROM Table1
where date = $curr_date
)
,cte2 as (
SELECT
*
FROM Table2
where date = $curr_date
)
select * from
cte1 as 1
inner join cte2 as 2
on(1.key = 2.key)
`
} );
return 'Done.';
$$;
You could write your own helper function(idea of user: waldente):
this.executeMany=(s) => s.split(';').map(sqlText => snowflake.createStatement({sqlText}).execute());
executeMany('set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE ...');
The last statement should not contain ; it also may fail if there is ; in one of DDL which was not intended as separator.
You can't have:
var rs = snowflake.execute( { sqlText:
`set curr_date = '2015-01-01';
CREATE OR REPLACE TABLE myTableName AS
...
`
Instead you need to call execute twice (or more). Each for a different query ending in ;.

Combine multiple tables into one in Snowflake

Let's say I have the following monthly tables with table names formatted such that the number after the underscore refers to the month. What I want to do is to combine these 12 tables into one without having to write 10-30 insert/union all statements
table_1
table_2
table_3
table_4
table_5
table_6
table_7
table_8
table_9
table_10
table_11
table_12 -- (only 12 in this instance but could be as many as 36)
My current approach is to first create the master table with data from table_1.
create temporary table master_table_1_12 as
select * -- * to keep it simple for this example
from table_1;
Then use variables such that I can simply keep hitting the run button until it errors out with "table_13 does not exist"
set month_id=(select max(month_id) from master_table_1_12) + 1;
set table_name=concat('table_',$month_id);
insert into master_table_1_12
select *
from identifier($table_name);
Note: All monthly tables have a month_id column
Sure it saves some space on the console(compared to multiple inserts), but I still have to run it 12 times. Are Snowflake Tasks something I could use for this? I couldn't find a fitting example from their documentation to code that up but, if anyone had success with that or with a Javascript based SP for a problem like this, please enlighten.
Here's a stored procedure that will insert into master_table_1_12 from selects on table_1 through table_12. Modify as required:
create or replace procedure FILL_MASTER_TABLE()
returns string
language javascript
as
$$
var rows = 0;
for (var i=1; i<=12; i++) {
rows += insertRows(i);
}
return rows + " rows inserted into master_table_1_12.";
// End of main function
function insertRows(i) {
sql =
`insert into master_table_1_12
select *
from table_${i};`;
return doInsert(sql);
}
function doInsert(queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs = stmt.execute();;
rs.next();
return rs.getColumnValue(1);
}
$$;
call fill_master_table();
By the way, if you don't have any processing to do and just need to consolidate the tables, you can do something like this:
insert into master_table_1_12
select * from table_1
union all
select * from table_2
union all
select * from table_3
union all
select * from table_4
union all
select * from table_5
union all
select * from table_6
union all
select * from table_7
union all
select * from table_8
union all
select * from table_9
union all
select * from table_10
union all
select * from table_11
union all
select * from table_12
;
Can you not create a view on top of these 12 tables. The view will be an union of all these tables.
Based on the comments below, I further elaborated my answer. please try this approach. It will provide better performance when your table is large. Partitioning it will improve performance. This is based on real experience.
CREATE TABLE SALES_2000 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2001 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2002 (REGION VARCHAR, UNITS_SOLD NUMBER);
CREATE TABLE SALES_2003 (REGION VARCHAR, UNITS_SOLD NUMBER);
INSERT INTO SALES_2000 VALUES('ASIA', 25);
INSERT INTO SALES_2001 VALUES('ASIA', 50);
INSERT INTO SALES_2002 VALUES('ASIA', 55);
INSERT INTO SALES_2003 VALUES('ASIA', 65);
CREATE VIEW ALL_SALES AS
SELECT * FROM SALES_2000
UNION
SELECT * FROM SALES_2001
UNION
SELECT * FROM SALES_2002
UNION
SELECT * FROM SALES_2003;
SELECT * FROM ALL_SALES WHERE UNITS_SOLD = 25;
I ended up creating a UDF that spits out a create view statement and a stored procedure that executes it to create a temporary view. I work with tables following specific naming convention, so you might have to tweak this solution a little for your use case. The separation of UDF and stored proc actually helps with that as you'd mostly need to tweak the SQL UDF. I am sharing a simplified version of what I actually have in the interest of keeping it representative of the tables I listed in my question.
SQL UDF FOR GENERATING A CREATE VIEW STATETEMENT
create or replace function sandbox.public.define_view(table_pattern varchar, start_month varchar, end_month varchar)
returns table ("" varchar) as
$$
with cte1(month_id) as
(select start_month::int + row_number() over (order by 1) - 1
from table(generator(rowcount=> end_month::int - start_month::int + 1)))
,cte2(month_id,statement) as
(select 0,
concat('create or replace temporary view master_',
split_part(table_pattern,'.',-1),
start_month,
'_',
end_month,
' as ')
union all
select month_id,
concat('select * from ',
table_pattern,
month_id,
case when month_id=end_month::int then ';' else ' union all ' end)
from cte1)
select listagg(statement, '\n') within group (order by month_id) as create_view_statement
from cte2
$$;
PROCEDURE FOR EXECUTING THE OUTPUT OF THE UDF ABOVE
create or replace procedure sandbox.public.create_view(TABLE_PATTERN varchar, START_MONTH varchar,END_MONTH varchar)
returns varchar not null
language Javascript
execute as caller
as
$$
sql_command = 'select * from table(sandbox.public.define_view(:1, :2, :3))';
var stmt = snowflake.createStatement({sqlText: sql_command ,binds: [TABLE_PATTERN, START_MONTH, END_MONTH]}).execute();
stmt.next();
var ddl = stmt.getColumnValue(1);
var run=snowflake.createStatement({sqlText: ddl}).execute();
run.next();
var message=run.getColumnValue(1);
return "Temporary " + message;
$$;
USAGE DEMO
set table_pattern ='sandbox.public.table_';
set start_month ='1';
set end_month = '12';
set master_view='master_'||split_part($table_pattern,'.',-1)||$start_month||'_'||$end_month;
call create_view($table_pattern, $start_month, $end_month);
select top 100 *
from identifier($master_view);

Copy records with dynamic column names

I have two tables with different columns in PostgreSQL 9.3:
CREATE TABLE person1(
NAME TEXT NOT NULL,
AGE INT NOT NULL
);
CREATE TABLE person2(
NAME TEXT NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR(50),
SALARY REAL
);
INSERT INTO person2 (Name, Age, ADDRESS, SALARY)
VALUES ('Piotr', 20, 'London', 80);
I would like to copy records from person2 to person1, but column names can change in program, so I would like to select joint column names in program. So I create an array containing the intersection of column names. Next I use a function: insert into .... select, but I get an error, when I pass the array variable to the function by name. Like this:
select column_name into name1 from information_schema.columns where table_name = 'person1';
select column_name into name2 from information_schema.columns where table_name = 'person2';
select * into cols from ( select * from name1 intersect select * from name2) as tmp;
-- Create array with name of columns
select array (select column_name::text from cols) into cols2;
CREATE OR REPLACE FUNCTION f_insert_these_columns(VARIADIC _cols text[])
RETURNS void AS
$func$
BEGIN
EXECUTE (
SELECT 'INSERT INTO person1 SELECT '
|| string_agg(quote_ident(col), ', ')
|| ' FROM person2'
FROM unnest(_cols) col
);
END
$func$ LANGUAGE plpgsql;
select * from cols2;
array
------------
{name,age}
(1 row)
SELECT f_insert_these_columns(VARIADIC cols2);
ERROR: column "cols2" does not exist
What's wrong here?
You seem to assume that SELECT INTO in SQL would assign a variable. But that is not so.
It creates a new table and its use is discouraged in Postgres. Use the superior CREATE TABLE AS instead. Not least, because the meaning of SELECT INTO inside plpgsql is different:
Combine two tables into a new one so that select rows from the other one are ignored
Concerning SQL variables:
User defined variables in PostgreSQL
Hence you cannot call the function like this:
SELECT f_insert_these_columns(VARIADIC cols2);
This would work:
SELECT f_insert_these_columns(VARIADIC (TABLE cols2 LIMIT 1));
Or cleaner:
SELECT f_insert_these_columns(VARIADIC array) -- "array" being the unfortunate column name
FROM cols2
LIMIT 1;
About the short TABLE syntax:
Is there a shortcut for SELECT * FROM?
Better solution
To copy all rows with columns sharing the same name between two tables:
CREATE OR REPLACE FUNCTION f_copy_rows_with_shared_cols(
IN _tbl1 regclass
, IN _tbl2 regclass
, OUT rows int
, OUT columns text)
LANGUAGE plpgsql AS
$func$
BEGIN
SELECT INTO columns -- proper use of SELECT INTO!
string_agg(quote_ident(attname), ', ')
FROM (
SELECT attname
FROM pg_attribute
WHERE attrelid IN (_tbl1, _tbl2)
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
GROUP BY 1
HAVING count(*) = 2
) sub;
EXECUTE format('INSERT INTO %1$s(%2$s) SELECT %2$s FROM %3$s'
, _tbl1, columns, _tbl2);
GET DIAGNOSTICS rows = ROW_COUNT; -- return number of rows copied
END
$func$;
Call:
SELECT * FROM f_copy_rows_with_shared_cols('public.person2', 'public.person1');
Result:
rows | columns
-----+---------
3 | name, age
Major points
Note the proper use of SELECT INTO for assignment inside plpgsql.
Note the use of the data type regclass. This allows to use schema-qualified table names (optionally) and defends against SQL injection attempts:
Table name as a PostgreSQL function parameter
About GET DIAGNOSTICS:
Count rows affected by DELETE
About OUT parameters:
Returning from a function with OUT parameter
The manual about format().
Information schema vs. system catalogs.

Resources