Redo a create or replace in snowflake - snowflake-cloud-data-platform

Redo a create or replace in snowflake - snowflake-cloud-data-platform

I have the following problem:
I have used the function:
CREATE OR REPLACE TABLE myschema.public.table1 as (SELECT * FROM myschema.public.table1 BEFORE(OFFSET => -60*4*15) WHERE MARKET = 'ES'
)
I still had the filter MARKET = 'ES' in and now all the entries that are unequal MARKET = 'ES' are gone. Can I still undo this?

As you issued create or replace, the original table is dropped. So you need to rename your existing table and undrop the original table. Then you may re-run your timetravel command:
alter table table1 to table1_bak
undrop table table1;
CREATE OR REPLACE TABLE myschema.public.table1 as (SELECT * FROM myschema.public.table1 BEFORE(OFFSET => -60*4*15) )
https://docs.snowflake.com/en/sql-reference/sql/undrop-table.html

Related

Snowflake - how to do multiple DML operations on same primary key in a specific order?

I am trying to set up continuous data replication in Snowflake. I get the transactions happened in source system and I need to perform them in Snowflake in the same order as source system. I am trying to use MERGE for this, but when there are multiple operations on same key in source system, MERGE is not working correctly. It either misses an operation or returns duplicate row detected during DML operation error.
Please note that the transactions need to happen in exact order and it is not possible to take the latest transaction for a key and just do it (like if a record has been INSERTED and UPDATED, in Snowflake too it needs to be inserted first and then updated even though insert is only transient state) .
Here is the example:
create or replace table employee_source (
id int,
first_name varchar(255),
last_name varchar(255),
operation_name varchar(255),
binlogkey integer
)
create or replace table employee_destination ( id int, first_name varchar(255), last_name varchar(255) );
insert into employee_source values (1,'Wayne','Bells','INSERT',11);
insert into employee_source values (1,'Wayne','BellsT','UPDATE',12);
insert into employee_source values (2,'Anthony','Allen','INSERT',13);
insert into employee_source values (3,'Eric','Henderson','INSERT',14);
insert into employee_source values (4,'Jimmy','Smith','INSERT',15);
insert into employee_source values (1,'Wayne','Bellsa','UPDATE',16);
insert into employee_source values (1,'Wayner','Bellsat','UPDATE',17);
insert into employee_source values (2,'Anthony','Allen','DELETE',18);
MERGE into employee_destination as T using (select * from employee_source order by binlogkey)
AS S
ON T.id = s.id
when not matched
And S.operation_name = 'INSERT' THEN
INSERT (id,
first_name,
last_name)
VALUES (
S.id,
S.first_name,
S.last_name)
when matched AND S.operation_name = 'UPDATE'
THEN
update set T.first_name = S.first_name, T.last_name = S.last_name
When matched
And S.operation_name = 'DELETE' THEN DELETE;
I am expecting to see - Bellsat - as last name for employee id 1 in the employee_destination table after all rows get processed. Same way, I should not see emp id 2 in the employee_destination table.
Is there any other alternative to MERGE to achieve this? Basically to go over every single DML in the same order (using binlogkey column for ordering) .
thanks.

You need to manipulate your source data to ensure that you only have one record per key/operation otherwise the join will be non-deterministic and will (dpending on your settings) either error or will update using a random one of the applicable source records. This is covered in the documentation here https://docs.snowflake.com/en/sql-reference/sql/merge.html#duplicate-join-behavior.
In any case, why would you want to update a record only for it to be overwritten by another update - this would be incredibly inefficient?

Since your updates appear to include the new values for all rows, you can use a window function to get to just the latest incoming change, and then merge those results into the target table. For example, the select for that merge (with the window function to get only the latest change) would look like this:
with SOURCE_DATA as
(
select COLUMN1::int ID
,COLUMN2::string FIRST_NAME
,COLUMN3::string LAST_NAME
,COLUMN4::string OPERATION_NAME
,COLUMN5::int PROCESSING_ORDER
from values
(1,'Wayne','Bells','INSERT',11),
(1,'Wayne','BellsT','UPDATE',12),
(2,'Anthony','Allen','INSERT',13),
(3,'Eric','Henderson','INSERT',14),
(4,'Jimmy','Smith','INSERT',15),
(1,'Wayne','Bellsa','UPDATE',16),
(1,'Wayne','Bellsat','UPDATE',17),
(2,'Anthony','Allen','DELETE',18)
)
select * from SOURCE_DATA
qualify row_number() over (partition by ID order by PROCESSING_ORDER desc) = 1
That will produce a result set that has only the changes required to merge into the target table:
ID
FIRST_NAME
LAST_NAME
OPERATION_NAME
PROCESSING_ORDER
1
Wayne
Bellsat
UPDATE
17
2
Anthony
Allen
DELETE
18
3
Eric
Henderson
INSERT
14
4
Jimmy
Smith
INSERT
15
You can then change the when not matched to remove the operation_name. If it's listed as an update and it's not in the target table, it's because it was inserted in a previous operation in the new changes.
For the when matched clause, you can use the operation_name to determine if the row should be updated or deleted.

Snowflake time travel

I would need to create tables with use of snowflake time travel but not to drop tables and create new objects. I'm looking for most efficient way to do that.
Example:
CREATE OR REPLACE TABLE_SCHEMA.TABLE_NAME AS SELECT * FROM TABLE_SCHEMA.TABLE_NAME at(timestamp => '2020-11-01 07:00:00'::timestamp);
In snowflake documentation I found out that by running a CREATE OR REPLACE, the original table is dropped and replaced with a new table. There will be no time-travel data from before the replace table was issued. If I need to remove the data from a table and retain the data for time-travel purposes, I can use TRUNCATE TABLE statement. My question is shoud I use TRUNCATE or CREATE OR REPLACE? Thank you in advance.

The simplest approach is INSERT OVERWIRTE INTO:
INSERT OVERWRITE INTO TABLE_SCHEMA.TABLE_NAME
AS
SELECT *
FROM TABLE_SCHEMA.TABLE_NAME at(timestamp => '2020-11-01 07:00:00'::timestamp);
I would also like to know how overite effects time travel...historical data?
It is still possible to access data before INSERT OVERWRITE as long as it is within data retention period.
CREATE OR REPLACE TABLE TAB(id INT) AS SELECT 1 UNION SELECT 2;
-- original table
SELECT * FROM TAB;
-- 1
-- 2
UPDATE TAB SET id = id * 10;
SET queryupdate = LAST_QUERY_ID();
-- after update
SELECT * FROM TAB;
-- 10
-- 20
-- restoring state before update
INSERT OVERWRITE INTO TAB SELECT * FROM TAB BEFORE(STATEMENT => $queryupdate);
SET queryinsertoverwrite = LAST_QUERY_ID();
-- current state
SELECT * FROM TAB;
-- 1
-- 2
-- state before update
SELECT * FROM TAB BEFORE(STATEMENT => $queryupdate);
-- 1
-- 2
-- state before insert overwrite
SELECT * FROM TAB BEFORE(STATEMENT => $queryinsertoverwrite);
-- 10
-- 20

Set a temporary global variable during a Postgres query

Is it possible to set a variable during a query (valid only for the query in question) that can be captured by a TRIGGER procedure?
For example, I want to record the ID of the executor of a query (current_user is always the same).
So I would do something like this:
tbl_executor (
id PRIMARY KEY,
name VARCHAR
);
tbl_log (
executor REFERENCE tbl_executor(id),
op VARCHAR
);
tbl_other ...
CREATE TRIGGER t AFTER INSERT OR UPDATE OR DELETE ON tbl_executor
FOR EACH ROW
EXECUTE PROCEDURE (INSERT INTO tbl_log VALUES( ID_VAR_OF_THIS_QUERY ,TG_OP))
Now if I run a query like:
INSERT INTO tbl_other
VALUES(.......) - and set ID_VAR_OF_THIS_QUERY='id of executor' -
I get the following result:
tbl_log
-----------------------------
id | op |
-----------------------------
'id of executor' | 'INSERT'|
I hope I have made the idea... and I think it is hardly feasible... but is there anyone who could help me?

To answer the question
You can SET a (customized option) like this:
SET myvar.role_id = '123';
But that requires a literal value. There is also the function set_config(). Quoting the manual:
set_config(setting_name, new_value, is_local) ... set parameter and return new value
set_config sets the parameter setting_name to new_value. If is_local is true, the new value will only apply to the current transaction.
Correspondingly, read option values with SHOW or current_setting(). Related:
How to use variable settings in trigger functions?
But your trigger is on the wrong table (tbl_executor) with wrong syntax. Looks like Oracle code, where you can provide code to CREATE TRIGGER directly. In Postgres you need a trigger function first:
How to use PostgreSQL triggers?
So:
CREATE OR REPLACE FUNCTION trg_log_who()
RETURNS trigger AS
$func$
BEGIN
INSERT INTO tbl_log(executor, op)
VALUES(current_setting('myvar.role_id')::int, TG_OP); -- !
RETURN NULL; -- irrelevant for AFTER trigger
END
$func$ LANGUAGE plpgsql;
Your example setup requires the a type cast ::int.
Then:
CREATE TRIGGER trg_log_who
AFTER INSERT OR UPDATE OR DELETE ON tbl_other -- !
FOR EACH ROW EXECUTE PROCEDURE trg_log_who(); -- !
Finally, fetching id from the table tbl_executor to set the variable:
BEGIN;
SELECT set_config('myvar.role_id', id::text, true) -- !
FROM tbl_executor
WHERE name = current_user;
INSERT INTO tbl_other VALUES( ... );
INSERT INTO tbl_other VALUES( ... );
-- more?
COMMIT;
Set the the third parameter (is_local) of set_config() to true to make it session-local as requested. (The equivalent of SET LOCAL.)
But why per row? Would seem more reasonable to make it per statement?
...
FOR EACH STATEMENT EXECUTE PROCEDURE trg_foo();
Different approach
All that aside, I'd consider a different approach: a simple function returning the id a column default:
CREATE OR REPLACE FUNCTION f_current_role_id()
RETURNS int LANGUAGE sql STABLE AS
'SELECT id FROM tbl_executor WHERE name = current_user';
CREATE TABLE tbl_log (
executor int DEFAULT f_current_role_id() REFERENCES tbl_executor(id)
, op VARCHAR
);
Then, in the trigger function, ignore the executor column; will be filled automatically:
...
INSERT INTO tbl_log(op) VALUES(TG_OP);
...
Be aware of the difference between current_user and session_user. See:
How to check role of current PostgreSQL user from Qt application?

One option is to create a shared table to hold this information. Since it's per-connection, the primary key should be pg_backend_pid().
create table connection_global_vars(
backend_pid bigint primary key,
id_of_executor varchar(50)
);
insert into connection_global_vars(backend_pid) select pg_backend_pid() on conflict do nothing;
update connection_global_vars set id_of_executor ='id goes here' where backend_pid = pg_backend_pid();
-- in the trigger:
CREATE TRIGGER t AFTER INSERT OR UPDATE OR DELETE ON tbl_executor
FOR EACH ROW
EXECUTE PROCEDURE (INSERT INTO tbl_log VALUES( (select id_of_executor from connection_global_vars where backend_pid = pg_backend_pid()) ,TG_OP))
Another option is to create a temporary table (which exists per-connection).
create temporary table if not exists connection_global_vars(
id_of_executor varchar(50)
) on commit delete rows;
insert into connection_global_vars(id_of_executor) select null where not exists (select 1 from connection_global_vars);
update connection_global_vars set id_of_executor ='id goes here';
-- in the trigger:
CREATE TRIGGER t AFTER INSERT OR UPDATE OR DELETE ON tbl_executor
FOR EACH ROW
EXECUTE PROCEDURE (INSERT INTO tbl_log VALUES( (select id_of_executor from connection_global_vars where backend_pid = pg_backend_pid()) ,TG_OP))
For PostgreSQL in particular it probably won't make much difference to performance, except an unlogged temporary table may just possibly be slightly faster.
If you have performance issues around not recognising that it's a single row-table, you might run analyse.

Temporary tables in hana

it it possible to write script in hana that crate temporary table that is based
on existing table (with no need to define columns and columns types hard coded ):
create local temporary table #mytemp (id integer, name varchar(20));
create temporary table with the same columns definitions and contain the
same data ? if so ..i ill be glad to get some examples
i am searching the internet for 2 days and i couldn't find anything useful
thanks

Creating local temporary tables based on dynamic structure definition is not supported in SQLScript.
The question would be: for what do you want to use it?
Instead of a local temp. table you can use a table variable in most cases.

By querying sys.table_columns view, you can get the list and properties of source table and build a dynamic CREATE script then Execute to create the table.
You can find SQL codes for a sample case at Create Table Dynamically on HANA Database
For table columns read
select * from sys.table_columns where table_name = 'TABLENAME';

Seems to work in the hana version I have. I'm not sure how to find out what the version.
PROCEDURE "xxx.yyy.zzz::MY_TEST"(
OUT "OUT_COL" NVARCHAR(200)
)
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER
AS
BEGIN
create LOCAL TEMPORARY TABLE #LOCALTEMPTABLE
as
(
SELECT distinct 'Cola' as out_col
FROM "SYNONYMS1"
);
select * from #LOCALTEMPTABLE ;
DROP TABLE #LOCALTEMPTABLE;
END

The newer HANA version (HANA 2 SPS 04 Patch 5 ( Build 4.4.17 )) supports your request:
create local temporary table #tempTableName' like "tableTypeName";

This should inherit the data types and all exact values from whatever query is in the parenthesis:
CREATE LOCAL COLUMN TEMPORARY TABLE #mytemp AS (
SELECT
"COLUMN1",
"COLUMN2",
"COLUMN3"
FROM MyTable
);
-- Now you can add the rest of your query here as such:
SELECT * FROM #mytemp

I suppose you can just write :
create column table #MyTempTable as ( select * from MySourceTable);
BR,

TSQL Stored Proc to copy records (with a twist!)

I am trying to write a Stored Procedure in SQL Server (2005) to do something that sounds simple, but is actually proving to be more difficult that I thought.
I have a table with 30 columns and 50,000 rows.
The number of records is fixed, but users can edit the fields of existing records.
To save them having to re-key repetitive data, I want to give them the ability to select a record, and specify a range of IDs to copy those details to.
The SP I'm trying to write will take 3 parameters: The source record primary key, and the lower and upper primary keys of the range of records that the data will be copied into.
Obviously the PKs of the destination records remain unchanged.
So I figured the SP needs to do a SELECT - to get all the data to be copied, and an UPDATE - to write the data into the specified destination records.
I just don't know how to store the results of the SELECT to slot them into the UPDATE.
A temp table wouldn't help - selecting from that would be just the same as selecting from the table!
What I need is a variable that is effectively a single record, so I can go something like:
#tempRECORD = SELECT * FROM SOURCETABLE WHERE ID = #sourcePK
UPDATE SOURCETABLE
SET FIELD1 = #tempRECORD.FIELD1,
FIELD2 = #tempRECORD.FIELD2,
...
FIELD30 = #tempRECORD.FIELD30
WHERE ID >= #LOWER_id AND ID <= #UPPER_id
But I don't know how, or if you even can.
I'm also open to any other clever way I haven't even thought of!
Thanks guys!

So I figured the SP needs to do a SELECT - to get all the data to be copied, and an UPDATE - to write the data into the specified destination records.
What you need is the T-SQL-specific extension to UPDATE, UPDATE ... FROM:
UPDATE T
SET
Field1 = source.Field1
, Field2 = source.Field2
, Field3 = source.Field3
FROM
(SELECT * FROM T AS source_T WHERE source_T.ID = #sourcePK) as source
WHERE
T.ID BETWEEN #LOWER_Id AND #UPPER_Id
Note that this ability to put a FROM clause in an UPDATE statement is not standard ANSI SQL, so I don't know how this would be done in other RDBMSs.

I am pretty sure this ain't the easiest way to do it, but it should work without any problems:
DECLARE #tempField1 varchar(255)
DECLARE #tempField2 varchar(255)
...
DECLARE #tempField30 varchar(255)
SELECT #tempField1 = FIELD1, #tempField2 = FIELD2, ... ,#tempField30 = FIELD30 FROM SOURCETABLE WHERE ID = #sourcePK
UPDATE SOURCETABLE
SET FIELD1 = #tempField1,
FIELD2 = #tempField2,
...
FIELD30 = #tempField30
WHERE ID >= #LOWER_id AND ID <= #UPPER_id
You would need to edit the tempField variables so that they have the right type.