Snowflake data unloading - multiple tables - snowflake-cloud-data-platform

Does snowflake provide any utility to unload data from multiple tables at the same time? Somewhat similar to expdp (or export) in Oracle? I can write a procedure to fetch table names and copy/unload to stage files, but was wondering if there is any other utility that might be available out of the box in snowflake for this?
Also it would be helpful if someone can point out approach or best practices to use for schema refreshes.

You can do what I mention above with SQL like this:
create table test.test.table_to_export(full_table_name text, write_path text);
create table test.test.table_a(id int);
insert into test.test.table_to_export values ('test.test.table_a', 'table_a/a_files_');
Then running this will use COPY TO to write you my_stage in the form of this example for each row in the at the same time point, so the data is coherent between them (via time travel) thus this method would only work on permanent tables.
declare
c1 cursor for select full_table_name, write_path from test.test.table_to_export;
sql text;
sql_template text;
id text;
begin
select 1;
select last_query_id(-1) into id;
sql_template := $$copy into #my_stage/result/WRITE_PATH FROM (select * FROM TABLE_NAME AT(STATEMENT => '$$ || id || $$'))
file_format=(format_name='myformat' compression='gzip');$$;
for record in c1 do
sql := replace(replace(sql_template, 'TABLE_NAME', record.full_table_name), 'WRITE_PATH', record.write_path);
execute immediate sql;
end for;
end;

Related

Rewriting a trigger in SQL-Server in Oracle SQL

I created a trigger in SQL server that was designed to act whenever data was entered into a certain table, in this case a table called FORECAST_TEST_DATA. The trigger was to then take certain values from the inserted row and insert them into a table called the PRODUCT_TEST_DATE table. The other columns in the table were then to be filled with values which already existed within the table, using products that shared a common PROD_NUM value.
The query in SQL server, looks as follows:
CREATE OR ALTER TRIGGER FORECAST_TRIGGER ON FORECAST_TEST_DATA
FOR INSERT
AS
INSERT INTO PRODUCT_TEST_DATA
(PRODUCT_TEST_DATA.PROD_NUM, PRODUCT_TEST_DATA.MONTH, PRODUCT_TEST_DATA.STORE_TYPE,
PRODUCT_TEST_DATA.PRODUCT_KEY, PRODUCT_TEST_DATA.CATEGORY,
PRODUCT_TEST_DATA.BRAND_NAME,PRODUCT_TEST_DATA.COLOUR)
SELECT
inserted.PROD_NUM, inserted.MONTH, inserted.STORE_TYPE, inserted.PRODUCT_KEY,
PRODUCT_TEST_DATA.CATEGORY, PRODUCT_TEST_DATA.BRAND_NAME,PRODUCT_TEST_DATA.COLOUR
FROM inserted, PRODUCT_TEST_DATA
WHERE inserted.PROD_NUM = PRODUCT_TEST_DATA.PROD_NUM
GO
The trigger already has the desired functionality, it just needs to be rewritten into Oracle SQL.
Thanks for taking the time to read through this problem, any help is appreciated.
Here is oracle syntax -
CREATE OR REPLACE TRIGGER FORECAST_TRIGGER
AFTER INSERT ON FORECAST_TEST_DATA
AS
BEGIN
INSERT INTO PRODUCT_TEST_DATA
(PROD_NUM, MONTH, STORE_TYPE, PRODUCT_KEY, CATEGORY, BRAND_NAME, COLOUR)
SELECT
:new.PROD_NUM, :new.MONTH, :new.STORE_TYPE, :new.PRODUCT_KEY,
:new.CATEGORY, :new.BRAND_NAME, :new.COLOUR
FROM PRODUCT_TEST_DATA
WHERE :new.PROD_NUM = PROD_NUM;
END;

Create global temporary table from result of sql code with multi joins in SAS(TSQL Code)

I have a scenario where I have several sql server data source tables. I have read only access to these sources. I cannot create permanent tables in Sql Server environment. I can however create temporary tables.
I thought of creating global temporary table out of scenario 1 result set and reference that in scenario 2(again create second global temp table in scenario 2 ) and 3rd global temp table out of third sql code. I have included generic sql below
Finally create a SAS data set out of each of these global temp tables.(We want to ensure all joins, data transformation needs to happen in sql server and not perform it in SAS)
Scenario1-
select * from table1 join table2
on table1.id=table2.id
where table1.product='Apple'
Scenario-2
Above result is then used in another query as
select * from table3 m
left join above _result_table t
on m.id=t.id
and the above result is again referenced further.
I tried researching online to find similar issue implementation and I could not find it.
I have tried below code, but this creates a SAS data set , I want to instead create a global temporary table so that another query such as below can reference it. How do I accomplish that?
proc sql ;
connect to odbc(dsn=abc user=123 pw=****** connection=shared);
create table xyz as select * from
connectoin to ODBC
(
select * from table1 join table2
on table1.id=table2.id
where table1.product='Apple'
);DISCONNECT FROM ODBC;
QUIT;
Your help is greatly appreciated.
Thanks,
Upon further research, I think I got part of the solution,-still working on creating sas data set out of temporary sql server table.
I created a temp table structure lets say ##abc,then i proceeded with following steps
PROC SQL;
CONNECT TO ODBC AS T1(DSN=SRC INSERTBUFF=32767 USER=UID PW="PD." CONNECTION=SHARED);
EXECUTE(
CREATE TABLE ##abc
(id int,
name varchar(50)
)by T1;
EXECUTE(INSERT ##abc
SELECT * FROM SqlServerTable
)by T1;
SELECT * FROM CONNECTION TO T1(SELECT * FROM ##abc);
QUIT;
I got the

Use Sybase triggers to write dynamic statement using all old and new values for creating your own replication transaction statement log?

PROBLEM SUMMARY
I have to write I/U/D-statement-generating-triggers for a bucardo/symmetricDS-inspired homemade bidirectional replication system between Sybase ADS and Postgresql 11 groups of nodes, using BEFORE triggers on any Postgresql and Sybase DB that creates Insert/Update/Delete commands based on the command entered in a replicating source table: e.g. an INSERT INTO PERSON (first_name,last_name,gender,age,ethnicity) Values ('John','Doe','M',42,'C') and manipulate them into a corresponding Insert statement, and UPDATE by getting OLD and NEW values to dynamically make an UPDATE statement, along with getting OLD values to make a DELETE command, all to run per command on a destination at some interval.
I know this is difficult and no one does this but it is for a job and I have no other options and can't object to offer a different solution. I have no other teammates or human resources to help outside of SO and something like Codementors, which was not so helpful. My idea/strategy is to copy parts of bucardo/SymmetricDS when inserting OLD and NEW values for generating a statement/command to run on the destination. Right now, I am snapshotting the whole table to a CSV as opposed to doing by individual command, but by command and looping through table that generates and saves commands will make the job much easier.
One big issue is that they come from Sybase ADS and have a mixed Key/Index structure (many tables have NO PK) and are mirroring that in Postgresql, so I am trying to write PK-less statements, or all-column commands to get around the no-pk tables. They also will only replicate certain columns for certain tables, so I have a column in a table for them to insert the column names delimited by ';' and then split it out into an array and link the column names to the values for each statement to generate a full command for I/U/D, Hopefully. I am open to other strategies but this is a big solo project and I have gone at it many ways with much difficulty.
I mostly come from DBA background and have some programming experience with the fundamentals, so I am mostly pseudocoding each major sequence,googling for syntax by part, and adjusting as I go or encounter a language incapability. I am thankful for any help given, as I am getting a bit desperate and discouraged.
WHAT I HAVE TRIED
I have to do this for Sybase ADS and Postgresql but this question is intially over ADS since it's more challenging and older.
To have one "Log" table which tracks row changes for each of the replicating tables and records and ultimately dynamically generates a command is the goal for both platforms. I am trying to make trigger statements like:
CREATE TRIGGER PERSON_INSERT
ON PERSON
BEFORE
INSERT
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'INSERT','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
END;
CREATE TRIGGER PERSON_UPDATE
ON PERSON
BEFORE
UPDATE
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'U','UPDATE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
UPDATE Backlog SET OldValues=select ''first_name';'last_name';'gender';'age';'ethnicity'' from __old where SourceTableID=select ID from __old;
END;
CREATE TRIGGER PERSON_DELETE
ON PERSON
BEFORE
DELETE
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, OldValues) select ID, 'D','DELETE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __old;
END;
but I would like the "''first_name';'last_name';'gender';'age';'ethnicity''" to come from another table as a value to make it dynamic since multiple tables will write their value and statement info to the single log table. Then, it can be made into a variable and then probably split to link to the corresponding values so the IUD statements can be made which will be executed on the destination one at a time.
ATTEMPTED INCOMPLETE SAMPLE TRIGGER CODE
CREATE TRIGGER PERSON_INSERT
ON PERSON
BEFORE
INSERT
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,';') into array to correspond to new and old VALUES
--#NewValues=#['#Columns='+NEW.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'INSERT','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
END;
CREATE TRIGGER PERSON_UPDATE
ON PERSON
BEFORE
UPDATE
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,';') into array to correspond to new and old VALUES
--#NewValues=#['#Columns='+NEW.#Columns+'']
--#OldValues=#['#Columns='+OLD.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'U','UPDATE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
UPDATE Backlog SET OldValues=select ''first_name';'last_name';'gender';'age';'ethnicity'' from __old where SourceTableID=select ID from __old;
END;
CREATE TRIGGER PERSON_DELETE
ON PERSON
BEFORE
DELETE
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,',') into array to correspond to new and old VALUES
--#OldValues=#['#Columns='+OLD.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, OldValues) select ID, 'D','DELETE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __old;
END;
CONCLUSION
For each row inserted,updated, or deleted; in a COMMAND column in the log table, I am trying to generate a corresponding 'INSERT INTO PERSON ('+#Columns+') VALUES ('+#NewValues+')' type statement, or an UPDATE or DELETE. Then a Foreach service will run each command value ordered by create time, as the main replication service.
To be clear, I am trying to make my sample code trigger write all old values and new values to a column in a dynamic way without hardcoding the columns in each trigger since it will be used for multiple tables, and writing the values into a single column delimited by a comma or semicolon.
An even bigger wish or goal behind this is to find a way to save/script each IUD command and then be able to run them on subscriber server.DBs of postgresql and Sybase platform, therefore making my own replication from a log
It is a complex but solvable problem that would take time and careful planning to write. I think what you are looking for is the "Execute Immediate" command in ADS SQL syntax. With this command you can create a dynamic statement to then be executed once construction of the SQL statement is terminated. Save each desired column value to a temp table by carefully constructing the statement as a string and then execute it with Execute Immediate. For example:
DECLARE TableColumns Cursor ;
DECLARE FldName Char(100) ;
...
OPEN TableColumns AS SELECT *
FROM system.columns
WHERE parent = #cTableName
AND field_type < 21 //ADS_ROWVERSION
AND field_type <> 6 //ADS_BINARY
AND field_type <> 7; //ADS_IMAGE
While Fetch TableColumns DO
FldName = Trim( TableColumns.Name) ;
StrSql = 'SELECT New.[' + Trim( FldName ) + '] newVal' +
'INTO #myTmpTable FROM ___New n' ;
After constructing the statement as a string it can then be executed like this:
EXECUTE IMMEDIATE STRSQL ;
You can pickup old and new values from __old and __new temp tables that are always available to triggers. Insert values into temp table myTmpTable and then use it to update the target. Remember to drop myTmpTable at the end.
Furthermore, I would think you can create a function on the DD that can actually be called from each trigger on the tables you want to keep track of instead of writing a long trigger for each table and cTableName can be a parameter sent to the function. That would make maintenance a little easier.

How to include custom data migrations and static/reference data in an SSDT project?

We have a moderately-sized SSDT project (~100 tables) that's deployed to dozens of different database instances. As part of our build process we generate a .dacpac file and then when we're ready to upgrade a database we generate a publish script and run it against the database. Some db instances are upgraded at different times so it's important that we have a structured process for these upgrades and versioning.
Most of the generated migration script is dropping and (re)creating procs, functions, indexes and performing any structural changes, plus some data scripts included in a Post-Deployment script. It's these two data-related items I'd like to know how best to structure within the project:
Custom data migrations needed between versions
Static or reference data
Custom data migrations needed between versions
Sometimes we want to perform a one-off data migration as part of an upgrade and I'm not sure the best way to incorporate this into our SSDT project. For example, recently I added a new bit column dbo.Charge.HasComments to contain (redundant) derived data based on another table and will be kept in sync via triggers. An annoying but necessary performance improvement (only added after careful consideration & measurement). As part of the upgrade the SSDT-generated Publish script will contain the necessary ALTER TABLE and CREATE TRIGGER statements, but I also want to update this column based on data in another table:
update dbo.Charge
set HasComments = 1
where exists ( select *
from dbo.ChargeComment
where ChargeComment.ChargeId = Charge.ChargeId )
and HasComments = 0
What's the best way to include this data migration script in my SSDT project?
Currently I have each of these types of migrations in a separate file that's included in the Post-Deployment script, so my Post-Deployment script ends up looking like this:
-- data migrations
:r "data migration\Update dbo.Charge.HasComments if never populated.sql"
go
:r "data migration\Update some other new table or column.sql"
go
Is this the right way to do it, or is there some way to tie in with SSDT and its version tracking better, so those scripts aren't even run when the SSDT Publish is being run against a database that's already at a more recent version. I could have my own table for tracking which migrations have been run, but would prefer not to roll-my-own if there's a standard way of doing this stuff.
Static or reference data
Some of the database tables contain what we call static or reference data, e.g. list of possible timezones, setting types, currencies, various 'type' tables etc. Currently we populate these by having a separate script for each table that is run as part of the Post-Deployment script. Each static data script inserts all the 'correct' static data into a table variable and then inserts/updates/deletes the static data table as needed. Depending on the table it might be appropriate only to insert or only insert and delete but not to update existing records. So each script looks something like this:
-- table listing all the correct static data
declare #working_data table (...)
-- add all the static data that should exist into the working table
insert into #working_data (...) select null, null null where 1=0
union all select 'row1 col1 value', 'col2 value', etc...
union all select 'row2 col1 value', 'col2 value', etc...
...
-- insert any missing records in the live table
insert into staticDataTableX (...)
select * from #working_data
where not exists ( select * from staticDataTableX
where [... primary key join on #working_data...] )
-- update any columns that should be updated
update staticDataTableX
set ...
from staticDataTableX
inner join #working_data on [... primary key join on #working_data...]
-- delete any records, if appropriate with this sort of static data
delete from staticDataTableX
where not exists ( select * from staticDataTableX
where [... primary key join on #working_data...] )
and then my Post-Deployment script has a section like this:
-- static data. each script adds any missing static/reference data:
:r "static_data\settings.sql"
go
:r "static_data\other_static_data.sql"
go
:r "static_data\more_static_data.sql"
go
Is there a better or more conventional way to structure such static data scripts as part of an SSDT project?
To track whether or not the field has already been initialized, try adding an Extended Property when the initialize is performed (it can also be used to determine the need for the initialize):
To add the extended property:
EXEC sys.sp_addextendedproperty
#name = N'EP_Charge_HasComments',
#value = N'Initialized',
#level0type = N'SCHEMA', #level0name = dbo,
#level1type = N'TABLE', #level1name = Charge,
#level2type = N'COLUMN', #level2name = HasComments;
To check for the extended property:
SELECT objtype, objname, name, value
FROM fn_listextendedproperty (NULL,
'SCHEMA', 'dbo',
'TABLE', 'Charge',
'COLUMN', 'HasComments');
For reference data, try using a MERGE. It's MUCH cleaner than the triple-set of queries you're using.
MERGE INTO staticDataTableX AS Target
USING (
VALUES
('row1_UniqueID', 'row1_col1_value', 'col2_value'),
('row2_UniqueID', 'row2_col1_value', 'col2_value'),
('row3_UniqueID', 'row3_col1_value', 'col2_value'),
('row4_UniqueID', 'row4_col1_value', 'col2_value')
) AS Source (TableXID, col1, col2)
ON Target.TableXID = Source.TableXID
WHEN MATCHED THEN
UPDATE SET
Target.col1 = Source.col1,
Target.col2 = Source.col2
WHEN NOT MATCHED BY TARGET THEN
INSERT (TableXID, col1, col2)
VALUES (Source.TableXID, Source.col1, Source.col2)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;

SQL 2005 copy single column between databases

I'm still fairly new to T-SQL and SQL 2005. I need to import a column of integers from a table in database1 to a identical table (only missing the column I need) in database2. Both are sql 2005 databases. I've tried the built in import command in Server Management Studio but it's forcing me to copy the entire table. This causes errors due to constraints and 'read-only' columns (whatever 'read-only' means in sql2005). I just want to grab a single column and copy it to a table.
There must be a simple way of doing this. Something like:
INSERT INTO database1.myTable columnINeed
SELECT columnINeed from database2.myTable
Inserting won't do it since it'll attempt to insert new rows at the end of the table. What it sounds like your trying to do is add a column to the end of existing rows.
I'm not sure if the syntax is exactly right but, if I understood you then this will do what you're after.
Create the column allowing nulls in database2.
Perform an update:
UPDATE database2.dbo.tablename
SET database2.dbo.tablename.colname = database1.dbo.tablename.colname
FROM database2.dbo.tablename INNER JOIN database1.dbo.tablename ON database2.dbo.tablename.keycol = database1.dbo.tablename.keycol
There is a simple way very much like this as long as both databases are on the same server. The fully qualified name is dbname.owner.table - normally the owner is dbo and there is a shortcut for ".dbo." which is "..", so...
INSERT INTO Datbase1..MyTable
(ColumnList)
SELECT FieldsIWant
FROM Database2..MyTable
first create the column if it doesn't exist:
ALTER TABLE database2..targetTable
ADD targetColumn int null -- or whatever column definition is needed
and since you're using Sql Server 2005 you can use the new MERGE statement.
The MERGE statement has the advantage of being able to treat all situations in one statement like missing rows from source (can do inserts), missing rows from destination (can do deletes), matching rows (can do updates), and everything is done atomically in a single transaction. Example:
MERGE database2..targetTable AS t
USING (SELECT sourceColumn FROM sourceDatabase1..sourceTable) as s
ON t.PrimaryKeyCol = s.PrimaryKeyCol -- or whatever the match should be bassed on
WHEN MATCHED THEN
UPDATE SET t.targetColumn = s.sourceColumn
WHEN NOT MATCHED THEN
INSERT (targetColumn, [other columns ...]) VALUES (s.sourceColumn, [other values ..])
The MERGE statement was introduced to solve cases like yours and I recommend using it, it's much more powerful than solutions using multiple sql batch statements that basically accomplish the same thing MERGE does in one statement without the added complexity.
You could also use a cursor. Assuming you want to iterate all the records in the first table and populate the second table with new rows then something like this would be the way to go:
DECLARE #FirstField nvarchar(100)
DECLARE ACursor CURSOR FOR
SELECT FirstField FROM FirstTable
OPEN ACursor
FETCH NEXT FROM ACursor INTO #FirstField
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO SecondTable ( SecondField ) VALUES ( #FirstField )
FETCH NEXT FROM ACursor INTO #FirstField
END
CLOSE ACursor
DEALLOCATE ACursor
MERGE is only available in SQL 2008 NOT SQL 2005
insert into Test2.dbo.MyTable (MyValue) select MyValue from Test1.dbo.MyTable
This is assuming a great deal. First that the destination database is empty. Second that the other columns are nullable. You may need an update instead. To do that you will need to have a common key.

Resources