Clone Schemas in Snowflake - snowflake-cloud-data-platform

Is it possible to clone schemas selectively in Snowflake?
For e.g.
Original:
DB_OG
--schema1
--schema2
--schema3
Clone:
DB_Clone
--schema1
--schema3

The CREATE <object> … CLONE statement does not support applying a filter or pattern or multiple objects, and its behaviour is to recursively clone every object underneath:
For databases and schemas, cloning is recursive:
Cloning a database clones all the schemas and other objects in the database.
There are a few explicit ways to filter the clone:
Clone the whole database, then follow up with DROP SCHEMA commands to remove away unnecessary schema
Create an empty database and selectively clone only the schemas required from the source database into it
Both of the above can also be automated by logic embedded within a stored procedure that takes a pattern or a list of names as its input and runs the appropriate SQL commands.

Currently the elimination of certain schemas and cloning all the other schema's of a database is not supported.
If the use case has schemas that are not required, are the recently created schemas, you could use the AT | BEFORE clause to eliminate the schemas(clone till a particular timestamp, that will eliminate the schemas that are created post the mentioned timestamp).
Ref: https://docs.snowflake.com/en/sql-reference/sql/create-clone.html#notes-for-cloning-with-time-travel-databases-schemas-tables-and-streams-only
Other options include dropping the schemas post the cloning operation or cloning only the required schemas

Related

Can a table cloned in Snowflake become the source?

In Snowflake:
I want to clone the source
Do operations on the clone
Turn clone into the source
Drop the original source
When reading the documents, I’m interpreting a clone is a unique, independent object that’s identical to the source. So, in my head, I can create a clone and drop the source and be OK. Anyone ever do this in a prod environment?
Thanks for any guidance. We've tested the theory and it doesn't look like there are any side effects with the exception of losing time travel and file loading history on the old source; but we're OK with that.
Yes you can create a clone and drop the "source" of the clone. You might also be able to achieve the same effect by using a transaction too but with simpler code:
begin transaction;
[do operation 1 on source table];
[do operation 2 on source table];
commit;
I guess your use case may be same as the use case I had.
As Simon said it is possible, but with clone, you need to be careful about the access control privileges. Please also note that you will lose load history of the source table. Also if a table is cloned, historical data for the table clone begins at the time/point when the clone was created
Access Control Privileges for Cloned Objects
A cloned object does not retain any granted privileges on the source object itself (i.e. clones do not automatically have the same privileges as their sources). A system administrator or the owner of the cloned object must explicitly grant any required privileges to the newly-created clone.
However, if the source object is a database or schema, for child objects contained in the source, the clone replicates all granted privileges on the corresponding child objects:
For databases, contained objects include schemas, tables, views, etc.
For schemas, contained objects include tables, views, etc.

Using schedule tasks in snowflake to clone DB's with dynamic names

I want to use snowflake Task scheduler to clone one or all of the DB's with dynamic clone DB name something like below,Is it possible to do it without creating Stored procedure.As I have multiple DB under my account I would prefer to clone all of the DB's in one task
create database xx_date
clone xx
I appreciate your response
Thanks,
Is it possible to do it without creating a Stored Procedure
The CREATE TASK statement syntax only allows for a single SQL statement to be specified, and the CREATE … CLONE statement syntax does not permit specifying more than one object at a time.
Given the above, this isn't possible currently. You will need to use an iteration of database names from within a stored procedure call. The same stored procedure can also be used to clean up older dated clones from previous task invocations.
For incorporating dates into a dynamically generated statement within the stored procedure, checkout this question.
P.s. If the underlying goal of the numerous clones is to maintain backups, also consider cross-account, cross-region (and/or) cross-cloud replication for better safety.

Are Flyway schemas merely informational or do they affect function?

I have a SQL Server database that uses schemas to logically group objects; there are ten schemas in the database.
If I baseline my database and create the schema history table in the “foo” schema, will Flyway apply a migration from the migration folder that operates on an object in the “bar” schema?
Do I need one folder of migration scripts for each schema in the database? The documentation explains how to specify schemas on the command line but doesn’t make it clear as to why I must.
The list of schemas on the command line has two effects:
the first named schema is where the history table goes
the named schemas are the ones that are cleaned by flyway clean
(Note - in 6.1 the command line defaultSchema parameter was introduced to separate these usages)
Migrations can refer to any schema in the database that you have access to - indeed, some objects may exist in one schema but depend on objects in another. If you're happy with the history table to go in dbo, and want to control the whole database with Flyway, just don't set these parameters. A folder of scripts per schema may help you with maintaining them but it is not necessary.

How to copy complete schema to a new schema

I'm working on a saas application which has an sql server database. I have tables, functions, and stored procedures in a particular schema (i.e. customer1.table1, customer1.spGetCustomers, etc.). I would like a way of copying the entire schema tables, functions, stored procedures, indexes, keys, etc. to a new empty schema for each new customer. I was hoping there was an fast easy way to do this so that I can add new schemas for every new customer and keep everything completely separate. I don't want a new database for each customer because of the cost and extra maintenance.
Please help.
Use SQLCMD variables in create scripts.
You will write schema independent create object scripts, something like:
CREATE TABLE [$(schema)].Table1 ...
CREATE PROCEDURE [$(schema)].Proc1...
Then you will execute it as:
sqlcmd -v schema ="Customer1" -i c:\script.sql

Grouping ETL Staging Tables With User Schemas?

I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the Initial staging are grouped together. Then I'd have a Validation schema for that stage of the ETL, etc.
This seems cleaner than trying to uniquely name all these very similar tables, since each table will have multiple instances of itself throughout the staging process.
Question: Is using a user schema to group tables/procs/views together an appropriate use of user schemas in MS SQL Server? Or are user schemas supposed to be used for security, such as grouping permissions together for objects?
This is actually a recommended practice. Take a look at the Microsoft Business Intelligence ETL Design Practices from the Project Real. You will find (download doc from the first link) that they use quite a few schemata to group and identify objects in the warehouse.
In addition to dbo and etl, they also use admin, audit, part, olap and a few more.
I think it's appropriate enough, it doesn't really matter, you could use another database if you liked which is actually what we do.
I'm not sure why you would want a validation schema though, what are you going to do there?
Both the reasons you list (purpose/intent, security) are valid reasons to use schemas. Once you start using them, you should always specify schema when referencing an object (although I'm lazy and never specify dbo).
One trick we use is to have the same-named table in each of several schemas, combined with table partitioning (available in SQL 2005 and up). Load the data in first schema, then when it's validated "swap" the partition into dbo--after swapping the dbo partition into a "dumpster" schema copy of the table. Net Production downtime is measured in seconds, and it's all carefully wrapped in a declared transaction.

Resources