In Snowflake Which of the following objects can be cloned?
A. Tables
B. Named File Formats
C. Schemas
D. Shares
E. Databases
F. Users
Tables, Schemas and Databases I know but can we also clone Users or Named File Format?
You can directly clone DATABASE, SCHEMA, TABLE, STREAM, STAGE, FILE FORMAT, SEQUENCE, and TASK objects:
https://docs.snowflake.com/en/sql-reference/sql/create-clone.html#syntax
In Snowflake, the following objects can be cloned:
Data Containment Objects
Databases
Schemas
Tables
Streams
Data Configuration and Transformation Objects
Stages (external only - not internal)
File Formats
Sequences
Tasks
The following account level objects cannot be cloned:
Users
Roles
Grants
Virtual Warehouses
Resource monitors
Storage integrations
Related
I have a few questions regarding the process of copying tables from S3 to Snowflake.
The plan is to copy some data from AWS/S3 onto snowflake and then perform some modeling by DataRobot
We have some tables that contain PII data and we would like to hide those columns from Datarobot, what suggestion do you have for this problem?
The schema in AWS needs to match the schema in Snowflake for the copying process.
Thanks,
Mali
Assuming you know the schema of the data you are loading, you have a few options for using Snowflake:
Use COPY INTO statements to load the data into the tables
Use SNOWPIPE to auto-load the data into the tables (this would be good for instances where you are regularly loading new data into Snowflake tables)
Use EXTERNAL TABLES to reference the S3 data directly as a table in Snowflake. You'd likely want to use MATERIALIZED VIEWS for this in order for the tables to perform better.
As for hiring the PII data from DataRobot, I would recommend leveraging Snowflake DYNAMIC DATA MASKING to establish rules that obfuscate the data (or null it out) for the role that DataRobot is using.
All of these features are well-documented in Snowflake documentation:
https://docs.snowflake.com/
Regarding hiding your PII elements, you can use 2 different roles, one would be say data_owner(the role that will create the table and load the data in it) and another say data_modelling (for using data robot)
Create masking policies using the data owner such that the data robot cannot see the column data.
About your question on copying the data, there is no requirement that AWS S3 folder need to be in sync with Snowflake. you can create the external stage with any name and point it to any S3 folder.
Snowflake documentation has good example which helps to get some hands on :
https://docs.snowflake.com/en/user-guide/data-load-s3.html
The two concepts confused me a lot recently.
Snowflake Database more refers to the data service and its website address as below:
https://www.snowflake.com/
This is more like a data platform or data warehouse on the cloud that provides SQL engine functionalities.
On the other hand, Snowflake schema is more like an algorithm that design database schema.
Are they totally two different things and just have the same name coincidently?
Databases and schemas are used to organize data stored in Snowflake:
A database is a logical grouping of schemas. Each database belongs to a single Snowflake account.
A schema is a logical grouping of database objects (tables, views, etc.). Each schema belongs to a single database.
Together, a database and schema comprise a namespace in Snowflake.
Source: https://docs.snowflake.com/en/sql-reference/ddl-database.html
I unable to create objects (views, file format, stage etc.. ) in a shared sample database (SNOWFLAKE_SAMPLE_DATA).
Kindly let me know, what is the possible way to get access the data?
Regards,
DB
The SNOWFLAKE_SAMPLE_DAT database contains a schema for each data set, with the sample data stored in the tables in each schema. You can execute queries on the tables in these databases just as you would any other databases in your account.
The database and schemas do not utilize any data storage so they do not incur storage charges for your account.
however, just as with other databases, executing queries requires a running, current warehouse for your session, which consumes credits.
You can refer to snowflake documentation: DOCS » USING SNOWFLAKE » SAMPLE DATASETS.
Hope this helps answer your question.
Shared databases are read-only. Users in a consumer account can view/query data, but cannot insert or update data, or create any objects in the database. This is why you can not create any objects on the shared database (SNOWFLAKE_SAMPLE_DATA).
https://docs.snowflake.com/en/user-guide/data-share-consumers.html#general-limitations-for-shared-databases
You can query the data in shared database like any other database.
https://docs.snowflake.com/en/user-guide/data-share-consumers.html#querying-a-shared-database
Doing my databases reading when I read...
Schema: Is a container for objects
Tablespace: A logical storage unit for objects
Can anyone explain the difference between these?
A schema is a namespace - a logical thing. It is used to organize the names of database objects. It has nothing to do with the way the data is stored.
A tablespace is a physical thing. It's a container for data and has nothing to do with the logical organization of the database objects.
A single object (e.g. a table) could be spread across multiple tablespaces (depending on the DBMS being used) but it can only be defined in a single schema. The table schema_1.table_1 is a different table than schema_2.table_1 - although the "plain" name is the same, the fully qualified name is different and therefore those are two different tables.
Objects that are organized in the same schema are not necessarily stored in the same tablespace. And a single tablespace can contain objects from different schemas.
Schemas (and catalogs, which are another level of namespace) are part of the SQL language and are defined in the SQL standard.
Tablespaces are part of the physical storage and are DBMS-specific (although nearly all DBMS support a concept like that) and are not part of the SQL query language (as defined by the SQL standard). They are, however, defined and managed through vendor-specific SQL/DDL statements.
Schema operates the logical structures.
While Tablespaces operate physical datafiles that constitute the database.
From Oracle documentation:
Schema:
A schema is a collection of database objects. A schema is owned by
a database user and has the same name as that user. Schema objects
are the logical structures that directly refer to the database's data.
Schema objects include structures like tables, views, and
indexes. (There is no relationship between a tablespace and a schema. Objects in the same schema can be in different tablespaces,
and a tablespace can hold objects from different schemas.)
Tablespaces:
A database is divided into one or more logical storage units called
tablespaces. Tablespaces are divided into logical units of storage
called segments, which are further divided into extents. Extents are a
collection of contiguous blocks.
The size of a tablespace is the size of the datafiles that constitute the tablespace. The size of a database is the collective size of the tablespaces that constitute the database.
You can enlarge a database in three ways:
Add a datafile to a tablespace
Add a new tablespace
Increase the size of a datafile
There is no relationship between schemas and tablespaces: a tablespace can contain objects from different schemas, and the objects for a schema can be contained in different tablespaces.
FROM ORACLE DOCUMENTATION.
https://docs.oracle.com/cd/B10500_01/server.920/a96524/c11schem.htm
Consider a database server whose job today is to house one database. Likely the database will be moved in the future to another database instance which houses multiple databases & schemas.
Let's pretend the app/project is called Invoicer 2.0. The database is called AcmeInvoice. The database holds all the invoice, customer, and product information. Here's a diagram of the actors and their roles and behaviour.
The schema(s) will largely be used to easily assign permissions to roles. The added benefit here is that the objects aren't under dbo, and that the objects & permissions can be ported to another machine in the future.
Question
What conventions do you use when naming the schema?
Is it good form to name the schema the same as the database?
I would think that if your schema name ends up being the same as your database schema, then you are just adding redundancy to your database. Find objects in your database that have common scope or purpose and create a schema to relect that scope. So for example if you have an entity for Invoices, and you have some supporting lookup tables for invoice states, etc, then put them all in an invoice schema.
As a generally rule of thumb, I would try to avoid using a name that reflects the application name, database name or other concrete/physical things because they can change, and find a name that conceptually represents the scope of your objects that will go into the schema.
Your comment states that "the schemas will largely be used to easily assign permissions to roles". Your diagram shows specific user types having access to some/all tables or some/all stored procs. I think trying to organize objects conceptually into schemas and organize them from a security standpoint into schemas are conflicting things. I am in favour of creating roles in sql server to reflect the types of users, and grant those roles access to the specific objects that each user type needs, as apposed to granting the role or user access the schema to build your security framework..
Why would you name the schema the same as the database? This means all database objects fall under the same schema. If this is the case, why have a schema at all?
Typically schema's are used to group objects within a common scope of activity or function. For example, given what you've described, you might have an Invoice schema, a Customer schema and a Product schema. All Invoice related objects would go into the Invoice schema, all Customer related objects would go into the Customer schema, and the same for Products.
We often will use a Common schema as well which includes objects that might be common to our entire application.
I would call the database AcmeInvoice (or another suitable name) and the schema Invoicer2.
My reasons are as follows: Acmeinvoice means I am grouping all of that applications objects/data together. It can therefore be moved as one unit to other machines (a backup/restore or unattach/attach).
The schema would be Invoicer2. Applications change, maybe in the future you will have Invoicer21 (you would create a schema), or perhaps a reporting module or system (Reports schema).
I find that the use of schemas allows me to separate data/procedures in one database into different groups which make it easier to adminster permissions.