Snowflake Sample data set - snowflake-cloud-data-platform

I unable to create objects (views, file format, stage etc.. ) in a shared sample database (SNOWFLAKE_SAMPLE_DATA).
Kindly let me know, what is the possible way to get access the data?
Regards,
DB

The SNOWFLAKE_SAMPLE_DAT database contains a schema for each data set, with the sample data stored in the tables in each schema. You can execute queries on the tables in these databases just as you would any other databases in your account.
The database and schemas do not utilize any data storage so they do not incur storage charges for your account.
however, just as with other databases, executing queries requires a running, current warehouse for your session, which consumes credits.
You can refer to snowflake documentation: DOCS » USING SNOWFLAKE » SAMPLE DATASETS.
Hope this helps answer your question.

Shared databases are read-only. Users in a consumer account can view/query data, but cannot insert or update data, or create any objects in the database. This is why you can not create any objects on the shared database (SNOWFLAKE_SAMPLE_DATA).
https://docs.snowflake.com/en/user-guide/data-share-consumers.html#general-limitations-for-shared-databases
You can query the data in shared database like any other database.
https://docs.snowflake.com/en/user-guide/data-share-consumers.html#querying-a-shared-database

Related

How to move data from S3 to Snowflake

I have a few questions regarding the process of copying tables from S3 to Snowflake.
The plan is to copy some data from AWS/S3 onto snowflake and then perform some modeling by DataRobot
We have some tables that contain PII data and we would like to hide those columns from Datarobot, what suggestion do you have for this problem?
The schema in AWS needs to match the schema in Snowflake for the copying process.
Thanks,
Mali
Assuming you know the schema of the data you are loading, you have a few options for using Snowflake:
Use COPY INTO statements to load the data into the tables
Use SNOWPIPE to auto-load the data into the tables (this would be good for instances where you are regularly loading new data into Snowflake tables)
Use EXTERNAL TABLES to reference the S3 data directly as a table in Snowflake. You'd likely want to use MATERIALIZED VIEWS for this in order for the tables to perform better.
As for hiring the PII data from DataRobot, I would recommend leveraging Snowflake DYNAMIC DATA MASKING to establish rules that obfuscate the data (or null it out) for the role that DataRobot is using.
All of these features are well-documented in Snowflake documentation:
https://docs.snowflake.com/
Regarding hiding your PII elements, you can use 2 different roles, one would be say data_owner(the role that will create the table and load the data in it) and another say data_modelling (for using data robot)
Create masking policies using the data owner such that the data robot cannot see the column data.
About your question on copying the data, there is no requirement that AWS S3 folder need to be in sync with Snowflake. you can create the external stage with any name and point it to any S3 folder.
Snowflake documentation has good example which helps to get some hands on :
https://docs.snowflake.com/en/user-guide/data-load-s3.html

What is difference between Snowflake Database And Snowflake Schema

The two concepts confused me a lot recently.
Snowflake Database more refers to the data service and its website address as below:
https://www.snowflake.com/
This is more like a data platform or data warehouse on the cloud that provides SQL engine functionalities.
On the other hand, Snowflake schema is more like an algorithm that design database schema.
Are they totally two different things and just have the same name coincidently?
Databases and schemas are used to organize data stored in Snowflake:
A database is a logical grouping of schemas. Each database belongs to a single Snowflake account.
A schema is a logical grouping of database objects (tables, views, etc.). Each schema belongs to a single database.
Together, a database and schema comprise a namespace in Snowflake.
Source: https://docs.snowflake.com/en/sql-reference/ddl-database.html

Can the default databases shipped with a new Snowflake account be deleted?

Snowflake ships with a number of databases - which of the following are ok to delete and which are critical for operations to retain?
image showing databases DEMO_DB, UTIL_DB, SNOWFLAKE, and SNOWFLAKE_SAMPLE_DATA
Thanks, Jason
You can delete all of them except for SNOWFLAKE. Snowflake the company/service is actually the owner of the SNOWFLAKE database so you couldn't delete that even if you tried, but you can delete all other databases using the right ROLE without any issues as they just have sample data in them.
The SNOWFLAKE database is extremely useful as it keeps a history of all Queries and other activity within your account.

How to share SNOWFLAKE.ACCOUNT_USAGE schema using managed/reader account?

I am trying to share my Snowflake Database(default metadata)--> Account_Usage schema --> Query_History table to another managed account (i.e. reader account) but the data is not visible in another account.
is there any way to share the snowflake database without duplicating the data?
I am getting error as Already Imported Database (i.e) SNowflake cannot be shared for Datashare option.
for a Managed account for usage, I Snowflake database and schemas are available but are not able to see the data which is available.
According to the documentation you can't re-share any database that is shared with you:
Shared databases and all the objects in the database cannot be forwarded (i.e. re-shared with other accounts).
Since the Snowflake database is one that is shared to you from Snowflake, this is probably why you're having issues.
If you need to do this your best bet is to create a table and populate it with the data you need from the Snowflake database and share that table instead. Although it is strange that you'd want to share this info with another account.
Your other option would be to create database/schema in your account with views over the account usage data that you want to share, create a role that can access only that, and then provide a user login with that role only to the group needing to do analytics on your data.

Best Options to manage large sets of data SQlserver

I am currently working on a project which involves the following:
The application I am working on is connected to a SQlserver
database.
SAP loads information into multiple tables (in a daily
and also hourly basis) into a MASTER database
There are 5 other databases(hosted on the same server) that access this information via synonyms and stored procedure calls to the MASTER database
The MASTER database purely used for storing the data and routing it to the other databases)
Master Database -
Tables:
MASTER_TABLE1 <------- SAP inserts data into this table.Triggers are used to process the valid data & insert into secondary staging tables -say MASTER_TABLE1_SEC
MASTER_TABLE1_SEC -- Holds processed data coming into MASTER_TABLE1
FIVE other databases ( for each manufacturing facility) are present in the same server. My application is connected to the facility databases ( not the Master)
FACILITY1
Facility2
....
FACILITY5
Synonyms of MASTER_TABLE1_SEC are created in each of these 5 facility databases
Stored procedures are again called from the Facility databases- in order to load data from the MASTER_TABLE1_SEC into the respective tables( within EACH facility) based on the business logic.
Is there a better architecture to handle this kind of a project? I am a beginner when it comes to advanced data management. Can anyone suggest a better architecture or tools to handle this?
There are a lot of patterns that would actually meet the needs described here. It serves that you are working with a type of Data Warehouse. I use Data Vault for my Enterprise Data Warehouses. It is an Ensemble Modeling technique designed for integration and master data preparation. You can think of it as a way to house all data from all time. You would then generate Data Marts (Kimball Method) for each of the Facilities containing only thei or whatever is required for their needs.

Resources