We know show create table in hive gives storage path. Checking how to find a storage path for a snowflake table. I don’t see show create or desc table giving a storage path for a table.
One of the main advantages of Snowflake Data Platform is automatic storage handling:
Key Concepts & Architecture
Database Storage
When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage.
Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake
Related
I perform data unload form Snowflake to s3 or by using Snowql localy.
I'd like to know if there's any kind of data tracing (for data governance) to always record or tag and save somewhere in Snowflake that a data was unloaded.
Thanks
COPY INTO <location> used for data unloading will leave trace in
ACCESS_HISTORY:
This Account Usage view can be used to query the access history of Snowflake objects (e.g. table, view, column) within the last 365 days (1 year).
This view supports write operations of the following type:
Data unloading statements:
COPY INTO internalStage FROM TABLE
COPY INTO externalStage FROM TABLE
COPY INTO externalLocation FROM TABLE
and QUERY_HISTORY
For data metering perspective DATA_TRANSFER_HISTORY:
This Account Usage view can be used to query the history of data transferred from Snowflake tables into a different cloud storage provider’s network (i.e. from Snowflake on AWS, Google Cloud Platform, or Microsoft Azure into the other cloud provider’s network) and/or geographical region within the last 365 days (1 year).
I have a few questions regarding the process of copying tables from S3 to Snowflake.
The plan is to copy some data from AWS/S3 onto snowflake and then perform some modeling by DataRobot
We have some tables that contain PII data and we would like to hide those columns from Datarobot, what suggestion do you have for this problem?
The schema in AWS needs to match the schema in Snowflake for the copying process.
Thanks,
Mali
Assuming you know the schema of the data you are loading, you have a few options for using Snowflake:
Use COPY INTO statements to load the data into the tables
Use SNOWPIPE to auto-load the data into the tables (this would be good for instances where you are regularly loading new data into Snowflake tables)
Use EXTERNAL TABLES to reference the S3 data directly as a table in Snowflake. You'd likely want to use MATERIALIZED VIEWS for this in order for the tables to perform better.
As for hiring the PII data from DataRobot, I would recommend leveraging Snowflake DYNAMIC DATA MASKING to establish rules that obfuscate the data (or null it out) for the role that DataRobot is using.
All of these features are well-documented in Snowflake documentation:
https://docs.snowflake.com/
Regarding hiding your PII elements, you can use 2 different roles, one would be say data_owner(the role that will create the table and load the data in it) and another say data_modelling (for using data robot)
Create masking policies using the data owner such that the data robot cannot see the column data.
About your question on copying the data, there is no requirement that AWS S3 folder need to be in sync with Snowflake. you can create the external stage with any name and point it to any S3 folder.
Snowflake documentation has good example which helps to get some hands on :
https://docs.snowflake.com/en/user-guide/data-load-s3.html
Is it possible to create external table in Snowflake referring to on premise Oracle database?
No, Snowflake does not presently support query federation to other DBMS software.
External tables in Snowflake exist only to expose a collection of data files (commonly found in data-lake architectures) as a qualified table without requiring a load first.
Querying your Oracle tables will currently require an explicit export of its data onto a cloud storage location to allow Snowflake to access it.
I loaded data into a table from an external stage by using COPY command. I know Snowflake compresses, encrypts, and saves all data to its "Storage Layer" shared across multiple Virtual Warehouses. Can I access my table's data directly on S3 storage layer?
I do not consider Unloading option.
The data is not stored in virtual warehouses, but in an underlying storage account.
You can not access the files used by Snowflake after the data have been ingested.
(You can upload the files using PUT with SnowSQL to an internal snowflake stage, and then download the files using GET)
I have a blob (storage account) that is housed on Azure. I also have a sql server table that is housed on Azure. I have a couple of questions
Is it possible to create a join between the blob and the table
Is it possible to store all of the information in the blob?
The table has address information on it and I wanted to be able to pull that information from the table and associate it or join it to the proper image by the ID in the sql table (if that is the best way)
Is it possible to create a join between the blob and the table?
No.
Is it possible to store all of the information in the blob?
You possibly could (by storing the address information as blob metadata) but it is not recommended because then you would lose searching capability. Blob storage is simply an object store. You won't be able to query on address information.
The table has address information on it and I wanted to be able to
pull that information from the table and associate it or join it to
the proper image by the ID in the sql table (if that is the best way)
Recommended way of doing this is storing the images in blob storage. Each blob in blob storage gets a unique URL (https://account.blob.core.windows.net/container/blob.png) that you can store in your database along with other address fields (e.g. create a column called ImageUrl and store the URL there).
Azure Storage (blob, in your case) and SQL Server are completely separate, independent data stores. You cannot do joins, transactions, or really any type of query, across both at the same time.
What you store in each is totally up to you. Typically, people store searchable/indexable metadata within a database engine (such as SQL Server in your case), and non-searchable (binary etc) content in bulk storage (such as blobs).
As far as "best way"? Not sure what you're looking for, but there is no best way. Like I said, some people will store anything searchable in their database. On top of this, they'd store a url to specific blobs that are related to that metadata. There's no specific rule about doing it this way, of course. Whatever works for you and your app...
Note: Blobs have metadata as well, but that metadata is not indexable; it would require searching through all blobs (or all blobs in a container) to perform specific searches.