Snowflake External Stage & PII Data - snowflake-cloud-data-platform

We plan to load sensitive PII data from an Azure Blob Storage (ADLS Gen2) into snowflake using an external stage which is secured by Azure credentials ( service principal) for the container where the data is stored.
However this is not an acceptable solution to the cyber team . The use Encryption key was considered. However the key issues that were raised was the the
Complete container that stages the PII data could be potentially be exposed.
Allowing the Snowflake VNet subnet IDs
Hence I am looking at any best practices or any further suggestions anyone may have that they use when using Azure External Stages to load into Snowflake

Complete container that stages the PII data could be potentially be exposed.
You may want to make sure the PII data is encrypted in Azure Blob Storage. Snowflake support ingesting Client-side Encrypted Data into Snowflake. You can read more here:
https://docs.snowflake.com/en/user-guide/security-encryption-end-to-end.html#ingesting-client-side-encrypted-data-into-snowflake
And this document discussed on how to create a stage with client-side encryption:
https://docs.snowflake.com/en/sql-reference/sql/create-stage.html#external-stage-parameters-externalstageparams
Allowing the Snowflake VNet subnet IDs
You can add Snowflake VNet subnet IDs to the network rule, restricting access only to the Snowflake VNet subnet IDs, more on this here:
https://docs.snowflake.com/en/user-guide/data-load-azure-allow.html

Related

Is it possible to make REST API calls in Snowflake?

I'm new to Snowflake and I'm not so strong in using SQL but my question is if it is possible to perform a (POST) request from Snowflake to the Azure Blob Service REST API in order to obtain a User Delegation Key. Can this be easily done?
In Snowflakes documentation I read about external functions which may could be used to execute some kind of script for acquiring a User Delegation Key but it seems to be a hassle to set this all up (see https://docs.snowflake.com/en/sql-reference/external-functions-creating-azure.html)
If not, do you by any chance have an idea on how to design a managable process in order to obtain access to an Azure Blob Storage via Azure AD user credentials?

Data security with Azure Cognitive Search

I have a client that does not want the documents to permanently live in Azure. They are ok with moving files up to Azure to be indexed but after indexing they want the files to be removed from Azure and results to point to their on-prem storage.
Is this use-case possible with Azure Cognitive Search?
you can push any data into a search index that you want via the service's REST API as long as you have network connectivity to the search service.
I'm not sure why your client doesn't want to store documents in Azure, but you should make sure they're aware that the ingested document data exists in the search index independently of any source data. That is, if he's concerned about his data being stored in Azure, the indexed data will always be stored in azure, since that's how the search service works.
If you're asking whether it's possible to point an azure search indexer to a datasource that is not hosted in Azure, then no, that's not generally supported. There are some third party organizations (eg: Accenture, BA Insight) that will host a connector to a non-azure datasource on your behalf though.

WHere does Snowflake stores data like metadata, tables data

WHere does Snowflake stores data like metadata, tables data and all other data ? Does it uses the Public Cloud which we used to configure while creating account in Snowflake and if yes then under that cloud where it keeps it ? and if no then which cloud provider does it use for the storage ?
Each Snowflake deployment has its own metadata servers. You may get more information on what is used for storing metadata:
https://www.snowflake.com/blog/how-foundationdb-powers-snowflake-metadata-forward/
Based on the additional questions:
The data (micro-partitions) are stored in "object storage services" of the same cloud provides (ie S3 for AWS etc)
Yes, all the data and metadata are stored in the cloud itself where the account is deployed.
Yes, it's deployed on the cloud service linked to the account.
Snowflake consists of following three layers, Database Storage, Query Processing, Cloud Services.
https://docs.snowflake.com/en/user-guide/intro-key-concepts.html
Metadata is managed in Cloud Services layer and it's clearly devided from database storage.
Snowflake's core feature is micro-partitions and immutable. Snowflake doesn't overwrite original targets but copies and updates to reference them by Cloud Services layer if something update is required.

How can I manage Always Encrypted technique and TDE together and mitigate risks of this?

In our electronic criminal law, customer-sensitive data must be encrypted at rest and all admins working on servers and DBs must not be able to access this information clearly.
Microsoft provides three methods to encrypt sensitive information.
1- TDE[Transperant Data Encryption].
2- Always Encryption.
3- Always encrypted Enclave. not supported in our platform.
TDE by documentation it encrypts data at rest [the files [mdf,ldf,bak] are encrypted].
but once you accessed the instance you can see all data in cleartext.
Always encrypted can encrypt data inside instance to prevent authorized users from accessing data in clear text unless the have different certificates. which could be deployed in the IIS server or development server. specifically in the window's store of the azure vault.
anyhow, by mixing both methods together so that data is encrypted at rest and encrypted to everyone cannot access the master certificate.
In a given structure that is managed by many teams:
1- DBA
2- DB backup
3- Domain Admin
4- local admin
Also in two tires system, that uses different two servers, one for IIS and the other for MSSL server.
By applying the up-mentioned mixeture, the admins of both machines can access the split password of always encrypted and then access the information.
My question:
How available it be, to prevent those admins from accessing info?

SQL Server 2016 - Always Encrypted

do i have to secure the column master encryption key at client side, so that nobody can read it?
Is it correct that when somebody has the column and master encryption key data can be decrypted by an attacker?
Regards
do i have to secure ... so that nobody can read it
This statement can never be true. If the application needs to read a secret (the key), then so can an administrator on the site. If you have an application running at a client side, there is nothing you can do to prevent a determined client from finding the key. Ditto for an attacker that has compromised the location.
Always Encrypted scenario is for applications that do not trust their service hosting (think Azure SQL Database). The application has the key and can manipulate the data, and the data travels to the hosting service and is stored encrypted. The hosting service cannot decrypt your data. All this is the starting paragraph describing the feature:
Always Encrypted is a feature designed to protect sensitive data, such as credit card numbers or national identification numbers (e.g. U.S. social security numbers), stored in Azure SQL Database or SQL Server databases. Always Encrypted allows clients to encrypt sensitive data inside client applications and never reveal the encryption keys to the Database Engine ( SQL Database or SQL Server). As a result, Always Encrypted provides a separation between those who own the data (and can view it) and those who manage the data (but should have no access). By ensuring on-premises database administrators, cloud database operators, or other high-privileged, but unauthorized users, cannot access the encrypted data, Always Encrypted enables customers to confidently store sensitive data outside of their direct control. This allows organizations to encrypt data at rest and in use for storage in Azure, to enable delegation of on-premises database administration to third parties, or to reduce security clearance requirements for their own DBA staff.
Your understanding is correct, roughly speaking, Always Encrypted provides the following security guarantee, Plaintext data will only be visible to entities that have access to the ColumnMasterKey (Certificate). So you would have to ensure that your CMK is only accessible by trusted entities. Also, the best practice is to have the client application and database on separate machines.
I have provided a short detailed explanation regarding the security guarantee provided by Always Encrypted here. You might find this useful. If you have additional questions, please leave a comment and I will try by best to help

Resources