I know Snowflake by default encrypts data at rest. But I wanted to know if there is a built-in feature to encrypt a column value and store it in Snowflake tables. Or is it something that I need to code using snowpark and store as a procedure.
I would recommend that you read up on the Dynamic Data Masking that is available natively in Snowflake, which allows you to encrypt data to specific roles or all roles, as needed. And if there in an external encryption tool via API that you'd like to use, Dynamic Data Masking can leverage that API via an External Function. You shouldn't need Snowpark or a Stored Procedure to accomplish this.
https://docs.snowflake.com/en/user-guide/security-column-ddm.html
Related
I am looking for a way in Informatica to pull data from a table in a database, load it in Snowflake, and then move on to the next table in that same DB and repeating that for the remaining tables in the database.
We currently have this set up running in Matillion where there is an orchestration that grabs all of the names of a table of a database, and then loops through each of the tables in that database to send the data into Snowflake.
My team and I have tried to ask Informatica Global Support, but they have not been very helpful for us to figure out how to accomplish this. They have suggested things like Dynamic Mapping, which I do not think will work for our particular case since we are in essence trying to get data from one database to a Snowflake database and do not need to do any other transformations.
Please let me know if any additional clarification is needed.
Dynamic Mapping Task is your answer. You create one mapping. With, or without any transformations - as you need. Then you set up Dynamic Mapping Task to execute the mapping across whole set of your 60+ different sources and targets.
Please note that this is available as part of Cloud Data Integration module of IICS. It's not available in PowerCenter.
I have a scenario where I need to read the data from SQL Server Database (Azure) and perform calculations and save the calculated data back to SQL Server Database.
Here, I'm using the Timer Trigger Function so that I can schedule one after another as calculations are dependent on each other (a totally of 10 calculations running one after another)
The same can be achieved via Stored Procedures in an easy way as they reside in the backend. I want to understand which is the better way to perform/handle such a scenario? in terms of Performance, Scalability, Debugging Capabilities, Cost, etc.
If you are using SQL Server then definitely SQL Procedure is the right approach because of it's compatibility with SQL Server.
Another recommended approach is use data flow activity in Azure Data Factory and transform the data using the available functions. This is easy to use methods as you will get all the required transformation functions in-built.
You can also run Store Procedure in Azure data factory using Stored Procedure activity.
Refer: Create Azure Data Factory data flows
I am able to read parameters from a local config file on SnowFlake ( using SnowSQL ). But in production environment, SQL will be running in a automated manner (using SnowFlake Tasks).
I have created a task in Snowflake which calls a Stored Procedure. The Stored procedure takes few parameters which I want to read from a config file. So that same Stored Procedure can be used for multiple similar use cases.
Please suggest if there is any work around.
Reference Link : https://docs.snowflake.net/manuals/user-guide/tasks-intro.html
Although it says "Note that a task does not support account or user parameters."
You can't read a config file from a task. The easiest way in my opinion is to put your configuration in a Snowflake table and have your Stored Procedure read any configuration from the table instead.
Since I am not very sure if stored proc can read the config file and hence I agree with the approach #SimonD suggested.
Another alternative (though bit complex design) is to have the config file in JSON format in the S3 bucket which you can load via stage. Refer $ notation to access the respective JSON properties to access the key-value and inject it where needed in the stored procedure. This way, your configuration is still in JSON or text format outside the snowflake and can be managed via S3 (if you are using AWS)
Though I have not tried this approach but looks it should work. This way, snowflake access or accidental DB update can be prevented.
I hope this idea makes sense to you?
I'm a SSIS Developer. I do lots of SQL stored procedure lookup concepts in SSIS. But when coming to Azure Data Factory I haven't any idea how to perform a lookup using a SQL stored procedure.
Could anyone please guide me on this?
Thanks in advance !
Jay
Azure Data Factory (ADF) is more of an ELT tool rather than ETL, therefore direct lookups are not supported. Instead, this type of operation, along with other transforms is pushed down into the compute you are actually using. For example, if you are moving data to SQL Server, Azure SQL Database or Azure SQL Data Warehouse, you would ensure all data is on the same server and use a Stored Procedure task to execute the lookups using T-SQL and joins. If you are using Azure Data Lake Analytics (ADLA) you would use the U-SQL Activity to run U-SQL or execute ADLA stored procedures, again doing lookups via joins or custom U-SQL code such as Combiner, Applier, Reducer. In fact you can use any of the ADF compute options like SQL, HDInsight (including Hive, Pig, Map Reduce, Streaming and Spark script), Machiine Learning or custom .net activities.
So you need to think about things differently with ADF. Have a look through this article to gain greater understanding of transforming data in ADF:
Transform data in Azure Data Factory
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-data-transformation-activities
As an aside, I would rarely use Lookups in SSIS as performance in early versions used to be poor. Although this has been improved in later versions, generally if you can do it in SQL you probably should. This pattern harnesses the power of SQL Server, rather than dragging data up into the SSIS pipeline, eg for the purposes of lookups (which are essentially joins) and pushing the data back out again. I reserve Data Flow transformations mainly when non-relational data is involved, eg xml or joining your email server with relational data. This is my personal view anyway : )
I've tried using the "EncryptByPassPhrase", which would work for me if I could get the underlying key generated by the passphrase, but nowhere I've found on the internet explains how to do that.
My reading of the relevant EncryptByCert or EncryptByKey is that I have to create and store the keys on the database.
I have a constraint that I must not update the source database, hence I cannot create and store keys on the database.
What I really want is a way using an existing external public key or certificate, to encrypt , and decrypt on a different system.
If you cannot update the source database, I would guess your best option is to use CLR to create a custom stored procedure and then you can use any external library to perform the encryption you require.
There's an example of this here:
http://sachabarbs.wordpress.com/2007/06/06/sql-server-clr-functions/