Tried searching for snowflake tags on metastack and superuser, and couldn't find them. Hence asking the questions here.
I have 2 snowflake accounts and I need to copy data from production account to testing account.
How can I do that? I read the snowflake documentation for loading and unloading the data using s3, but is there a quick way to get the data across?
As Howard said you could use Data Sharing. Create a share with the data you want to copy, grant the testing account access to the SHARE, create a new database in the testing account from the share. Now you can query that data as if it was in the test account. If you need an actual copy of the data, so you can change it, etc, then you need to do a CTAS from the share to an empty table in another database. This will be much faster than unloading and loading all the data.
Here is the doc to get started: https://docs.snowflake.net/manuals/user-guide/data-sharing-intro.html
Related
I am looking for a way in Informatica to pull data from a table in a database, load it in Snowflake, and then move on to the next table in that same DB and repeating that for the remaining tables in the database.
We currently have this set up running in Matillion where there is an orchestration that grabs all of the names of a table of a database, and then loops through each of the tables in that database to send the data into Snowflake.
My team and I have tried to ask Informatica Global Support, but they have not been very helpful for us to figure out how to accomplish this. They have suggested things like Dynamic Mapping, which I do not think will work for our particular case since we are in essence trying to get data from one database to a Snowflake database and do not need to do any other transformations.
Please let me know if any additional clarification is needed.
Dynamic Mapping Task is your answer. You create one mapping. With, or without any transformations - as you need. Then you set up Dynamic Mapping Task to execute the mapping across whole set of your 60+ different sources and targets.
Please note that this is available as part of Cloud Data Integration module of IICS. It's not available in PowerCenter.
I have a few questions regarding the process of copying tables from S3 to Snowflake.
The plan is to copy some data from AWS/S3 onto snowflake and then perform some modeling by DataRobot
We have some tables that contain PII data and we would like to hide those columns from Datarobot, what suggestion do you have for this problem?
The schema in AWS needs to match the schema in Snowflake for the copying process.
Thanks,
Mali
Assuming you know the schema of the data you are loading, you have a few options for using Snowflake:
Use COPY INTO statements to load the data into the tables
Use SNOWPIPE to auto-load the data into the tables (this would be good for instances where you are regularly loading new data into Snowflake tables)
Use EXTERNAL TABLES to reference the S3 data directly as a table in Snowflake. You'd likely want to use MATERIALIZED VIEWS for this in order for the tables to perform better.
As for hiring the PII data from DataRobot, I would recommend leveraging Snowflake DYNAMIC DATA MASKING to establish rules that obfuscate the data (or null it out) for the role that DataRobot is using.
All of these features are well-documented in Snowflake documentation:
https://docs.snowflake.com/
Regarding hiding your PII elements, you can use 2 different roles, one would be say data_owner(the role that will create the table and load the data in it) and another say data_modelling (for using data robot)
Create masking policies using the data owner such that the data robot cannot see the column data.
About your question on copying the data, there is no requirement that AWS S3 folder need to be in sync with Snowflake. you can create the external stage with any name and point it to any S3 folder.
Snowflake documentation has good example which helps to get some hands on :
https://docs.snowflake.com/en/user-guide/data-load-s3.html
I have a question regarding incremental refresh from Snowflake to Tableau. I know the feature for Incremental refresh/Incremental extract is available in Tableau but can it be used for incremental loads from Snowflake? And how does it work?
The reason for me asking is because I know that query folding which other BI-tools on the market uses for incremental refreshes, isn't possible in Snowflake.
Thanks!
/P
Tableau incremental refreshes work the same for Snowflake as it does for other databases.
"Query Folding" looks like a Microsoft (and specifically PowerBI) term. According to this article https://exceleratorbi.com.au/how-query-folding-works/ "query folding" is the process of pushing the work load down to the database, which is what Tableau does when querying Snowflake tables directly.
With Snowflake I would recommend querying the tables directly as they are already setup in columnar format, and you can avoid moving the data to a Tableau Server and waiting on refreshes. Snowflake has unlimited storage whereas you might be limited by your Tableau Server.
If you need the tables in Snowflake to only show data as of a point in time, there are different ways you could accomplish this including:
Preset date filters (or parameters as filter within Tableau) that are pushed down to Snowflake
Using Tasks in Snowflake to run at a specific time to:
Clone your tables, and use the clones for reporting
Update existing reporting tables
I agree with Chris' answer accept for avoiding the extracts on Tableau Server. There can be a lot of performance gains had by using Tableau to extract the data. We run extracts out of Snowflake for most of our data sources. We also test both live connections and extracts for each to see which performs best. If timing is an issue, extracts can be set to refresh every 15 minutes at the most.
To get extracts loaded and refreshing use the following steps.
Switch your data source to an extract in Tableau Desktop
This will create a local copy of the data to be used to publish next.
Select Server/Publish Workbook
In the Publish settings, choose your refresh schedule and publish to Tableau Server. The workbook and data source will be loaded to Server.
You can also update the refresh schedules directly in Server by navigating to the new data source and going to the Extract Refreshes tab.
If you don't have the correct schedule available, you can create one in the Schedules menu for the site.
I am trying to share my Snowflake Database(default metadata)--> Account_Usage schema --> Query_History table to another managed account (i.e. reader account) but the data is not visible in another account.
is there any way to share the snowflake database without duplicating the data?
I am getting error as Already Imported Database (i.e) SNowflake cannot be shared for Datashare option.
for a Managed account for usage, I Snowflake database and schemas are available but are not able to see the data which is available.
According to the documentation you can't re-share any database that is shared with you:
Shared databases and all the objects in the database cannot be forwarded (i.e. re-shared with other accounts).
Since the Snowflake database is one that is shared to you from Snowflake, this is probably why you're having issues.
If you need to do this your best bet is to create a table and populate it with the data you need from the Snowflake database and share that table instead. Although it is strange that you'd want to share this info with another account.
Your other option would be to create database/schema in your account with views over the account usage data that you want to share, create a role that can access only that, and then provide a user login with that role only to the group needing to do analytics on your data.
As the kaha db used to store the persistent data is there any method to access the un consumed files in the db. Please suggest with some of the UI through which the data in the kaha db can be accessed, Is there any method to use some queries to access the data from kaha db. please help me out to get with a solution.
Is there any query browser used to query kaha db
Kaha db is an in memory database used with Activemq to store the queuing data. It stores the files in the form of journal files with a Btree index made on every journal file it stores. On my finding no query browsers are available to query the kaha db but there are options to access the jornal files of the kahadb which stores the data.
unable to find any UI way to access it rather we can make use of the below amq-kahadb-tool to see what's inside and a summary of the kaha db logs
https://github.com/Hill30/amq-kahadb-tool