How to know what warehouse snowpipe uses to process data? - snowflake-cloud-data-platform

Like the title says, I'd like to know how can I know in which warehouse does snowpipe runs the copy queries to load data.

Snowflake uses an "internal" warehouse (a Snowflake-provided warehouses), called Snowpipe to process pipes, documented in this link which reviews this pretty well.
https://docs.snowflake.net/manuals/user-guide/data-load-snowpipe-billing.html#viewing-the-data-load-history-for-your-account
I hope this helps...Rich

Related

Snowflake pulling MaestroQA data

Has anyone here tried pulling data from MaestroQA to Snowflake?
From MaestroQA to Snowflake there is a way, but I'm wondering if there's way the other way around, from Snowflake pulling MaestroQA data, without using any APIs.
In addition, trying to look for a way to automate this.
I tried looking for documentation and any threads online, but couldn't find one.
Below are documents/links I have seen so far, but this method is from MaestroQA pushing data to Snowflake.
https://help.maestroqa.com/en/articles/1982484-data-warehouse-table-overview
https://help.maestroqa.com/en/articles/1557390-push-qa-data-to-your-data-warehouse.
Snowflake can only load data from its internal/external stages. It has no capabilities to pull data from anywhere.
You'll either need to use a tool with ETL capabilities or write your own process in, for example, python.

Snowflake Tasks and Streams - Complexity and Visualization

We're heading into a POC and would need to determine if Snowflake tasks and streams are useful for CDC and data transformation. I have read snowflake documentation and the more I read it seems like it will be a complex mess to handle. Thinking about thousands of tables and complex transformations, how will tasks and streams scale up? Considering of a table that gets loaded from 5 other feeds, how will the process look like. On top of that, snowflake doesn't offer any visualization to work with tasks. Can some of you who worked with Snowflake streams/tasks comment and share you opinion of using tasks and streams? If you went with an alternative after trying them out, was it a commercial ETL tool or databricks? If we're already using qlik to bring in data into AWS S3 (data lake), would it make sense to use streams to ingest from our data lake into snowflake?
TIA
This question seems too wide for the typical Stack Overflow process (so the community might choose to close it).
In the meantime, I'll reply here to one of the stated questions: "On top of that, snowflake doesn't offer any visualization to work with tasks"
There is a tool to visualize tasks, created by a Snowflake SE:
https://medium.com/snowflake/visualizing-task-hierarchies-and-dependencies-in-snowflake-snowsight-d28298d0f0ed
For the larger picture: Snowflake streams and tasks are basic building blocks for more complex solutions. As your use case grows more complex, you'll need to find ways to manage this complexity - either with your own tools, Snowflake's, or third parties.
Since you are running a POC: Make sure to ask your Snowflake sales contact. Engineers like Dave are ready and eager to find a solution that fits your needs.

What kind of connector to snowflake that automaticly uploads new data would you use for IoT data?

I am just starting to set up a project to keep track of some open, home devices that are enabled for an at home network. I have a program that saves this data, and am putting together a process to upload that data to Snowflake automatically. I would like to know what you would recommend so I can easily access the home device information from anywhere.
The two options I am considering are aws's and snowflake's auto ingest option using the snowpipe rest api, which I have tested with only a few devices.
I am considering these two factors - which method can I set up to upload and select data quickly from a mobile app written in python or ruby depending on the device.
Any advice or resources you can point me to on this?
Thank you!
Your question is a pretty open question, so details from you might make this answer a bit more detailed, as well. However, in general, I would suggest that if your IoT data can be stored directly to Blob Storage (S3 in the case of AWS), then you should leverage Snowflake's Snowpipe for continuous ingestion. Also, look into Tasks and Streams to automate moving that data through whatever processes you'll setup once the data is in Snowflake.
A good reference for you:
https://docs.snowflake.net/manuals/user-guide/data-pipelines-intro.html

Continuous data for Snowpipe set up billing question Rest api vs Snowflake UI DDL?

I am interested in automating the data job we run weekly.
I have engaged with snowsql and just started to see what Snowpipe can do.
On the note that Snowflake Internal does not have an option yet to automate data loads with cloud messaging via How Does Snowpipe Work? Note I went with trying the Snowpipe REST Endpoints.
Per the recommendation to separate the files that I copied with the copy command and where I set up Snowpipe, I made sure they were in different tables.
However, with the Snowpipe DDL and the python apks, will both the endpoints and the pipe created in the user interface appear in the Snowpipe billing section?
Yes - for example.
This is an example for Python.
Then CREATE PIPE in a worksheet using the DDL.
Do they both show up?
Try creating a pipe in both your connections and view them in the Billing & Ussage section in the Snowpipe warehouse. Make sure you are using the ACCOUNTADMIN to view the usage information. See this.
Hope that helps.

Load data from firebase to amazon redshift

I have around 500MB data in firebase and I want to move it to amazon redshift on daily basis. what is the best way for above problem.
thanks in advance.
What is "the best way" depends on your criteria, and often highly subjective. But a few pointers may help you get started:
don't download the entire data with a single ref.once('value'. Loading that much data will take time and all your regular users will be blocked while your read is being fulfilled.
do consider using Firebase's private backups. These are coming out of a different data stream, so will not interfere with your regular users. But the downside is that you'll need to a paid app to be able to use this feature.
do consider how you can make your backup process streaming, instead of daily. Firebase is a real-time database, and typically works best when you consider the data flow to be real-time too.

Resources