My team is trying to integrate datadog's RUM data into Snowflake for our data scientists to consume. Is this possible? If so how?
So far I have found documentation on how to integrate data from snowflake into the datadog dashboard, but not the other way around.
There are a number of options:
Use an ETL tool that can connect to both Snowflake and Datadog
Bulk load: export the data to an S3 (or similar) file and use the Snowflake COPY INTO command
Streaming: stream the data out of Datadog and then into Snowflake using Snowpipe
Poll the RUM events API with an application you develop yourself.
https://docs.datadoghq.com/api/latest/rum/
Write microbatches to your target tables using one of the language connectors, or the Spark connector.
https://docs.snowflake.com/en/user-guide/spark-connector-use.html
Related
We have Snowflake in our organization and currently we dont have an ETL tool.
I would like to pull data directly from Salesforce into Snowflake staging table manually for an analysis.
Would it be possible to do this with python or java code ?
many thanks,
This article is pretty useful reference for this requirement: https://rudderstack.com/guides/how-to-load-data-from-salesforce-to-snowflake-step-by-step
You can do this using the simple-salesforce library for python.
https://pypi.org/project/simple-salesforce/
run the query, write out the results to a CSV, then load the CSV into snowflake.
You can use Salesforce Data Loader (provided by Salesforce) : a simple interface to export Salesforce data to a .csv file. It's a java-based tool.
Once exported you can PUT your flat file in a Snowflake Stage then use COPY INTO to load into your final table.
Documentation available here
We choosed Snowflake as our DWH and we would like to connect different data sources like (Salesforce, Hubspot and Zendesk).
Is there a way to extract data from these sources and store them in Snowflake in a staging schema without having to store the data in cloud storage like S3 then reading the data into Snowflake?
Many thanks in advance.
You can use any of the connectors Snowflake provide (odbc, jdbc, python, etc) and any tool that can use one of these connectors. However they wont perform well compared to the COPY INTO approach that is optimised for bulk loading.
There are ETL tools, such as Matillion, that use the stage/copy into approach but do it in the background so that it appears that you are loading directly into Snowflake.
I'm using Zapier with Redshift to fetch data from custom queries and trigger a wide array of actions when new rows are detected from either a table or custom query, including sending emails through Gmail or Mailchimp, exporting data to Google Sheets, and more. Zapier's UI enables our non-technical product stakeholders to take over these workflows and customize them as needed. Zapier has several integrations built for Postgres, and since Redshift supports the Postgres protocol, these custom workflows can be easily built in Zapier.
I'm switching our data warehouse from Redshift to Snowflake and the final obstacle is moving these Zapier Integrations. Snowflake doesn't support the Postgres protocol so it cannot be used as a drop in replacement for these workflows. No other data source has all the information that we need for these workflows so connecting to an upstream datasource of Snowflake is not an option. Would appreciate guidance on alternatives I could pursue, including the following:
Moving these workflows into application code
Using a foreign data wrapper in Postgres for Snowflake to continue using the existing workflows from a dummy Postgres instance
Using custom-code blocks in Zapier instead of the Postgres integration
I'm not sure if Snowflake has an API that will allow you to do what you want, but you can create a private Zapier Integration that will have all the same features and permissions as a public integration, but you can customize it for your team.
There's info about that process here: https://platform.zapier.com/
You might find it easier to use a vendor solution like Census to forward rows as events to Zapier. Their free plan is pretty sizeable for getting started. More info here https://www.getcensus.com/integrations/zapier
I'm new to building data pipelines where dumping files in the cloud is one or more steps in the data flow. Our goal is to store large, raw sets of data from various APIs in the cloud then only pull what we need (summaries of this raw data) and store that in our on premises SQL Server for reporting and analytics. We want to do this in the most easy, logical and robust way. We have chosen AWS as our cloud provider but since we're at the beginning phases are not attached to any particular architecture/services. Because I'm no expert with the cloud nor AWS, I thought I'd post my thought for how we can accomplish our goal and see if anyone has any advice for us. Does this architecture for our data pipeline make sense? Are there any alternative services or data flows we should look into? Thanks in advance.
1) Gather data from multiple sources (using APIs)
2) Dump responses from APIs into S3 buckets
3) Use Glue Crawlers to create a Data Catalog of data in S3 buckets
4) Use Athena to query summaries of the data in S3
5) Store data summaries obtained from Athena queries in on-premises SQL Server
Note: We will program the entire data pipeline using Python (which seems like a good call and easy no matter what AWS services we utilize as boto3 is pretty awesome from what I've seen thus far).
You may use glue jobs (pyspark) for #4 and #5. You may automate flow using Glue triggers
I am using Twitter Steaming and wanted to do visualization for my data. Which is the most compatible and feature enriched database recommended?
You could setup a data pipeline where you fetch and move your data using a tool like Apache Flume or/and Apache Kafka, analyze it with Spark and store it in a sink like Elasticsearch (or any other NoSql db). After that you can query your data using a visualization tool like Kibana.