I'm discovering Talend Data Quality Dashboards through a tutorial, and i want to create a schema as shown below but i cant find how:
Related
What are the steps to be taken to migrate historical data load from Teradata to Snowflake?
Imagine there is 200TB+ of historical data combined from all tables.
I am thinking of two approaches. But I don't have enough expertise and experience on how to execute them. So looking for someone to fill in the gaps and throw some suggestions
Approach 1- Using TPT/FEXP scripts
I know that TPT/FEXP scripts can be written to generate files for a table. How can I create a single script that can generate files for all the tables in the database. (Because imagine creating 500 odd scripts for all the tables is impractical).
Once you have this script ready, how is this executed in real-time? Do we create a shell script and schedule it through some Enterprise scheduler like Autosys/Tidal?
Once these files are generated , how do you split them in Linux machine if each file is huge in size (because the recommended size is between 100-250MB for data loading in Snowflake)
How to move these files to Azure Data Lake?
Use COPY INTO / Snowpipe to load into Snowflake Tables.
Approach 2
Using ADF copy activity to extract data from Teradata and create files in ADLS.
Use COPY INTO/ Snowpipe to load into Snowflake Tables.
Which of these two is the best suggested approach ?
In general, what are the challenges faced in each of these approaches.
Using ADF will be a much better solution. This also allows you to design DataLake as part of your solution.
You can design a generic solution that will import all the tables provided in the configuration. For this you can choose the recommended file format (parquet) and the size of these files and parallel loading.
The challenges you will encounter are probably a poorly working ADF connector to Snowflake, here you will find my recommendations on how to bypass the connector problem and how to use DataLake Gen2:
Trouble loading data into Snowflake using Azure Data Factory
More about the recommendation on how to build Azure Data Lake Storage Gen2 structures can be found here: Best practices for using Azure Data Lake Storage Gen2
We have Snowflake in our organization and currently we dont have an ETL tool.
I would like to pull data directly from Salesforce into Snowflake staging table manually for an analysis.
Would it be possible to do this with python or java code ?
many thanks,
This article is pretty useful reference for this requirement: https://rudderstack.com/guides/how-to-load-data-from-salesforce-to-snowflake-step-by-step
You can do this using the simple-salesforce library for python.
https://pypi.org/project/simple-salesforce/
run the query, write out the results to a CSV, then load the CSV into snowflake.
You can use Salesforce Data Loader (provided by Salesforce) : a simple interface to export Salesforce data to a .csv file. It's a java-based tool.
Once exported you can PUT your flat file in a Snowflake Stage then use COPY INTO to load into your final table.
Documentation available here
I am using Google Sheets to create a database that is connected to Google Data Studio. But the database is growing fast and will soon overgrow Sheets limits.
I am looking for a cloud service that is simple to use like Sheets, where I can manually add data, do calculations (like formulas in Sheets) and also use Python to update the data there. I also need it to connect to Google Data Studio for visualisation.
I got recommended Firestore, Cloud SQL, Bigquery, but I still do not understand the difference between them. I am looking for something cheap where I can do the things I mentioned above.
P.S. I am new to SQL, so I would prefer a visual database (like Sheets).
Thank you all!
Sheet is not a database, but you can use as is. You have other type of database on Google Cloud, such as
Firestore a document oriented database, not really similar to a tabular Sheet
BigQuery which is a datawarehouse very powerful and the most similar to sheet in its design, checks and controls
Cloud SQL hosts relational database engine, similar to BigQuery but with, in addition, the capacity to create contraint (unique value, primary key, external (foreign) key in relation with another value in another table.
However, no one offer the easiness of Sheet in term of graphical interface. The engine are powerful but are developer oriented and not desktop user oriented.
I'm using Zapier with Redshift to fetch data from custom queries and trigger a wide array of actions when new rows are detected from either a table or custom query, including sending emails through Gmail or Mailchimp, exporting data to Google Sheets, and more. Zapier's UI enables our non-technical product stakeholders to take over these workflows and customize them as needed. Zapier has several integrations built for Postgres, and since Redshift supports the Postgres protocol, these custom workflows can be easily built in Zapier.
I'm switching our data warehouse from Redshift to Snowflake and the final obstacle is moving these Zapier Integrations. Snowflake doesn't support the Postgres protocol so it cannot be used as a drop in replacement for these workflows. No other data source has all the information that we need for these workflows so connecting to an upstream datasource of Snowflake is not an option. Would appreciate guidance on alternatives I could pursue, including the following:
Moving these workflows into application code
Using a foreign data wrapper in Postgres for Snowflake to continue using the existing workflows from a dummy Postgres instance
Using custom-code blocks in Zapier instead of the Postgres integration
I'm not sure if Snowflake has an API that will allow you to do what you want, but you can create a private Zapier Integration that will have all the same features and permissions as a public integration, but you can customize it for your team.
There's info about that process here: https://platform.zapier.com/
You might find it easier to use a vendor solution like Census to forward rows as events to Zapier. Their free plan is pretty sizeable for getting started. More info here https://www.getcensus.com/integrations/zapier
I'm planning to make little web on salesforce that sells e-magazines. But i don't know how to create database and use it. I'm using 30 days free trial of salesforce.
Help me.
Salesforce has wrapped database engine which runs under each salesforce org. You do not need any preparation for work with it.
Salesforce database model has the following structure:
sObject is analog of database table
sObject's field is analog of database column
Salesforce has a set of default tables(sObject - Standard Salesforce objects). You has ability to create your own tables via Setup --> Create --> Object and connect these tables with standard or custom via relationships. Also you have ability to manipulate data via SOQL (SQL like query language)
I would suggest you to read these tutorials