I'm trying to monitor pipeline runs from ADF in a Snowflake table. I've managed to use a REST API to get the data into Power BI but I now need to get the data from ADF to Snowflake. Anyone have any examples that would be of great help. The data I need to get is like Pipeline name, run time, start time, error message etc.
Please check below approach.
Take Pipeline name, run time, start time, error message etc.. details into pipeline variables
Inside copy activity, Source - point to some dummy file on blob or datalake and then add additional columns for Pipeline name, run time, start time, error message etc.. in source tab.
Inside Copy activity, Sink - Point to your snowflake table.
Inside Copy activity, Mapping tab - Mapp your source and Sink columns accordingly.
Please check below video where author doing same but using dataflow. In your case you can go with copy activity as explained above.
https://www.youtube.com/watch?v=-xna7n33lmc
To know how to access error message of any activity failure. Please check video https://www.youtube.com/watch?v=_lSB7jaDnG0
Related
I wanted to clear my understanding on the following.
Use case
Basically I am running a flink batch job. My requirement is following
I have 10 tables having raw data in postgresql
I want to aggregate that data by creating a tumble window of 10 minutes
I need to store the aggregated data into aggregated postgresql tables
My pseudo code somewhat looks like this
initialize StreamExecutionEnvironment, StreamTableEnvironment
load all the configs from file
configs.foreach(
load data from table
aggregate
store data
delete temporary views created
)
streamExecutionEnvironment.execute()
Everything works fine for now. Still I have gotten one question. I think with this approach all the load functions would be executed simultaneously. So it would put load on flink right as all data is getting loaded simultaneously?? Or my understanding is wrong and the data would get loaded, processed and stored one by one?? please guide
After 4 days of trying everything to load data into snowflake, nothing seems to work at all.
Now as my last option I want to load a local CSV file into snowflake in order to be able to follow the tutorial I am watching.
Unfortunately even this step seems to be a hard one in snowflake. I have seen, that I need to create an internal stage for this. Therefore I went to the Stage and created a "Snowflake Managed", which I think should be an internal stage. I called that Stage "MY_CSV_STAGE".
Internal Stage option on snowflake:
Then I went back to the worksheet and tried the following command:
PUT file://C:\Users\User\Downloads\Projekte/csv_dateien_fuer_snowflake.csv #MY_CSV_STAGE AUTO_COMPRESS=TRUE;
Now by trying to run the command I am just receiving a weired error, which I don't understand:
Error Message:
I really would like to understand what exactly I am doing wrong. I have also read on other places, that I should maybe need Snowsql to import data from local to snowflake. But the installation of the Snowsql I did not figure out.
How can I write this command line in snowflake in order to be able to import the CSV file?
I have loaded data in internal/names stage in snowflake and now want to copy data from internal stage to snowflake table continuously. I read that continuous load via snowpipe is supported only for external stage and not for internal stage. So what is the way to load data from internal stage to snowflake table continuously?
Thanks in advance.
Amrita
I think you can use Snowpipe but you'll need to call the REST API for Snowpipe to load the data.
Have you considered simply running a COPY statement immediately after you run the PUT command to get the file into the table?
Try using Tasks and Schedule it for every 5 mins.
// Task DDL
CREATE OR REPLACE TASK MASTER_TASK
WAREHOUSE = LOAD_WH
SCHEDULE = '5 MINUTE'
AS
COPY INTO STAGE_TABLE
FROM #STAGE;
You can always change the frequency based on your requirement. Refer to this Documentation
For External Stage, please refer this link
I am planning to use Snowpipe to load data from Kafka, but the support team monitoring the pipe jobs needs an alert mechanism.
How can I implement an alert mechanism for Snowpipe via email/slack/etc?
The interface provided by Snowflake between the database and surroundings is mainly with cloud storage. There is no out-of-the-box integration with messaging apart from cloud storage events.
All other integration and messaging must be provided by client solutions.
Snowflake also provides scheduled tasks that can be used for monitoring purposes, but the interface limitations are the same as described above.
Snowflake is database as a service and relies on other (external) cloud services for a complete systems solution.
This is different from installing your own copy of database software on your own compute resource, where you can install any software alongside with the database.
Please correct my understanding if anything I say is incorrect. I believe Snowpipe is great for continuous data loading but it is hard or no way to track all the errors in the source file. As mentioned in the previous suggestions, we could build a visualization querying against COPY_HISTORY and/or PIPE_USAGE HISTORY but it doesn't give you ALL THE ERRORS in the source file. It only tells you these related to the errors
PIPE_USAGE HISTORY will tell you nothing about the errors in the source file.
The only function that can be helpful (for returning all errors) is the VALIDATE table function in the Information_Schema but it only validates for COPY_INTO.
There is a similar function for PIPE called VALIDATE_PIPE_LOAD but according to the documentation it returns only the first error. Snowflake says "This function returns details about ANY errors encountered during an attempted data load into Snowflake tables." But the output column ERROR says only the first error in the source file.
So here is my question. If any of you guys have successfully Snowpipe to load in real-time production environment how are you doing the error handling and alerting mechanism?
I think as compared to Snowpipe, using COPY_INTO within a Stored Procedure and have shell script calling this Stored procedure and then scheduling this script to run using any Enterprise Scheduler like Autosys/Control-m is a much streamlined solution.
Using External functions, Stream and Task for alerting is an elegant solution maybe but again I am not sure if solves the problem of error-tracking.
Both email and Slack alerts can be implemented via external functions.
EDIT (2022-04-27): Snowflake now officially supports Error Notifications for Snowpipe (currently in Public Preview, for AWS only).
"Monitoring" & "alert mechanism" are a very broad terms. What do you want to monitor? What should be triggering the alerts? The answer can only be as good as the question, so adding more details would be helpful.
As Hans mentioned in his answer, any solution would require the use of systems external to Snowflake. However, Snowflake can be the source of the alerts by leveraging external functions or notification integrations.
Here are some options:
If you want to monitor Snowpipe's usage or performance:
You could simply hook up a BI visualization tool to Snowflake's COPY_HISTORY and/or PIPE_USAGE_HISTORY. You could also use Snowflake's own visualization tool, called Snowsight.
If you want to be alerted about data loading issues:
You could create a data test against COPY_HISTORY in DBT, and schedule it to run on a regular basis in DBT Cloud.
Alternatively, you could create a task that calls a procedure on a schedule. Your procedure would check COPY_HISTORY first, then call an external function to report failures.
Some notes about COPY_HISTORY:
Please be aware of the limitations described in the documentation (in terms of the privileges required, etc.)
Because COPY_HISTORY is an INFORMATION_SCHEMA function, it can only operate on one database at a time.
To query multiple databases at once, UNION could be used to combine the results.
COPY_HISTORY can be used for alerting only, not diagnostic. Diagnosing data load errors is another topic entirely (the VALIDATE_PIPE_LOAD function is probably a good place to start).
If you want to be immediately notified of every successful data load performed by Snowpipe:
Create an external function to send notifications/alerts to your service(s) of choice.
Create a stream on the table that Snowpipe loads into.
Add a task that runs every minute, but only when the stream contains data, and have it call your external function to send out the alerts/notifications.
EDIT: This solution does not provide alerting for errors - only for successful data loads! To send alerts for errors, see the solutions above ("If you want to be alerted about data loading issues").
We have a DOS Batch job which runs a multi-step process to:
Delete all records from salesforce for a specific object (download IDs and then delete them using Data Loader)
Deletes all records from a database table which mirrors the Salesforce data.
Extracts data from a database and uploads the data to the Salesforce objects using Data Loader.
Downloads the Salesforce data into the database table.
Recently, the first step has been failing with a QUERY-TIMEOUT error. If I rerun the process, it generally works OK without any other changes. This is being investigated, but is not my question.
My question is: How can I detect when step 1 (which uses Data Loader) in the batch file fails? If this fails, I do not want to proceed with the rest of the process, as this deletes the database data which is used elsewhere for reporting.
Does the Apex Loader set an ERRORLEVEL if it fails? How else can I determine that there was a failure?
Thanks.
Ron Ventura
Please to view more detail refer to the link below. Basically is to check for the log file that the data loader generates when there was an error, so if no errors where found the log files are empty, If the pass is 100% successful, the error log will have a header line and no rows.
https://www.nimbleuser.com/blog/failing-safe-with-the-apex-data-loader-for-salesforce-crm
And also you can refer to this answer.
https://salesforce.stackexchange.com/questions/14466/availability-of-apex-data-loader-error-file-from-local-pc-to-salesforce
Regards.