Snowflake Snowpipe - Email Alert Mechanism - snowflake-cloud-data-platform

I am planning to use Snowpipe to load data from Kafka, but the support team monitoring the pipe jobs needs an alert mechanism.
How can I implement an alert mechanism for Snowpipe via email/slack/etc?

The interface provided by Snowflake between the database and surroundings is mainly with cloud storage. There is no out-of-the-box integration with messaging apart from cloud storage events.
All other integration and messaging must be provided by client solutions.
Snowflake also provides scheduled tasks that can be used for monitoring purposes, but the interface limitations are the same as described above.
Snowflake is database as a service and relies on other (external) cloud services for a complete systems solution.
This is different from installing your own copy of database software on your own compute resource, where you can install any software alongside with the database.

Please correct my understanding if anything I say is incorrect. I believe Snowpipe is great for continuous data loading but it is hard or no way to track all the errors in the source file. As mentioned in the previous suggestions, we could build a visualization querying against COPY_HISTORY and/or PIPE_USAGE HISTORY but it doesn't give you ALL THE ERRORS in the source file. It only tells you these related to the errors
PIPE_USAGE HISTORY will tell you nothing about the errors in the source file.
The only function that can be helpful (for returning all errors) is the VALIDATE table function in the Information_Schema but it only validates for COPY_INTO.
There is a similar function for PIPE called VALIDATE_PIPE_LOAD but according to the documentation it returns only the first error. Snowflake says "This function returns details about ANY errors encountered during an attempted data load into Snowflake tables." But the output column ERROR says only the first error in the source file.
So here is my question. If any of you guys have successfully Snowpipe to load in real-time production environment how are you doing the error handling and alerting mechanism?
I think as compared to Snowpipe, using COPY_INTO within a Stored Procedure and have shell script calling this Stored procedure and then scheduling this script to run using any Enterprise Scheduler like Autosys/Control-m is a much streamlined solution.
Using External functions, Stream and Task for alerting is an elegant solution maybe but again I am not sure if solves the problem of error-tracking.

Both email and Slack alerts can be implemented via external functions.

EDIT (2022-04-27): Snowflake now officially supports Error Notifications for Snowpipe (currently in Public Preview, for AWS only).
"Monitoring" & "alert mechanism" are a very broad terms. What do you want to monitor? What should be triggering the alerts? The answer can only be as good as the question, so adding more details would be helpful.
As Hans mentioned in his answer, any solution would require the use of systems external to Snowflake. However, Snowflake can be the source of the alerts by leveraging external functions or notification integrations.
Here are some options:
If you want to monitor Snowpipe's usage or performance:
You could simply hook up a BI visualization tool to Snowflake's COPY_HISTORY and/or PIPE_USAGE_HISTORY. You could also use Snowflake's own visualization tool, called Snowsight.
If you want to be alerted about data loading issues:
You could create a data test against COPY_HISTORY in DBT, and schedule it to run on a regular basis in DBT Cloud.
Alternatively, you could create a task that calls a procedure on a schedule. Your procedure would check COPY_HISTORY first, then call an external function to report failures.
Some notes about COPY_HISTORY:
Please be aware of the limitations described in the documentation (in terms of the privileges required, etc.)
Because COPY_HISTORY is an INFORMATION_SCHEMA function, it can only operate on one database at a time.
To query multiple databases at once, UNION could be used to combine the results.
COPY_HISTORY can be used for alerting only, not diagnostic. Diagnosing data load errors is another topic entirely (the VALIDATE_PIPE_LOAD function is probably a good place to start).
If you want to be immediately notified of every successful data load performed by Snowpipe:
Create an external function to send notifications/alerts to your service(s) of choice.
Create a stream on the table that Snowpipe loads into.
Add a task that runs every minute, but only when the stream contains data, and have it call your external function to send out the alerts/notifications.
EDIT: This solution does not provide alerting for errors - only for successful data loads! To send alerts for errors, see the solutions above ("If you want to be alerted about data loading issues").

Related

Automate the execution of a C# code that uses Entity Framework to treat data?

I have code that uses Entity Framework to treat data (retrieves data from multiple tables then performs operations on it before saving in a SQL database). The code was supposed to run when a button is clicked in an MVC web application that I created. But now the client wants the data treatment to run automatically every day at a set time (like an SSIS package). How do I go about this?
But now the client wants the data treatment to run automatically every day at a set time (like an SSIS package). How do I go about this?
In addition to adding a job scheduler to your MVC application as #Pac0 suggests, here are a couple of other options:
Leave the code in the MVC project and create an API endpoint that you can invoke on some sort of schedule. Give the client a PowerShell script that calls the API and let them take it from there.
Or
Refactor the code into a .DLL or copy/paste it into a console application that can be run on a schedule using the Windows Scheduler, SQL Agent or some other external scheduler.
You could use some tool/lib that does this for you. I could recommend Hangfire, it works fine (there are some others, I have not tried them).
The example on their homepage is pretty explicit :
RecurringJob.AddOrUpdate(
() => Console.WriteLine("Recurring!"),
Cron.Daily);
The above code needs to be executed once when your application has started up, and you're good to go. Just replace the lambda by a call to your method.
Adapt the time parameter on what you wish, or even better: make it configurable, because we know customers like to change their mind.
Hangfire needs to create its own database, usually it will stay pretty small for this kind of things. You can also monitor if the jobs ran well or not, and check no the hangfire server some useful stats.

How to automatically save received pdf files from gmail into a database?

I would like to know if this scenario would be possible in any programming language combined with any database technology.
I would like to automatically save received pdf files that are attached in emails into a database. Is this possible? Is there any library or framework available to do so?
Yes, I would recommend using Google Apps Script for this. The approach you should follow is to use the GmailApp class (Documentation here) to get the messages you need, you can use methods like getInboxThreads() (Documentation), to retrieve the messages.
After you've found the message and retrieved the attachment (which you can do withgetAttachments() (Documentation)), you can use the JDBC Service to connect with external databases. The specifics here depend a lot on what database you want to connect with, but the documentation will lead you in the right direction.

What can I do with generated error logs?

I'm currently working on a web application which generates daily error (and non error) logs.
The current system outputs a log per task to a text file, and outputs critical errors as well as "start" and "finish" type messages to an email account.
The current workflow is as follows: scour the email box for errors, then go and find the .txt file to look at the associated errors and find the cause.
There are around 30 txt files split across about 5 servers.
This system was set up before me, but I'm looking for any advice on how to deal with the situation.
I have control of the script forming the error logs so can do pretty much anything - but I'm lost where to start: I'd considered some kind of web facing dashboard tool, maybe output the files to RSS or something?
Are there any external or internal tools I should be using?
Of course you may use the SQL Server Reporting Services or review this comparison table, there are some packages which may support SQL Server but they may be overwhelming for your task.
It's not really clear what your problem is or what you want to do, but if I understand correctly, your biggest problem is that some messages are logged to a log file but others are sent by email. Therefore, there is no single location that has all error messages in it and that makes analysis and troubleshooting difficult.
The best solution would be to use a logging framework that supports multiple logging destinations (file, DB, email) and severities. That would allow you to specify a configuration like "all errors are logged to a text file and critical ones are also sent by email", so you can ensure that you have everything in one place for general analysis but critical errors are also handled with priority.
You didn't mention what programming language you use, but assuming it's .NET-based then log4net and Enterprise Library are two common frameworks and there are many questions about them here on SO. Googling should give you a good idea of the pros and cons for your situation. If you're using a different language then you can look for the equivalent package: log4j (Java), logging (Python) etc.

Design Datatabase tables structure

I am new in this and try to found information in the web have not got any success. I need to create some log tables but have no idea what information should this table contains and how to organize them.
For example:
LogErrorTabble, LogChangesTable, etc..
Could anyone give me some articles about this or link to site with example solutions that he has used?
First of all what log library do you use? If you're on java got for log4j, if you're on .NET go for log4net. Both of these frameworks provide db log appenders that log to the database out of the box.
In case you're not using a log library: use a log library :)
In case you really want to do that on your own I can recommend a layout I used in a project where log messages were stored in a table logs and exceptions associated with an entry in the logs table were stored in an exceptions table but that highly depends on your platform.
You can find a lot of useful information on how to design your log tables in the log4net and log4j documentation. For example take a look at the log4net AdoNetAppender Class.

How do I elegantly import an Excel file into Sql Server via a Coldfusion HTML form?

Does anyone have an elegant suggestion for how to get the contents of an Excel spreadsheet into SQL Server via a web form? I need to allow our clients to upload modest amounts of structured data, and I need that data to ultimately reside in a sql table. I really can't expect the clientele to produce anything but an Excel file, but I could require that it be an xlsx.
The web app is written in Coldfusion; it doesn't need to be able to handle huge numbers of simultaneous requests, but I don't want to consider some sort of server-side batch job processing or shunt the user to an asp.net page (which is what we are doing now).
Any recommendations (or examples of how others are successfully doing this) would be appreciated. Due to the sensitivity of the data, we really can't do anything to compromise the security of the web or sql servers.
If you are using CF9, then you could easily use the cfspreadsheet tag too. I mention this one specifically because Shawn's link did not (presumably due to its being relatively new on the CF scene). Here's the livedoc link: http://help.adobe.com/en_US/ColdFusion/9.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec17cba-7f87.html
For full use, I would create a web form with a standard file upload field. On the backend handling the form submission, get a copy of the file with
<cffile action="upload" destination="uploaded.xls".....>
Then use:
<cfspreadsheet action="read" query="myExcelData" src="uploaded.xls" ...>
At which point, your spreadsheet content will be available as a query object. You can then loop over this query, running insert queries into your sql server each time you loop. That should do it.
Here are the most notable options to help point you in the right direction; choose what you are most comfortable with (Source: Charlie Arehart).
CFXL
JXLS
CFX_Excel
My personal recommendation is to go the CFX_Excel route. Although a commercial product, it will grant you the most functionality/flexibility of the options listed.

Resources