Snowflake - Task not running - snowflake-cloud-data-platform

Snowflake - Task not running - snowflake-cloud-data-platform

I have created a simple task with the below script and for some reason it never ran.
CREATE OR REPLACE TASK dbo.tab_update
WAREHOUSE = COMPUTE_WH
SCHEDULE = 'USING CRON * * * * * UTC'
AS CALL dbo.my_procedure();
I am using a snowflake trail enterprise version.

Did you RESUME? From the docs -- "After creating a task, you must execute ALTER TASK … RESUME before the task will run"

A bit of clarification:
Both the steps, while possibly annoying are needed.
Tasks can consume warehouse time (credits) repeatedly (e.g. up to
every minute) so we wanted to make sure that the execute privilege
was granted explicitly to a role.
Tasks can have dependencies and task trees (eventually DAGs)
shouldn't start executing as soon as one or more tasks are created.
Resume provides an explicit sync point when a data engineer can tell
us that the task tree is ready for validation and execution can
start at the next interval.
Dinesh Kulkarni
(PM, Snowflake)

Related

TASKS history in Snowflake

Is there a efficient way to see the logs of task run in snowflake
I am using this. Is there a possibility to wipe off the history from here?
select *
from table(information_schema.task_history(
scheduled_time_range_start=>dateadd('hour',-1,current_timestamp()),
result_limit => 1000,
task_name=>'TASKNAME'));

Is there a efficient way to see the logs of task run in Snowflake?
Depending of meaning of workd efficient, Snowflake offers UI to monitor tasks dependencies and run history.
Run History
Task run history includes details about each execution of a given task. You can view the scheduled time, the actual start time, duration of a task and other information.
Account Level Task History:
ask history displays task information at the account level, and is divided into three sections:
Selection (1) - Defines the set of task history to display, and includes types of tasks, date range and other information
Histogram (2) - Displays a bar graph of task runs over.
Task list (3) - a list of selected tasks.
Is there a possibility to wipe off the history from here?
Task History
This Account Usage view enables you to retrieve the history of task usage within the last 365 days (1 year). The view displays one row for each run of a task in the history.

Snowflake orchestration of tasks

I have a batch load process that loads data into a staging database. I have a number of tasks that execute stored procedures which move the data to a different database. The tasks are executed when the SYSTEM$STREAM_HAS_DATA condition is satisfied on a given table.
I have a separate stored procedure that I want to execute only after the tasks have completed moving the data.
However, I have no way to know which tables will receive data and therefore do not know which tasks will be executed.
How can I know when all the tasks that satisfied the SYSTEM$STREAM_HAS_DATA condition are finished and I can now kick off the other stored procedure? Is there a way to orchestrate this step by step process similar to how you would in a SQL job?

There is no automated way but you can do it with some coding.
You may create a stored procedure to check the STATE column of the task_history view to see if the tasks are completed or skipped:
https://docs.snowflake.com/en/sql-reference/functions/task_history.html
You can call this stored procedure periodically using a task (like every 5 minutes etc).
Based on your checks inside of the stored procedure (all tasks were succeeded, the target SP wasn't executed today yet etc), you can execute your target stored procedure which needs to be executed after all tasks have been completed.

You can also check the status of all the streams via SELECT SYSTEM$STREAM_HAS_DATA('<stream_name>') FROM STREAM which does not process the stream, or SELECT COUNT(*) FROM STREAM.
Look into using IDENTIFIER for dynamic queries.

How to find which warehouse a Snowflake Task is running against?

I can see the history of my task using:
select *
from table(information_schema.task_history())
where NAME = 'MY_TASK'
order by scheduled_time;
But this specific task failed because of:
Statement reached its statement or warehouse timeout of 3,600 second(s) and was canceled.
So I issued the following command to increase the timeout of the warehouse I think it's running against:
ALTER WAREHOUSE "MY_WAREHOUSE" SET STATEMENT_TIMEOUT_IN_SECONDS = 18000
But the task still gets the same error. How can I conclusively identify the warehouse I need to issue this command?

If you want your task to use a specific warehouse, you can define it when creating the task using the WAREHOUSE parameter,
otherwise it will be serverless task and you can only define the USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE parameter.
If you have problems with TIMEOUT on task, change the default value of the USER_TASK_TIMEOUT_MS parameter, by default it is 3600 seconds.
If you already have a task, you can change this parameter using the ALTER command, for example change to 4 hours:
ALTER TASK IF EXISTS mytask
SET USER_TASK_TIMEOUT_MS = 14400000;
Reference: CREATE TASK, ALTER TASK
Remember that the task_history () function is very limited, by default it only returns 100 rows and only stores data for 7 days.
It's much better to use the TASK_HISTORY view.
Reference: task_history () function, TASK_HISTORY view

Try running SHOW TASKS
SHOW TASKS documentation

Launch stored procedure and continue running it even if disconnected

I have a database where data is processed in some kind of batches, where each batch may contain even a million records. I am processing data in a console application, and when I'm done with a batch, I mark it as Done (to avoid reading it again in case it does not get deleted), delete it and move on to a next batch.
I have the following simple stored procedure which deletes processed "batches" of data
CREATE PROCEDURE [dbo].[DeleteBatch]
(
#BatchId bigint
)
AS
SET XACT_ABORT ON
BEGIN TRANSACTION
DELETE FROM table1 WHERE BatchId = #BatchId
DELETE FROM table2 WHERE BatchId = #BatchId
DELETE FROM table3 WHERE BatchId = #BatchId
COMMIT
RETURN ##Error
I am using NHibernate with command timeout value 10 minutes, and the DeleteBatch procedure call times out occasionally.
Actually I don't want to wait for DeleteBatch to complete. I already have marked the batch as Done, so I want to go processing a next batch or maybe even exit my console application, if there are no more pending batches.
I am using Microsoft SQL Express 2012.
Is there any simple solution to tell the SQL server - "launch DeleteBatch and run it asynchronously even if I disconnect, and I don't even need the result of the procedure"?
It would also be great if I could set a lower processing priority for DeleteBatch because other queries are more important than DeleteBatch.

I dont know much about NHibernate. But if you were or can use ADO.NET in this scenario then you can implement asynchronous database operations easliy using the SqlCommand.BeginExecuteNonQuery Method in C#. This method starts the process of asynchronously executing a Transact-SQL statement or stored procedure that does not return rows, so that other tasks can run concurrently while the statement is executing.
EDIT: If you really want to exit from your console app before the db operation ends then you will have to manually create threads in your code and perform the db operation in those threads. Now when you close your console app these threads would still be alive because Threads created using System.Thread.Thread are foreground threads by default. But having said that it is also important to consider how many threads you will create. In your case you would have to assign 1 thread for each batch. If number of batches is very large then large number of threads would need to be created which would inturn eat a large amount of your CPU resources and would even freeze your OS for a long time.
Another simple solution I could suggest is to insert the BatchIds into some database table. Create an INSERT TRIGGER on that table. This trigger would then call a stored proc with BatchId as its parameter and would perform the required tasks.
Hope it helps.

What if your console application were, instead of trying to delete the batch, just write the batch id into a "BatchIdsToDelete" table. Then, you could use an agent job running every x minutes/seconds or whatever, to delete the top x percent records for a given batch id, and maybe sleeping a little before tackling the next x percent.
Maybe worth having a look at that?

Look at this article which explains how to do reliable asynchronous procedure execution, code included. IS based on Service Broker.
the problem with trying to use .NEt async features (like BeginExecute, or task etc) is that the call is unreliable: if the process exits before the procedure completes the execution is canceled in the server as the session is disconnected.
But you need to also look at the task itself, why is the deletion taking +10 minutes? is it blocked by contention? are you missing indexes on BatchId? Use the Performance Troubleshooting Flowchart.

Late to the party, but if someone else has this problem use SQLCMD. With express you are limited in the number of users (I think 2, but it may have changed since I the last time I did much with express). You can have sqlcmd, run queries, stored procedures ...
And you can kick off the sqlcmd with Windows Scheduler. A script, an outlook rule ...
I used it to manage like 3 or 4 thousand SQL Server Express instances, with their nightly maintenance scheduled with the Windows Scheduler.
You could also create and run a PowerShell script, it's more versatile and probably a more widely used than sqlcmd.

I needed a same thing..
After searching for long time I found the solution
Its d easiest way
SqlConnection connection = new SqlConnection();
connection.ConnectionString = "your connection string";
SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder(connection.ConnectionString);
builder.AsynchronousProcessing = true;
SqlConnection newSqlConn = new SqlConnection(builder.ConnectionString);
newSqlConn.Open();
SqlCommand cmd = new SqlCommand(storeProcedureName, newSqlConn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.BeginExecuteNonQuery(null, null);

Ideally SQLConnection object should take an optional parameter / property, URL of a web service, be that WCF or WebApi, or something yet to be named, and if the user wishes to, notify user of execution advance and / or completion status by calling this URL with well known message.
Theoretically DBConnection is extensible object one is free to implement. However, it will take some review of what really can be and needs to be done, before this approach can be said feasible.

SQL Server trigger an asynchronous update from trigger?

If a user inserts rows into a table, i would like SQL Server to perform some additional processing - but not in the context of the user's transaction.
e.g. The user gives read access to a folder:
UPDATE Folders SET ReadAccess = 1
WHERE FolderID = 7
As far as the user is concerned i want that to be the end of the atomic operation. In reality i have to now go find all child files and folders and give them ReadAccess.
EXECUTE SynchronizePermissions
This is a potentially lengthy operation (over 2s). i want this lengthy operation to happen "later". It can happen 0 seconds later, and before the carbon-unit has a chance to think about it the asynchronous update is done.
How can i run this required operation asychronously when it's required (i.e. triggered)?
The ideal would be:
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
EXECUTEASYNCHRONOUS SynchronizePermissions
or
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
EXECUTE SynchronizePermissions WITH(ASYNCHRONOUS)
Right now this happens as a trigger:
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
EXECUTE SynchronizePermissions
and the user is forced to wait the 3 seconds every time they make a change to the Folders table.
i've thought about creating a Scheduled Task on the user, that runs every minute, and check for an PermissionsNeedSynchronizing flag:
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
UPDATE SystemState SET PermissionsNeedsSynchronizing = 1
The scheduled task binary can check for this flag, run if the flag is on:
DECLARE #FlagValue int
SET #FlagValue = 0;
UPDATE SystemState SET #FlagValue = PermissionsNeedsSynchronizing+1
WHERE PermissionsNeedsSynchronizing = 1
IF #FlagValue = 2
BEGIN
EXECUTE SynchronizePermissions
UPDATE SystemState SET PermissionsNeedsSynchronizing = 0
WHERE PermissionsNeedsSynchronizing = 2
END
The problem with a scheduled task is:
- the fastest it can run is every 60 seconds
- it's suffers from being a polling solution
- it requires an executable
What i'd prefer is a way that SQL Server could trigger the scheduled task:
CREATE TRIGGER dbo.Folders FOR INSERT, UPDATE, DELETE AS
EXECUTE SynchronizePermissionsAsychronous
CREATE PROCEDURE dbo.SynchronizePermissionsAsychronous AS
EXECUTE sp_ms_StartWindowsScheduledTask #taskName="SynchronousPermissions"
The problem with this is:
- there is no sp_ms_StartWinodowsScheduledTask system stored procedure
So i'm looking for ideas for better solutions.
Update: The previous example is a problem, that has has no good solution, for five years now. A problem from 3 years ago, that has no good solution is a table that i need to update a meta-data column after an insert/update. The metadata takes too long to calculate in online transaction processing, but i am ok with it appearing 3 or 5 seconds later:
CREATE TRIGGER dbo.UpdateFundsTransferValues FOR INSERT, UPDATE AS
UPDATE FundsTransfers
SET TotalOrderValue = (SELECT ....[snip]....),
TotalDropValue = (SELECT ....,[snip]....)
WHERE FundsTransfers.FundsTransferID IN (
SELECT i.FundsTransferID
FROM INSERTED i
)
And the problem that i'm having today is a way to asychronously update some metadata after a row has been transitionally inserted or modified:
CREATE TRIGGER dbo.UpdateCDRValue FOR INSERT, UPDATE AS
UPDATE LCDs
SET CDRValue = (SELECT ....[snip]....)
WHERE LCDs.LCDGUID IN (
SELECT i.LCDGUID
FROM INSERTED i
)
Update 2: i've thought about creating a native, or managed, dll and using it as an extended stored procedure. The problem with that is:
you can't script a binary
i'm now allowed to do it

Use a queue table, and have a different background process pick things up off the queue and process them. The trigger itself is by definition a part of the user's transaction - this is precisely why they are often discouraged (or at least people are warned to not use expensive techniques inside triggers).

Create a SQL Agent job and run it with sp_start_job..it shouldn't wait for completion
However you need the proper permission to run jobs
Members of SQLAgentUserRole and SQLAgentReaderRole can only start jobs
that they own. Members of SQLAgentOperatorRole can start all local
jobs including those that are owned by other users. Members of
sysadmin can start all local and multiserver jobs.
The problem with this approach is that if the job is already running it can't be started until it is finished
Otherwise go with the queue table that Aaron suggested, it is cleaner and better

We came across this problem some time ago, and I figured out a solution that works beautifully. I do have a process running in the background-- but just like you, I didn't want it to have to poll every 60 seconds.
Here are the steps:
(1) Our trigger doesn't run the db update itself. It merely throws a "flag file" into a folder that is monitored by the background process.
(2) The background process monitors that folder using Windows Change Notification (this is the really cool part, because you don't have to poll the folder-- your process sleeps until Windows notifies it that a file has appeared). Whenever the background process is awoken by Windows, it runs the db update. Then it deletes the flag file(s), goes to sleep again and tells Windows to wake it up when another file appears in the folder.
This is working exactly as you described: the triggered update runs shortly after the main database event, and voila, the user doesn't have to wait the extra few seconds. I just love it.
You don't necessarily need to compile your own executable to do this: many scripting languges can use Windows Change Notification. I wrote the background process in Perl and it only took a few minutes to get it working.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake - Task not running - snowflake-cloud-data-platform

I have created a simple task with the below script and for some reason it never ran. CREATE OR REPLACE TASK dbo.tab_update WAREHOUSE = COMPUTE_WH SCHEDULE = 'USING CRON * * * * * UTC' AS CALL dbo.my_procedure(); I am using a snowflake trail enterprise version.

Did you RESUME? From the docs -- "After creating a task, you must execute ALTER TASK … RESUME before the task will run"

Related

TASKS history in Snowflake

Snowflake orchestration of tasks

How to find which warehouse a Snowflake Task is running against?

Launch stored procedure and continue running it even if disconnected

SQL Server trigger an asynchronous update from trigger?

Categories

Resources