How to find which warehouse a Snowflake Task is running against? - snowflake-cloud-data-platform

I can see the history of my task using:
select *
from table(information_schema.task_history())
where NAME = 'MY_TASK'
order by scheduled_time;
But this specific task failed because of:
Statement reached its statement or warehouse timeout of 3,600 second(s) and was canceled.
So I issued the following command to increase the timeout of the warehouse I think it's running against:
ALTER WAREHOUSE "MY_WAREHOUSE" SET STATEMENT_TIMEOUT_IN_SECONDS = 18000
But the task still gets the same error. How can I conclusively identify the warehouse I need to issue this command?

If you want your task to use a specific warehouse, you can define it when creating the task using the WAREHOUSE parameter,
otherwise it will be serverless task and you can only define the USER_TASK_MANAGED_INITIAL_WAREHOUSE_SIZE parameter.
If you have problems with TIMEOUT on task, change the default value of the USER_TASK_TIMEOUT_MS parameter, by default it is 3600 seconds.
If you already have a task, you can change this parameter using the ALTER command, for example change to 4 hours:
ALTER TASK IF EXISTS mytask
SET USER_TASK_TIMEOUT_MS = 14400000;
Reference: CREATE TASK, ALTER TASK
Remember that the task_history () function is very limited, by default it only returns 100 rows and only stores data for 7 days.
It's much better to use the TASK_HISTORY view.
Reference: task_history () function, TASK_HISTORY view

Try running SHOW TASKS
SHOW TASKS documentation

Related

Snowflake orchestration of tasks

I have a batch load process that loads data into a staging database. I have a number of tasks that execute stored procedures which move the data to a different database. The tasks are executed when the SYSTEM$STREAM_HAS_DATA condition is satisfied on a given table.
I have a separate stored procedure that I want to execute only after the tasks have completed moving the data.
However, I have no way to know which tables will receive data and therefore do not know which tasks will be executed.
How can I know when all the tasks that satisfied the SYSTEM$STREAM_HAS_DATA condition are finished and I can now kick off the other stored procedure? Is there a way to orchestrate this step by step process similar to how you would in a SQL job?
There is no automated way but you can do it with some coding.
You may create a stored procedure to check the STATE column of the task_history view to see if the tasks are completed or skipped:
https://docs.snowflake.com/en/sql-reference/functions/task_history.html
You can call this stored procedure periodically using a task (like every 5 minutes etc).
Based on your checks inside of the stored procedure (all tasks were succeeded, the target SP wasn't executed today yet etc), you can execute your target stored procedure which needs to be executed after all tasks have been completed.
You can also check the status of all the streams via SELECT SYSTEM$STREAM_HAS_DATA('<stream_name>') FROM STREAM which does not process the stream, or SELECT COUNT(*) FROM STREAM.
Look into using IDENTIFIER for dynamic queries.

Snowflake - Task not running

I have created a simple task with the below script and for some reason it never ran.
CREATE OR REPLACE TASK dbo.tab_update
WAREHOUSE = COMPUTE_WH
SCHEDULE = 'USING CRON * * * * * UTC'
AS CALL dbo.my_procedure();
I am using a snowflake trail enterprise version.
Did you RESUME? From the docs -- "After creating a task, you must execute ALTER TASK … RESUME before the task will run"
A bit of clarification:
Both the steps, while possibly annoying are needed.
Tasks can consume warehouse time (credits) repeatedly (e.g. up to
every minute) so we wanted to make sure that the execute privilege
was granted explicitly to a role.
Tasks can have dependencies and task trees (eventually DAGs)
shouldn't start executing as soon as one or more tasks are created.
Resume provides an explicit sync point when a data engineer can tell
us that the task tree is ready for validation and execution can
start at the next interval.
Dinesh Kulkarni
(PM, Snowflake)

ssis-How do I check job status in a table continuously from SSIS control flow?

we have a requirement where SSIS job should trigger based on the availability of value in the status table maintained,point to remember here that we are not sure about the exact time when the status is going to be available so my SSIS process must continuously look for the value in status table,if value(ex: success) is available in status table then job should trigger.here we have 20 different ssis batch processes which should invoke based on respective/related status value is available.
What you can do is:
Scheduled the SSIS package that run frequently.
For that scheduled package, assign the value from the table to a package variable
Use either expression for disabling the task or constraint expression to let the package proceeds.
Starting a SSIS package takes some time. So I would recommend to create a package with the following structure:
Package variable Check_run type int, initial value 1440 (to stop run after 24 hours if we run check every minute). This is to avoid infinite package run.
Set For Loop, check if Check_run is greater than zero and decrement it on each loop run.
In For loop check your flag variable in Exec SQL task, select single result value and assign its result to a variable, say, Flag.
Create conditional execution branches based on Flag variable value. If Flag variable is set to run - start other packages. Otherwise - wait for a minute with Exec SQL command waitfor delay '01:00'
You mentioned the word trigger. How about you create a trigger when that status column meets the criteria to run the packages:
Also this is how to run a package from T-SQL:
https://www.timmitchell.net/post/2016/11/28/a-better-way-to-execute-ssis-packages-with-t-sql/
You might want to consider creating a master package that runs all the packages associated with this trigger.
I would take #Long's approach, but enhance it by doing the following:
1.) use Execute SQL Task to query the status table for all records that pertain to the specific job function and load the results into a recordset. Note: the variable that you are loading the recordset into must be of type object.
2.) Create a Foreach Loop enumerator of type ADO to loop over the recordset.
3.) Do stuff.
4.) When the job is complete, go back to the status table and mark the record complete so that it is not processed again.
5.) Set the job to run periodically (e.g., minute, hourly, daily, etc.).
The enhancement hear is that no flags are needed to govern the job. If a record exists then the foreach loop does its job. If no records exist within the recordset then the job exits successfully. This simplifies the design.

Launch stored procedure and continue running it even if disconnected

I have a database where data is processed in some kind of batches, where each batch may contain even a million records. I am processing data in a console application, and when I'm done with a batch, I mark it as Done (to avoid reading it again in case it does not get deleted), delete it and move on to a next batch.
I have the following simple stored procedure which deletes processed "batches" of data
CREATE PROCEDURE [dbo].[DeleteBatch]
(
#BatchId bigint
)
AS
SET XACT_ABORT ON
BEGIN TRANSACTION
DELETE FROM table1 WHERE BatchId = #BatchId
DELETE FROM table2 WHERE BatchId = #BatchId
DELETE FROM table3 WHERE BatchId = #BatchId
COMMIT
RETURN ##Error
I am using NHibernate with command timeout value 10 minutes, and the DeleteBatch procedure call times out occasionally.
Actually I don't want to wait for DeleteBatch to complete. I already have marked the batch as Done, so I want to go processing a next batch or maybe even exit my console application, if there are no more pending batches.
I am using Microsoft SQL Express 2012.
Is there any simple solution to tell the SQL server - "launch DeleteBatch and run it asynchronously even if I disconnect, and I don't even need the result of the procedure"?
It would also be great if I could set a lower processing priority for DeleteBatch because other queries are more important than DeleteBatch.
I dont know much about NHibernate. But if you were or can use ADO.NET in this scenario then you can implement asynchronous database operations easliy using the SqlCommand.BeginExecuteNonQuery Method in C#. This method starts the process of asynchronously executing a Transact-SQL statement or stored procedure that does not return rows, so that other tasks can run concurrently while the statement is executing.
EDIT: If you really want to exit from your console app before the db operation ends then you will have to manually create threads in your code and perform the db operation in those threads. Now when you close your console app these threads would still be alive because Threads created using System.Thread.Thread are foreground threads by default. But having said that it is also important to consider how many threads you will create. In your case you would have to assign 1 thread for each batch. If number of batches is very large then large number of threads would need to be created which would inturn eat a large amount of your CPU resources and would even freeze your OS for a long time.
Another simple solution I could suggest is to insert the BatchIds into some database table. Create an INSERT TRIGGER on that table. This trigger would then call a stored proc with BatchId as its parameter and would perform the required tasks.
Hope it helps.
What if your console application were, instead of trying to delete the batch, just write the batch id into a "BatchIdsToDelete" table. Then, you could use an agent job running every x minutes/seconds or whatever, to delete the top x percent records for a given batch id, and maybe sleeping a little before tackling the next x percent.
Maybe worth having a look at that?
Look at this article which explains how to do reliable asynchronous procedure execution, code included. IS based on Service Broker.
the problem with trying to use .NEt async features (like BeginExecute, or task etc) is that the call is unreliable: if the process exits before the procedure completes the execution is canceled in the server as the session is disconnected.
But you need to also look at the task itself, why is the deletion taking +10 minutes? is it blocked by contention? are you missing indexes on BatchId? Use the Performance Troubleshooting Flowchart.
Late to the party, but if someone else has this problem use SQLCMD. With express you are limited in the number of users (I think 2, but it may have changed since I the last time I did much with express). You can have sqlcmd, run queries, stored procedures ...
And you can kick off the sqlcmd with Windows Scheduler. A script, an outlook rule ...
I used it to manage like 3 or 4 thousand SQL Server Express instances, with their nightly maintenance scheduled with the Windows Scheduler.
You could also create and run a PowerShell script, it's more versatile and probably a more widely used than sqlcmd.
I needed a same thing..
After searching for long time I found the solution
Its d easiest way
SqlConnection connection = new SqlConnection();
connection.ConnectionString = "your connection string";
SqlConnectionStringBuilder builder = new SqlConnectionStringBuilder(connection.ConnectionString);
builder.AsynchronousProcessing = true;
SqlConnection newSqlConn = new SqlConnection(builder.ConnectionString);
newSqlConn.Open();
SqlCommand cmd = new SqlCommand(storeProcedureName, newSqlConn);
cmd.CommandType = CommandType.StoredProcedure;
cmd.BeginExecuteNonQuery(null, null);
Ideally SQLConnection object should take an optional parameter / property, URL of a web service, be that WCF or WebApi, or something yet to be named, and if the user wishes to, notify user of execution advance and / or completion status by calling this URL with well known message.
Theoretically DBConnection is extensible object one is free to implement. However, it will take some review of what really can be and needs to be done, before this approach can be said feasible.

How to simulate an ODBC time-out error?

I am testing the error-handling of an Access-VBA controlled process:
A script in an Access 'controller' DB starts.
The script starts a macro in a 2nd Access file (the 'database').
The macro in the 'database' file runs a bunch of maketable queries.
These queries pull from tables linked to an ODBC source (SQL-Server actually).
When this process runs in the early morning hours, sometimes the queries time out. Today, I've updated the error-handling in the controller script, so I want to simulate a time-out error.
I've looked at the ODBC administrator and Advanced options in MS Access, but I'm not finding what I need. Ideas?
Open your macro in design view. Under the View menu, select Properties.
It should be a Timeout property, set it to a short value and test.
re: sometimes the queries time out.
Make sure your query property for ODBC timeout is set to zero so it doesn't generate an error but continues running.
If your queries are modifications, you can add a trigger which invokes WAITFOR. Described here.
Within your SQL Queries add the following statement - it should cause a timeout.
--waits for 5 mins
WaitFor Delay '00:05'
Or if you don't want to amend existing queries you can run this over one of the tables that the macro queries. This will lock the table for 3 mins
begin transaction
Select *
From MyTable with (TABLOCKX)
--wait for 3 min
WaitFor Delay '00:03'
rollback transaction

Resources