TASKS in Snowflake - snowflake-cloud-data-platform

TASKS in Snowflake - snowflake-cloud-data-platform

I have created two tasks to run once a day
create or replace task TESTDB.TESTSCHEMA.TASK_EXTERNAL_REFRESH
warehouse=W_TEST_DEVELOPER
schedule='USING CRON 0 4 * * * UTC'
TIMESTAMP_INPUT_FORMAT='YYYY-MM-DD HH24'
as
call TESTDB.TESTSCHEMA.TEST_EXTERNAL_TABLE_REFRESH();
create or replace task ESTDB.TESTSCHEMA.TASK_LOAD_TABLES
warehouse=W_TEST_DEVELOPER
schedule='USING CRON 0 5 * * * UTC'
TIMESTAMP_INPUT_FORMAT='YYYY-MM-DD HH24'
as
call TESTDB.TESTSCHEMA.TEST_LOAD_TABLES();
Now I want to ensure that TESTDB.TESTSCHEMA.TASK_EXTERNAL_REFRESH runs before TASK_LOAD_TABLES runs.
How should I do this ?
Also, should the error details from task run be captured in config tables? What is "TESTDB.TESTSCHEMA.TASK_EXTERNAL_REFRESH" fails? If this fails, next one should not run.

The precedence rule should be added instead of schedule:
ALTER TASK TESTDB.TESTSCHEMA.TASK_LOAD_TABLES
ADD AFTER TESTDB.TESTSCHEMA.TASK_EXTERNAL_REFRESH;
CREATE TASK:
AFTER string [ , string , ... ]
Specifies one or more predecessor tasks for the current task. Use this option to create a DAG of tasks or add this task to an existing DAG. A DAG is a series of tasks that starts with a scheduled root task and is linked together by dependencies.
Related: Snowflake - Many tasks dependencies for a task

For query on the predecessor and successor tasks, you should use the "after taskname" option
create task task2
after task1
as
insert into t1(ts) values(current_timestamp);
https://docs.snowflake.com/en/sql-reference/sql/create-task.html#single-sql-statement
A few options to check the status of the task and decide on the successor/child task execution are given below.
You can use the SUSPEND_TASK_AFTER_FAILURES = number
https://docs.snowflake.com/en/user-guide/tasks-intro.html#automatically-suspend-tasks-after-failed-runs
Create a task that calls a UDF to check the ACCOUNT_USAGE.TASK_HISTORY or INFORMATION_SCHEMA.TASK_HISTORY views for task status.
You can use external tools to check the status of the task and integrate it.

Related

AFTER parameter not recognized as part of CREATE TASK

I've created a task and verified that it exists using SHOW TASKS. I'm now trying to create a subtask using the AFTER parameter of CREATE TASK, but I'm getting the following error: invalid property 'AFTER' for 'TASK'. I can't find any documentation on why this is happening. I think my syntax is correct; it appears to match Snowflake's documentation. This is my code:
//Create the task and set the schedule
CREATE TASK NIGHTLY_1
WAREHOUSE = ADMIN_WH
SCHEDULE = 'USING CRON 0 02 * * * America/Chicago'
AS
CALL SP_LOAD_STAGING_TABLE('param1');
//Create the subtask
CREATE TASK NIGHTLY_2
WAREHOUSE = ADMIN_WH
AFTER = NIGHTLY_1
AS
CALL SP_LOAD_STAGING_TABLE('param2');
The notes on the AFTER param state (my emphasis):
The root task in a tree of tasks must be suspended before any task in the tree is recreated (using the CREATE OR REPLACE TASK syntax) or a child task is added (using CREATE TASK … AFTER).
I've verified with SHOW TASKS that the parent task is suspended.
Any thoughts on what is causing the issue?

The equal sign should be removed AFTER = NIGHTLY_1:
//Create the subtask
CREATE TASK NIGHTLY_2
WAREHOUSE = ADMIN_WH
AFTER NIGHTLY_1
AS
CALL SP_LOAD_STAGING_TABLE('param2');
CREATE TASK:
AFTER <string>
Specifies the predecessor task for the current task. When a run of the predecessor task finishes successfully, it triggers this task (after a brief lag).
The same rule applies to ALTER:
ALTER TASK [ IF EXISTS ] <name> REMOVE AFTER <string> | ADD AFTER <string>

how to cancel job in Dolphindb

I use the function of submitJob to get the jobId and I try to cancel the job by using the cancelJob function, but I failed to stop the job. What function should I use to stop the job?
I use the code below:
submitJob("aa", "a1", replay, [ds], [sink], date, `time, 10)
cancelJob(aa)

The function submitJob will return the actual jobId that might be different from the input jobId. So please use the jobId returned by the submitJob function.
DolphinDB uses thread pool to run jobs. So if the job is simple and contains no sub tasks, we still can't cancel the job.

How to execute a sample just before thread shutdown in Jmeter?

Is there a way in Jmeter to execute a sample just before thread shutdown?
For example, I have a test plan that inserts data into a database and autocommit is disabled on the connection. Each thread spawns its own connection to the database. Plan runs on a schedule (i.e. I don't know samples count) and I want to commit all inserted rows at the end of the test. Is there a way to do that?

The easiest is going for tearDown Thread Group which is designed for performing clean-up actions.
The harder way is to add a separate Thread Group with 1 thread and 1 iteration and 1 JSR223 Sampler with the following Groovy code:
class ShutdownListener implements Runnable {
#Override
public void run() {
//your code which needs to be executed before test ends
}
}
new ShutdownListener().run()

Try running the commit sample based on some if condition w.r.t duration or iterationnum
For ex: if you are supposed to run 100 iterations :
An If controller with the condition -
__groovy(${__iterationNum}==100)
should help.

ok this might not be the most optimal but could be workable
Add the following code in a JSRSampler inside a onceonly controller
def scenarioStartTime = System.currentTimeMillis();
def timeLimit= ctx.getThreadGroup().getDuration()-10; //Timelimit to execute the commit sampler
vars.put("scenarioStartTime",scenarioStartTime.toString());
vars.put("timeLimit",timeLimit.toString());
Now after your DB insert sampler add the following condition in a if controller and add the commit sampler.
${__groovy(System.currentTimeMillis()-Long.valueOf(vars.get("scenarioStartTime"))>=Long.valueOf(vars.get("timeLimit"))*1000)}
This condition should let you execute the commit sampler just before the end of test duration.

Polling a restful service in Talend

I am building a job in Talend that queries a restful service. In the job, I initiate a job and get a job ID back. I then query a status service, and need to wait for the job to complete. How would I go about doing this in Talend? I have been playing around with tLoop, tFlowToIterate, tIterateToFlow and tJavaRow components to try get this to work, but am not sure how to configure it.
Here's a summary of what I'm trying to do:
1. tRest: Start a job and get job ID
|
--> 2. tRest: Poll status of job
|
--> 3. tUnknown?: If the job is running, sleep and re-run Step 2.
|
--> 4. tRest: when the job is complete, retrieve the results
How would I set up step 3 above?

Basically you want something like
tInfiniteLoop --iterate--> (subjob for querying the service and determining if result is ready) --if (result is ready)--> (subjob for fetching the result) --on subjob ok--> tjava with "counter_tInfiniteLoop_1 = -1;" to leave loop (don't know of a better alternative)
I would advice to implementing a timeout or maximum number of lookups and maybe even an automatically increasing sleep time.

how long the map call can last?

I want to do some heavy processing in the map() call of the mapper.
I was going through the source file MapReduceServlet.java:
// Amount of time to spend on actual map() calls per task execution.
public static final int PROCESSING_TIME_PER_TASK_MS = 10000;
Does it mean, the map call can last only for 10secs. What happens after 10sec?
Can I increase this to large number like 1min or 10min.
-Aswath

MapReduce operations are executed in tasks using Push Queues, and as said in the documentation the task deadline is currently 10 minutes (limit after which you will get a DeadlineExceededException).
If the task failed to execute, by default App Engine retries it until it succeed. If you need longer deadline that 10 minutes, you can use Backend for executing your tasks.
Looking at the actual usage of PROCESSING_TIME_PER_TASK_MS in Worker.java, this value is used to limit the number of map call done in a single task.
After each map call has been executed if more than 10s has elapsed since the beginning of the task, it will spawn a new task to handle the rest of the map calls.
Worker.scheduleWorker spawns a new Task for a each given shard
Each task will call Worker.processMapper
processMapper execute 1 map call
if less than PROCESSING_TIME_PER_TASK_MS have elapsed since 2. go back to 3.
else if processing is not finished reschedule a new worker task
In the worst case scenario the default task request deadline (10 minutes) should apply to each of your individual map call.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

TASKS in Snowflake - snowflake-cloud-data-platform

Related

AFTER parameter not recognized as part of CREATE TASK

how to cancel job in Dolphindb

How to execute a sample just before thread shutdown in Jmeter?

Polling a restful service in Talend

how long the map call can last?

Categories

Resources