AFTER parameter not recognized as part of CREATE TASK - snowflake-cloud-data-platform

I've created a task and verified that it exists using SHOW TASKS. I'm now trying to create a subtask using the AFTER parameter of CREATE TASK, but I'm getting the following error: invalid property 'AFTER' for 'TASK'. I can't find any documentation on why this is happening. I think my syntax is correct; it appears to match Snowflake's documentation. This is my code:
//Create the task and set the schedule
CREATE TASK NIGHTLY_1
WAREHOUSE = ADMIN_WH
SCHEDULE = 'USING CRON 0 02 * * * America/Chicago'
AS
CALL SP_LOAD_STAGING_TABLE('param1');
//Create the subtask
CREATE TASK NIGHTLY_2
WAREHOUSE = ADMIN_WH
AFTER = NIGHTLY_1
AS
CALL SP_LOAD_STAGING_TABLE('param2');
The notes on the AFTER param state (my emphasis):
The root task in a tree of tasks must be suspended before any task in the tree is recreated (using the CREATE OR REPLACE TASK syntax) or a child task is added (using CREATE TASK … AFTER).
I've verified with SHOW TASKS that the parent task is suspended.
Any thoughts on what is causing the issue?

The equal sign should be removed AFTER = NIGHTLY_1:
//Create the subtask
CREATE TASK NIGHTLY_2
WAREHOUSE = ADMIN_WH
AFTER NIGHTLY_1
AS
CALL SP_LOAD_STAGING_TABLE('param2');
CREATE TASK:
AFTER <string>
Specifies the predecessor task for the current task. When a run of the predecessor task finishes successfully, it triggers this task (after a brief lag).
The same rule applies to ALTER:
ALTER TASK [ IF EXISTS ] <name> REMOVE AFTER <string> | ADD AFTER <string>

Related

Two flink jobs running in one application result in first to complete and second to fail with NPE

I have two flink jobs in one Application:
1)First is flink batch job that sends events to kafka, which is then written by someone else to s3
2)Second is flink batch job that checks generated data(reads s3).
Considerations. These 2 jobs work fine separately. When combined only first job is completed and sends events to kafka. But the second is failing when I'm traversing the result of SQL
...
//First job
val env = org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.getExecutionEnvironment
...
//Creates Datastream from generated events and gets the store
streamingDataset.write(store)
env.execute()
...
// Second job
val flinkEnv: = org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getExecutionEnvironment
val batchStream: DataStream[RowData] =
FlinkSource.forRowData()
.env(flinkEnv)
.tableLoader(tableLoader)
.streaming(false)
.build()
val tableEnv = StreamTableEnvironment.create(flinkEnv)
val inputTable = tableEnv.fromDataStream(batchStream)
tableEnv.createTemporaryView("InputTable", inputTable)
val resultTable: TableResult = tableEnv
.sqlQuery("SELECT * FROM InputTable")
.fetch(3)
.execute()
val results: CloseableIterator[Row] = resultTable.collect()
while (results.hasNext) {
print("Result test " + event)
}
...
org.apache.flink.streaming.api.operators.collect.CollectResultFetcher [] - An exception occurred when fetching query results
java.lang.NullPointerException: Unknown operator ID. This is a bug.
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:76)
at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.sendRequest(CollectResultFetcher.java:166)
at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:129)
at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:106)
at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80)
at org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:222) ~[?:?]
I want to have two jobs in one application to have generated data in-memory(so I don't have to take care of saving it somewhere else). Is it possible to combine these two jobs or do I have to run them separately? Or is there a better way to restructure my code to make it work?

TASKS in Snowflake

I have created two tasks to run once a day
create or replace task TESTDB.TESTSCHEMA.TASK_EXTERNAL_REFRESH
warehouse=W_TEST_DEVELOPER
schedule='USING CRON 0 4 * * * UTC'
TIMESTAMP_INPUT_FORMAT='YYYY-MM-DD HH24'
as
call TESTDB.TESTSCHEMA.TEST_EXTERNAL_TABLE_REFRESH();
create or replace task ESTDB.TESTSCHEMA.TASK_LOAD_TABLES
warehouse=W_TEST_DEVELOPER
schedule='USING CRON 0 5 * * * UTC'
TIMESTAMP_INPUT_FORMAT='YYYY-MM-DD HH24'
as
call TESTDB.TESTSCHEMA.TEST_LOAD_TABLES();
Now I want to ensure that TESTDB.TESTSCHEMA.TASK_EXTERNAL_REFRESH runs before TASK_LOAD_TABLES runs.
How should I do this ?
Also, should the error details from task run be captured in config tables? What is "TESTDB.TESTSCHEMA.TASK_EXTERNAL_REFRESH" fails? If this fails, next one should not run.
The precedence rule should be added instead of schedule:
ALTER TASK TESTDB.TESTSCHEMA.TASK_LOAD_TABLES
ADD AFTER TESTDB.TESTSCHEMA.TASK_EXTERNAL_REFRESH;
CREATE TASK:
AFTER string [ , string , ... ]
Specifies one or more predecessor tasks for the current task. Use this option to create a DAG of tasks or add this task to an existing DAG. A DAG is a series of tasks that starts with a scheduled root task and is linked together by dependencies.
Related: Snowflake - Many tasks dependencies for a task
For query on the predecessor and successor tasks, you should use the "after taskname" option
create task task2
after task1
as
insert into t1(ts) values(current_timestamp);
https://docs.snowflake.com/en/sql-reference/sql/create-task.html#single-sql-statement
A few options to check the status of the task and decide on the successor/child task execution are given below.
You can use the SUSPEND_TASK_AFTER_FAILURES = number
https://docs.snowflake.com/en/user-guide/tasks-intro.html#automatically-suspend-tasks-after-failed-runs
Create a task that calls a UDF to check the ACCOUNT_USAGE.TASK_HISTORY or INFORMATION_SCHEMA.TASK_HISTORY views for task status.
You can use external tools to check the status of the task and integrate it.

Apache Flink (How to uniquely tag Jobs)

Is it possible to tag jobs with a unique name so I can stop them at a later date?. I don't really want to grep and persist Job IDs.
In a nutshell I want to stop a job as part of my deployment and deploy the new one.
You can name jobs when you start them in the execute(name: String) call, e.g.,
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment()
val result: DataStream[] = ??? // your job logic
result.addSink(new YourSinkFunction) // add a sink
env.execute("Name of your job") // execute and assign a name
The REST API of the JobManager provides a list of job details which include the name of the job and its JobId.

multiple emails sent by sp_db_sendmail when run as SSIS package

I've a procedure which generates a tab delimited text file and also sends an email with a list of students as attachment using msdb.dbo.sp_send_dbmail.
When I execute the procedure thoruhg SQL server management studio, it sends only one email.
But I created a SSIS package and scheduled the job to run nightly. This job sends 4 copies of the email to each recipient.
EXEC msdb.dbo.sp_send_dbmail #profile_name = 'A'
,#recipients = #email_address
,#subject = 'Error Records'
,#query = 'SELECT * FROM ##xxxx'
,#attach_query_result_as_file = 1
,#query_attachment_filename = 'results.txt'
,#query_result_header = 1
,#query_result_width=8000
,#body = 'These students were not imported'
I've set following parameters to 0 (within database mail configuration wizard), to see if it makes any difference. But it didn't resolve the problem.
AccountRetryAttempts 0
AccountRetryDelay 0
DatabaseMailExeMinimumLifeTime 0
Any suggestions?
I assume you have this email wired up to an event, like OnError/OnTaskFailed, probably at the root level.
Every item you add to a Control Flow adds another layer of potential events. Imagine a Control Flow with a Sequence Container which Contains a ForEach Enumerator which contains a Data Flow Task. That's a fairly common design. Each of those objects has the ability to raise/handle events based on the objects it contains. The distance between the Control Flow's OnTaskFailed event handler and the Data Flow's OnTaskFailed event handler is 5 objects deep.
Data flow fails and raises the OnTaskFailed message. That message bubbles all the way up to the Control Flow resulting in email 1 being fired. The data flow then terminates. The ForEach loop receives signal that the Data Flow has completed and the return status was a failure so now the OnTaskFailed error fires for the Foreach loop. Repeat this pattern ad nauseum until every task/container has raised their own event.
Resolution depends, but usually folks get around this by either only putting the notification at the innermost objects (data flow in my example) or disabling the percolation of event handlers.
Check the solution here (it worked for me as I was getting 2 at a time) - Stored procedure using SP_SEND_DBMAIL sending duplicate emails to all recipients
Change the number of retries from X to 0. Now I only get 1 email. It'll be more obvious if your users are getting 4 emails, exactly 1 minute apart.

Get list of scheduled jobs by name/type

I wrote schedulable class which is a batch at the same time. I'm scheduling it like so:
String cronExpression = String.format('0 {0} {1} * * ?', new List<String> { String.valueOf(minute), String.valueOf(hour) });
String jobName = 'roomSyncronizationJob' + Integer.valueOf(hour);
return System.schedule(jobName, cronExpression, batch);
Then I have a page to display form for scheduling and table that should display scheduled jobs. For the moment, it displays all scheduled jobs in the system.
My question: Is there any way to get the jobName to be able to filter out jobs that was not scheduled by the code above? Does anybody know any other workaround except for storing all scheduled job ids in the database?
There is one standard object "CronTrigger". May be this object will help you to get name.
Old, but i was searching for this. CronTrigger contains the field CronJobDetailId. CronJobDetail has the field "Name"

Resources