TASKS history in Snowflake - snowflake-cloud-data-platform

Is there a efficient way to see the logs of task run in snowflake
I am using this. Is there a possibility to wipe off the history from here?
select *
from table(information_schema.task_history(
scheduled_time_range_start=>dateadd('hour',-1,current_timestamp()),
result_limit => 1000,
task_name=>'TASKNAME'));

Is there a efficient way to see the logs of task run in Snowflake?
Depending of meaning of workd efficient, Snowflake offers UI to monitor tasks dependencies and run history.
Run History
Task run history includes details about each execution of a given task. You can view the scheduled time, the actual start time, duration of a task and other information.
Account Level Task History:
ask history displays task information at the account level, and is divided into three sections:
Selection (1) - Defines the set of task history to display, and includes types of tasks, date range and other information
Histogram (2) - Displays a bar graph of task runs over.
Task list (3) - a list of selected tasks.
Is there a possibility to wipe off the history from here?
Task History
This Account Usage view enables you to retrieve the history of task usage within the last 365 days (1 year). The view displays one row for each run of a task in the history.

Related

Avoiding stale data when two cron jobs access the same DB

I have two tables(let's say Table A and B) to store the status of two independent activity respectively. I have two cron jobs(Lets call it A and B again) which run 15 mins apart, every 30 mins. They both have the same logic, Updating the data in the table, the only difference is the table they work on.
Let me explain Cron A:
Cron A picks up users with "pending" status in Table A, performs some logic to filter out the now "active" users out of these. It then queries data from table B for these active users only. It then begins to update the status of users in Table A from pending to active, checks if this user is active in table B as well(from the data queried in table B) and stores such users in a Set(active in A and B) for further processing.
This DB update to table A happens for one record at a time and not Bulk update hence it consumes a lot of time.
Note: the users present in table A are also present in Table B.
Similar logic applies to cron B as well where it considers the other table.
You can see the problem. If there are too many records to update and, for example, say cron A is still running and cron B starts it's processing, Cron B will query users from cron A who are still being updated by cron A, i.e, cron B will end up having stale data.
I have one approach to solve this but wanted to know better or more practised solutions to such issues. One thing is, we currently don't have time or resource to completely optimize the legacy code in these cron jobs to improve its processing time.
I was thinking of having a redis key which will be set to True when one of the cron jobs begins to update its respective table's data. While the redis key is set, if it's time for the other cron job, it will first check the value of this redis key and if it is True it should stop processing for that schedule.
The same redis key logic goes for the other cron job as well.
There is not much impact in terms of TAT since the next cron will run in 30 mins. Moreover, I can setup an Elastalert saying the cron job had to be terminated and anyone can monitor and trigger the job manually once the other job is completed.
Wanted to check if this is a viable approach

Credit usage within a task executing a copy statement

We have some task that execute a copy statement against and external S3 bucket/prefix. The bucket/prefix has millions of files. Even when there are no additional files to load the task still takes 7 minutes with the LIST_EXTERNAL_FILES_TIME in query history showing this is where it is spending its time.
Ignoring the design for a moment :) does the LIST_EXTERNAL_FILES_TIME consume credits? Will it consume credits even when utilizing a SERVERLESS warehouse?
Thanks
I would have assumed it would... but:
show warehouses;
name
state
type
COMPUTE_WH
SUSPENDED
STANDARD
list #citibike_trips;
4.2K files listed
show warehouses;
name
state
type
COMPUTE_WH
SUSPENDED
STANDARD
So it seems the fetch is on the CLOUD_COMPUTE, which you will get billed for the amount of the days usage that is over 10% of you normal compute.

Snowflake - Task not running

I have created a simple task with the below script and for some reason it never ran.
CREATE OR REPLACE TASK dbo.tab_update
WAREHOUSE = COMPUTE_WH
SCHEDULE = 'USING CRON * * * * * UTC'
AS CALL dbo.my_procedure();
I am using a snowflake trail enterprise version.
Did you RESUME? From the docs -- "After creating a task, you must execute ALTER TASK … RESUME before the task will run"
A bit of clarification:
Both the steps, while possibly annoying are needed.
Tasks can consume warehouse time (credits) repeatedly (e.g. up to
every minute) so we wanted to make sure that the execute privilege
was granted explicitly to a role.
Tasks can have dependencies and task trees (eventually DAGs)
shouldn't start executing as soon as one or more tasks are created.
Resume provides an explicit sync point when a data engineer can tell
us that the task tree is ready for validation and execution can
start at the next interval.
Dinesh Kulkarni
(PM, Snowflake)

Auto updating access database (can't be linked)

I've got a CSV file that refreshes every 60 seconds with live data from the internet. I want to automatically update my Access database (on a 60 second or so interval) with the new rows that get downloaded, however I can't simply link the DB to the CSV.
The CSV comes with exactly 365 days of data, so when another day ticks over, a day of data drops off. If i was to link to the CSV my DB would only ever have those 365 days of data, whereas i want to append the existing database with the new data added.
Any help with this would be appreciated.
Thanks.
As per the comments the first step is to link your CSV to the database. Not as your main table but as a secondary table that will be used to update your main table.
Once you do that you have two problems to solve:
Identify the new records
I assume there is a way to do so by timestamp or ID, so all you have to do is hold on to the last ID or timestamp imported (that will require an additional mini-table to hold the value persistently).
Make it happen every 60 seconds. To get that update on a regular interval you have two options:
A form's 'OnTimer' event is the easy way but requires very specific conditions. You have to make sure the form that triggers the event is only open once. This is possible even in a multi-user environment with some smart tracking.
If having an Access form open to do the updating is not workable, then you have to work with Windows scheduled tasks. You can set up an Access Macro to run as a Windows scheduled task.

QBO3's Batch Apply does not apply to all matching records

In our QBO3 system, we have intermittant problems generating documents and saving them to our FTP site. When this occurs, the Workflow > Dashboard By Errors report will show us the errored steps.
Once our FTP server is no longer under load, I use ImportFile/BatchApply to re-execute the relevent workflow steps. Specifically, my query is:
DecisionStep/Search?DecisionStepTemplateID=X&ErrorDate!=&SqlFilter=Active&DisplaySize=0&Batch=1000
with an action of:
DecisionStep/Start?ID={DecisionStepID}
Observations:
when I click Query, I see the appropriate results
when I click Preview, I see the appropriate results
when I click Batch, an Import File is created, all Import File Queue records are processed successfully
the Workflow > Dashboard By Errors shows some errors remaining
Why are there errors remaining after using Batch Apply?
In your query, you specify BatchSize=1000; this limits the results of the query to the top 1000 rows. Set BatchSize to a value that is larger than your expected result set. In this example, you should be able to determine your expected result set from Workflow > Dashboard By Errors.
TLDR; Use BatchSize=25 for Query and Preview, then change it to 10000 for Batch.
The UI for ImportFile/BatchApply offers 4 buttons:
Query: executes your query so you can see the results in the browser,
Preview: executes your query, and displays the matching Actions, so you can see the results in the browser,
Apply: queues execution of the query + action for later execution as one long transaction, and
Batch: queues execution of the query + action for later execution as an Import File.
The first two of these buttons hold the possibility of being long-running transactions, and timing out at the load balancer. The Apply and Batch buttons queue an operation, so it won't ever be a long-running transaction subject to a timeout.
When using the Query and Preview buttons, consider making your DisplaySize and BatchSize reasonably small (like 25 or 50), so you're less likely to be subjected to timeouts. When you're ready to Apply or Batch, you can change your DisplaySize / BatchSize to a large value (like 10000) so all matching records have the Action applied to them.

Resources