Snowflake UDF execution time limit - snowflake-cloud-data-platform

I've got an error when calling a UDF when the amount of data get increased:
Database Error in model test_model (models/prep/test_model.sql)
100132 (P0000): JavaScript execution error: UDF execution time limit exceeded by function IMAGE_URL
compiled SQL at target/run/models/test/test_model.sql
As mentioned in the documentation, there is a execution time limit for js UDF, so how long is the time limit and is it configurable?

so lets write a function and use it to test this out.
CREATE OR REPLACE FUNCTION long_time(D DOUBLE)
RETURNS variant
LANGUAGE JAVASCRIPT
AS $$
function sleepFor(sleepDuration){
var now = new Date().getTime();
while(new Date().getTime() < now + sleepDuration){
/* Do nothing */
}
}
sleepFor(D*1000);
return D;
$$;
select long_time(1); -- takes 1.09s
select long_time(10); -- takes 10.32s
select long_time(60); -- Explodes with
JavaScript execution error: UDF execution time limit exceeded by function LONG_TIME
BUT, ran for 31.33s, so it seems you have 30 seconds to complete, which I feel is rather large amount of time, per call to me.

The default timeout limit for JS UDFs is 30 seconds for a row set! Snowflake will send rows in sets to the JS engine, and it will try to process all rows in the set within that time. The size of the row set may vary, but you may assume it will be around 1K (this is just an estimation, the number of rows in a set could be much higher or lower).
The timeout limit is different for Java and Python UDFs. It's 300 seconds for them.
As #Felipe said, you may contact Snowflake Support and share your query ID to get help with this error. The Support may guide you to mitigate the issue.

Related

How to fix System.LimitException: Apex CPU time limit exceeded caused by workflows with email alert?

I'm trying to execute batch test method on 100 records and get CPU Runtime Limit error.
I placed the Limits.getCpuTime() method in my code and noticed that my code without the workflow segment takes 3148 ms to complete. However, when I activate two workflows that sends emails to one user each, I get the CPU runtime limit error. In total my process without those two workflows takes around 10 seconds to complete while with them activated it takes around 20 seconds.
#IsTest
static void returnIncClientAddress(){
//Select Required Records
User incidentClient = [SELECT Id FROM User WHERE Username = 'bbaggins#shire.qa.com' LIMIT 1];
BMCServiceDesk__Category__c category = [SELECT Id FROM BMCServiceDesk__Category__c WHERE Name = 'TestCategory'];
BMCServiceDesk__BMC_BaseElement__c service = [SELECT ID FROM BMCServiceDesk__BMC_BaseElement__c WHERE Name = 'TestService'];
BMCServiceDesk__BMC_BaseElement__c serviceOffering = [SELECT ID FROM BMCServiceDesk__BMC_BaseElement__c WHERE Name = 'TestServiceOffering'];
//Create Incidents
List<BMCServiceDesk__Incident__c> incidents = new List<BMCServiceDesk__Incident__c>();
for(integer i = 0; i < 100; i++){
BMCServiceDesk__Incident__c incident = new BMCServiceDesk__Incident__c(
BMCServiceDesk__FKClient__c = incidentClient.ID,
BMCServiceDesk__FKCategory__c = category.ID,
BMCServiceDesk__FKServiceOffering__c = serviceOffering.ID,
BMCServiceDesk__FKBusinessService__c = service.ID,
BMCServiceDesk__FKStatus__c = awaiting_for_handling
);
incidents.add(incident);
}
test.startTest();
insert incidents;
test.stopTest();
}
I expected the email workflows and alerts to be processed in batch and sent without being so expensive in CPU time, but it seems that Salesforce takes a lot of time both checking the workflows rules and executing on them when needed. The majority of the process' time seems to be spent on sending the workflows' emails (which it doesn't actually do because it's a test method).
There's not much you can do to control the execution time of Workflow Rules. You could try converting them into Apex and benchmarking to see whether that results in improvement in time consumed, but I suspect the real solution is that you're going to have to dial down your bulk test.
The CPU limit for a transaction is 10 seconds. If your unit test code is already taking approximately 10 seconds to complete without Workflows (I'm not sure exactly what bounds your 3148 ms and 10 s refer to), you've really got only two choices:
Make the sum total of automation running on insert of this object faster;
Reduce the quantity of data you're processing in this unit test.
It's not clear what you're actually testing here, but if it's an Apex trigger, you should make sure that it's properly bulkified and does not consume unnecessary CPU time, including through trigger recursion. Reviewing the call stack in your logs (or simply adding System.debug() statements) may help with that.
Lastly - make sure you write assertions in your test method. Test methods without assertions are close to worthless.
Are there triggers on the BMCServiceDesk__Incident__c or on objects modified by the Workflow? Triggers on updates could possible cause the code to execute multiple times in the same execution context causing you to hit the cpu limit. Consider prventing reentry into triggers or performing check to only run triggers if specific criteria is met.
Otherwise consider refactoring the code if possible to have work executed within the same loop if possible as loops especially nested loops drive up your cpu usage. Usually workflow on their own dont drive up CPU limit unless triggers are executes due to workflow updates.

CucumberJS timeout error. Can I set "setDefaultTimeout" in my step?

I am testing a set of rules defined in database using CucumberJS and Protrator. I am doing a database call to fetch the set of rules (~ 1000). When I run the scenario 1000 times, it takes whole lot of time and timesout almost after 2/3 loops.
Is there a way to set "setDefaultTimeout" in my code for every loop, so that timeout keeps increasing by that much time? Is there a better way to implement this? Using Examples in CucumberJS and feeding the Examples table or data table with sql queried data table?
Thanks for your help.
You can set a step specific timeout in the steps implementation that will overwrite any default timeouts.
https://github.com/cucumber/cucumber-js#timeouts
this.When(/^I do something that takes ages$, {timeout : 30 * 1000}, function (url, next) {
// Make db call and process results
});

Which NDB query function is more efficient to iterate through a big set of query results?

I use NDB for my app and use iter() with limit and starting cursor to iterate through 20,000 query results in a task. A lot of time I run into timeout error.
Timeout: The datastore operation timed out, or the data was temporarily unavailable.
The way I make the call is like this:
results = query.iter(limit=20000, start_cursor=cursor, produce_cursors=True)
for item in results:
process(item)
save_cursor_for_next_time(results.cursor_after().urlsafe())
I can reduce the limit but I thought a task can run as long as 10 mins. 10 mins should be more than enough time to go through 20000 results. In fact, on a good run, the task can complete in just about a minute.
If I switched to fetch() or fetch_page(), would they be more efficient and less likely to run into the timeout error? I suspect there's a lot of overhead in iter() that causes the timeout error.
Thanks.
Fetch is not really any more efficient they all use the same mechanism, unless you know how many entities you want upfront - then fetch can be more efficient as you end up with just one round trip.
You can increase the batch size for iter, that can improve things. See https://developers.google.com/appengine/docs/python/ndb/queryclass#kwdargs_options
From the docs the default batch size is 20, which would mean for 20,000 entities a lot of batches.
Other things that can help. Consider using map and or map_async on the processing, rather than explicitly calling process(entity) Have a read https://developers.google.com/appengine/docs/python/ndb/queries#map also introducing async into your processing can mean improved concurrency.
Having said all of that you should profile so you can understand where the time is used. For instance the delays could be in your process due to things you are doing there.
There are other things to conside with ndb like context caching, you need to disable it. But I also used iter method for these. I also made an ndb version of the mapper api with the old db.
Here is my ndb mapper api that should solve timeout problems and ndb caching and easily create this kind of stuff:
http://blog.altlimit.com/2013/05/simple-mapper-class-for-ndb-on-app.html
with this mapper api you can create it like or you can just improve it too.
class NameYourJob(Mapper):
def init(self):
self.KIND = YourItemModel
self.FILTERS = [YourItemModel.send_email == True]
def map(self, item):
# here is your process(item)
# process here
item.send_email = False
self.update(item)
# Then run it like this
from google.appengine.ext import deferred
deferred.defer(NameYourJob().run, 50, # <-- this is your batch
_target='backend_name_if_you_want', _name='a_name_to_avoid_dups')
For potentially long query iterations, we use a time check to ensure slow processing can be handled. Given the disparities in GAE infrastructure performance, you will likely never find an optimal processing number. The code excerpt below is from an on-line maintenance handler we use which generally runs within ten seconds. If not, we get a return code saying it needs to be run again thanks to our timer check. In your case, you would likely break the process after passing the cursor to your next queue task. Here is some sample code which is edited down to hopefully give you a good idea of our logic. One other note: you may choose to break this up into smaller bites and then fan out the smaller tasks by re-enqueueing the task until it completes. Doing 20k things at once seems very aggressive in GAE's highly variable environment. HTH -stevep
def over_dt_limit(start, milliseconds):
dt = datetime.datetime.now() - start
mt = float(dt.seconds * 1000) + (float(dt.microseconds)/float(1000))
if mt > float(milliseconds):
return True
return False
#set a start time
start = datetime.datetime.now()
# handle a timeout issue inside your query iteration
for item in query.iter():
# do your loop logic
if over_dt_limit(start, 9000):
# your specific time-out logic here
break

CPU Usage Date Predicate SQL Server

I'm retrieving some data using a date field predicate(non clustered) and I'm trying to use 2 alternatives to achieve my goals.
The field is a datetime and in one of the statements I'm using literally the value stored in the field
WHERE dataload ='2012-07-16 10:13:01.307'
In the other statement I'm using this predicate.
WHERE dataload >= DATEADD(DAY,0,datediff(day,0,#Dataload))
and dataload <= dateadd(ms,-3,DATEADD(DAY,1,datediff(day,0,#Dataload)))
Looking at the statistics IO/time output, the reads are the same, but the method #1 used 16ms of CPU and the method #2 used 0ms causing the method #1 to consume 61% of the total cost over the 39%of the method #2.
I'm not understanding really why the CPU are being used in #method1 when the method #2 has so many functions on it and gives me 0ms.
There is any basic explanation for it ?
The first method retrieves the field from disk, taking `6 msec. The second method just fetches the data from memory. If you reversed the order, the other method would take longer.

QODBS and SQL Server query performance

My application makes about 5 queries per second to a SQL Server database. Each query results in 1500 rows on average. The application is written on C++/QT, database operations are implemented using QODBC driver. I determined that query processing takes about 25 ms, but fetching the result - 800 ms. Here is how code querying the data base looks like
QSqlQuery query(db)
query.prepare(queryStr);
query.setForwardOnly(true);
if(query.exec())
{
while( query.next() )
{
int v = query.value(0).toInt();
.....
}
}
How to optimize result fetching?
This does not directly answer your question as I haven't used QT in years. In the actual ODBC API you can often speed up the retrieval of rows by setting SQL_ATTR_ROW_ARRAY_SIZE to N then each call to SQLFetch returns N rows at once. I took a look at SqlQuery in qt and could not see a way to do this but it may be something you could look in to with QT or simply write to the ODBC API directly. You can find an example at Preparing to Return Multiple Rows

Resources